From chandan.kr.singh at gmail.com Thu Feb 2 02:26:09 2006 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Thu, 2 Feb 2006 12:56:09 +0530 Subject: [Bioperl-l] Sorry, failure in post on the net, so still via email In-Reply-To: <001001c62793$bef08f70$93656785@zhur> References: <001001c62793$bef08f70$93656785@zhur> Message-ID: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> Hi It seems that its not a proxy problem. I tried today and faced the same problem. It has been months since my last try and therefore something might have changed. Try reading more on this problem. I myself will try to do it. Regards Chandan On 2/2/06, Huang Jian wrote: > > I tried some "Quick getting started scripts" in bptutorial. > > use Bio::Perl; > $seq = get_sequence('swiss',"ROA1_HUMAN"); > # uses the default database - nr in this case > $blast_result = blast_sequence($seq); > write_blast(">roa1.blast",$blast_result); > > It returns "Submitted Blast for [ROA1_HUMAN] " > It does not return me any error after I run the script. However, it does > not > return me any result either. The file "roa1.blast" is created but is > always > empty. > > I found the return is like the code below in function "blast_sequence" > if( $verbose ) { > print STDERR "Submitted Blast for [".$seq->id."] "; > } > sleep 5; > .... > I have tested "( env_proxy => 1 )" ...The problem remains the same... > > Help! By the way, could you send me an invitation letter of gmail, I want > to have a gmail account too... :-) > > Best Regards! > Jian Huang > > From osborne1 at optonline.net Thu Feb 2 17:06:25 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 02 Feb 2006 17:06:25 -0500 Subject: [Bioperl-l] Sorry, failure in post on the net, so still via email In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> Message-ID: Chandan, I'd be interested in what you find. This is not a new problem, this same code snippet has been mentioned many times, but for many others, like me, the code always works. Brian O. On 2/2/06 2:26 AM, "CHANDAN SINGH" wrote: > Hi > It seems that its not a proxy problem. I tried today and faced the same > problem. It has been months since my last try and therefore something might > have changed. > Try reading more on this problem. > I myself will try to do it. > Regards > Chandan > > On 2/2/06, Huang Jian wrote: >> >> I tried some "Quick getting started scripts" in bptutorial. >> >> use Bio::Perl; >> $seq = get_sequence('swiss',"ROA1_HUMAN"); >> # uses the default database - nr in this case >> $blast_result = blast_sequence($seq); >> write_blast(">roa1.blast",$blast_result); >> >> It returns "Submitted Blast for [ROA1_HUMAN] " >> It does not return me any error after I run the script. However, it does >> not >> return me any result either. The file "roa1.blast" is created but is >> always >> empty. >> >> I found the return is like the code below in function "blast_sequence" >> if( $verbose ) { >> print STDERR "Submitted Blast for [".$seq->id."] "; >> } >> sleep 5; >> .... >> I have tested "( env_proxy => 1 )" ...The problem remains the same... >> >> Help! By the way, could you send me an invitation letter of gmail, I want >> to have a gmail account too... :-) >> >> Best Regards! >> Jian Huang >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nagesh.chakka at anu.edu.au Thu Feb 2 20:23:50 2006 From: nagesh.chakka at anu.edu.au (Nagesh Chakka) Date: Fri, 03 Feb 2006 12:23:50 +1100 Subject: [Bioperl-l] RemoteBlast.pm version 1.28 In-Reply-To: <003901c6285e$d1b36670$93656785@zhur> References: <43E28C39.2060308@anu.edu.au> <003901c6285e$d1b36670$93656785@zhur> Message-ID: <43E2B0A6.7000307@anu.edu.au> Hi Huang, Thanks for the message. The older version of RemoteBlast.pm works on the logic of checking the temporary file size to determine whether the Blast results are ready. This condition is not getting satisfied may be due to some changes brought about by NCBI. I had this problem recently and figured out that the solution was to use the latest version which has this problem fixed (does not use file size logic any more) which is not yet included in the BioPerl package. Cheers Nagesh Huang Jian wrote: > Dear Nagesh, > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > me. Now it works perfectly!!! > > Thank you!! > > Huang > > ----- Original Message ----- From: "Nagesh Chakka" > > To: "Huang Jian" ; "bioperl-l" > > Sent: Friday, February 03, 2006 7:48 AM > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > via email > > >> Hi Huang, >> I see that you are submitting a sequence for a remote blast search. Can >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If >> not I have attached it with this email, try to replace it with the old >> one which has a bug. >> Let me know if it works. >> Nagesh > > > From cjfields at uiuc.edu Fri Feb 3 10:45:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 3 Feb 2006 09:45:23 -0600 Subject: [Bioperl-l] RemoteBlast.pm version 1.28 In-Reply-To: <43E2B0A6.7000307@anu.edu.au> Message-ID: <001501c628d8$d91cd430$15327e82@pyrimidine> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It will work for saving text output. However, it will not parse anything using next_result (it will likely hang) and will not save XML format. See these bugs: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 for explanations and possible fixes (changes to RemoteBlast and Bio::SearchIO::blast). Note that these haven't been checked in yet so are still not included in bioperl-live; they may be further modified before committing to CVS. If you're not worried about XML, you could just try the first fix, which is a change to SearchIO::blast. Nagesh, I remember you posting to the list a month ago using a script which had problems; the script you used saves the output but doesn't actually parse it (i.e. you don't use next_result() to go through the data). Is the version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried parsing the output using "-readmethod => SearchIO" or "-readmethod => blast" using your version of RemoteBlast and method next_result()? Like below (from perldoc): while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { # parsing starts here my $result = $rc->next_result(); # it should hang here #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My script hanged if I used next_result() in any way prior to the fixes. I want to see how many others are having the same issues with parsing using the CVS version of bioperl-live. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > Sent: Thursday, February 02, 2006 7:24 PM > To: Huang Jian; bioperl-l > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Hi Huang, > Thanks for the message. The older version of RemoteBlast.pm works on the > logic of checking the temporary file size to determine whether the Blast > results are ready. This condition is not getting satisfied may be due to > some changes brought about by NCBI. I had this problem recently and > figured out that the solution was to use the latest version which has > this problem fixed (does not use file size logic any more) which is not > yet included in the BioPerl package. > Cheers > Nagesh > > Huang Jian wrote: > > > Dear Nagesh, > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > me. Now it works perfectly!!! > > > > Thank you!! > > > > Huang > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > To: "Huang Jian" ; "bioperl-l" > > > > Sent: Friday, February 03, 2006 7:48 AM > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > via email > > > > > >> Hi Huang, > >> I see that you are submitting a sequence for a remote blast search. Can > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > >> not I have attached it with this email, try to replace it with the old > >> one which has a bug. > >> Let me know if it works. > >> Nagesh > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri Feb 3 13:05:44 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 03 Feb 2006 13:05:44 -0500 Subject: [Bioperl-l] Documentation in the Bioperl package Message-ID: bioperl-l, The recent work on the Bioperl Wiki moved much of the Bioperl documentation online. Since we cannot maintain 2 locations for all of this we?ll be removing a number of files from the package, specifically: biodatabases.pod biodesign.pod bioperl.pod bioscripts.pod doc/howto/* doc/faq/* FAQ Rest assured that all of these files have been gone over in detail to make sure that no important information was lost during the migration. All of this will be replaced by a single file, such as ?README.docs?, that explains where all the documentation is. It?s not entirely clear what will happen to bptutorial.pl. Moving its content to different online locations is possible but in this case we loose its functionality as a script. Are there any comments or questions or concerns? Brian O. From saldroubi at yahoo.com Fri Feb 3 13:38:26 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Fri, 3 Feb 2006 10:38:26 -0800 (PST) Subject: [Bioperl-l] Gibbs sampling algorithm? Message-ID: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com> Hi everyone, I am wondering if anyone has implemented the Gibbs sampling algorithm in BioPerl or otherwise for finding motifs. I saw Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs program which is not free open source, I think. I prefer not to write my one Gibbs sampling algorithm if it is already out there. Any comments are appreciated. Thank you Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From cjfields at uiuc.edu Fri Feb 3 14:34:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 3 Feb 2006 13:34:27 -0600 Subject: [Bioperl-l] Gibbs sampling algorithm? In-Reply-To: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com> Message-ID: <001901c628f8$d89917b0$15327e82@pyrimidine> Do you mean this Gibbs program? ftp://ncbi.nlm.nih.gov/pub/neuwald/ You can also request a license from the Gibbs Motif Sampler homepage, which is more up to date: http://bayesweb.wadsworth.org/gibbs/gibbs.html. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sam Al-Droubi > Sent: Friday, February 03, 2006 12:38 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Gibbs sampling algorithm? > > Hi everyone, > > I am wondering if anyone has implemented the Gibbs sampling algorithm in > BioPerl or otherwise for finding motifs. I saw > Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs > program which is not free open source, I think. I prefer not to write my > one Gibbs sampling algorithm if it is already out there. Any comments are > appreciated. > > Thank you > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gyang at plantbio.uga.edu Fri Feb 3 14:44:50 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Fri, 03 Feb 2006 14:44:50 -0500 Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 In-Reply-To: <001501c628d8$d91cd430$15327e82@pyrimidine> Message-ID: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu> Hi, Everybody, I see this post and am wondering if this is the reason for the malfunctionning of my webserver. We set up a webserver named MAK, for MITE sequence analysis. It was working very well until around November 2005, when it stopped returning any result (the site is fine and seems to be doing sth after submission). In the CGI script, I used remoteblast (that work was done in 2003) to do searches. I currently do not have access to the server because I moved. Quite several people sent emails to us about its malfunctioning. Is there any suggestion on fixing the problem? Should I simplily ask the remoteblast.pm be replaced with the new version? Thanks a lot, Guojun Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-l at bioperl.org] Sent: Fri, 03 Feb 2006 10:45:23 -0500 Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It will work for saving text output. However, it will not parse anything using next_result (it will likely hang) and will not save XML format. See these bugs: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 for explanations and possible fixes (changes to RemoteBlast and Bio::SearchIO::blast). Note that these haven't been checked in yet so are still not included in bioperl-live; they may be further modified before committing to CVS. If you're not worried about XML, you could just try the first fix, which is a change to SearchIO::blast. Nagesh, I remember you posting to the list a month ago using a script which had problems; the script you used saves the output but doesn't actually parse it (i.e. you don't use next_result() to go through the data). Is the version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried parsing the output using "-readmethod => SearchIO" or "-readmethod => blast" using your version of RemoteBlast and method next_result()? Like below (from perldoc): while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { # parsing starts here my $result = $rc->next_result(); # it should hang here #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } My script hanged if I used next_result() in any way prior to the fixes. I want to see how many others are having the same issues with parsing using the CVS version of bioperl-live. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > Sent: Thursday, February 02, 2006 7:24 PM > To: Huang Jian; bioperl-l > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Hi Huang, > Thanks for the message. The older version of RemoteBlast.pm works on the > logic of checking the temporary file size to determine whether the Blast > results are ready. This condition is not getting satisfied may be due to > some changes brought about by NCBI. I had this problem recently and > figured out that the solution was to use the latest version which has > this problem fixed (does not use file size logic any more) which is not > yet included in the BioPerl package. > Cheers > Nagesh > > Huang Jian wrote: > > > Dear Nagesh, > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > me. Now it works perfectly!!! > > > > Thank you!! > > > > Huang > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > To: "Huang Jian" ; "bioperl-l" > > > > Sent: Friday, February 03, 2006 7:48 AM > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > via email > > > > > >> Hi Huang, > >> I see that you are submitting a sequence for a remote blast search. Can > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > >> not I have attached it with this email, try to replace it with the old > >> one which has a bug. > >> Let me know if it works. > >> Nagesh > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From gbazykin at Princeton.EDU Fri Feb 3 15:38:04 2006 From: gbazykin at Princeton.EDU (Georgii A Bazykin) Date: Fri, 3 Feb 2006 15:38:04 -0500 Subject: [Bioperl-l] proposed additions to Tree and cladogram In-Reply-To: <148174979677.20051026172707@princeton.edu> References: <148174979677.20051026172707@princeton.edu> Message-ID: <8010525745.20060203153804@princeton.edu> Hi all, a while ago, I mailed to bioperl-l some proposed additions to phylogeny-related modules (see below). I am doing a project on hiv phylogeny now, and rely on these additions heavily. They expand on what was already present in the corresponding modules. I expected them to be also of general usage (at least the first one). However, I never got any answer, so I assumed that these additions were considered superfluous by most. I am now working on an addition to Tree::Draw::Cladogram module. For my project, I need to color individual tree edges (including internal) into colors from red to blue (according to the nosynonymous/synonymous ratios of these branches). This should be technically easy (I guess I will add -Rcolor, -Gcolor and -Bcolor tags to nodes and use them in Cladogram to color preceding edges), but I have two questions: - will this add-on be of general interest - should I try to do it "the right way", updating the pods etc.; - in general, are there any guidelines about how specific an issue a method should address to be included in bioperl distribution? Thanks, Yegor Bazykin This is a forwarded message From: Georgii Bazykin To: bioperl-l at bioperl.org Date: Wednesday, October 26, 2005, 4:27:07 PM Subject: suggestions for additions to Tree ===8<==============Original message text=============== Hi, here are some tree-related methods I needed and added to my bioperl. Hope someone else finds any of them useful as well. Yegor Bazykin ============================================= To NodeI: # modified from total_branch_length in Tree:Tree module # gets sum of branches in the subtree - descendents of given node =head2 children_branch_length Title : children_branch_length Usage : my $size = $node->children_branch_length Function: Returns the sum of the length of all branches of the subtree which starts at given node Returns : integer Args : none =cut sub children_branch_length { my ($self) = @_; return 0 if($self -> is_Leaf) ; my $sum = 0; for ($self -> get_all_Descendents) { $sum += $_->branch_length || 0; } return $sum; } ----------------------------------- =head2 height_nodes Title : height_nodes Usage : my $len = $node->height_nodes Function: Returns the height of the tree starting at this node. Height is the maximum branchlength to get to the tip. Returns : The longest length to a leaf, in nodes Args : none =cut sub height_nodes{ my ($self) = @_; return 0 if( $self->is_Leaf ); my $max = 0; foreach my $subnode ( $self->each_Descendent ) { my $s = $subnode->height_nodes + 1; if( $s > $max ) { $max = $s; } } return $max; } ---------------------------------- =head2 get_all_Descendent_Leaves Title : get_all_Descendent_Leaves($sortby) Usage : my @nodes = $node->get_all_Descendent_Leaves; Function: Recursively fetch all the nodes and their descendents, only selecting leaves *NOTE* This is different from each_Descendent Returns : Array or Bio::Tree::NodeI objects Args : $sortby [optional] "height", "creation" or coderef to be used to sort the order of children nodes. =cut sub get_all_Descendent_Leaves{ my ($self, $sortby) = @_; $sortby ||= 'height'; my @nodes; foreach my $node ( $self->each_Descendent($sortby) ) { if ($node->is_Leaf) { push @nodes, $node; } else { push @nodes, ($node->get_all_Descendents($sortby)); } } return @nodes; } ===================================================== To Tree: =head2 total_internal_branch_length Title : total_internal_branch_length Usage : my $size = $tree->total_internal_branch_length Function: Returns the sum of the length of all branches, excluding branches leading to leaves Returns : integer Args : none =cut sub total_internal_branch_length { my ($self) = @_; my $sum = 0; if( defined $self->get_root_node ) { for ( $self->get_root_node->get_Descendents() ) { unless ($_->is_Leaf) { # YB: THIS IS ALL I ADDED $sum += $_->branch_length || 0; } } } return $sum; } ================================================= To TreeFunctionsI: =head2 distance_nodes Title : distance_nodes Usage : distance_nodes(-nodes => \@nodes ) Function: returns the distance between two given nodes in numbers of nodes Returns : numerical distance Args : -nodes => arrayref of nodes to test =cut # YB: distance_nodes is very similar to distance method in TreeFunctionsI except that # it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and # daughter, 2 between two sisters, etc.) sub distance_nodes { my ($self, at args) = @_; my ($nodes) = $self->_rearrange([qw(NODES)], at args); if( ! defined $nodes ) { $self->warn("Must supply -nodes parameter to distance_nodes() method"); return undef; } my ($node1,$node2) = $self->_check_two_nodes($nodes); # algorithm: # Find lca: Start with first node, find and save every node from it # to root, saving cumulative distance. Then start with second node; # for it and each of its ancestor nodes, check to see if it's in # the first node's ancestor list - if so it is the lca. Return sum # of (cumul. distance from node1 to lca) and (cumul. distance from # node2 to lca) # find and save every ancestor of node1 (including itself) my %node1_ancestors; # keys are internal ids, values are objects my %node1_cumul_dist; # keys are internal ids, values # are cumulative distance from node1 to given node my $place = $node1; # start at node1 my $cumul_dist = 0; while ( $place ){ $node1_ancestors{$place->internal_id} = $place; $node1_cumul_dist{$place->internal_id} = $cumul_dist; $cumul_dist++; # YB #YB if ($place->branch_length) { #YB $cumul_dist += $place->branch_length; # include current branch #YB # length in next iteration #YB } $place = $place->ancestor; } # now climb up node2, for each node checking whether # it's in node1_ancestors $place = $node2; # start at node2 $cumul_dist = 0; while ( $place ){ foreach my $key ( keys %node1_ancestors ){ # ugh if ( $place->internal_id == $key){ # we're at lca return $node1_cumul_dist{$key} + $cumul_dist; } } # include current branch length in next iteration #YB $cumul_dist += $place->branch_length || 0; $cumul_dist++; # YB $place = $place->ancestor; } $self->warn("Could not find distance!"); # should never execute, # if so, there's a problem return undef; } ===8<===========End of original message text=========== From cjfields at uiuc.edu Fri Feb 3 16:07:29 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 3 Feb 2006 15:07:29 -0600 Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 In-Reply-To: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu> Message-ID: <001a01c62905$d7ef0920$15327e82@pyrimidine> I would say give the new code a try, but realize that it hasn't been checked in (like I said below). I will try going over the modified Bio::SearchIO::blast again this weekend to see if there is anything I might have missed. The changed order in the header of BLAST text output has me a bit worried that it might not catch everything, but it at least doesn't hang in the while() loop I described in the bug report below (bug #1934) and seems to process everything fine. If you want more stability in the code, you might consider changing over to XML output and parsing with Bio::SearchIO::blastxml. There are some changes in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML output, but I believe it parses everything regardless. If you look back the last month or so there has been a bit of discussion here about it. Jason describes a bit on how to set up RemoteBlast for XML: http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Friday, February 03, 2006 1:45 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > Hi, Everybody, > I see this post and am wondering if this is the reason for the > malfunctionning of my webserver. We set up a webserver named MAK, for MITE > sequence analysis. It was working very well until around November 2005, > when it stopped returning any result (the site is fine and seems to be > doing sth after submission). In the CGI script, I used remoteblast (that > work was done in 2003) to do searches. I currently do not have access to > the server because I moved. Quite several people sent emails to us about > its malfunctioning. Is there any suggestion on fixing the problem? Should > I simplily ask the remoteblast.pm be replaced with the new version? > Thanks a lot, > Guojun > > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > l at bioperl.org] > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > will > work for saving text output. However, it will not parse anything using > next_result (it will likely hang) and will not save XML format. See these > bugs: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > for explanations and possible fixes (changes to RemoteBlast and > Bio::SearchIO::blast). Note that these haven't been checked in yet so are > still not included in bioperl-live; they may be further modified before > committing to CVS. If you're not worried about XML, you could just try the > first fix, which is a change to SearchIO::blast. > > Nagesh, I remember you posting to the list a month ago using a script > which > had problems; the script you used saves the output but doesn't actually > parse it (i.e. you don't use next_result() to go through the data). Is the > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > parsing the output using "-readmethod => SearchIO" or "-readmethod => > blast" > using your version of RemoteBlast and method next_result()? Like below > (from > perldoc): > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { # parsing > starts here > my $result = $rc->next_result(); # it should hang > here > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > My script hanged if I used next_result() in any way prior to the fixes. I > want to see how many others are having the same issues with parsing using > the CVS version of bioperl-live. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > Sent: Thursday, February 02, 2006 7:24 PM > > To: Huang Jian; bioperl-l > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Hi Huang, > > Thanks for the message. The older version of RemoteBlast.pm works on the > > logic of checking the temporary file size to determine whether the Blast > > results are ready. This condition is not getting satisfied may be due to > > some changes brought about by NCBI. I had this problem recently and > > figured out that the solution was to use the latest version which has > > this problem fixed (does not use file size logic any more) which is not > > yet included in the BioPerl package. > > Cheers > > Nagesh > > > > Huang Jian wrote: > > > > > Dear Nagesh, > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > me. Now it works perfectly!!! > > > > > > Thank you!! > > > > > > Huang > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > via email > > > > > > > > >> Hi Huang, > > >> I see that you are submitting a sequence for a remote blast search. > Can > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > > >> not I have attached it with this email, try to replace it with the > old > > >> one which has a bug. > > >> Let me know if it works. > > >> Nagesh > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Fri Feb 3 18:11:03 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 3 Feb 2006 15:11:03 -0800 Subject: [Bioperl-l] Documentation in the Bioperl package In-Reply-To: References: Message-ID: Just to be sure, the wiki will be able to handle versions (releases)? (documentation and APIs may change between releases and hence a more recent doc page may not apply to an earlier release) -hilmar On 2/3/06, Brian Osborne wrote: > bioperl-l, > > The recent work on the Bioperl Wiki moved much of the Bioperl documentation > online. Since we cannot maintain 2 locations for all of this we?ll be > removing a number of files from the package, specifically: > > biodatabases.pod > biodesign.pod > bioperl.pod > bioscripts.pod > doc/howto/* > doc/faq/* > FAQ > > Rest assured that all of these files have been gone over in detail to make > sure that no important information was lost during the migration. All of > this will be replaced by a single file, such as ?README.docs?, that explains > where all the documentation is. It?s not entirely clear what will happen to > bptutorial.pl. Moving its content to different online locations is possible > but in this case we loose its functionality as a script. > > Are there any comments or questions or concerns? > > Brian O. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From hubert.prielinger at gmx.at Fri Feb 3 17:47:37 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 03 Feb 2006 16:47:37 -0600 Subject: [Bioperl-l] standalone blast composition based statistics parameter Message-ID: <43E3DD89.7080903@gmx.at> Hi, Does anybody know whether it is possible to perform a with the standalone blast a database search where the composition based statistics parameter is on and what's the abbreviation for the parameter thanks Hubert From osborne1 at optonline.net Fri Feb 3 22:32:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 03 Feb 2006 22:32:18 -0500 Subject: [Bioperl-l] Documentation in the Bioperl package In-Reply-To: Message-ID: Hilmar, MediaWiki supports such things as rollback based on date but it is not CVS where an entire set of pages are tagged by version. It is also scriptable so it may be possible to emulate this type of tagging by script, but I'm not entirely sure (see WWW::Mediawiki::Client, Jason pointed this out to me). So the simple answer is probably "no". But let's be honest: synchrony between code and documentation wasn't achieved using the previous approach, CVS, either. What Jason, Torsten, and I appreciated when adding content to this new site was that it was relatively easy, our hope is that this approach will get more people involved. The assumption is that more involvement will lead to better documentation - Jason made this assumption when electing to move the site to MediaWiki and I have to say that I completely agree with this assumption. Jason, any thoughts on this question? An interesting one... Brian O. On 2/3/06 6:11 PM, "Hilmar Lapp" wrote: > Just to be sure, the wiki will be able to handle versions (releases)? > (documentation and APIs may change between releases and hence a more > recent doc page may not apply to an earlier release) > > -hilmar > > On 2/3/06, Brian Osborne wrote: >> bioperl-l, >> >> The recent work on the Bioperl Wiki moved much of the Bioperl documentation >> online. Since we cannot maintain 2 locations for all of this we?ll be >> removing a number of files from the package, specifically: >> >> biodatabases.pod >> biodesign.pod >> bioperl.pod >> bioscripts.pod >> doc/howto/* >> doc/faq/* >> FAQ >> >> Rest assured that all of these files have been gone over in detail to make >> sure that no important information was lost during the migration. All of >> this will be replaced by a single file, such as ?README.docs?, that explains >> where all the documentation is. It?s not entirely clear what will happen to >> bptutorial.pl. Moving its content to different online locations is possible >> but in this case we loose its functionality as a script. >> >> Are there any comments or questions or concerns? >> >> Brian O. >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From shameer at ncbs.res.in Sat Feb 4 05:15:33 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Sat, 4 Feb 2006 15:45:33 +0530 (IST) Subject: [Bioperl-l] Calpha to Co-ordinates Program In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> References: <001001c62793$bef08f70$93656785@zhur> <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> Message-ID: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> Dear All, Any one is aware of a perl script / Bio::PERL module that can be used to construct full atomic coordinates of a protein from a given C(alpha) trace and optimizes side chain geometry. I tried the original program Maxsprout from Holms Group, But it is not giving me proper results (am getting errors like segmentation fault - backbonchain failed etc.) Since I need to use as a part of a webs server - I would appreciate if any one could let me know about a perl script for the same. Thanks and cheers in advance, -- Mr. Shameer Khadar (JRF) Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." MM From torsten.seemann at infotech.monash.edu.au Sat Feb 4 22:34:35 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 05 Feb 2006 14:34:35 +1100 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E3DD89.7080903@gmx.at> References: <43E3DD89.7080903@gmx.at> Message-ID: <43E5724B.5070007@infotech.monash.edu.au> Hubert, > Does anybody know whether it is possible to perform a with the > standalone blast a database search where the composition based > statistics parameter is on > and what's the abbreviation for the parameter The StandAloneBlast only runs the "blastall" binary on your system. It accepts all the command line options (like "-d" etc.) that "blastall" does but just passes them as-is; it doesn't do anything special. On a Unix system, type "blastall -" to list all the options that your BLAST binary supports. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From fernan at iib.unsam.edu.ar Sat Feb 4 23:34:27 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Sun, 5 Feb 2006 01:34:27 -0300 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E3DD89.7080903@gmx.at> References: <43E3DD89.7080903@gmx.at> Message-ID: <20060205043427.GB39264@iib.unsam.edu.ar> +----[ Hubert Prielinger (03.Feb.2006 21:06): | | Hi, | Does anybody know whether it is possible to perform a with the | standalone blast a database search where the composition based | statistics parameter is on | and what's the abbreviation for the parameter | | thanks | Hubert | +----] only for tblastn. As Torsten said, 'blastall' with no arguments would have revealed it: [ ... ] -C Use composition-based statistics for tblastn: D or d: default (equivalent to F) 0 or F or f: no composition-based statistics 1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001 2: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, conditioned on sequence properties 3: Composition-based score adjustment as in Bioinformatics 21:902-911, 2005, unconditionally For programs other than tblastn, must either be absent or be D, F or 0. [String] default = D Fernan PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206) From hubert.prielinger at gmx.at Sun Feb 5 21:56:07 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 05 Feb 2006 20:56:07 -0600 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <20060205043427.GB39264@iib.unsam.edu.ar> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> Message-ID: <43E6BAC7.5050707@gmx.at> Hi, thank you very much, If I use the tblastn instead of blastp, I get the following error message [blastall] WARNING: : Unable to open nr.00.nin I looked up in the folder, but I don't have that file, and if I download the database and extract the file, it isn't there either... thanks Hubert Fernan Aguero wrote: >+----[ Hubert Prielinger (03.Feb.2006 21:06): >| >| Hi, >| Does anybody know whether it is possible to perform a with the >| standalone blast a database search where the composition based >| statistics parameter is on >| and what's the abbreviation for the parameter >| >| thanks >| Hubert >| >+----] > >only for tblastn. > >As Torsten said, 'blastall' with no arguments would have >revealed it: > >[ ... ] > -C Use composition-based statistics for tblastn: > D or d: default (equivalent to F) > 0 or F or f: no composition-based statistics > 1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001 > 2: Composition-based score adjustment as in Bioinformatics 21:902-911, > 2005, conditioned on sequence properties > 3: Composition-based score adjustment as in Bioinformatics 21:902-911, > 2005, unconditionally > For programs other than tblastn, must either be absent or be D, F or 0. > [String] > default = D > >Fernan > >PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206) >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From torsten.seemann at infotech.monash.edu.au Sun Feb 5 23:29:11 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 06 Feb 2006 15:29:11 +1100 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E6BAC7.5050707@gmx.at> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> Message-ID: <43E6D097.7080304@infotech.monash.edu.au> Hubert > thank you very much, If I use the tblastn instead of blastp, I get the > following error message > [blastall] WARNING: : Unable to open nr.00.nin > I looked up in the folder, but I don't have that file, and if I download > the database and extract the file, it isn't there either... "tblastn" requires a NUCLEOTIDE database to search. It appears that you have specified a PROTEIN database with "-d nr" ("nr" is protein). You probably want to install the "nt" blast database and use that instead. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From hubert.prielinger at gmx.at Sun Feb 5 23:12:27 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Sun, 05 Feb 2006 22:12:27 -0600 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E6D097.7080304@infotech.monash.edu.au> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> Message-ID: <43E6CCAB.2060107@gmx.at> dear torsten, thanks for your quick reply, I have looked up at the ftp server and there are nt.00 to nt.04. Do I have to download all of them, are there differences? thanks Hubert Torsten Seemann wrote: >Hubert > > > >>thank you very much, If I use the tblastn instead of blastp, I get the >>following error message >>[blastall] WARNING: : Unable to open nr.00.nin >>I looked up in the folder, but I don't have that file, and if I download >>the database and extract the file, it isn't there either... >> >> > >"tblastn" requires a NUCLEOTIDE database to search. It appears that you >have specified a PROTEIN database with "-d nr" ("nr" is protein). You >probably want to install the "nt" blast database and use that instead. > > > From torsten.seemann at infotech.monash.edu.au Mon Feb 6 00:22:09 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 06 Feb 2006 16:22:09 +1100 Subject: [Bioperl-l] standalone blast composition based statistics parameter In-Reply-To: <43E6CCAB.2060107@gmx.at> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> <43E6CCAB.2060107@gmx.at> Message-ID: <43E6DD01.2010600@infotech.monash.edu.au> Hubert > thanks for your quick reply, I have looked up at the ftp server and > there are nt.00 to nt.04. Do I have to download all of them, are there > differences? You have to download them all. The "nt" database (actually the index files) is very big, and it is split up into gigabyte (?) parts. Although they are called "nt.00" "nt.01" etc, you still pass "-d nt" to "blastall", because together these parts are one "nt" database. The "blastall" program will automatically use the separate parts; you do not have to join them. You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are using the correct BLAST search for your problem. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ Phone: +61 3 9905 9010 From shameer at ncbs.res.in Mon Feb 6 03:27:50 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 6 Feb 2006 13:57:50 +0530 (IST) Subject: [Bioperl-l] Need a slogan for OBF In-Reply-To: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> References: <001001c62793$bef08f70$93656785@zhur> <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com> <47205.192.168.1.176.1139048133.squirrel@192.168.1.176> Message-ID: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38> Dear All, As we are moving to the all new look wiki-style-web - why dont we think about a unique logo + slogan that can express our spirit and excitement ??? For Example we can have a logo with O|B|F its full form and the slogan - any body is interested - i would be happy to design logos once we have done with the logo. I have a couple of suggestions -I hope all OBF members can sent much more powerful slogans than mine 'Let's Code for Life' 'Let's Decode Life' 'Let's Recode Life' 'Code your Life ' Happy O|B|!!! -- Mr. Shameer Khadar (JRF) Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." MM From olsonbr2 at msu.edu Fri Feb 3 15:54:22 2006 From: olsonbr2 at msu.edu (Bradley J. S. C. Olson) Date: Fri, 3 Feb 2006 15:54:22 -0500 Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the method? Message-ID: <005e01c62904$02b2ad30$db4c0a23@dihedral> I have been working with the RemoteBlast.pm module and have found that it is a bit clunky to use loops to keep checking to see if you RID has finished. For example, every time you write a script, you need to add a code block (see example in the documentation) in order to keep checking if @rid is finished. Would it be better to maybe write this in as a method in the RemoteBlast module? It seems like it would be better for remoteblast to have a method we could call say retrieve_when_done that would return the blast report when the value of retrieve_blast is no longer 0. The only issue may be report parsing, but I wonder if it might be better to separate out submittal/retrieval of BLAST requests from the parsing step and make these more discrete processes? Since NCBI seems to be not supporting text results as a standard, maybe the module should work exclusively with XML and we could change report handling away from the headaches of text processing and just allow Bio::SeqIO or blastxml handle the task of making a blast reports into different forms (such as HTML, text etc). This would definitely simplifying coding using the RemoteBlast.pm module as then you could treat the report retrieval process as an object and just wait for the object to return its value, instead of coding in a bunch of test loops to see if it is done. This may also help keep bugs out of the module and make the module longer lasting and not require module users to rewrite their code every time NCBI makes changes. Any thoughts or ideas? Is anyone working on this? Thanks Brad Olson -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006 From cjfields at uiuc.edu Mon Feb 6 12:27:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 6 Feb 2006 11:27:56 -0600 Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter themethod? In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral> Message-ID: <002c01c62b42$ab7671a0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson > Sent: Friday, February 03, 2006 2:54 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter > themethod? > > I have been working with the RemoteBlast.pm module and have found that it > is > a bit clunky to use loops to keep checking to see if you RID has finished. > > > > For example, every time you write a script, you need to add a code block > (see example in the documentation) in order to keep checking if @rid is > finished. > > Would it be better to maybe write this in as a method in the RemoteBlast > module? It seems like it would be better for remoteblast to have a method > we could call say retrieve_when_done that would return the blast report > when > the value of retrieve_blast is no longer 0. Sounds reasonable, though I'm not sure how easy it would be to implement. Why not drop by Bugzilla (http://bugzilla.bioperl.org/) and submit this as an enhancement? > The only issue may be report parsing, but I wonder if it might be better > to > separate out submittal/retrieval of BLAST requests from the parsing step > and > make these more discrete processes? Since NCBI seems to be not supporting > text results as a standard, maybe the module should work exclusively with > XML and we could change report handling away from the headaches of text > processing and just allow Bio::SeqIO or blastxml handle the task of making > a > blast reports into different forms (such as HTML, text etc). They are separated. RemoteBlast executes BLAST remotely (via HTTP). Results are parsed via various Bio::SearchIO modules depending on what you set '-readmethod' to. This is from perldoc: >From Bio::Tools::Run::RemoteBlast ________________________________________________________ DESCRIPTION Class for remote execution of the NCBI Blast via HTTP. For a description of the many CGI parameters see: http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html Various additional options and input formats are available. ________________________________________________________ >From Bio::SearchIO____________ ____________________________________________ DESCRIPTION This is a driver for instantiating a parser for report files from sequence database searches. This object serves as a wrapper for the format parsers in Bio::SearchIO::* - you should not need to ever use those format parsers directly. (For people used to the SeqIO system it, we are deliberately using the same pattern). Once you get a SearchIO object, calling next_result() gives you back a Bio::Search::Result::ResultI compliant object, which is an object that represents one Blast/Fasta/HMMER whatever report. A list of module names and formats is below: blast BLAST (WUBLAST, NCBIBLAST,bl2seq) fasta FASTA -m9 and -m0 blasttable BLAST -m9 or -m8 output (NCBI not WUBLAST tabular) megablast MEGABLAST psl UCSC PSL format waba WABA output axt AXT format sim4 Sim4 hmmer HMMER hmmpfam and hmmsearch exonerate Exonerate CIGAR and VULGAR format blastxml NCBI BLAST XML wise Genewise -genesf format See the SearchIO HOWTO linked from http://bioperl.org/HOWTOs/ ________________________________________________________ This is also in the wiki online now: http://www.bioperl.org/wiki/Module:Bio::SearchIO http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast I think the current line of thought is to make XML the default, but I also know you would irritate a LOT of people out there by cutting off text output parsing completely. Roger Hall or Jason pointed out that doing so will break many scripts out there. Furthermore, the problems with text output parsing are usually minimal. For instance, the last one was a small change which broke a regex, causing an infinite loop; the actual bug was in Bio::SearchIO::blast and not in RemoteBlast. A simple addition to the regex fixed it. The only change to RemoteBlast was to implement the option of saving XML formatted BLAST output. I do like the idea of using XML output to build custom (bioperl-specific) BLAST reports, but that also requires more work, likely a lot more work. Again, maybe add that as an enhancement in Bugzilla or, better yet, submit some sample code maybe as an example. > This would definitely simplifying coding using the RemoteBlast.pm module > as > then you could treat the report retrieval process as an object and just > wait > for the object to return its value, instead of coding in a bunch of test > loops to see if it is done. This may also help keep bugs out of the > module > and make the module longer lasting and not require module users to rewrite > their code every time NCBI makes changes. I think the most stable way of submitting jobs is by using the netblast client (blastcl3) and parsing the results from that. No CGI, no HTML, just saving to a temp file and parsing through SearchIO. RemoteBlast was designed, I believe, with the idea of letting researchers with some basic knowledge of perl use an interface familiar to them (i.e. the BLAST interface at NCBI) and retrieve results on a regular basis. The results are parsed via SearchIO::blast/blastxml/blasttable. The problem is, though convenient, RemoteBlast is also reliant on the powers that be at NCBI not changing anything dramatically. It is possible that NCBI could modify the HTML code from the BLAST retrieval process, thus breaking RemoteBlast. Text output could change again, even more dramatically, thus severely breaking Bio::SearchIO::blast. Thus, we adapt to those changes by modifying the broken modules. It's evolution at its finest. It's also a fact of life that code breaks and needs to be fixed every once in a while to stay current. Okay, I'm waxing philosophical now so I know I've definitely had too much coffee. Must get back to work... > > > > Any thoughts or ideas? > > > > Is anyone working on this? > > > > Thanks > > > > Brad Olson > > > > > > > -- > No virus found in this outgoing message. > Checked by AVG Free Edition. > Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From roger at iosea.com Mon Feb 6 13:14:11 2006 From: roger at iosea.com (Roger Hall) Date: Mon, 6 Feb 2006 12:14:11 -0600 Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the method? In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral> Message-ID: <000f01c62b49$25732d30$4301a8c0@LIBERAL> Brad, I decided to fix this module about ten days ago, and then was out all of last week with Strep plus a virus or two - it's one of the advantages of having young kids. I see that there have been quite a few messages about this module in just the last week. I am sitting down now to read through them. I'll get back to you (and the list) ASAP. If you have any other questions or suggestions about RemoteBlast, feel free to bug me with 'em. Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson Sent: Friday, February 03, 2006 2:54 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the method? I have been working with the RemoteBlast.pm module and have found that it is a bit clunky to use loops to keep checking to see if you RID has finished. For example, every time you write a script, you need to add a code block (see example in the documentation) in order to keep checking if @rid is finished. Would it be better to maybe write this in as a method in the RemoteBlast module? It seems like it would be better for remoteblast to have a method we could call say retrieve_when_done that would return the blast report when the value of retrieve_blast is no longer 0. The only issue may be report parsing, but I wonder if it might be better to separate out submittal/retrieval of BLAST requests from the parsing step and make these more discrete processes? Since NCBI seems to be not supporting text results as a standard, maybe the module should work exclusively with XML and we could change report handling away from the headaches of text processing and just allow Bio::SeqIO or blastxml handle the task of making a blast reports into different forms (such as HTML, text etc). This would definitely simplifying coding using the RemoteBlast.pm module as then you could treat the report retrieval process as an object and just wait for the object to return its value, instead of coding in a bunch of test loops to see if it is done. This may also help keep bugs out of the module and make the module longer lasting and not require module users to rewrite their code every time NCBI makes changes. Any thoughts or ideas? Is anyone working on this? Thanks Brad Olson -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.m.dancis at gsk.com Mon Feb 6 12:17:13 2006 From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com) Date: Mon, 6 Feb 2006 12:17:13 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: <003701c625c4$5527d790$2f01a8c0@GOLHARMOBILE1> Message-ID: Hi -- Are there any classes for manipulating miRNA's with functions such as parsing the name, storing and interlinking pri/pre/mat sequences, etc? Thanks, Barry From hubert.prielinger at gmx.at Mon Feb 6 18:16:01 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 06 Feb 2006 17:16:01 -0600 Subject: [Bioperl-l] no results with standalone tblastn In-Reply-To: <43E6DD01.2010600@infotech.monash.edu.au> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> <43E6CCAB.2060107@gmx.at> <43E6DD01.2010600@infotech.monash.edu.au> Message-ID: <43E7D8B1.5030307@gmx.at> dear torsten, I have downloaded all the databases, as you recommended me. And it is working, but I don't get any results, if I try it online it works fine. my result file looks like that: TBLASTN 2.2.13 [Nov-27-2005] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Query= (8 letters) Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) 3,749,503 sequences; 16,556,997,203 total letters Searching..................................................done Sequences producing significant alignments: Score E (bits) Value the program code for it looks like that: #!/usr/local/bin/perl -w BEGIN { $ENV{BLASTDIR}= "/home/Hubert/blast/blast-2.2.13/bin"; $ENV{BLASTDATADIR}= "/home/Hubert/blast/blast-2.2.13/data"; } use Bio::Tools::Run::StandAloneBlast; use Bio::Seq; use Bio::SeqIO; use strict; print "Please insert matrix:\t"; my $matrix_STD = ; chomp $matrix_STD; print "Please insert count:\t"; my $count_STD = ; chomp $count_STD; # parameters my $expect_value = 20000; #my $filter_query_sequence = 'T'; my $one_line_description = 1000; my $alignments = 1000; #my $matrix = 'BLOSUM80'; my $gapcost = 10; my $gapextend = 1; my $wordsize = 2; #my $compbasedStat = '1'; #my $count = 1; # my $strands = 1; my @params = ('program' => 'tblastn','database' => 'nt'); #my $progress_interval = 100; my $seqio_obj = Bio::SeqIO->new( -file => "aloneblosum62.txt", -format => "raw", ); # create factory object and set parameters my $factory = Bio::Tools::Run::StandAloneBlast->new(@params); print "submitted parameters successfully \n"; $factory->e($expect_value); #$factory->F($filter_query_sequence); $factory->v($one_line_description); $factory->b($alignments); $factory->M($matrix_STD); $factory->G($gapcost); $factory->E($gapextend); $factory->W($wordsize); #$factory->C($compbasedStat); #$factory->S($strands); print "changed parameters successfully \n"; print "\n"; # get query while ( my $query = $seqio_obj->next_seq) { print "entered while loop \n"; my $blast_report = $factory->blastall($query); # print "$blast_report\n"; $factory->outfile("nucleo80$count_STD.txt"); $count_STD++; print $query->seq; print "\n"; } thanks Hubert Torsten Seemann wrote: >Hubert > > > >>thanks for your quick reply, I have looked up at the ftp server and >>there are nt.00 to nt.04. Do I have to download all of them, are there >>differences? >> >> > >You have to download them all. The "nt" database (actually the index >files) is very big, and it is split up into gigabyte (?) parts. Although >they are called "nt.00" "nt.01" etc, you still pass "-d nt" to >"blastall", because together these parts are one "nt" database. The >"blastall" program will automatically use the separate parts; you do not >have to join them. > >You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are >using the correct BLAST search for your problem. > > > From torsten.seemann at infotech.monash.edu.au Mon Feb 6 21:17:40 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 07 Feb 2006 13:17:40 +1100 Subject: [Bioperl-l] no results with standalone tblastn In-Reply-To: <43E7D8B1.5030307@gmx.at> References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar> <43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au> <43E6CCAB.2060107@gmx.at> <43E6DD01.2010600@infotech.monash.edu.au> <43E7D8B1.5030307@gmx.at> Message-ID: <43E80344.5090207@infotech.monash.edu.au> > I have downloaded all the databases, as you recommended me. And it is > working, but I don't get any results, if I try it online it works fine. > my result file looks like that: > > TBLASTN 2.2.13 [Nov-27-2005] > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database search > programs", Nucleic Acids Res. 25:3389-3402. > Query= > (8 letters) > Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS, > GSS,environmental samples or phase 0, 1 or 2 HTGS sequences) > 3,749,503 sequences; 16,556,997,203 total letters > Searching..................................................done > Sequences producing significant alignments: Score > E (bits) Value Is your query only 8 amino acids long? This report looks like it did have alignments that were not displayed, otherwise it would print "**** No hits ****". This mailing list is not here to solve your BLAST problems unless it is a problem with the Perl module running BLAST. You first need to try and get your problem working on the command line *without* Perl. eg. /home/Hubert/blast/blast-2.2.13/bin/blastall -p tblastn -d nt -i YOUR_FASTA_FILE_WITH_SEQUENCE_IN_IT -o OUTPUT_FILE.txt -e 0.001 ... where "..." is the rest of the options you are setting in your Perl script. If it doesn't work that way, it will never work in Perl. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From rahall2 at ualr.edu Mon Feb 6 21:46:44 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Mon, 6 Feb 2006 20:46:44 -0600 Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please reply Message-ID: <002001c62b90$bb9dbe00$4301a8c0@LIBERAL> To everyone who uses RemoteBlast.pm: Would anyone object to RemoteBlast being rewritten in a way that requires NCBI's blastcl3 executable? Binary downloads of blastcl3 (column "netblast") are available for numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml Does anyone require or desire a "pure perl" implementation? If so, please explain the advantage you see with such an implementation. Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 From osborne1 at optonline.net Tue Feb 7 12:05:56 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 07 Feb 2006 12:05:56 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: Barry, If the sequence information is in one of the formats that Bioperl understands (Genbank, Swissprot flat, and so on) then the answer is yes. This assumes that the details on sequence that you mentioned are found in some sequence feature section in the file. But it looks to me like there's no specialized parser for miRNA sequence per se, I'll be corrected if I'm wrong. Brian O. On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" wrote: > Hi -- > > Are there any classes for manipulating miRNA's with functions such > as parsing the name, storing and interlinking pri/pre/mat sequences, etc? > > Thanks, > > Barry > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From barry.m.dancis at gsk.com Tue Feb 7 15:26:27 2006 From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com) Date: Tue, 7 Feb 2006 15:26:27 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: It's the parser in particular that I need "Brian Osborne" Sent by: bioperl-l-bounces at lists.open-bio.org 07-Feb-2006 12:05 To barry.m.dancis at gsk.com, "bioperl-l" , bioperl-l-bounces at lists.open-bio.org cc Subject Re: [Bioperl-l] Handling miRNA's Barry, If the sequence information is in one of the formats that Bioperl understands (Genbank, Swissprot flat, and so on) then the answer is yes. This assumes that the details on sequence that you mentioned are found in some sequence feature section in the file. But it looks to me like there's no specialized parser for miRNA sequence per se, I'll be corrected if I'm wrong. Brian O. On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" wrote: > Hi -- > > Are there any classes for manipulating miRNA's with functions such > as parsing the name, storing and interlinking pri/pre/mat sequences, etc? > > Thanks, > > Barry > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From deep.raman at gmail.com Tue Feb 7 15:16:48 2006 From: deep.raman at gmail.com (Raman Deep Singh) Date: Wed, 8 Feb 2006 01:46:48 +0530 Subject: [Bioperl-l] Needed help Message-ID: Hi all I have a huge task of retrieving a number of sequences from the swiss prot databases on some fixed criteria. FOr that i want to index the swiss prot database on my local disk. I have downloaded the whole swiss prot database on my local disc (the january 2006 release). I am currently using the bioperl on linux machine . I am using the code listed below ======================= use Bio::Index::Swissprot; my $Index_File_Name = shift; my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name, '-write_flag' => 'WRITE'); $inx->make_index(@ARGV); ----------------------------------------- # Print out several sequences present in the index # in gcg format use Bio::Index::Swissprot; use Bio::SeqIO; my $out = Bio::SeqIO->new( '-format' => 'gcg', '-fh' => \*STDOUT ); my $Index_File_Name = shift; my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name); foreach my $id (@ARGV) { my $seq = $inx->fetch($id); # Returns Bio::Seq object $out->write_seq($seq); } # alternatively my $seq1 = $inx->get_Seq_by_id($id); my $seq2 = $inx->get_Seq_by_acc($acc); -- ------------------------------- i am running teh script as perl getseqfromid.pl sample.dat from the shell and i am getting this error repeatedly ------------- EXCEPTION ------------- MSG: Can't open 'DB_File' dbm file 'swiss100.dat' : No such file or directory STACK Bio::Index::Abstract::open_dbm /usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:389 STACK Bio::Index::Abstract::new /usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150 STACK Bio::Index::AbstractSeq::new /usr/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91 STACK toplevel i.pl:6 -------------------------- At some place online, i also found some document that some variables need to be exported. I also did the same but still got teh same errors kindly help Ramandeep Singh From cjfields at uiuc.edu Tue Feb 7 17:40:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Feb 2006 16:40:15 -0600 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: <007701c62c37$7914af60$15327e82@pyrimidine> Are you talking about sequences or text output from a specific program? If you are talking about sequences in a particular format, then listen to Brian. If you are talking about output, then we need to know which program you're using, as a parser may exist or could be built. There are a few modules in Bio::Tools that handle RNA (like QRNA, tRNAscan-SE), so check those out first. I'm currently finishing up a Bio::Tools module for RNAMotif and have plans for making an ERPIN parser. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Tuesday, February 07, 2006 2:26 PM > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > Subject: Re: [Bioperl-l] Handling miRNA's > > It's the parser in particular that I need > > > > > "Brian Osborne" Sent by: > bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 12:05 > > To > barry.m.dancis at gsk.com, "bioperl-l" , > bioperl-l-bounces at lists.open-bio.org > cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Barry, > > If the sequence information is in one of the formats that > Bioperl understands (Genbank, Swissprot flat, and so on) then > the answer is yes. > This assumes that the details on sequence that you mentioned > are found in some sequence feature section in the file. But > it looks to me like there's no specialized parser for miRNA > sequence per se, I'll be corrected if I'm wrong. > > Brian O. > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > wrote: > > > Hi -- > > > > Are there any classes for manipulating miRNA's with > functions > such > > as parsing the name, storing and interlinking pri/pre/mat sequences, > etc? > > > > Thanks, > > > > Barry > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Feb 7 18:06:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Feb 2006 17:06:21 -0600 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: <000001c62c3b$1c6017b0$15327e82@pyrimidine> Sorry if this gets posted twice. Are you talking about sequences or text output from a specific program? If you are talking about sequences in a particular format, then Brian's right. If you are talking about output, then we need to know which program you're using, as a parser may exist, or prbably could be built from and existing one. There are a few modules in Bio::Tools that handle RNA (like QRNA, tRNAscan-SE), so check those out first. I'm currently finishing up a Bio::Tools module for RNAMotif and have plans for making an ERPIN parser. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Tuesday, February 07, 2006 2:26 PM > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > Subject: Re: [Bioperl-l] Handling miRNA's > > It's the parser in particular that I need > > > > > "Brian Osborne" Sent by: > bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 12:05 > > To > barry.m.dancis at gsk.com, "bioperl-l" , > bioperl-l-bounces at lists.open-bio.org > cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Barry, > > If the sequence information is in one of the formats that > Bioperl understands (Genbank, Swissprot flat, and so on) then > the answer is yes. > This assumes that the details on sequence that you mentioned > are found in some sequence feature section in the file. But > it looks to me like there's no specialized parser for miRNA > sequence per se, I'll be corrected if I'm wrong. > > Brian O. > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > wrote: > > > Hi -- > > > > Are there any classes for manipulating miRNA's with > functions > such > > as parsing the name, storing and interlinking pri/pre/mat sequences, > etc? > > > > Thanks, > > > > Barry > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From paul.boutros at utoronto.ca Tue Feb 7 20:38:42 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Tue, 7 Feb 2006 20:38:42 -0500 Subject: [Bioperl-l] (no subject) Message-ID: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> Hi Roger, I would definitely prefer a fully Perl-based implementation. For starters, I have not been successful in compiling the Toolkit that contains netblast for some platforms (e.g. AIX 5.2 w/gcc 4.0). I haven't been following the discussion: is there some compelling reason to prefer a netblast-based system that's come up recently? I'm guessing that adding a new non-perl dependency would only be done if there was considerable justification for this type of change, but I'm not clear from your message what that justification is. Paul ------------------------------ Message: 12 Date: Mon, 6 Feb 2006 20:46:44 -0600 From: "Roger Hall" Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please reply To: Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> Content-Type: text/plain; charset="us-ascii" To everyone who uses RemoteBlast.pm: Would anyone object to RemoteBlast being rewritten in a way that requires NCBI's blastcl3 executable? Binary downloads of blastcl3 (column "netblast") are available for numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml Does anyone require or desire a "pure perl" implementation? If so, please explain the advantage you see with such an implementation. Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 From cjfields at uiuc.edu Tue Feb 7 23:52:36 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 7 Feb 2006 22:52:36 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) Message-ID: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> I want to submit a module for parsing RNAMotif output (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning output and returning Bio::SeqFeature::Generic objects with added tags for descriptors/sequences/file info. I'm in the process of writing up tests and going through biodesign to make sure everything's kosher, but the module itself is essentially ready-to-go. What should I do next? Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From rahall2 at ualr.edu Wed Feb 8 00:16:44 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Tue, 7 Feb 2006 23:16:44 -0600 Subject: [Bioperl-l] RemoteBlast [was: (no subject)] In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> Message-ID: <004401c62c6e$da906a40$4301a8c0@LIBERAL> Paul, I think that most core Bioperl folks have long since moved away from RemoteBlast and are using the functionality in StandAloneBlast to run their own local servers. More importantly, they are, in general, researchers who are coming to Bioinformatics from the life sciences side, and are particularly tired of dealing with the technical issues that RemoteBlast consistently generates due to changes in the text-formatted BLAST reports. They aren't code-for-code-sake geeks like me. ;} When RemoteBlast was written, XML was barely on the technology radar, and XML-formatted BLAST reports weren't even available. It seems that everyone recognizes that the XML reports now generated by NCBI's blast server is the wave of the future, but I think there is still some concern that not every flavor of BLAST produces XML yet. Even so, the XML parser is considered to be very strong, and only helps hasten the end of text-formatted support, since parsing text-formatted reports is the primary source of pain. In discussing the shift from old to new, I think the idea of relying on NCBI's application (and NCBI's issue system and NCBI's developers) entered the realm of possibility, so as the guy who just showed up to adopt RemoteBlast, I am trying to air all options and beg for all requirements. Personally, I am okay with the idea of maintaining text-formatted report parsing, but like I said, I'm pound foolish about code sometimes. Additional foolishness arises from the fact that the first money I earned in Bioinformatics was on a contract gig where I relied on RemoteBlast (and the related text parsers). For my money, I just needed anyone, anywhere, to say they desired a pure perl implementation to meet my personal threshold. So far, you're the second. ;} I do, however, see the advantage in shifting to XML-formatted reporting and parsing *only* as soon as every BLAST flavor supports it, if not before. (Anyone - is this still an issue. Please educate me.) At the moment, I'm leaning towards adding an option to RemoteBlast. The default (no option) would use a "pure perl" implementation, and the enhancement (with explicit option) would merely wrap the NCBI executable. However, there are other issues (queuing, batches) that I don't fully understand in context, so I haven't zeroed in on a complete recommendation yet. Additionally, the end of text-formatted reports, while drawing near, is not yet agreed, although it is pretty clear that the only way text support will be continued is if I insist on it and then deliver the support myself. :} In any case, I am very interested in a pure perl implementation for exactly the two reasons stated thus far: it's one less thing for a newbie to worry about, and it will run on every platform that runs perl. Thanks much for the input! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Paul Boutros Sent: Tuesday, February 07, 2006 7:39 PM To: BioPerl Mailing List Cc: Roger Hall Subject: [Bioperl-l] (no subject) Hi Roger, I would definitely prefer a fully Perl-based implementation. For starters, I have not been successful in compiling the Toolkit that contains netblast for some platforms (e.g. AIX 5.2 w/gcc 4.0). I haven't been following the discussion: is there some compelling reason to prefer a netblast-based system that's come up recently? I'm guessing that adding a new non-perl dependency would only be done if there was considerable justification for this type of change, but I'm not clear from your message what that justification is. Paul ------------------------------ Message: 12 Date: Mon, 6 Feb 2006 20:46:44 -0600 From: "Roger Hall" Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please reply To: Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> Content-Type: text/plain; charset="us-ascii" To everyone who uses RemoteBlast.pm: Would anyone object to RemoteBlast being rewritten in a way that requires NCBI's blastcl3 executable? Binary downloads of blastcl3 (column "netblast") are available for numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml Does anyone require or desire a "pure perl" implementation? If so, please explain the advantage you see with such an implementation. Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Wed Feb 8 01:53:58 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 8 Feb 2006 08:53:58 +0200 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> Message-ID: <200602080853.58889.heikki@sanbi.ac.za> Chris, Post your files to bugzilla (ticket type enhancement, add files to ticket after creation) and someone with commit ability will add them to CVS once the code is in satisfactory condition. Thanks, -Heikki On Wednesday 08 February 2006 06:52, Chris Fields wrote: > I want to submit a module for parsing RNAMotif output > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > output and returning Bio::SeqFeature::Generic objects with added tags > for descriptors/sequences/file info. I'm in the process of writing > up tests and going through biodesign to make sure everything's > kosher, but the module itself is essentially ready-to-go. What > should I do next? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Wed Feb 8 00:48:40 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 7 Feb 2006 21:48:40 -0800 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> Message-ID: I presume you don't have a cvs write account yet - if you do just add and commit the module and test. Otherwise could you post the POD to the list please; either somebody with an account will hopefully volunteer or Jason or I or Heikki or Aaron will assume mentorship and commit the code with feedback to you. Unless you completely refuse to heed any and all advice ;) that person will then soon try to absolve him/herself of having to do this again for you and support you for receiving a cvs write account of your own. -hilmar On 2/7/06, Chris Fields wrote: > I want to submit a module for parsing RNAMotif output > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > output and returning Bio::SeqFeature::Generic objects with added tags > for descriptors/sequences/file info. I'm in the process of writing > up tests and going through biodesign to make sure everything's > kosher, but the module itself is essentially ready-to-go. What > should I do next? > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Wed Feb 8 07:57:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 06:57:46 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu> Message-ID: I'll probably goes with Heikki's advice and post the module (with POD, tests, and test file) to bugzilla as an enhancement. That way it can be looked through before committing. I will likely have a few more modules for ERPIN and maybe Infernal int he next few months (if I can get it up and running). Also, completely off-topic, I'll post what I have written up for installing bioperl-db on WinXP here soon. I think it should probably be included in the wiki in some way, maybe as a link from the bioperl- db wiki page. Thanks Hilmar, Heikki! Chris On Feb 7, 2006, at 11:48 PM, Hilmar Lapp wrote: > I presume you don't have a cvs write account yet - if you do just add > and commit the module and test. Otherwise could you post the POD to > the list please; either somebody with an account will hopefully > volunteer or Jason or I or Heikki or Aaron will assume mentorship and > commit the code with feedback to you. Unless you completely refuse to > heed any and all advice ;) that person will then soon try to absolve > him/herself of having to do this again for you and support you for > receiving a cvs write account of your own. > > -hilmar > > On 2/7/06, Chris Fields wrote: >> I want to submit a module for parsing RNAMotif output >> (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning >> output and returning Bio::SeqFeature::Generic objects with added tags >> for descriptors/sequences/file info. I'm in the process of writing >> up tests and going through biodesign to make sure everything's >> kosher, but the module itself is essentially ready-to-go. What >> should I do next? >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Wed Feb 8 10:32:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 09:32:25 -0600 Subject: [Bioperl-l] RemoteBlast [was: (no subject)] In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL> Message-ID: <000401c62cc4$de0cc9b0$15327e82@pyrimidine> Roger, It might be better to build a wrapper for the blastcl3 and make it a separate Bio::Tools::Run module, maybe branch it off from RemoteBlast or, better yet, StandAloneBlast. All the put/get parameters in the BEGIN{} block for RemoteBlast look like they are configured for NCBI's HTTP submission via CGI; I don't think you can use these for blastcl3. Ergo, you'll have to create a whole new set of hashes or parameter arrays inside RemoteBlast just for blastcl3 since everything is passed via command-line flags, like so (from http://www.ncbi.nlm.nih.gov/blast/docs/netblast.html): blastcl3 -p blastp -d nr -i MY_QUEYR -o MY_QUERY.out However, StandAloneBlast looks like it has all the parameters mapped out in the BEGIN{} block. And it looks like the command line options support just about everything you get via the web version. It probably wouldn't take much modification from StandAloneBlast to get it to run blastcl3. As for queueing, I don't think it's supported, though you can send in a FASTA file with multiple sequences for multiple BLAST queries (I tried this and it works). You could also create a queue using a sequence factory, sending them to the netblast client one at a time, though I'd suggest putting a delay in between cycles in that case so as not to make the guys at NCBI cranky. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Roger Hall > Sent: Tuesday, February 07, 2006 11:17 PM > To: Paul.Boutros at utoronto.ca; 'BioPerl Mailing List' > Subject: Re: [Bioperl-l] RemoteBlast [was: (no subject)] > > Paul, > > I think that most core Bioperl folks have long since moved > away from RemoteBlast and are using the functionality in > StandAloneBlast to run their own local servers. More > importantly, they are, in general, researchers who are coming > to Bioinformatics from the life sciences side, and are > particularly tired of dealing with the technical issues that > RemoteBlast consistently generates due to changes in the > text-formatted BLAST reports. > > They aren't code-for-code-sake geeks like me. ;} > > When RemoteBlast was written, XML was barely on the > technology radar, and XML-formatted BLAST reports weren't > even available. It seems that everyone recognizes that the > XML reports now generated by NCBI's blast server is the wave > of the future, but I think there is still some concern that > not every flavor of BLAST produces XML yet. Even so, the XML > parser is considered to be very strong, and only helps hasten > the end of text-formatted support, since parsing > text-formatted reports is the primary source of pain. > > In discussing the shift from old to new, I think the idea of > relying on NCBI's application (and NCBI's issue system and > NCBI's developers) entered the realm of possibility, so as > the guy who just showed up to adopt RemoteBlast, I am trying > to air all options and beg for all requirements. > > Personally, I am okay with the idea of maintaining > text-formatted report parsing, but like I said, I'm pound > foolish about code sometimes. Additional foolishness arises > from the fact that the first money I earned in Bioinformatics > was on a contract gig where I relied on RemoteBlast (and the > related text parsers). > > For my money, I just needed anyone, anywhere, to say they > desired a pure perl implementation to meet my personal > threshold. So far, you're the second. ;} > > I do, however, see the advantage in shifting to XML-formatted > reporting and parsing *only* as soon as every BLAST flavor > supports it, if not before. > (Anyone - is this still an issue. Please educate me.) > > At the moment, I'm leaning towards adding an option to > RemoteBlast. The default (no option) would use a "pure perl" > implementation, and the enhancement (with explicit option) > would merely wrap the NCBI executable. > However, there are other issues (queuing, batches) that I > don't fully understand in context, so I haven't zeroed in on > a complete recommendation yet. Additionally, the end of > text-formatted reports, while drawing near, is not yet > agreed, although it is pretty clear that the only way text > support will be continued is if I insist on it and then > deliver the support myself. > :} > > In any case, I am very interested in a pure perl > implementation for exactly the two reasons stated thus far: > it's one less thing for a newbie to worry about, and it will > run on every platform that runs perl. > > Thanks much for the input! > > Roger Hall > Technical Director > MidSouth Bioinformatics Center > University of Arkansas at Little Rock > (501) 569-8074 > > > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Paul Boutros > Sent: Tuesday, February 07, 2006 7:39 PM > To: BioPerl Mailing List > Cc: Roger Hall > Subject: [Bioperl-l] (no subject) > > Hi Roger, > > I would definitely prefer a fully Perl-based implementation. > For starters, I have not been successful in compiling the > Toolkit that contains netblast for some platforms (e.g. > AIX 5.2 w/gcc 4.0). > > I haven't been following the discussion: is there some > compelling reason to prefer a netblast-based system that's > come up recently? I'm guessing that adding a new non-perl > dependency would only be done if there was considerable > justification for this type of change, but I'm not clear from > your message what that justification is. > > Paul > > > > ------------------------------ > > Message: 12 > Date: Mon, 6 Feb 2006 20:46:44 -0600 > From: "Roger Hall" > Subject: [Bioperl-l] RemoteBlast users - potentially major changes - > please reply > To: > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> > Content-Type: text/plain; charset="us-ascii" > > To everyone who uses RemoteBlast.pm: > > Would anyone object to RemoteBlast being rewritten in a way > that requires NCBI's blastcl3 executable? > > Binary downloads of blastcl3 (column "netblast") are > available for numerous platforms at: > http://ncbi.nih.gov/BLAST/download.shtml > > Does anyone require or desire a "pure perl" implementation? > If so, please explain the advantage you see with such an > implementation. > > Thanks! > > > Roger Hall > > Technical Director > > MidSouth Bioinformatics Center > > University of Arkansas at Little Rock > > (501) 569-8074 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Wed Feb 8 15:51:41 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Wed, 08 Feb 2006 14:51:41 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output Message-ID: <43EA59DD.1030608@gmx.at> Hi, If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO, I get the following error message: MSG: no data for midline Query 1 WWWKWRW 7 STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 STACK toplevel /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 is that a bug...... If I want to parse Blast Output (version 2.2.13), I don't get anything..... I'm using bioperl 1.4 before, I have installed bioperl 1.4, it worked fine parsing Blast Output (version 2.2.12), but I don't remember which bioperl version I had installed thanks in advance Hubert From cjfields at uiuc.edu Wed Feb 8 17:15:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 16:15:23 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA59DD.1030608@gmx.at> Message-ID: <001101c62cfd$28605df0$15327e82@pyrimidine> My guess is you're running into text parsing problems in Bio::SearchIO::blast. Upgrade to the latest developer version (1.5.1) or bioperl-live (CVS), then see the bug below. http://bugzilla.bioperl.org/show_bug.cgi?id=1934 I think the first problem you ran into is solved in bioperl 1.5.1, the last problem (more recent, not related to the first) has been fixed but hasn't been committed to bioperl-live yet. The fixed SearchIO::blast is available in the link above, but realize it hasn't been committed yet and may change. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Hubert Prielinger > Sent: Wednesday, February 08, 2006 2:52 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > Hi, > If I want to parse a Blast Output (Version 2.2.12) with > Bio::SearchIO, I get the following error message: > > MSG: no data for midline Query 1 WWWKWRW 7 > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > STACK toplevel > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > is that a bug...... > > If I want to parse Blast Output (version 2.2.13), I don't get > anything..... > I'm using bioperl 1.4 > > before, I have installed bioperl 1.4, it worked fine parsing > Blast Output (version 2.2.12), but I don't remember which > bioperl version I had installed > > thanks in advance > > Hubert > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Wed Feb 8 16:41:04 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Wed, 08 Feb 2006 15:41:04 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <001101c62cfd$28605df0$15327e82@pyrimidine> References: <001101c62cfd$28605df0$15327e82@pyrimidine> Message-ID: <43EA6570.9070909@gmx.at> hi chris, thanks, I have upgraded to version 1.5.1 but it isn't still working, do you have any ohter idea, the problem I have is that I have to parse a lot of textfiles.... or shall I look for another option to parse those files... regards Hubert Chris Fields wrote: >My guess is you're running into text parsing problems in >Bio::SearchIO::blast. Upgrade to the latest developer version (1.5.1) or >bioperl-live (CVS), then see the bug below. > >http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >I think the first problem you ran into is solved in bioperl 1.5.1, the last >problem (more recent, not related to the first) has been fixed but hasn't >been committed to bioperl-live yet. The fixed SearchIO::blast is available >in the link above, but realize it hasn't been committed yet and may change. > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>Hubert Prielinger >>Sent: Wednesday, February 08, 2006 2:52 PM >>To: bioperl-l at bioperl.org >>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>Hi, >>If I want to parse a Blast Output (Version 2.2.12) with >>Bio::SearchIO, I get the following error message: >> >>MSG: no data for midline Query 1 WWWKWRW 7 >>STACK Bio::SearchIO::blast::next_result >>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>STACK toplevel >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >>is that a bug...... >> >>If I want to parse Blast Output (version 2.2.13), I don't get >>anything..... >>I'm using bioperl 1.4 >> >>before, I have installed bioperl 1.4, it worked fine parsing >>Blast Output (version 2.2.12), but I don't remember which >>bioperl version I had installed >> >>thanks in advance >> >>Hubert >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > > > From cjfields at uiuc.edu Wed Feb 8 18:00:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 17:00:21 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA6570.9070909@gmx.at> Message-ID: <001201c62d03$703178c0$15327e82@pyrimidine> Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not just the modules you want; mixing bioperl versions might work, but you might run into interoperability problems). Then replace the Bio::SearchIO::blast with the one in Bugzilla. The 'other option' you mentioned might be trying XML instead of text, which is more stable in the long run. You will still need to run a full upgrade to bioperl 1.5.1 for that; make sure you read this: http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast If you're using SearchIO directly instead of Remoteblast, you should be able to set the '-readmethod' flag to 'blastxml'. It also wouldn't hurt to know what OS you're using or see some code. Roger is out there somewhere (I think) and may also have some input. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] > Sent: Wednesday, February 08, 2006 3:41 PM > To: Chris Fields; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > hi chris, > thanks, I have upgraded to version 1.5.1 but it isn't still > working, do you have any ohter idea, the problem I have is > that I have to parse a lot of textfiles.... > or shall I look for another option to parse those files... > > regards > Hubert > > > > Chris Fields wrote: > > >My guess is you're running into text parsing problems in > >Bio::SearchIO::blast. Upgrade to the latest developer > version (1.5.1) > >or bioperl-live (CVS), then see the bug below. > > > >http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > >I think the first problem you ran into is solved in bioperl > 1.5.1, the > >last problem (more recent, not related to the first) has > been fixed but > >hasn't been committed to bioperl-live yet. The fixed > SearchIO::blast > >is available in the link above, but realize it hasn't been > committed yet and may change. > > > >Christopher Fields > >Postdoctoral Researcher - Switzer Lab > >Dept. of Biochemistry > >University of Illinois Urbana-Champaign > > > > > > > >>-----Original Message----- > >>From: bioperl-l-bounces at lists.open-bio.org > >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>Prielinger > >>Sent: Wednesday, February 08, 2006 2:52 PM > >>To: bioperl-l at bioperl.org > >>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast > >>output > >> > >>Hi, > >>If I want to parse a Blast Output (Version 2.2.12) with > Bio::SearchIO, > >>I get the following error message: > >> > >>MSG: no data for midline Query 1 WWWKWRW 7 > >>STACK Bio::SearchIO::blast::next_result > >>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>STACK toplevel > >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >> > >>is that a bug...... > >> > >>If I want to parse Blast Output (version 2.2.13), I don't get > >>anything..... > >>I'm using bioperl 1.4 > >> > >>before, I have installed bioperl 1.4, it worked fine parsing Blast > >>Output (version 2.2.12), but I don't remember which bioperl > version I > >>had installed > >> > >>thanks in advance > >> > >>Hubert > >> > >> > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > > > > > > From hubert.prielinger at gmx.at Wed Feb 8 17:22:44 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Wed, 08 Feb 2006 16:22:44 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <001201c62d03$703178c0$15327e82@pyrimidine> References: <001201c62d03$703178c0$15327e82@pyrimidine> Message-ID: <43EA6F34.4090007@gmx.at> hi, I have installed from the following page: http://news.open-bio.org/archives/2005_10.html, the Core, Run and Ext. I'm using only the SearchIO without remoteblast module, because I have already all my Blast output files. My operating system is fedora core 9. Code: #!/usr/bin/perl -w use Bio::SearchIO; print "start program\n"; my $directory = "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { print "read file\n"; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); my $cutoff_len = 10; #iterate over each query sequence while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { # Print some tab-delimited data about this HSP open (bigShot, ">>BlastOutputTrial.txt") || die ("Could not open file. $!"); #print $result->query_name, "\t"; # print $hit->significance, "\t"; print bigShot $hit->name, "-->"; print bigShot $hit->description, "\n"; #print bigShot "Query: ", $hsp->start('query'), " ", $hsp->query_string, " ", $hsp->end('query'), "\n"; print bigShot "Seq: ", $hsp->start('hit'), " ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; # print $hsp->rank, "\t"; # print $hsp->percent_identity, "\t"; # print $hsp->evalue, "\t"; # print $hsp->hsp_length, "\n"; close (bigShot); }; } } } } } } closedir(DIR); Chris Fields wrote: >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not >just the modules you want; mixing bioperl versions might work, but you might >run into interoperability problems). Then replace the Bio::SearchIO::blast >with the one in Bugzilla. The 'other option' you mentioned might be trying >XML instead of text, which is more stable in the long run. You will still >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read >this: > >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast > >If you're using SearchIO directly instead of Remoteblast, you should be able >to set the '-readmethod' flag to 'blastxml'. > >It also wouldn't hurt to know what OS you're using or see some code. Roger >is out there somewhere (I think) and may also have some input. > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] >>Sent: Wednesday, February 08, 2006 3:41 PM >>To: Chris Fields; bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>hi chris, >>thanks, I have upgraded to version 1.5.1 but it isn't still >>working, do you have any ohter idea, the problem I have is >>that I have to parse a lot of textfiles.... >>or shall I look for another option to parse those files... >> >>regards >>Hubert >> >> >> >>Chris Fields wrote: >> >> >> >>>My guess is you're running into text parsing problems in >>>Bio::SearchIO::blast. Upgrade to the latest developer >>> >>> >>version (1.5.1) >> >> >>>or bioperl-live (CVS), then see the bug below. >>> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>> >>>I think the first problem you ran into is solved in bioperl >>> >>> >>1.5.1, the >> >> >>>last problem (more recent, not related to the first) has >>> >>> >>been fixed but >> >> >>>hasn't been committed to bioperl-live yet. The fixed >>> >>> >>SearchIO::blast >> >> >>>is available in the link above, but realize it hasn't been >>> >>> >>committed yet and may change. >> >> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>Prielinger >>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>To: bioperl-l at bioperl.org >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> >>>> >>parsing Blast >> >> >>>>output >>>> >>>>Hi, >>>>If I want to parse a Blast Output (Version 2.2.12) with >>>> >>>> >>Bio::SearchIO, >> >> >>>>I get the following error message: >>>> >>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>STACK Bio::SearchIO::blast::next_result >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>STACK toplevel >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>>is that a bug...... >>>> >>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>anything..... >>>>I'm using bioperl 1.4 >>>> >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast >>>>Output (version 2.2.12), but I don't remember which bioperl >>>> >>>> >>version I >> >> >>>>had installed >>>> >>>>thanks in advance >>>> >>>>Hubert >>>> >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> >>> >>> >>> > > > > From rahall2 at ualr.edu Wed Feb 8 18:34:45 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Wed, 8 Feb 2006 17:34:45 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA6F34.4090007@gmx.at> Message-ID: <000401c62d08$3ede6b70$4301a8c0@LIBERAL> Hubert, Give me a bit to look over your code and think this through. I am still re-familiarizing myself with the relevant modules, so I can't give an answer off the top of my head. Also, please send me one or more of your blast reports (zipped) if you don't mind (and maybe avoid including the list in your reply). Let's take this "offline" relative to the list - we'll include the list again if there is a Bioperl issue and solution. (In case you are concerned at all, I promise not to share or study the actual BLAST results.) I'm not particularly familiar with the Fedora distributions, but I'm sure I can either chase down the perl problem or at least eliminate everything else but Fedora as the culprit. ;} (Chris - I'm not quite paying attention on an hourly basis yet, but I do intend to help support these issues for the foreseeable future. Thanks as always for the assist.) Thanks! Roger Hall Technical Director MidSouth Bioinformatics Center University of Arkansas at Little Rock (501) 569-8074 -----Original Message----- From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] Sent: Wednesday, February 08, 2006 4:23 PM To: Chris Fields; bioperl-l at bioperl.org; rahall2 at ualr.edu Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output hi, I have installed from the following page: http://news.open-bio.org/archives/2005_10.html, the Core, Run and Ext. I'm using only the SearchIO without remoteblast module, because I have already all my Blast output files. My operating system is fedora core 9. Code: #!/usr/bin/perl -w use Bio::SearchIO; print "start program\n"; my $directory = "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; opendir(DIR, $directory) || die("Cannot open directory"); print "opened directory\n"; foreach my $file (readdir(DIR)) { print "read file\n"; my $search = new Bio::SearchIO (-format => 'blast', -file => $file); my $cutoff_len = 10; #iterate over each query sequence while (my $result = $search->next_result) { print "entered 1st while loop\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { if ($hsp->length('sbjct') <= $cutoff_len) { #print $hsp->hit_string, "\n"; for ($hsp->hit_string) { if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { # Print some tab-delimited data about this HSP open (bigShot, ">>BlastOutputTrial.txt") || die ("Could not open file. $!"); #print $result->query_name, "\t"; # print $hit->significance, "\t"; print bigShot $hit->name, "-->"; print bigShot $hit->description, "\n"; #print bigShot "Query: ", $hsp->start('query'), " ", $hsp->query_string, " ", $hsp->end('query'), "\n"; print bigShot "Seq: ", $hsp->start('hit'), " ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; # print $hsp->rank, "\t"; # print $hsp->percent_identity, "\t"; # print $hsp->evalue, "\t"; # print $hsp->hsp_length, "\n"; close (bigShot); }; } } } } } } closedir(DIR); Chris Fields wrote: >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not >just the modules you want; mixing bioperl versions might work, but you might >run into interoperability problems). Then replace the Bio::SearchIO::blast >with the one in Bugzilla. The 'other option' you mentioned might be trying >XML instead of text, which is more stable in the long run. You will still >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read >this: > >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast > >If you're using SearchIO directly instead of Remoteblast, you should be able >to set the '-readmethod' flag to 'blastxml'. > >It also wouldn't hurt to know what OS you're using or see some code. Roger >is out there somewhere (I think) and may also have some input. > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] >>Sent: Wednesday, February 08, 2006 3:41 PM >>To: Chris Fields; bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>hi chris, >>thanks, I have upgraded to version 1.5.1 but it isn't still >>working, do you have any ohter idea, the problem I have is >>that I have to parse a lot of textfiles.... >>or shall I look for another option to parse those files... >> >>regards >>Hubert >> >> >> >>Chris Fields wrote: >> >> >> >>>My guess is you're running into text parsing problems in >>>Bio::SearchIO::blast. Upgrade to the latest developer >>> >>> >>version (1.5.1) >> >> >>>or bioperl-live (CVS), then see the bug below. >>> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>> >>>I think the first problem you ran into is solved in bioperl >>> >>> >>1.5.1, the >> >> >>>last problem (more recent, not related to the first) has >>> >>> >>been fixed but >> >> >>>hasn't been committed to bioperl-live yet. The fixed >>> >>> >>SearchIO::blast >> >> >>>is available in the link above, but realize it hasn't been >>> >>> >>committed yet and may change. >> >> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>Prielinger >>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>To: bioperl-l at bioperl.org >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> >>>> >>parsing Blast >> >> >>>>output >>>> >>>>Hi, >>>>If I want to parse a Blast Output (Version 2.2.12) with >>>> >>>> >>Bio::SearchIO, >> >> >>>>I get the following error message: >>>> >>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>STACK Bio::SearchIO::blast::next_result >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>STACK toplevel >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>>is that a bug...... >>>> >>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>anything..... >>>>I'm using bioperl 1.4 >>>> >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast >>>>Output (version 2.2.12), but I don't remember which bioperl >>>> >>>> >>version I >> >> >>>>had installed >>>> >>>>thanks in advance >>>> >>>>Hubert >>>> >>>> >>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> >>> >>> >>> > > > > From injunjoel at hotmail.com Wed Feb 8 19:54:26 2006 From: injunjoel at hotmail.com (Joel Steele) Date: Wed, 08 Feb 2006 16:54:26 -0800 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blastoutput In-Reply-To: <43EA6F34.4090007@gmx.at> Message-ID: Greetings, Im not well versed in Bio::SearchIO but there are a few comments about your code that may or may not be relevant... first thing: =-=-=-=-=code snippet=-=-=-=-= #!/usr/bin/perl -w use strict; #save yourself the headaches and force yourself to write clean code. =-=-=-=-=code snippet=-=-=-=-= next thing: when you are reading the files from the directory you are not doing any sort of filtering as to what is returned. If you are on a Unix flavored system you may be getting the '.' and '..' entries from your readdir(DIR) call. I would suggest placing a grep in there somewhere to get only blast files. something like: =-=-=-=-=code snippet=-=-=-=-= #assuming the file extension for blast files is .bls #the -e and -f are filetests; you could probably get away with just #-f. Here is a link for reference on the filetests available in Perl. # # http://www.perlmonks.org/?node_id=370 my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR); closedir(DIR); #then proceed with your foreach but over @files_to_parse foreach my $file(@files_to_parse){ #do cool stuff here... } =-=-=-=-=code snippet=-=-=-=-= Hope that helps. -Joel Steele "The surest way to corrupt a youth is to instruct him to hold in higher regard those who think alike than those who think differently." -Nietzsche "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo >From: Hubert Prielinger >To: Chris Fields , bioperl-l at bioperl.org, >rahall2 at ualr.edu >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >Blastoutput >Date: Wed, 08 Feb 2006 16:22:44 -0600 >MIME-Version: 1.0 >Received: from newportal.open-bio.org ([209.59.5.172]) by >bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Wed, 8 >Feb 2006 15:21:55 -0800 >Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by >newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k18NKjCX009295;Wed, 8 >Feb 2006 18:20:53 -0500 >Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by >newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for >; Wed, 8 Feb 2006 18:20:43 -0500 >Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000 >Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >[136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006 00:19:21 >+0100 >X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY= >X-Authenticated: #16854991 >User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >X-Accept-Language: en-us, en >References: <001201c62d03$703178c0$15327e82 at pyrimidine> >X-Y-GMX-Trusted: 0 >X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 >(newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21 -0500 (EST) >X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Wed, 08 >Feb 2006 18:20:43 -0500 (EST) >X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >X-Scanned-By: MIMEDefang 2.52 >X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >X-BeenThere: bioperl-l at lists.open-bio.org >X-Mailman-Version: 2.1.7 >Precedence: list >List-Id: Bioperl Project Discussion List >List-Unsubscribe: >, >List-Archive: >List-Post: >List-Help: >List-Subscribe: >, >Errors-To: bioperl-l-bounces at lists.open-bio.org >Return-Path: bioperl-l-bounces at lists.open-bio.org >X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC) >FILETIME=[7419CF20:01C62D06] > >hi, >I have installed from the following page: >http://news.open-bio.org/archives/2005_10.html, the Core, Run and Ext. >I'm using only the SearchIO without remoteblast module, because I have >already all my Blast output files. >My operating system is fedora core 9. > >Code: > >#!/usr/bin/perl -w > >use Bio::SearchIO; > >print "start program\n"; >my $directory = >"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; >opendir(DIR, $directory) || die("Cannot open directory"); >print "opened directory\n"; > >foreach my $file (readdir(DIR)) { >print "read file\n"; > >my $search = new Bio::SearchIO (-format => 'blast', > -file => $file); > >my $cutoff_len = 10; > > > >#iterate over each query sequence >while (my $result = $search->next_result) { >print "entered 1st while loop\n"; > > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > > if ($hsp->length('sbjct') <= $cutoff_len) { > #print $hsp->hit_string, "\n"; > for ($hsp->hit_string) { > > > if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || >tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { > > # Print some tab-delimited data about this HSP > > open (bigShot, ">>BlastOutputTrial.txt") || >die ("Could not open file. $!"); > #print $result->query_name, "\t"; > ># print $hit->significance, "\t"; > print bigShot $hit->name, "-->"; > print bigShot $hit->description, "\n"; > #print bigShot "Query: ", >$hsp->start('query'), " ", $hsp->query_string, " ", >$hsp->end('query'), "\n"; > print bigShot "Seq: ", $hsp->start('hit'), >" ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; > ># print $hsp->rank, "\t"; ># print $hsp->percent_identity, "\t"; ># print $hsp->evalue, "\t"; ># print $hsp->hsp_length, "\n"; > > close (bigShot); > > }; > > > } > } > } > } >} > >} > >closedir(DIR); > > >Chris Fields wrote: > > >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live >(not > >just the modules you want; mixing bioperl versions might work, but you >might > >run into interoperability problems). Then replace the >Bio::SearchIO::blast > >with the one in Bugzilla. The 'other option' you mentioned might be >trying > >XML instead of text, which is more stable in the long run. You will >still > >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read > >this: > > > >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast > > > >If you're using SearchIO directly instead of Remoteblast, you should be >able > >to set the '-readmethod' flag to 'blastxml'. > > > >It also wouldn't hurt to know what OS you're using or see some code. >Roger > >is out there somewhere (I think) and may also have some input. > > > >Christopher Fields > >Postdoctoral Researcher - Switzer Lab > >Dept. of Biochemistry > >University of Illinois Urbana-Champaign > > > > > > > >>-----Original Message----- > >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] > >>Sent: Wednesday, February 08, 2006 3:41 PM > >>To: Chris Fields; bioperl-l at bioperl.org > >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>parsing Blast output > >> > >>hi chris, > >>thanks, I have upgraded to version 1.5.1 but it isn't still > >>working, do you have any ohter idea, the problem I have is > >>that I have to parse a lot of textfiles.... > >>or shall I look for another option to parse those files... > >> > >>regards > >>Hubert > >> > >> > >> > >>Chris Fields wrote: > >> > >> > >> > >>>My guess is you're running into text parsing problems in > >>>Bio::SearchIO::blast. Upgrade to the latest developer > >>> > >>> > >>version (1.5.1) > >> > >> > >>>or bioperl-live (CVS), then see the bug below. > >>> > >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>> > >>>I think the first problem you ran into is solved in bioperl > >>> > >>> > >>1.5.1, the > >> > >> > >>>last problem (more recent, not related to the first) has > >>> > >>> > >>been fixed but > >> > >> > >>>hasn't been committed to bioperl-live yet. The fixed > >>> > >>> > >>SearchIO::blast > >> > >> > >>>is available in the link above, but realize it hasn't been > >>> > >>> > >>committed yet and may change. > >> > >> > >>>Christopher Fields > >>>Postdoctoral Researcher - Switzer Lab > >>>Dept. of Biochemistry > >>>University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> > >>> > >>>>-----Original Message----- > >>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>>>Prielinger > >>>>Sent: Wednesday, February 08, 2006 2:52 PM > >>>>To: bioperl-l at bioperl.org > >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>> > >>>> > >>parsing Blast > >> > >> > >>>>output > >>>> > >>>>Hi, > >>>>If I want to parse a Blast Output (Version 2.2.12) with > >>>> > >>>> > >>Bio::SearchIO, > >> > >> > >>>>I get the following error message: > >>>> > >>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>STACK Bio::SearchIO::blast::next_result > >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>STACK toplevel > >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >>>> > >>>>is that a bug...... > >>>> > >>>>If I want to parse Blast Output (version 2.2.13), I don't get > >>>>anything..... > >>>>I'm using bioperl 1.4 > >>>> > >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast > >>>>Output (version 2.2.12), but I don't remember which bioperl > >>>> > >>>> > >>version I > >> > >> > >>>>had installed > >>>> > >>>>thanks in advance > >>>> > >>>>Hubert > >>>> > >>>> > >>>> > >>>>_______________________________________________ > >>>>Bioperl-l mailing list > >>>>Bioperl-l at lists.open-bio.org > >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>>> > >>>> > >>>> > >>> > >>> > >>> > >>> > > > > > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From saldroubi at yahoo.com Wed Feb 8 20:12:16 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Wed, 8 Feb 2006 17:12:16 -0800 (PST) Subject: [Bioperl-l] Documentation link? Message-ID: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com> All, Forgive me but I don't see the documentation link on the new website. I only see a link to the HOWTO's. I think I am looking for the Pdoc link. Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From saldroubi at yahoo.com Wed Feb 8 20:24:23 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Wed, 8 Feb 2006 17:24:23 -0800 (PST) Subject: [Bioperl-l] Count or weight matrix in bioperl? Message-ID: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com> All, Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? Please excuse my lack of knowledge as I am a new comer to bioinformatics. Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From osborne1 at optonline.net Wed Feb 8 20:44:56 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 08 Feb 2006 20:44:56 -0500 Subject: [Bioperl-l] Documentation link? In-Reply-To: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com> Message-ID: Sam, http://bioperl.open-bio.org/wiki/Main_Page Look for the API Docs under "main links". Brian O. On 2/8/06 8:12 PM, "Sam Al-Droubi" wrote: > All, > > Forgive me but I don't see the documentation link on the new website. I > only see a link to the HOWTO's. I think I am looking for the Pdoc link. > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From torsten.seemann at infotech.monash.edu.au Wed Feb 8 21:54:39 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 09 Feb 2006 13:54:39 +1100 Subject: [Bioperl-l] Count or weight matrix in bioperl? In-Reply-To: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com> References: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com> Message-ID: <43EAAEEF.3000304@infotech.monash.edu.au> > Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? > Please excuse my lack of knowledge as I am a new comer to bioinformatics. Use the Bio::Tools::SeqStats module. The PDoc documentation even has an example similar to what you want to do: http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html --Torsten Seemann From cjfields at uiuc.edu Thu Feb 9 00:07:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 8 Feb 2006 23:07:15 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blastoutput In-Reply-To: References: Message-ID: On Feb 8, 2006, at 6:54 PM, Joel Steele wrote: > Greetings, > Im not well versed in Bio::SearchIO but there are a few comments > about your > code that may or may not be relevant... > > first thing: > > =-=-=-=-=code snippet=-=-=-=-= > > #!/usr/bin/perl -w > use strict; #save yourself the headaches and force yourself to > write clean > code. > > =-=-=-=-=code snippet=-=-=-=-= > Tread very carefully here. Just about every book on perl suggests 'use strict' and adding warnings for code development (ex. the Camel, the Llama, and others); in fact, these are the very books most beginners start from. Some would consider NOT using -w or 'use strict' a bad habit; everybody has an opinion (I would repeat an oft- heard Texas saying, but I'll refrain). Just remember: try to be a little more constructive in your critique and insert a little less about your personal coding style. If you hit the wrong person, you might get flamed. Here's a link that may help a bit here: http://bioperl.org/Core/Latest/ biodesign.html#respect_people_s_code__in_particular_if_it_works_ > next thing: > when you are reading the files from the directory you are not doing > any sort > of filtering as to what is returned. If you are on a Unix flavored > system > you may be getting the '.' and '..' entries from your readdir(DIR) > call. I > would suggest placing a grep in there somewhere to get only blast > files. > something like: > I agree here. You could probably also use something like File::Find here to make things a bit easier with the file names as well; works wonderfully, esp. when traversing a directory tree. > =-=-=-=-=code snippet=-=-=-=-= > > #assuming the file extension for blast files is .bls > #the -e and -f are filetests; you could probably get away with just > #-f. Here is a link for reference on the filetests available in Perl. > # > # http://www.perlmonks.org/?node_id=370 > > my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR); > closedir(DIR); > > #then proceed with your foreach but over @files_to_parse > > foreach my $file(@files_to_parse){ > #do cool stuff here... > } > Again, agreed. But, does it really solve the main problem, which is an issue with SearchIO::blast? It seemed to try parsing a blast file... > =-=-=-=-=code snippet=-=-=-=-= > > Hope that helps. > -Joel Steele > > > "The surest way to corrupt a youth is to instruct him to hold in > higher > regard those who think alike than those who think differently." - > Nietzsche > > "I do not feel obliged to believe that the same God who endowed us > with > sense, reason and intellect has intended us to forego their use." - > Galileo > > > > >> From: Hubert Prielinger >> To: Chris Fields , bioperl-l at bioperl.org, >> rahall2 at ualr.edu >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >> Blastoutput >> Date: Wed, 08 Feb 2006 16:22:44 -0600 >> MIME-Version: 1.0 >> Received: from newportal.open-bio.org ([209.59.5.172]) by >> bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC >> (6.0.3790.211); Wed, 8 >> Feb 2006 15:21:55 -0800 >> Received: from newportal.open-bio.org (localhost.localdomain >> [127.0.0.1])by >> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id >> k18NKjCX009295;Wed, 8 >> Feb 2006 18:20:53 -0500 >> Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by >> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for >> ; Wed, 8 Feb 2006 18:20:43 -0500 >> Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000 >> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >> [136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006 >> 00:19:21 >> +0100 >> X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY= >> X-Authenticated: #16854991 >> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >> X-Accept-Language: en-us, en >> References: <001201c62d03$703178c0$15327e82 at pyrimidine> >> X-Y-GMX-Trusted: 0 >> X-Greylist: Sender IP whitelisted, not delayed by milter- >> greylist-2.0.2 >> (newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21 >> -0500 (EST) >> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >> bymilter-greylist-2.0.2 (newportal.open-bio.org >> [207.154.17.70]);Wed, 08 >> Feb 2006 18:20:43 -0500 (EST) >> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >> X-Scanned-By: MIMEDefang 2.52 >> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >> X-BeenThere: bioperl-l at lists.open-bio.org >> X-Mailman-Version: 2.1.7 >> Precedence: list >> List-Id: Bioperl Project Discussion List > bio.org> >> List-Unsubscribe: >> > l>, >> List-Archive: >> List-Post: >> List-Help: >> List-Subscribe: >> > l>, >> Errors-To: bioperl-l-bounces at lists.open-bio.org >> Return-Path: bioperl-l-bounces at lists.open-bio.org >> X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC) >> FILETIME=[7419CF20:01C62D06] >> >> hi, >> I have installed from the following page: >> http://news.open-bio.org/archives/2005_10.html, the Core, Run and >> Ext. >> I'm using only the SearchIO without remoteblast module, because I >> have >> already all my Blast output files. >> My operating system is fedora core 9. >> >> Code: >> >> #!/usr/bin/perl -w >> >> use Bio::SearchIO; >> >> print "start program\n"; >> my $directory = >> "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4"; >> opendir(DIR, $directory) || die("Cannot open directory"); >> print "opened directory\n"; >> >> foreach my $file (readdir(DIR)) { >> print "read file\n"; >> >> my $search = new Bio::SearchIO (-format => 'blast', >> -file => $file); >> >> my $cutoff_len = 10; >> >> >> >> #iterate over each query sequence >> while (my $result = $search->next_result) { >> print "entered 1st while loop\n"; >> >> #iterate over each hit on the query sequence >> while (my $hit = $result->next_hit) { >> >> #iterate over each HSP in the hit >> while (my $hsp = $hit->next_hsp) { >> >> if ($hsp->length('sbjct') <= $cutoff_len) { >> #print $hsp->hit_string, "\n"; >> for ($hsp->hit_string) { >> >> >> if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || >> tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { >> >> # Print some tab-delimited data about this >> HSP >> >> open (bigShot, >> ">>BlastOutputTrial.txt") || >> die ("Could not open file. $!"); >> #print $result->query_name, "\t"; >> >> # print $hit->significance, "\t"; >> print bigShot $hit->name, "-->"; >> print bigShot $hit->description, "\n"; >> #print bigShot "Query: ", >> $hsp->start('query'), " ", $hsp->query_string, " ", >> $hsp->end('query'), "\n"; >> print bigShot "Seq: ", $hsp->start >> ('hit'), >> " ", $hsp->hit_string, " ", $hsp->end('hit'), "\n"; >> >> # print $hsp->rank, "\t"; >> # print $hsp->percent_identity, "\t"; >> # print $hsp->evalue, "\t"; >> # print $hsp->hsp_length, "\n"; >> >> close (bigShot); >> >> }; >> >> >> } >> } >> } >> } >> } >> >> } >> >> closedir(DIR); >> >> >> Chris Fields wrote: >> >>> Make sure you ran a full installation of bioperl-1.5.1 or bioperl- >>> live >> (not >>> just the modules you want; mixing bioperl versions might work, >>> but you >> might >>> run into interoperability problems). Then replace the >> Bio::SearchIO::blast >>> with the one in Bugzilla. The 'other option' you mentioned might be >> trying >>> XML instead of text, which is more stable in the long run. You will >> still >>> need to run a full upgrade to bioperl 1.5.1 for that; make sure >>> you read >>> this: >>> >>> http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast >>> >>> If you're using SearchIO directly instead of Remoteblast, you >>> should be >> able >>> to set the '-readmethod' flag to 'blastxml'. >>> >>> It also wouldn't hurt to know what OS you're using or see some code. >> Roger >>> is out there somewhere (I think) and may also have some input. >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>>> -----Original Message----- >>>> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] >>>> Sent: Wednesday, February 08, 2006 3:41 PM >>>> To: Chris Fields; bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> parsing Blast output >>>> >>>> hi chris, >>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>> working, do you have any ohter idea, the problem I have is >>>> that I have to parse a lot of textfiles.... >>>> or shall I look for another option to parse those files... >>>> >>>> regards >>>> Hubert >>>> >>>> >>>> >>>> Chris Fields wrote: >>>> >>>> >>>> >>>>> My guess is you're running into text parsing problems in >>>>> Bio::SearchIO::blast. Upgrade to the latest developer >>>>> >>>>> >>>> version (1.5.1) >>>> >>>> >>>>> or bioperl-live (CVS), then see the bug below. >>>>> >>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>> I think the first problem you ran into is solved in bioperl >>>>> >>>>> >>>> 1.5.1, the >>>> >>>> >>>>> last problem (more recent, not related to the first) has >>>>> >>>>> >>>> been fixed but >>>> >>>> >>>>> hasn't been committed to bioperl-live yet. The fixed >>>>> >>>>> >>>> SearchIO::blast >>>> >>>> >>>>> is available in the link above, but realize it hasn't been >>>>> >>>>> >>>> committed yet and may change. >>>> >>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher - Switzer Lab >>>>> Dept. of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>> Prielinger >>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>> To: bioperl-l at bioperl.org >>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>> >>>>>> >>>> parsing Blast >>>> >>>> >>>>>> output >>>>>> >>>>>> Hi, >>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>> >>>>>> >>>> Bio::SearchIO, >>>> >>>> >>>>>> I get the following error message: >>>>>> >>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>> STACK Bio::SearchIO::blast::next_result >>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>> STACK toplevel >>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>> Blast.pl:21 >>>>>> >>>>>> is that a bug...... >>>>>> >>>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>>> anything..... >>>>>> I'm using bioperl 1.4 >>>>>> >>>>>> before, I have installed bioperl 1.4, it worked fine parsing >>>>>> Blast >>>>>> Output (version 2.2.12), but I don't remember which bioperl >>>>>> >>>>>> >>>> version I >>>> >>>> >>>>>> had installed >>>>>> >>>>>> thanks in advance >>>>>> >>>>>> Hubert >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From golharam at umdnj.edu Wed Feb 8 23:46:43 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Wed, 08 Feb 2006 23:46:43 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence Message-ID: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Does anyone know of tool to mutate a DNA sequence by a specified amount? For instance, say I have a DNA sequence 1000 bases long, and I want to simulate mutations to make it 75% (or 80%, etc) similar to the original. Ryan From torsten.seemann at infotech.monash.edu.au Thu Feb 9 06:15:28 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 09 Feb 2006 22:15:28 +1100 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <43EB2450.6000606@infotech.monash.edu.au> Ryan, > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. The EMBOSS suite comes with a tool called "msbar" which can controllably mutate sequences: http://emboss.sourceforge.net/apps/msbar.html -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From cjfields at uiuc.edu Thu Feb 9 11:16:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 10:16:28 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu> Message-ID: <001b01c62d94$2e8bee50$15327e82@pyrimidine> > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Thursday, February 09, 2006 9:13 AM > To: Hubert Prielinger > Cc: Chris Fields; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > > hi chris, > > thanks, I have upgraded to version 1.5.1 but it isn't still > working, > > do you have any ohter idea, the problem I have is that I > have to parse > > a lot of textfiles.... > > or shall I look for another option to parse those files... > > > > regards > > Hubert > > > The code from Bioperl 1.5.1 works fine for me for blast > 2.2.13 reports but unless you post your blast report we can't > really determine the problem. > > If you are still getting the same error like this I am not > convinced you have upgraded to 1.5.1 which includes a fix in > the fact that NCBI changed the HSP result format to remove > the ':' from the Query/Sbjct prefixes. We fixed this as soon > as it was apparent sometime in September. > > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > If you are just getting no results but also no warnings wrt > parsing, are you sure your logic is correct? > > If you remove your filters do you see all the HSPS? > > > while (my $result = $search->next_result) { > print $result->query_name, "\n"; > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print $hit->name, "\n"; > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- > >hit_string, "\n"; > } > } > } I tested some of the BLAST results that Hubert sent Roger and me with a similar script to the above. I removed the file parsing logic and it seemed to work just fine. It may very well be a logic issue or that he hasn't installed the latest fix. It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even though the returned output was from nr, the top of the blast output showed that it was v2.2.12: BLASTP 2.2.12 [Aug-07-2005] I double-checked my local version and it's definitely v.2.2.13: ------------------------------------- C:\Perl\Scripts>blastcl3 - blastcl3 2.2.13 arguments:... ------------------------------------- If you use RemoteBlast using the same settings, the version in the header looks like this: BLASTP 2.2.13 [Nov-27-2005] I'm wondering if all the blast executables (blast and netblast) from NCBI have text output like v.2.2.12, while the wwwblast outputs a new format (2.2.13). I'll ask blast-help at NCBI about this. > > To clarify some stuff - > Chris I don't necessarily think the XML is best way forward > for BLAST reports generated locally, it isn't as detailed as > the Text format and it is what most people expect to be able > to scroll through and parse -- it is also harder for the > format to change dramatically if you have a static binary on > your machine =). I think for remoteblast the XML format > should be the way forward but I expect Bioperl to maintain > support of any plain text BLAST report format that people use > on a regular basis. > Does XML lack some specific info that text output has? Didn't know that. I believe that XML should be default in RemoteBlast since it will not break, but I agree with you about text output. I also agree that it will need somebody to maintain it constantly, much like RemoteBlast. > -jason > > > > > > Chris Fields wrote: > > > >> My guess is you're running into text parsing problems in > >> Bio::SearchIO::blast. Upgrade to the latest developer version > >> (1.5.1) or > >> bioperl-live (CVS), then see the bug below. > >> > >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >> > >> I think the first problem you ran into is solved in bioperl 1.5.1, > >> the last problem (more recent, not related to the first) has been > >> fixed but hasn't been committed to bioperl-live yet. The fixed > >> SearchIO::blast is available in the link above, but > realize it hasn't > >> been committed yet and may change. > >> > >> Christopher Fields > >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org > >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>> Prielinger > >>> Sent: Wednesday, February 08, 2006 2:52 PM > >>> To: bioperl-l at bioperl.org > >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast > >>> output > >>> > >>> Hi, > >>> If I want to parse a Blast Output (Version 2.2.12) with > >>> Bio::SearchIO, I get the following error message: > >>> > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >>> > >>> is that a bug...... > >>> > >>> If I want to parse Blast Output (version 2.2.13), I don't get > >>> anything..... > >>> I'm using bioperl 1.4 > >>> > >>> before, I have installed bioperl 1.4, it worked fine > parsing Blast > >>> Output (version 2.2.12), but I don't remember which > bioperl version > >>> I had installed > >>> > >>> thanks in advance > >>> > >>> Hubert > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Feb 9 12:53:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 11:53:24 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <200602080853.58889.heikki@sanbi.ac.za> Message-ID: <000001c62da1$ba346ba0$15327e82@pyrimidine> Heikki, I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and two test data files to bugzilla. The first data file is needed for normal tests, the second is for testing parsing with modified data in the score tag (using sprintf() in the RNAMotif descriptor). I ran 'perl t\RNAMotif.t' and they all passed. Thanks! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Heikki Lehvaslaiho > Sent: Wednesday, February 08, 2006 12:54 AM > To: bioperl-l at lists.open-bio.org > Cc: Chris Fields > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > Chris, > > Post your files to bugzilla (ticket type enhancement, add > files to ticket after creation) and someone with commit > ability will add them to CVS once the code is in satisfactory > condition. > > Thanks, > > -Heikki > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > I want to submit a module for parsing RNAMotif output > > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > > output and returning Bio::SeqFeature::Generic objects with > added tags > > for descriptors/sequences/file info. I'm in the process of > writing up > > tests and going through biodesign to make sure everything's kosher, > > but the module itself is essentially ready-to-go. What should I do > > next? > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu Feb 9 10:13:09 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 9 Feb 2006 10:13:09 -0500 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EA6570.9070909@gmx.at> References: <001101c62cfd$28605df0$15327e82@pyrimidine> <43EA6570.9070909@gmx.at> Message-ID: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > hi chris, > thanks, I have upgraded to version 1.5.1 but it isn't still > working, do > you have any ohter idea, the problem I have is that I have to parse a > lot of textfiles.... > or shall I look for another option to parse those files... > > regards > Hubert The code from Bioperl 1.5.1 works fine for me for blast 2.2.13 reports but unless you post your blast report we can't really determine the problem. If you are still getting the same error like this I am not convinced you have upgraded to 1.5.1 which includes a fix in the fact that NCBI changed the HSP result format to remove the ':' from the Query/Sbjct prefixes. We fixed this as soon as it was apparent sometime in September. >>> MSG: no data for midline Query 1 WWWKWRW 7 >>> STACK Bio::SearchIO::blast::next_result >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>> STACK toplevel >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 If you are just getting no results but also no warnings wrt parsing, are you sure your logic is correct? If you remove your filters do you see all the HSPS? while (my $result = $search->next_result) { print $result->query_name, "\n"; #iterate over each hit on the query sequence while (my $hit = $result->next_hit) { print $hit->name, "\n"; #iterate over each HSP in the hit while (my $hsp = $hit->next_hsp) { print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >hit_string, "\n"; } } } To clarify some stuff - Chris I don't necessarily think the XML is best way forward for BLAST reports generated locally, it isn't as detailed as the Text format and it is what most people expect to be able to scroll through and parse -- it is also harder for the format to change dramatically if you have a static binary on your machine =). I think for remoteblast the XML format should be the way forward but I expect Bioperl to maintain support of any plain text BLAST report format that people use on a regular basis. -jason > > > Chris Fields wrote: > >> My guess is you're running into text parsing problems in >> Bio::SearchIO::blast. Upgrade to the latest developer version >> (1.5.1) or >> bioperl-live (CVS), then see the bug below. >> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >> >> I think the first problem you ran into is solved in bioperl 1.5.1, >> the last >> problem (more recent, not related to the first) has been fixed but >> hasn't >> been committed to bioperl-live yet. The fixed SearchIO::blast is >> available >> in the link above, but realize it hasn't been committed yet and >> may change. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Hubert Prielinger >>> Sent: Wednesday, February 08, 2006 2:52 PM >>> To: bioperl-l at bioperl.org >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> Hi, >>> If I want to parse a Blast Output (Version 2.2.12) with >>> Bio::SearchIO, I get the following error message: >>> >>> MSG: no data for midline Query 1 WWWKWRW 7 >>> STACK Bio::SearchIO::blast::next_result >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>> STACK toplevel >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>> is that a bug...... >>> >>> If I want to parse Blast Output (version 2.2.13), I don't get >>> anything..... >>> I'm using bioperl 1.4 >>> >>> before, I have installed bioperl 1.4, it worked fine parsing >>> Blast Output (version 2.2.12), but I don't remember which >>> bioperl version I had installed >>> >>> thanks in advance >>> >>> Hubert >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From barry.m.dancis at gsk.com Wed Feb 8 16:44:55 2006 From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com) Date: Wed, 8 Feb 2006 16:44:55 -0500 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: <007701c62c37$7914af60$15327e82@pyrimidine> Message-ID: Hi Chris-- The problem I am solving is given a mature miRna name, how do I use it to search for its pre/pri miRna and vice versa. For example, how to go from mir-102a* to hsa-mir-102a-1*. Yes, I can write a parser for it, but I'm hoping that someone else has already done it and has some bells and whistles to go with it. Below is a hierarchy chart of a data structure to hold the naming information. The parsing is not trivial and given data in that structure there could be all kinds of neat functions that return various aspects of the names. Barry "Chris Fields" Sent by: bioperl-l-bounces at lists.open-bio.org 07-Feb-2006 17:40 To barry.m.dancis at gsk.com, "'bioperl-l'" cc Subject Re: [Bioperl-l] Handling miRNA's Are you talking about sequences or text output from a specific program? If you are talking about sequences in a particular format, then listen to Brian. If you are talking about output, then we need to know which program you're using, as a parser may exist or could be built. There are a few modules in Bio::Tools that handle RNA (like QRNA, tRNAscan-SE), so check those out first. I'm currently finishing up a Bio::Tools module for RNAMotif and have plans for making an ERPIN parser. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Tuesday, February 07, 2006 2:26 PM > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > Subject: Re: [Bioperl-l] Handling miRNA's > > It's the parser in particular that I need > > > > > "Brian Osborne" Sent by: > bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 12:05 > > To > barry.m.dancis at gsk.com, "bioperl-l" , > bioperl-l-bounces at lists.open-bio.org > cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Barry, > > If the sequence information is in one of the formats that > Bioperl understands (Genbank, Swissprot flat, and so on) then > the answer is yes. > This assumes that the details on sequence that you mentioned > are found in some sequence feature section in the file. But > it looks to me like there's no specialized parser for miRNA > sequence per se, I'll be corrected if I'm wrong. > > Brian O. > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > wrote: > > > Hi -- > > > > Are there any classes for manipulating miRNA's with > functions > such > > as parsing the name, storing and interlinking pri/pre/mat sequences, > etc? > > > > Thanks, > > > > Barry > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: image/gif Size: 8775 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060208/7f5bee48/attachment-0001.gif From pmr at ebi.ac.uk Thu Feb 9 03:25:24 2006 From: pmr at ebi.ac.uk (pmr at ebi.ac.uk) Date: Thu, 9 Feb 2006 08:25:24 -0000 (GMT) Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <2714.86.132.216.50.1139473524.squirrel@webmail.ebi.ac.uk> Ryan Golhar writes: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. EMBOSS has the msbar program ("mutate sequence beyond all recognition") which allows you to select the number and type of changes. With some tuning of options to match the sequence length you should be able to get results that match whatever your definition of 75% similar might be (amazing how much more similarity you can get by adding gaps in an alignment :-) If you can specify a clear and generally useful way to define what you need we could of course add a "percent change" option to the msbar program for a future release. Hope that helps, Peter From sofia at neuro.utah.edu Thu Feb 9 13:00:05 2006 From: sofia at neuro.utah.edu (Sofia Robb) Date: Thu, 09 Feb 2006 11:00:05 -0700 Subject: [Bioperl-l] Bio::Assembly::IO::phrap and Bio::Assembly::IO::ace with large files Message-ID: <43EB8325.6050501@neuro.utah.edu> I am having trouble parsing large (2030 contigs) phrap.out and ace.1 files. I have no problem with a small files (1 contig). Here are the errors I get when try the code that is at the end of my email. My script fails on this line: my $assembly = $in->next_assembly; I think it may be something to do with BTREE in Collection.pm, but have been unable to correct my errors. ------- file with 2030 contigs Bio::Assembly::IO::ace Can't call method "get_dup" on an undefined value at /Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 359, line 17699. line 17699 of my ace file is the last line of the record for Contig253 ------ file with 2030 contigs Bio::Assembly::IO::phrap Can't call method "put" on an undefined value at /Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 225, line 39839. line 39839 of my phrap.out file is first line of the record for Contig253 ------ use Bio::Assembly::IO; my $filename = $ARGV[0]; my $in = Bio::Assembly::IO->new(-file=>"$filename", -format=>"phrap" #or -format=>"ace" for ace.1 files ); my $assembly = $in->next_assembly; my @contigs = $assembly->all_contigs(); foreach my $contig ($assembly->all_contigs){ my $id = $contig->id(); print "contig id = $id "; my $seqObj = $contig->get_consensus_sequence(); my $seq = $seqObj->seq(); print "is $seq\n"; } my $id = $assembly->id(); print "$id\n"; ----- Thanks for any input, Sofia Sofia Robb Molecular Biology Ph.D Program Sanchez Laboratory Department of Neurobiology and Anatomy University of Utah http://planaria.neuro.utah.edu From hubert.prielinger at gmx.at Thu Feb 9 12:32:39 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 09 Feb 2006 11:32:39 -0600 Subject: [Bioperl-l] zip file In-Reply-To: References: <43EA75FF.7010504@gmx.at> Message-ID: <43EB7CB7.7040602@gmx.at> Hi Chris, It doesn't work with the simple input line either, but I have tried my script on the command line with the file scanning part and it is working, but it takes more than 10 minutes!!!!!!!!!!! for reading one file and it doesn't create the output file, so there is no output. Before I run the script in the eclipse IDE. I'm trying to upgrade to bioperl 1.5.1 once more, hopefully that's the problem, I have installed the from bioperl.org the core, run and ext part... the output as you got it is just fine, but nevertheless I need the script with the file scanning part, because I have a lot of them. to Roger: I have tried it with different files, but always the same result.....reads the files, but takes them a very long time and no Output result file Hubert Chris Fields wrote: > Hubert, > > I tried this script out it and it managed to parse your reports. I > removed the file scanning and replaced it with a simple arg line > input (i.e. script.pl blast_file). I attached one of the output files. > > Chris > > > > #!perl > > $file = shift @ARGV; > > use Bio::SearchIO; > my $cutoff_len = 10; > my $searchio = Bio::SearchIO->new( -format => 'blast', > -file => $file ); > while ( my $result = $searchio->next_result() ) { > while( my $hit = $result->next_hit ) { > while(my $hsp = $hit->next_hsp) { > if ($hsp->length('sbjct') <= $cutoff_len) { > for ($hsp->hit_string) { > if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || > tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) { > #Print some tab-delimited data about this HSP > open (bigShot, ">>BlastOutputTrial.txt") || > die ("Could not open file. $!"); > #print $result->query_name, "\t"; > #print $hit->significance, "\t"; > print bigShot $hit->name, "-->"; > print bigShot $hit->description, "\n"; > print bigShot "Query: ", > $hsp->start('query'), " ", $hsp- > >query_string, " ", > $hsp->end('query'), "\n"; > print bigShot "Seq: ", $hsp->start('hit'), > " ", $hsp->hit_string, " ", > $hsp->end('hit'), "\n"; > # print $hsp->rank, "\t"; > # print $hsp->percent_identity, "\t"; > # print $hsp->evalue, "\t"; > # print $hsp->hsp_length, "\n"; > > close (bigShot); > > }; > > > } > } > } > } > } > >------------------------------------------------------------------------ > > > From heikki at sanbi.ac.za Thu Feb 9 09:54:30 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 16:54:30 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091654.30890.heikki@sanbi.ac.za> Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 06:31:20 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 13:31:20 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091331.21690.heikki@sanbi.ac.za> Ryan, Instructions in pseudo code: take the sequence string out of the object use a hash to store changed locations repeat pick a location in the string randomly if the location is not in a hash , i.e. changed already, change it into something else add the changed location into the hash if enough locations have been changed (scalar keys hash), exit loop put the sequence string back into the seq object -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 06:31:20 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 13:31:20 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091331.21690.heikki@sanbi.ac.za> Ryan, Instructions in pseudo code: take the sequence string out of the object use a hash to store changed locations repeat pick a location in the string randomly if the location is not in a hash , i.e. changed already, change it into something else add the changed location into the hash if enough locations have been changed (scalar keys hash), exit loop put the sequence string back into the seq object -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From jason.stajich at duke.edu Thu Feb 9 14:10:54 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 9 Feb 2006 14:10:54 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> Depending on whether or not you want to use evolutionary realistic models... * evolver which comes with PAML lets you evolve sequences on a tree * SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html? id=seqgen also lets you do this I believe there are PISE interfaces to both of these at the pasteur bioweb site - http://bioweb.pasteur.fr/ -jason On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified > amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the > original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From heikki at sanbi.ac.za Thu Feb 9 09:54:30 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 16:54:30 +0200 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <200602091654.30890.heikki@sanbi.ac.za> Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified amount? > For instance, say I have a DNA sequence 1000 bases long, and I want to > simulate mutations to make it 75% (or 80%, etc) similar to the original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 14:41:33 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 9 Feb 2006 21:41:33 +0200 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <000001c62da1$ba346ba0$15327e82@pyrimidine> References: <000001c62da1$ba346ba0$15327e82@pyrimidine> Message-ID: <200602092141.34401.heikki@sanbi.ac.za> Chris, I committed your file. All tests pass; code looks like written by a long term bioperl contributor! Impressive. I truncated the larger test file from 270K to 20K (200 lines), to not bloat the distribution unnecessarily. Tests pass which is the main thing. Shout if if you disagree. Great job! -Heikki On Thursday 09 February 2006 19:53, Chris Fields wrote: > Heikki, > > I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and > two test data files to bugzilla. The first data file is needed for normal > tests, the second is for testing parsing with modified data in the score > tag (using sprintf() in the RNAMotif descriptor). I ran 'perl > t\RNAMotif.t' and they all passed. > > Thanks! > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > Heikki Lehvaslaiho > > Sent: Wednesday, February 08, 2006 12:54 AM > > To: bioperl-l at lists.open-bio.org > > Cc: Chris Fields > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > Chris, > > > > Post your files to bugzilla (ticket type enhancement, add > > files to ticket after creation) and someone with commit > > ability will add them to CVS once the code is in satisfactory > > condition. > > > > Thanks, > > > > -Heikki > > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > > I want to submit a module for parsing RNAMotif output > > > (Bio::Tools::RNAMotif). It is capable, at the moment, of scanning > > > output and returning Bio::SeqFeature::Generic objects with > > > > added tags > > > > > for descriptors/sequences/file info. I'm in the process of > > > > writing up > > > > > tests and going through biodesign to make sure everything's kosher, > > > but the module itself is essentially ready-to-go. What should I do > > > next? > > > > > > Christopher Fields > > > Postdoctoral Researcher > > > Lab of Dr. Robert Switzer > > > Dept of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ _/_/_/_/_/________________________________________________________ > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hubert.prielinger at gmx.at Thu Feb 9 15:13:31 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 09 Feb 2006 14:13:31 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL> References: <004301c62db4$c9bcbab0$d416a790@LIBERAL> Message-ID: <43EBA26B.4010907@gmx.at> dear roger, this error message I got, when I tried to parse Blast output (version 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot of Blast output files with version 2.2.13 and for that I don't get any error message.....it just doesn't work Hubert Roger Hall wrote: >Guys - I'm looking at the error message: > >MSG: no data for midline Query 1 WWWKWRW 7 >STACK Bio::SearchIO::blast::next_result >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >STACK toplevel >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >This is my line of thought: >1. "no data for midline $_" is a unique message generated by blast.pm in one >location only at the point of a. reading three lines b. dropping lines with >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3) >2. There is a regexp match that fails in order to reach that error message >3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression >4. It does anyway >5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast >reports > >I suspect a newline/chomp/metacharacter issue. Not finding the string >anywhere has me thoroughly confused - I asked Hubert for the additional >file, assuming that I didn't have it. > >My next thought is to write a quick script to test perl behavior on "Fedora >Core 9". > >Thoughts? > >Did I misread the issue entirely? :} > >Roger > > >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields >Sent: Thursday, February 09, 2006 10:16 AM >To: 'Jason Stajich'; 'Hubert Prielinger' >Cc: bioperl-l at bioperl.org >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast >output > > > > >>-----Original Message----- >>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>Sent: Thursday, February 09, 2006 9:13 AM >>To: Hubert Prielinger >>Cc: Chris Fields; bioperl-l at bioperl.org >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >> >>>hi chris, >>>thanks, I have upgraded to version 1.5.1 but it isn't still >>> >>> >>working, >> >> >>>do you have any ohter idea, the problem I have is that I >>> >>> >>have to parse >> >> >>>a lot of textfiles.... >>>or shall I look for another option to parse those files... >>> >>>regards >>>Hubert >>> >>> >>The code from Bioperl 1.5.1 works fine for me for blast >>2.2.13 reports but unless you post your blast report we can't >>really determine the problem. >> >>If you are still getting the same error like this I am not >>convinced you have upgraded to 1.5.1 which includes a fix in >>the fact that NCBI changed the HSP result format to remove >>the ':' from the Query/Sbjct prefixes. We fixed this as soon >>as it was apparent sometime in September. >> >> >> >>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>STACK Bio::SearchIO::blast::next_result >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>STACK toplevel >>>>> >>>>> >>>>> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >>If you are just getting no results but also no warnings wrt >>parsing, are you sure your logic is correct? >> >>If you remove your filters do you see all the HSPS? >> >> >>while (my $result = $search->next_result) { >> print $result->query_name, "\n"; >> #iterate over each hit on the query sequence >> while (my $hit = $result->next_hit) { >> print $hit->name, "\n"; >> #iterate over each HSP in the hit >> while (my $hsp = $hit->next_hsp) { >> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >> >hit_string, "\n"; >> } >> } >>} >> >> > >I tested some of the BLAST results that Hubert sent Roger and me with a >similar script to the above. I removed the file parsing logic and it seemed >to work just fine. It may very well be a logic issue or that he hasn't >installed the latest fix. > >It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even >though the returned output was from nr, the top of the blast output showed >that it was v2.2.12: > >BLASTP 2.2.12 [Aug-07-2005] > >I double-checked my local version and it's definitely v.2.2.13: >------------------------------------- >C:\Perl\Scripts>blastcl3 - > >blastcl3 2.2.13 arguments:... >------------------------------------- > >If you use RemoteBlast using the same settings, the version in the header >looks like this: > >BLASTP 2.2.13 [Nov-27-2005] > >I'm wondering if all the blast executables (blast and netblast) from NCBI >have text output like v.2.2.12, while the wwwblast outputs a new format >(2.2.13). I'll ask blast-help at NCBI about this. > > > >>To clarify some stuff - >>Chris I don't necessarily think the XML is best way forward >>for BLAST reports generated locally, it isn't as detailed as >>the Text format and it is what most people expect to be able >>to scroll through and parse -- it is also harder for the >>format to change dramatically if you have a static binary on >>your machine =). I think for remoteblast the XML format >>should be the way forward but I expect Bioperl to maintain >>support of any plain text BLAST report format that people use >>on a regular basis. >> >> >> > >Does XML lack some specific info that text output has? Didn't know that. I >believe that XML should be default in RemoteBlast since it will not break, >but I agree with you about text output. I also agree that it will need >somebody to maintain it constantly, much like RemoteBlast. > > > >>-jason >> >> >>>Chris Fields wrote: >>> >>> >>> >>>>My guess is you're running into text parsing problems in >>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>(1.5.1) or >>>>bioperl-live (CVS), then see the bug below. >>>> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>> >>>>I think the first problem you ran into is solved in bioperl 1.5.1, >>>>the last problem (more recent, not related to the first) has been >>>>fixed but hasn't been committed to bioperl-live yet. The fixed >>>>SearchIO::blast is available in the link above, but >>>> >>>> >>realize it hasn't >> >> >>>>been committed yet and may change. >>>> >>>>Christopher Fields >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>>> >>>>>-----Original Message----- >>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>Prielinger >>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>To: bioperl-l at bioperl.org >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>> >>>>> >>parsing Blast >> >> >>>>>output >>>>> >>>>>Hi, >>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>Bio::SearchIO, I get the following error message: >>>>> >>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>STACK Bio::SearchIO::blast::next_result >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>STACK toplevel >>>>> >>>>> >>>>> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >> >>>>>is that a bug...... >>>>> >>>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>>anything..... >>>>>I'm using bioperl 1.4 >>>>> >>>>>before, I have installed bioperl 1.4, it worked fine >>>>> >>>>> >>parsing Blast >> >> >>>>>Output (version 2.2.12), but I don't remember which >>>>> >>>>> >>bioperl version >> >> >>>>>I had installed >>>>> >>>>>thanks in advance >>>>> >>>>>Hubert >>>>> >>>>> >>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>-- >>Jason Stajich >>Duke University >>http://www.duke.edu/~jes12 >> >> >> > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From rahall2 at ualr.edu Thu Feb 9 15:09:52 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Thu, 09 Feb 2006 14:09:52 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <001b01c62d94$2e8bee50$15327e82@pyrimidine> Message-ID: <004301c62db4$c9bcbab0$d416a790@LIBERAL> Guys - I'm looking at the error message: MSG: no data for midline Query 1 WWWKWRW 7 STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 STACK toplevel /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 This is my line of thought: 1. "no data for midline $_" is a unique message generated by blast.pm in one location only at the point of a. reading three lines b. dropping lines with spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3) 2. There is a regexp match that fails in order to reach that error message 3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression 4. It does anyway 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast reports I suspect a newline/chomp/metacharacter issue. Not finding the string anywhere has me thoroughly confused - I asked Hubert for the additional file, assuming that I didn't have it. My next thought is to write a quick script to test perl behavior on "Fedora Core 9". Thoughts? Did I misread the issue entirely? :} Roger -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Thursday, February 09, 2006 10:16 AM To: 'Jason Stajich'; 'Hubert Prielinger' Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at duke.edu] > Sent: Thursday, February 09, 2006 9:13 AM > To: Hubert Prielinger > Cc: Chris Fields; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast output > > On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > > hi chris, > > thanks, I have upgraded to version 1.5.1 but it isn't still > working, > > do you have any ohter idea, the problem I have is that I > have to parse > > a lot of textfiles.... > > or shall I look for another option to parse those files... > > > > regards > > Hubert > > > The code from Bioperl 1.5.1 works fine for me for blast > 2.2.13 reports but unless you post your blast report we can't > really determine the problem. > > If you are still getting the same error like this I am not > convinced you have upgraded to 1.5.1 which includes a fix in > the fact that NCBI changed the HSP result format to remove > the ':' from the Query/Sbjct prefixes. We fixed this as soon > as it was apparent sometime in September. > > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > If you are just getting no results but also no warnings wrt > parsing, are you sure your logic is correct? > > If you remove your filters do you see all the HSPS? > > > while (my $result = $search->next_result) { > print $result->query_name, "\n"; > #iterate over each hit on the query sequence > while (my $hit = $result->next_hit) { > print $hit->name, "\n"; > #iterate over each HSP in the hit > while (my $hsp = $hit->next_hsp) { > print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- > >hit_string, "\n"; > } > } > } I tested some of the BLAST results that Hubert sent Roger and me with a similar script to the above. I removed the file parsing logic and it seemed to work just fine. It may very well be a logic issue or that he hasn't installed the latest fix. It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even though the returned output was from nr, the top of the blast output showed that it was v2.2.12: BLASTP 2.2.12 [Aug-07-2005] I double-checked my local version and it's definitely v.2.2.13: ------------------------------------- C:\Perl\Scripts>blastcl3 - blastcl3 2.2.13 arguments:... ------------------------------------- If you use RemoteBlast using the same settings, the version in the header looks like this: BLASTP 2.2.13 [Nov-27-2005] I'm wondering if all the blast executables (blast and netblast) from NCBI have text output like v.2.2.12, while the wwwblast outputs a new format (2.2.13). I'll ask blast-help at NCBI about this. > > To clarify some stuff - > Chris I don't necessarily think the XML is best way forward > for BLAST reports generated locally, it isn't as detailed as > the Text format and it is what most people expect to be able > to scroll through and parse -- it is also harder for the > format to change dramatically if you have a static binary on > your machine =). I think for remoteblast the XML format > should be the way forward but I expect Bioperl to maintain > support of any plain text BLAST report format that people use > on a regular basis. > Does XML lack some specific info that text output has? Didn't know that. I believe that XML should be default in RemoteBlast since it will not break, but I agree with you about text output. I also agree that it will need somebody to maintain it constantly, much like RemoteBlast. > -jason > > > > > > Chris Fields wrote: > > > >> My guess is you're running into text parsing problems in > >> Bio::SearchIO::blast. Upgrade to the latest developer version > >> (1.5.1) or > >> bioperl-live (CVS), then see the bug below. > >> > >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >> > >> I think the first problem you ran into is solved in bioperl 1.5.1, > >> the last problem (more recent, not related to the first) has been > >> fixed but hasn't been committed to bioperl-live yet. The fixed > >> SearchIO::blast is available in the link above, but > realize it hasn't > >> been committed yet and may change. > >> > >> Christopher Fields > >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >> University of Illinois Urbana-Champaign > >> > >> > >> > >>> -----Original Message----- > >>> From: bioperl-l-bounces at lists.open-bio.org > >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>> Prielinger > >>> Sent: Wednesday, February 08, 2006 2:52 PM > >>> To: bioperl-l at bioperl.org > >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing Blast > >>> output > >>> > >>> Hi, > >>> If I want to parse a Blast Output (Version 2.2.12) with > >>> Bio::SearchIO, I get the following error message: > >>> > >>> MSG: no data for midline Query 1 WWWKWRW 7 > >>> STACK Bio::SearchIO::blast::next_result > >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>> STACK toplevel > >>> > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >>> > >>> is that a bug...... > >>> > >>> If I want to parse Blast Output (version 2.2.13), I don't get > >>> anything..... > >>> I'm using bioperl 1.4 > >>> > >>> before, I have installed bioperl 1.4, it worked fine > parsing Blast > >>> Output (version 2.2.12), but I don't remember which > bioperl version > >>> I had installed > >>> > >>> thanks in advance > >>> > >>> Hubert > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >> > >> > >> > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From Lalancettec at AGR.GC.CA Thu Feb 9 15:53:10 2006 From: Lalancettec at AGR.GC.CA (Lalancette, Claudia) Date: Thu, 9 Feb 2006 15:53:10 -0500 Subject: [Bioperl-l] module for finding restriction site in batch of sequences? Message-ID: Greetings, I need to find a way to look for a specific restriction enzyme site in hundreds of sequences. Been looking at Bio::Restriction, but not sure if will work... Any suggestions? Thanks, Claudia From cjfields at uiuc.edu Thu Feb 9 16:25:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 15:25:01 -0600 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <200602092141.34401.heikki@sanbi.ac.za> Message-ID: <000901c62dbf$49bfae20$15327e82@pyrimidine> Thanks! I think, as long as the tests pass everything is fine with me. I may be submitting another module or two in the next few weeks; just depends on how much time I can spend on them. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za] > Sent: Thursday, February 09, 2006 1:42 PM > To: bioperl-l at lists.open-bio.org > Cc: Chris Fields > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > Chris, > > I committed your file. All tests pass; code looks like > written by a long term bioperl contributor! Impressive. > > I truncated the larger test file from 270K to 20K (200 > lines), to not bloat the distribution unnecessarily. Tests > pass which is the main thing. Shout if if you disagree. > > Great job! > > -Heikki > > > On Thursday 09 February 2006 19:53, Chris Fields wrote: > > Heikki, > > > > I've added the Bio::Tools::RNAMotif module with test suite > (24 tests) > > and two test data files to bugzilla. The first data file is needed > > for normal tests, the second is for testing parsing with > modified data > > in the score tag (using sprintf() in the RNAMotif > descriptor). I ran > > 'perl t\RNAMotif.t' and they all passed. > > > > Thanks! > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki > > > Lehvaslaiho > > > Sent: Wednesday, February 08, 2006 12:54 AM > > > To: bioperl-l at lists.open-bio.org > > > Cc: Chris Fields > > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > > > Chris, > > > > > > Post your files to bugzilla (ticket type enhancement, add > files to > > > ticket after creation) and someone with commit ability will add > > > them to CVS once the code is in satisfactory condition. > > > > > > Thanks, > > > > > > -Heikki > > > > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > > > I want to submit a module for parsing RNAMotif output > > > > (Bio::Tools::RNAMotif). It is capable, at the moment, > of scanning > > > > output and returning Bio::SeqFeature::Generic objects with > > > > > > added tags > > > > > > > for descriptors/sequences/file info. I'm in the process of > > > > > > writing up > > > > > > > tests and going through biodesign to make sure everything's > > > > kosher, but the module itself is essentially ready-to-go. What > > > > should I do next? > > > > > > > > Christopher Fields > > > > Postdoctoral Researcher > > > > Lab of Dr. Robert Switzer > > > > Dept of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > ______ _/ > _/_____________________________________________________ > > > _/ _/ > > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > > _/ _/ _/ SANBI, South African National > Bioinformatics Institute > > > _/ _/ _/ University of Western Cape, South Africa > > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > > ___ > > > _/_/_/_/_/________________________________________________________ > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ > _/_/_/_/_/________________________________________________________ From golharam at umdnj.edu Thu Feb 9 16:19:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 09 Feb 2006 16:19:46 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za> Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1> Thanks all. The responses I got were definitely more than helpful. FYI - I did initially look at msbar. I glanced over the "Number of times to perform mutation operations", which is what I was looking for. I'm looking to statistically test some simply scoring matrices. I think msbar will do. Ryan -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho Sent: Thursday, February 09, 2006 9:55 AM To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l'; emboss at emboss.open-bio.org Subject: Re: [Bioperl-l] Tool to mutate DNA sequence Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified > amount? For instance, say I have a DNA sequence 1000 bases long, and I > want to simulate mutations to make it 75% (or 80%, etc) similar to the > original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Thu Feb 9 16:19:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 09 Feb 2006 16:19:46 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za> Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1> Thanks all. The responses I got were definitely more than helpful. FYI - I did initially look at msbar. I glanced over the "Number of times to perform mutation operations", which is what I was looking for. I'm looking to statistically test some simply scoring matrices. I think msbar will do. Ryan -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho Sent: Thursday, February 09, 2006 9:55 AM To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l'; emboss at emboss.open-bio.org Subject: Re: [Bioperl-l] Tool to mutate DNA sequence Ryan, I should have made this very clear in my first reply: You have to plan very carefully what rules you use when you mutate your sequence because it will affect directly the resulting sequences. Of course, all that depends on what you will be using the sequences for. If you are going to draw evolutionary conclusions from those sequences, you must mutate them in a way that simulates evolutionary principles. My earlier pseudocode example, for example, should allow mutations in every location. Mutations do occur multiple times in same places as sequences get saturated by mutations. Also, you should decide the relative occurrence of transversions versus transitions. Then there are indels; do you want to take those into account? Also, check the EMBOSS program 'msbar'. You did not ask this, but... I remember that during the early days of Celera, one of the tools that enabled them to estimate the feasibility of the whole genome shotgun sequence assembly, was a very complete program to 'synthesize' in-silico the whole complexity of the human genome. I have no idea of that program is generally available now. Yours, -Heikki On Thursday 09 February 2006 06:46, Ryan Golhar wrote: > Does anyone know of tool to mutate a DNA sequence by a specified > amount? For instance, say I have a DNA sequence 1000 bases long, and I > want to simulate mutations to make it 75% (or 80%, etc) similar to the > original. > > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From injunjoel at hotmail.com Thu Feb 9 16:33:45 2006 From: injunjoel at hotmail.com (Joel Steele) Date: Thu, 09 Feb 2006 13:33:45 -0800 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast output In-Reply-To: <43EBA26B.4010907@gmx.at> Message-ID: Greetings again, Its the colon... observe. -=Code Snippet=- #!/usr/bin/perl -w use strict; #the string as reported from your error. my $string1 = 'Query 1 WWWKWRW 7'; #your string with a colon thrown in for testing. my $string2 = 'Query: 1 WWWKWRW 7'; foreach ($string1, $string2){ if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){ print "Match Found in $_\n"; print $1."\n"; print $2."\n"; print $3."\n"; print $4."\n"; print $5."\n"; }else{ print "no Match for $_\n"; } } -=End Code=- The Output -=Code Snippet=- no Match for Query 1 WWWKWRW 7 Match Found in Query: 1 WWWKWRW 7 Query: 1 Query 1 WWWKWRW 7 -=End Code=- Now I would suggest changing the regexp From: /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ To: /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ in SearchIO::Blast. General suggestion: Again I would like to suggest that everyone get use to using the strict pragma. Though it may not applicable to this particular problem it becomes essential if you wish progress in your use of Perl. It is a core module so there is nothing to download from CPAN. It helps with development and once your code can run without warnings and errors you can remove it. This is not a targeted attack as some may interpret it, rather a general FYI for those out there new to Perl or programming in general. Better to start learning the rules early before bad habits creep in. One more thing. There is a wonderfully supportive Perl community available to anyone who wants to join at PerlMonks.org check it out, who knows you may even catch a glimpse of Larry Wall while youre there. -Joel Steele "The surest way to corrupt a youth is to instruct him to hold in higher regard those who think alike than those who think differently." -Nietzsche "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo >From: Hubert Prielinger >To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields >, Jason Stajich >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >parsingBlast output >Date: Thu, 09 Feb 2006 14:13:31 -0600 >MIME-Version: 1.0 >Received: from newportal.open-bio.org ([209.59.5.172]) by >bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9 >Feb 2006 13:14:17 -0800 >Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by >newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k19LAD2j009778;Thu, 9 >Feb 2006 16:10:49 -0500 >Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by >newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for >; Thu, 9 Feb 2006 16:09:59 -0500 >Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000 >Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >[136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 22:10:05 >+0100 >X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA= >X-Authenticated: #16854991 >User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >X-Accept-Language: en-us, en >References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL> >X-Y-GMX-Trusted: 0 >X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 >(newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 -0500 (EST) >X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Thu, 09 >Feb 2006 16:09:59 -0500 (EST) >X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >X-Scanned-By: MIMEDefang 2.52 >X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >X-BeenThere: bioperl-l at lists.open-bio.org >X-Mailman-Version: 2.1.7 >Precedence: list >List-Id: Bioperl Project Discussion List >List-Unsubscribe: >, >List-Archive: >List-Post: >List-Help: >List-Subscribe: >, >Errors-To: bioperl-l-bounces at lists.open-bio.org >Return-Path: bioperl-l-bounces at lists.open-bio.org >X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) >FILETIME=[C95D94A0:01C62DBD] > >dear roger, >this error message I got, when I tried to parse Blast output (version >2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot >of Blast output files >with version 2.2.13 and for that I don't get any error message.....it >just doesn't work > >Hubert > > > >Roger Hall wrote: > > >Guys - I'm looking at the error message: > > > >MSG: no data for midline Query 1 WWWKWRW 7 > >STACK Bio::SearchIO::blast::next_result > >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >STACK toplevel > >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > > >This is my line of thought: > >1. "no data for midline $_" is a unique message generated by blast.pm in >one > >location only at the point of a. reading three lines b. dropping lines >with > >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < >3) > >2. There is a regexp match that fails in order to reach that error >message > >3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression > >4. It does anyway > >5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast > >reports > > > >I suspect a newline/chomp/metacharacter issue. Not finding the string > >anywhere has me thoroughly confused - I asked Hubert for the additional > >file, assuming that I didn't have it. > > > >My next thought is to write a quick script to test perl behavior on >"Fedora > >Core 9". > > > >Thoughts? > > > >Did I misread the issue entirely? :} > > > >Roger > > > > > >-----Original Message----- > >From: bioperl-l-bounces at lists.open-bio.org > >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > >Sent: Thursday, February 09, 2006 10:16 AM > >To: 'Jason Stajich'; 'Hubert Prielinger' > >Cc: bioperl-l at bioperl.org > >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast > >output > > > > > > > > > >>-----Original Message----- > >>From: Jason Stajich [mailto:jason.stajich at duke.edu] > >>Sent: Thursday, February 09, 2006 9:13 AM > >>To: Hubert Prielinger > >>Cc: Chris Fields; bioperl-l at bioperl.org > >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>parsing Blast output > >> > >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > >> > >> > >>>hi chris, > >>>thanks, I have upgraded to version 1.5.1 but it isn't still > >>> > >>> > >>working, > >> > >> > >>>do you have any ohter idea, the problem I have is that I > >>> > >>> > >>have to parse > >> > >> > >>>a lot of textfiles.... > >>>or shall I look for another option to parse those files... > >>> > >>>regards > >>>Hubert > >>> > >>> > >>The code from Bioperl 1.5.1 works fine for me for blast > >>2.2.13 reports but unless you post your blast report we can't > >>really determine the problem. > >> > >>If you are still getting the same error like this I am not > >>convinced you have upgraded to 1.5.1 which includes a fix in > >>the fact that NCBI changed the HSP result format to remove > >>the ':' from the Query/Sbjct prefixes. We fixed this as soon > >>as it was apparent sometime in September. > >> > >> > >> > >>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>STACK Bio::SearchIO::blast::next_result > >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>STACK toplevel > >>>>> > >>>>> > >>>>> > >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >> > >>If you are just getting no results but also no warnings wrt > >>parsing, are you sure your logic is correct? > >> > >>If you remove your filters do you see all the HSPS? > >> > >> > >>while (my $result = $search->next_result) { > >> print $result->query_name, "\n"; > >> #iterate over each hit on the query sequence > >> while (my $hit = $result->next_hit) { > >> print $hit->name, "\n"; > >> #iterate over each HSP in the hit > >> while (my $hsp = $hit->next_hsp) { > >> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- > >> >hit_string, "\n"; > >> } > >> } > >>} > >> > >> > > > >I tested some of the BLAST results that Hubert sent Roger and me with a > >similar script to the above. I removed the file parsing logic and it >seemed > >to work just fine. It may very well be a logic issue or that he hasn't > >installed the latest fix. > > > >It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), >even > >though the returned output was from nr, the top of the blast output >showed > >that it was v2.2.12: > > > >BLASTP 2.2.12 [Aug-07-2005] > > > >I double-checked my local version and it's definitely v.2.2.13: > >------------------------------------- > >C:\Perl\Scripts>blastcl3 - > > > >blastcl3 2.2.13 arguments:... > >------------------------------------- > > > >If you use RemoteBlast using the same settings, the version in the header > >looks like this: > > > >BLASTP 2.2.13 [Nov-27-2005] > > > >I'm wondering if all the blast executables (blast and netblast) from NCBI > >have text output like v.2.2.12, while the wwwblast outputs a new format > >(2.2.13). I'll ask blast-help at NCBI about this. > > > > > > > >>To clarify some stuff - > >>Chris I don't necessarily think the XML is best way forward > >>for BLAST reports generated locally, it isn't as detailed as > >>the Text format and it is what most people expect to be able > >>to scroll through and parse -- it is also harder for the > >>format to change dramatically if you have a static binary on > >>your machine =). I think for remoteblast the XML format > >>should be the way forward but I expect Bioperl to maintain > >>support of any plain text BLAST report format that people use > >>on a regular basis. > >> > >> > >> > > > >Does XML lack some specific info that text output has? Didn't know that. > I > >believe that XML should be default in RemoteBlast since it will not >break, > >but I agree with you about text output. I also agree that it will need > >somebody to maintain it constantly, much like RemoteBlast. > > > > > > > >>-jason > >> > >> > >>>Chris Fields wrote: > >>> > >>> > >>> > >>>>My guess is you're running into text parsing problems in > >>>>Bio::SearchIO::blast. Upgrade to the latest developer version > >>>>(1.5.1) or > >>>>bioperl-live (CVS), then see the bug below. > >>>> > >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>> > >>>>I think the first problem you ran into is solved in bioperl 1.5.1, > >>>>the last problem (more recent, not related to the first) has been > >>>>fixed but hasn't been committed to bioperl-live yet. The fixed > >>>>SearchIO::blast is available in the link above, but > >>>> > >>>> > >>realize it hasn't > >> > >> > >>>>been committed yet and may change. > >>>> > >>>>Christopher Fields > >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>University of Illinois Urbana-Champaign > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>-----Original Message----- > >>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert > >>>>>Prielinger > >>>>>Sent: Wednesday, February 08, 2006 2:52 PM > >>>>>To: bioperl-l at bioperl.org > >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>> > >>>>> > >>parsing Blast > >> > >> > >>>>>output > >>>>> > >>>>>Hi, > >>>>>If I want to parse a Blast Output (Version 2.2.12) with > >>>>>Bio::SearchIO, I get the following error message: > >>>>> > >>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>STACK Bio::SearchIO::blast::next_result > >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>STACK toplevel > >>>>> > >>>>> > >>>>> > >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > >> > >> > >>>>>is that a bug...... > >>>>> > >>>>>If I want to parse Blast Output (version 2.2.13), I don't get > >>>>>anything..... > >>>>>I'm using bioperl 1.4 > >>>>> > >>>>>before, I have installed bioperl 1.4, it worked fine > >>>>> > >>>>> > >>parsing Blast > >> > >> > >>>>>Output (version 2.2.12), but I don't remember which > >>>>> > >>>>> > >>bioperl version > >> > >> > >>>>>I had installed > >>>>> > >>>>>thanks in advance > >>>>> > >>>>>Hubert > >>>>> > >>>>> > >>>>> > >>>>>_______________________________________________ > >>>>>Bioperl-l mailing list > >>>>>Bioperl-l at lists.open-bio.org > >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> > >>>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l at lists.open-bio.org > >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>-- > >>Jason Stajich > >>Duke University > >>http://www.duke.edu/~jes12 > >> > >> > >> > > > >Christopher Fields > >Postdoctoral Researcher - Switzer Lab > >Dept. of Biochemistry > >University of Illinois Urbana-Champaign > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Thu Feb 9 17:13:16 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 9 Feb 2006 17:13:16 -0500 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast output In-Reply-To: References: Message-ID: Uh, that was done in sept see the CVS log... On Feb 9, 2006, at 4:33 PM, Joel Steele wrote: > Greetings again, > Its the colon... > observe. > > -=Code Snippet=- > #!/usr/bin/perl -w > use strict; > > #the string as reported from your error. > my $string1 = 'Query 1 WWWKWRW 7'; > > #your string with a colon thrown in for testing. > my $string2 = 'Query: 1 WWWKWRW 7'; > > foreach ($string1, $string2){ > if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){ > print "Match Found in $_\n"; > print $1."\n"; > print $2."\n"; > print $3."\n"; > print $4."\n"; > print $5."\n"; > }else{ > print "no Match for $_\n"; > } > } > > -=End Code=- > > The Output > > -=Code Snippet=- > no Match for Query 1 WWWKWRW 7 > Match Found in Query: 1 WWWKWRW 7 > Query: 1 > Query > 1 > WWWKWRW > 7 > > -=End Code=- > > > Now I would suggest changing the regexp > > From: > /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ > > To: > /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/ > > in SearchIO::Blast. > > General suggestion: > Again I would like to suggest that everyone get use to using the > strict > pragma. Though it may not applicable to this particular problem it > becomes > essential if you wish progress in your use of Perl. > It is a core module so there is nothing to download from CPAN. It > helps with > development and once your code can run without warnings and errors > you can > remove it. This is not a targeted attack as some may interpret it, > rather a > general FYI for those out there new to Perl or programming in general. > Better to start learning the rules early before bad habits creep in. > One more thing. There is a wonderfully supportive Perl community > available > to anyone who wants to join at PerlMonks.org check it out, who > knows you may > even catch a glimpse of Larry Wall while youre there. > > -Joel Steele > > "The surest way to corrupt a youth is to instruct him to hold in > higher > regard those who think alike than those who think differently." - > Nietzsche > > "I do not feel obliged to believe that the same God who endowed us > with > sense, reason and intellect has intended us to forego their use." - > Galileo > > > > >> From: Hubert Prielinger >> To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields >> , Jason Stajich >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >> parsingBlast output >> Date: Thu, 09 Feb 2006 14:13:31 -0600 >> MIME-Version: 1.0 >> Received: from newportal.open-bio.org ([209.59.5.172]) by >> bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); >> Thu, 9 >> Feb 2006 13:14:17 -0800 >> Received: from newportal.open-bio.org (localhost.localdomain >> [127.0.0.1])by >> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id >> k19LAD2j009778;Thu, 9 >> Feb 2006 16:10:49 -0500 >> Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by >> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for >> ; Thu, 9 Feb 2006 16:09:59 -0500 >> Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000 >> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) >> [136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 >> 22:10:05 >> +0100 >> X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA= >> X-Authenticated: #16854991 >> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929) >> X-Accept-Language: en-us, en >> References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL> >> X-Y-GMX-Trusted: 0 >> X-Greylist: Sender IP whitelisted, not delayed by milter- >> greylist-2.0.2 >> (newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 >> -0500 (EST) >> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed >> bymilter-greylist-2.0.2 (newportal.open-bio.org >> [207.154.17.70]);Thu, 09 >> Feb 2006 16:09:59 -0500 (EST) >> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS >> X-Scanned-By: MIMEDefang 2.52 >> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70 >> X-BeenThere: bioperl-l at lists.open-bio.org >> X-Mailman-Version: 2.1.7 >> Precedence: list >> List-Id: Bioperl Project Discussion List > bio.org> >> List-Unsubscribe: >> > l>, >> List-Archive: >> List-Post: >> List-Help: >> List-Subscribe: >> > l>, >> Errors-To: bioperl-l-bounces at lists.open-bio.org >> Return-Path: bioperl-l-bounces at lists.open-bio.org >> X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) >> FILETIME=[C95D94A0:01C62DBD] >> >> dear roger, >> this error message I got, when I tried to parse Blast output (version >> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >> a lot >> of Blast output files >> with version 2.2.13 and for that I don't get any error message.....it >> just doesn't work >> >> Hubert >> >> >> >> Roger Hall wrote: >> >>> Guys - I'm looking at the error message: >>> >>> MSG: no data for midline Query 1 WWWKWRW 7 >>> STACK Bio::SearchIO::blast::next_result >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>> STACK toplevel >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>> This is my line of thought: >>> 1. "no data for midline $_" is a unique message generated by >>> blast.pm in >> one >>> location only at the point of a. reading three lines b. dropping >>> lines >> with >>> spaces only c. identifying the Query, Midline, and Match lines (0 >>> <= $i < >> 3) >>> 2. There is a regexp match that fails in order to reach that error >> message >>> 3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>> expression >>> 4. It does anyway >>> 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in >>> the blast >>> reports >>> >>> I suspect a newline/chomp/metacharacter issue. Not finding the >>> string >>> anywhere has me thoroughly confused - I asked Hubert for the >>> additional >>> file, assuming that I didn't have it. >>> >>> My next thought is to write a quick script to test perl behavior on >> "Fedora >>> Core 9". >>> >>> Thoughts? >>> >>> Did I misread the issue entirely? :} >>> >>> Roger >>> >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris >>> Fields >>> Sent: Thursday, February 09, 2006 10:16 AM >>> To: 'Jason Stajich'; 'Hubert Prielinger' >>> Cc: bioperl-l at bioperl.org >>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast >>> output >>> >>> >>> >>> >>>> -----Original Message----- >>>> From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>> Sent: Thursday, February 09, 2006 9:13 AM >>>> To: Hubert Prielinger >>>> Cc: Chris Fields; bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> parsing Blast output >>>> >>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>> >>>> >>>>> hi chris, >>>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>>> >>>>> >>>> working, >>>> >>>> >>>>> do you have any ohter idea, the problem I have is that I >>>>> >>>>> >>>> have to parse >>>> >>>> >>>>> a lot of textfiles.... >>>>> or shall I look for another option to parse those files... >>>>> >>>>> regards >>>>> Hubert >>>>> >>>>> >>>> The code from Bioperl 1.5.1 works fine for me for blast >>>> 2.2.13 reports but unless you post your blast report we can't >>>> really determine the problem. >>>> >>>> If you are still getting the same error like this I am not >>>> convinced you have upgraded to 1.5.1 which includes a fix in >>>> the fact that NCBI changed the HSP result format to remove >>>> the ':' from the Query/Sbjct prefixes. We fixed this as soon >>>> as it was apparent sometime in September. >>>> >>>> >>>> >>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>> STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>> Blast.pl:21 >>>> >>>> If you are just getting no results but also no warnings wrt >>>> parsing, are you sure your logic is correct? >>>> >>>> If you remove your filters do you see all the HSPS? >>>> >>>> >>>> while (my $result = $search->next_result) { >>>> print $result->query_name, "\n"; >>>> #iterate over each hit on the query sequence >>>> while (my $hit = $result->next_hit) { >>>> print $hit->name, "\n"; >>>> #iterate over each HSP in the hit >>>> while (my $hsp = $hit->next_hsp) { >>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>> hit_string, "\n"; >>>> } >>>> } >>>> } >>>> >>>> >>> >>> I tested some of the BLAST results that Hubert sent Roger and me >>> with a >>> similar script to the above. I removed the file parsing logic >>> and it >> seemed >>> to work just fine. It may very well be a logic issue or that he >>> hasn't >>> installed the latest fix. >>> >>> It's a funny thing, though. When I tried using blastcl3 (v. >>> 2.2.13), >> even >>> though the returned output was from nr, the top of the blast output >> showed >>> that it was v2.2.12: >>> >>> BLASTP 2.2.12 [Aug-07-2005] >>> >>> I double-checked my local version and it's definitely v.2.2.13: >>> ------------------------------------- >>> C:\Perl\Scripts>blastcl3 - >>> >>> blastcl3 2.2.13 arguments:... >>> ------------------------------------- >>> >>> If you use RemoteBlast using the same settings, the version in >>> the header >>> looks like this: >>> >>> BLASTP 2.2.13 [Nov-27-2005] >>> >>> I'm wondering if all the blast executables (blast and netblast) >>> from NCBI >>> have text output like v.2.2.12, while the wwwblast outputs a new >>> format >>> (2.2.13). I'll ask blast-help at NCBI about this. >>> >>> >>> >>>> To clarify some stuff - >>>> Chris I don't necessarily think the XML is best way forward >>>> for BLAST reports generated locally, it isn't as detailed as >>>> the Text format and it is what most people expect to be able >>>> to scroll through and parse -- it is also harder for the >>>> format to change dramatically if you have a static binary on >>>> your machine =). I think for remoteblast the XML format >>>> should be the way forward but I expect Bioperl to maintain >>>> support of any plain text BLAST report format that people use >>>> on a regular basis. >>>> >>>> >>>> >>> >>> Does XML lack some specific info that text output has? Didn't >>> know that. >> I >>> believe that XML should be default in RemoteBlast since it will not >> break, >>> but I agree with you about text output. I also agree that it >>> will need >>> somebody to maintain it constantly, much like RemoteBlast. >>> >>> >>> >>>> -jason >>>> >>>> >>>>> Chris Fields wrote: >>>>> >>>>> >>>>> >>>>>> My guess is you're running into text parsing problems in >>>>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>> (1.5.1) or >>>>>> bioperl-live (CVS), then see the bug below. >>>>>> >>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>> >>>>>> I think the first problem you ran into is solved in bioperl >>>>>> 1.5.1, >>>>>> the last problem (more recent, not related to the first) has been >>>>>> fixed but hasn't been committed to bioperl-live yet. The fixed >>>>>> SearchIO::blast is available in the link above, but >>>>>> >>>>>> >>>> realize it hasn't >>>> >>>> >>>>>> been committed yet and may change. >>>>>> >>>>>> Christopher Fields >>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>> University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>>> Hubert >>>>>>> Prielinger >>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>> To: bioperl-l at bioperl.org >>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>> >>>>>>> >>>> parsing Blast >>>> >>>> >>>>>>> output >>>>>>> >>>>>>> Hi, >>>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>>> Bio::SearchIO, I get the following error message: >>>>>>> >>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>> STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>> Blast.pl:21 >>>> >>>> >>>>>>> is that a bug...... >>>>>>> >>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>>>> anything..... >>>>>>> I'm using bioperl 1.4 >>>>>>> >>>>>>> before, I have installed bioperl 1.4, it worked fine >>>>>>> >>>>>>> >>>> parsing Blast >>>> >>>> >>>>>>> Output (version 2.2.12), but I don't remember which >>>>>>> >>>>>>> >>>> bioperl version >>>> >>>> >>>>>>> I had installed >>>>>>> >>>>>>> thanks in advance >>>>>>> >>>>>>> Hubert >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> -- >>>> Jason Stajich >>>> Duke University >>>> http://www.duke.edu/~jes12 >>>> >>>> >>>> >>> >>> Christopher Fields >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From boris.steipe at utoronto.ca Thu Feb 9 16:54:53 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Thu, 9 Feb 2006 16:54:53 -0500 Subject: [Bioperl-l] Tool to mutate DNA sequence In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> Message-ID: <1B7E8DA9-86F5-4411-B16C-E6573E5E8C36@utoronto.ca> Golf, anyone? #!/usr/bin/perl -nl for(split//){push at a,$_} END{ while($n/@a<0.5) { $p=rand(@a); if($a[$p]=~/[A-Z]/){$a[$p]=lc((grep!/$a[$p]/,split//,"ACGT")[rand (3)]); $n++; } } print @a; } (144, not counting \s and the # !line ) :-) B. >> Does anyone know of tool to mutate a DNA sequence by a specified >> amount? >> For instance, say I have a DNA sequence 1000 bases long, and I >> want to >> simulate mutations to make it 75% (or 80%, etc) similar to the >> original. >> >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Thu Feb 9 17:20:46 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 09 Feb 2006 16:20:46 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <000e01c62dca$bc66df60$15327e82@pyrimidine> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> Message-ID: <43EBC03E.4040900@gmx.at> Hi Chris, I'm incredibly sorry for causing so much inconvenience, yes you are right, I had only to change the blast.pm file, it is working very fine, thank you very much, and you are right, you have mentioned it ealier either to change the file... ;) but I have another question: does it work with the WU-Blast output too? regards Hubert Chris Fields wrote: >Ha! I come back from meeting and there's a billion emails! What have we >started? ;p . Sorry about this Jason; I know you're busy. > >Hubert, if you're out there, I sent you an email with an attachment. You >said the output looks like what you were expecting. So I think we have two >problems: > >1) I haven't delved into the file scanning, but the fact that it takes so >long should tell you something's seriously wrong there. Strip that part out >and start with a simple script, say, like the one Jason or that I sent you; >the script I used to generate that output works fine (on two OS's, WinXP and >Mac OS X). Use it on one file at a time. Do everything on command line >(not through Eclipse). IDE's can be notoriously flaky about running >scripts, esp. when they run debugging. > >2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast will still >not work whenever the text blast output has the following header, which >comes from the new web version of BLAST: > >----------------------------------------------------- >BLASTP 2.2.13 [Nov-27-2005] >Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, >Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman >(1997), "Gapped BLAST and PSI-BLAST: a new generation of >protein database search programs", Nucleic Acids Res. 25:3389-3402. > >RID: 1139501210-857-165793005128.BLASTQ1 > > >Database: All non-redundant GenBank CDS >translations+PDB+SwissProt+PIR+PRF excluding environmental samples > 3,292,813 sequences; 1,128,164,434 total letters >Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >tuberculosis >H37Rv]. >Length=193 >....... >----------------------------------------------------- > >It will work if the text output has the following header (or is an older >version of BLAST): > >----------------------------------------------------- >BLASTP 2.2.12 [Aug-07-2005] > > >Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, >Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), >"Gapped BLAST and PSI-BLAST: a new generation of protein database search >programs", Nucleic Acids Res. 25:3389-3402. > >Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >tuberculosis H37Rv]. > (193 letters) > >Database: All non-redundant GenBank CDS >translations+PDB+SwissProt+PIR+PRF excluding environmental samples > 2,895,325 sequences; 997,103,285 total letters >----------------------------------------------------- >You have the former (2.2.13) version. I know b/c I have your BLAST files. >Therefore, even bioperl-1.5.1 will not work! > >If you want the really gory details on why this is a problem, look here: > >http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >So, any text output with the above header will not work; it will either hang >or end abruptly (depending on OS, perl version, memory, patience). If you >look in the above, I have added a preliminary fix for this. I'll reiterate >for the billionth time, it hasn't been committed yet, so don't kill me if >blows your computer up ;> > >Here's the direct link: >http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >This is a modified version of Bio::SearchIO::blast.pm (it says it's version >1.90, but it's lying, I didn't change the version, only the regex; sorry >Jason). From what you've been posting it doesn't sound like you've tried >this, and I believe I've suggested this fix before. > >Replace the one in your Bio/SearchIO directory (which looks like >'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your prev. >message) with this file. Make sure the filename stays the same (blast.pm). > >Run everything again, one file at a time. Make sure you use Jason's script >as well as the one I sent you. Do NOT rely on running through multiple >files yet. Fix one bug at a time. And heed Joel's words about file checks. > > >Here's a small chunk of output from one of your blast files using the >modifed script I sent you: > >sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >Query: 1 RWKWKRKK 8 >Seq: 542 RWAWRRKK 549 > >Look familiar? > >Christopher Fields >Postdoctoral Researcher - Switzer Lab >Dept. of Biochemistry >University of Illinois Urbana-Champaign > > > >>-----Original Message----- >>From: Roger Hall [mailto:rahall2 at ualr.edu] >>Sent: Thursday, February 09, 2006 3:24 PM >>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>In other words, yes, I'm on the wrong trail. :} >> >>Sorry - I'll look at the output issue this evening (or >>realize that Chris already solved the issue). ;} >> >>Thanks! >> >>Roger >> >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>Hubert Prielinger >>Sent: Thursday, February 09, 2006 2:14 PM >>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; >>Jason Stajich >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>parsing Blast output >> >>dear roger, >>this error message I got, when I tried to parse Blast output (version >>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I >>have a lot of Blast output files with version 2.2.13 and for >>that I don't get any error message.....it just doesn't work >> >>Hubert >> >> >> >>Roger Hall wrote: >> >> >> >>>Guys - I'm looking at the error message: >>> >>>MSG: no data for midline Query 1 WWWKWRW 7 >>>STACK Bio::SearchIO::blast::next_result >>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>STACK toplevel >>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>>This is my line of thought: >>>1. "no data for midline $_" is a unique message generated by >>> >>> >>blast.pm >> >> >>>in >>> >>> >>one >> >> >>>location only at the point of a. reading three lines b. >>> >>> >>dropping lines >> >> >>>with spaces only c. identifying the Query, Midline, and >>> >>> >>Match lines (0 >> >> >>><= $i < >>> >>> >>3) >> >> >>>2. There is a regexp match that fails in order to reach that >>> >>> >>error message >> >> >>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>> >>> >>expression >> >> >>>4. It does anyway >>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>> >>> >>in the blast >> >> >>>reports >>> >>>I suspect a newline/chomp/metacharacter issue. Not finding >>> >>> >>the string >> >> >>>anywhere has me thoroughly confused - I asked Hubert for the >>> >>> >>additional >> >> >>>file, assuming that I didn't have it. >>> >>>My next thought is to write a quick script to test perl behavior on >>>"Fedora Core 9". >>> >>>Thoughts? >>> >>>Did I misread the issue entirely? :} >>> >>>Roger >>> >>> >>>-----Original Message----- >>>From: bioperl-l-bounces at lists.open-bio.org >>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> >>> >>Chris Fields >> >> >>>Sent: Thursday, February 09, 2006 10:16 AM >>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>Cc: bioperl-l at bioperl.org >>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >>>Blast output >>> >>> >>> >>> >>> >>> >>>>-----Original Message----- >>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>To: Hubert Prielinger >>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >>>>Blast output >>>> >>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>> >>>> >>>> >>>> >>>>>hi chris, >>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>> >>>>> >>>>> >>>>> >>>>working, >>>> >>>> >>>> >>>> >>>>>do you have any ohter idea, the problem I have is that I >>>>> >>>>> >>>>> >>>>> >>>>have to parse >>>> >>>> >>>> >>>> >>>>>a lot of textfiles.... >>>>>or shall I look for another option to parse those files... >>>>> >>>>>regards >>>>>Hubert >>>>> >>>>> >>>>> >>>>> >>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>2.2.13 reports but unless you post your blast report we >>>> >>>> >>can't really >> >> >>>>determine the problem. >>>> >>>>If you are still getting the same error like this I am not >>>> >>>> >>convinced >> >> >>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>> >>>> >>that NCBI >> >> >>>>changed the HSP result format to remove the ':' from the >>>> >>>> >>Query/Sbjct >> >> >>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>September. >>>> >>>> >>>> >>>> >>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>>If you are just getting no results but also no warnings wrt >>>> >>>> >>parsing, >> >> >>>>are you sure your logic is correct? >>>> >>>>If you remove your filters do you see all the HSPS? >>>> >>>> >>>>while (my $result = $search->next_result) { >>>> print $result->query_name, "\n"; >>>> #iterate over each hit on the query sequence >>>> while (my $hit = $result->next_hit) { >>>> print $hit->name, "\n"; >>>> #iterate over each HSP in the hit >>>> while (my $hsp = $hit->next_hsp) { >>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>> >>>> >>>>>hit_string, "\n"; >>>>> >>>>> >>>> } >>>> } >>>>} >>>> >>>> >>>> >>>> >>>I tested some of the BLAST results that Hubert sent Roger >>> >>> >>and me with a >> >> >>>similar script to the above. I removed the file parsing logic and it >>> >>> >>seemed >> >> >>>to work just fine. It may very well be a logic issue or >>> >>> >>that he hasn't >> >> >>>installed the latest fix. >>> >>>It's a funny thing, though. When I tried using blastcl3 (v. >>> >>> >>2.2.13), >> >> >>>even though the returned output was from nr, the top of the blast >>>output showed that it was v2.2.12: >>> >>>BLASTP 2.2.12 [Aug-07-2005] >>> >>>I double-checked my local version and it's definitely v.2.2.13: >>>------------------------------------- >>>C:\Perl\Scripts>blastcl3 - >>> >>>blastcl3 2.2.13 arguments:... >>>------------------------------------- >>> >>>If you use RemoteBlast using the same settings, the version in the >>>header looks like this: >>> >>>BLASTP 2.2.13 [Nov-27-2005] >>> >>>I'm wondering if all the blast executables (blast and netblast) from >>>NCBI have text output like v.2.2.12, while the wwwblast >>> >>> >>outputs a new >> >> >>>format (2.2.13). I'll ask blast-help at NCBI about this. >>> >>> >>> >>> >>> >>>>To clarify some stuff - >>>>Chris I don't necessarily think the XML is best way forward >>>> >>>> >>for BLAST >> >> >>>>reports generated locally, it isn't as detailed as the Text >>>> >>>> >>format and >> >> >>>>it is what most people expect to be able to scroll through >>>> >>>> >>and parse >> >> >>>>-- it is also harder for the format to change dramatically >>>> >>>> >>if you have >> >> >>>>a static binary on your machine =). I think for >>>> >>>> >>remoteblast the XML >> >> >>>>format should be the way forward but I expect Bioperl to maintain >>>>support of any plain text BLAST report format that people use on a >>>>regular basis. >>>> >>>> >>>> >>>> >>>> >>>Does XML lack some specific info that text output has? >>> >>> >>Didn't know that. >>I >> >> >>>believe that XML should be default in RemoteBlast since it will not >>>break, but I agree with you about text output. I also agree that it >>>will need somebody to maintain it constantly, much like RemoteBlast. >>> >>> >>> >>> >>> >>>>-jason >>>> >>>> >>>> >>>> >>>>>Chris Fields wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>My guess is you're running into text parsing problems in >>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>(1.5.1) or >>>>>>bioperl-live (CVS), then see the bug below. >>>>>> >>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>> >>>>>>I think the first problem you ran into is solved in >>>>>> >>>>>> >>bioperl 1.5.1, >> >> >>>>>>the last problem (more recent, not related to the first) has been >>>>>>fixed but hasn't been committed to bioperl-live yet. The fixed >>>>>>SearchIO::blast is available in the link above, but >>>>>> >>>>>> >>>>>> >>>>>> >>>>realize it hasn't >>>> >>>> >>>> >>>> >>>>>>been committed yet and may change. >>>>>> >>>>>>Christopher Fields >>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>University of Illinois Urbana-Champaign >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>-----Original Message----- >>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>> >>>>>>> >>Of Hubert >> >> >>>>>>>Prielinger >>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>To: bioperl-l at bioperl.org >>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>parsing Blast >>>> >>>> >>>> >>>> >>>>>>>output >>>>>>> >>>>>>>Hi, >>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>Bio::SearchIO, I get the following error message: >>>>>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>>> >>>> >>>> >>>> >>>>>>>is that a bug...... >>>>>>> >>>>>>>If I want to parse Blast Output (version 2.2.13), I don't get >>>>>>>anything..... >>>>>>>I'm using bioperl 1.4 >>>>>>> >>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>parsing Blast >>>> >>>> >>>> >>>> >>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>bioperl version >>>> >>>> >>>> >>>> >>>>>>>I had installed >>>>>>> >>>>>>>thanks in advance >>>>>>> >>>>>>>Hubert >>>>>>> >>>>>>> >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>-- >>>>Jason Stajich >>>>Duke University >>>>http://www.duke.edu/~jes12 >>>> >>>> >>>> >>>> >>>> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >>> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > > From olenka.m at gmail.com Thu Feb 9 17:49:48 2006 From: olenka.m at gmail.com (Olena Morozova) Date: Thu, 9 Feb 2006 17:49:48 -0500 Subject: [Bioperl-l] Bio::TreeIO Message-ID: <259a224c0602091449u353e4bf1g5a3cfbb46297217a@mail.gmail.com> Hi all, Probably a very stupid question, but the get_lca function does not work for unrooted trees, does it? I am trying to get the LCA for a set of nodes in a phylip tree, and I am using the script in the HOWTO. Thanks, Olena On 2/8/06, Hubert Prielinger wrote: > Hi, > If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO, > I get the following error message: > > MSG: no data for midline Query 1 WWWKWRW 7 > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > STACK toplevel > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > is that a bug...... > > If I want to parse Blast Output (version 2.2.13), I don't get anything..... > I'm using bioperl 1.4 > > before, I have installed bioperl 1.4, it worked fine parsing Blast > Output (version 2.2.12), but I don't remember which bioperl version I > had installed > > thanks in advance > > Hubert > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From victor.ruotti at gmail.com Thu Feb 9 18:22:11 2006 From: victor.ruotti at gmail.com (Victor) Date: Thu, 9 Feb 2006 17:22:11 -0600 Subject: [Bioperl-l] Running BLAT with BioPerl Message-ID: <36d7e5550602091522g114728a2w57f2a1cb7c1383ee@mail.gmail.com> Hi, Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to date in the lastest bioperl release? use Bio::Tools::Run::Alignment::Blat; my $factory = Bio::Tools::Run::Alignment::Blat->new(); my $seq = "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA"; my @feats = $factory->run( $seq); Here is what I get when tring to use it: ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Blat call (/usr/local/bin/blat/blat -out=blast TGAAATAAAACTCAGTA /tmp/fB09bp5F76) crashed: -1 Notice that it is using "blat' twice in the path. The way that I fixed this is by going to the blat.pm module and changing the following lines: #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); my $str= Bio::Root::IO->catfile($self->program_name); Any ideas, maybe I'm missing the $ENV variable somewhere? I'd like to avoid making this change. Also does anyone have a known synopsis of this blat module (where to set the parameters, and whether it allows you to have a config file). I'll be happy to add a better synopsis to the module if needed. Thanks in advance, Victor From osborne1 at optonline.net Thu Feb 9 20:37:39 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 09 Feb 2006 20:37:39 -0500 Subject: [Bioperl-l] module for finding restriction site in batch of sequences? In-Reply-To: Message-ID: Claudia, Yes, Bio::Restricion does this, see bptutorial.pl for code examples. Note that statement "@fragments = $analysis->fragments($enzyme)". If the array @fragments has more than 1 element that means your sequence has a site for the enzyme in question. Alternatively it sounds like you could use some kind of regular expression. Brian O. On 2/9/06 3:53 PM, "Lalancette, Claudia" wrote: > Greetings, > > > > I need to find a way to look for a specific restriction enzyme site in > hundreds of sequences. Been looking at Bio::Restriction, but not sure > if will work... Any suggestions? > > > > Thanks, > > Claudia > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Feb 9 20:52:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 9 Feb 2006 19:52:34 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43EBC03E.4040900@gmx.at> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> Message-ID: From 'perldoc Bio::SearchIO::blast': DESCRIPTION This object encapsulated the necessary methods for generating events suitable for building Bio::Search objects from a BLAST report file. Read the Bio::SearchIO for more information about how to use this. This driver can parse: o NCBI produced plain text BLAST reports from blastall, this also includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq reports. NCBI XML BLAST output is parsed with the blastxml SearchIO driver o WU-BLAST all reports o Jim Kent's BLAST-like output from his programs (BLASTZ, BLAT) o BLAST-like output from Paracel BTK output So, it should. Let us know if it doesn't. On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: > Hi Chris, > I'm incredibly sorry for causing so much inconvenience, yes you are > right, I had only to change the blast.pm file, it is working very > fine, thank you very much, and you are right, you have mentioned it > ealier either to change the file... ;) > > but I have another question: does it work with the WU-Blast output > too? > regards > Hubert > > > Chris Fields wrote: > >> Ha! I come back from meeting and there's a billion emails! What >> have we >> started? ;p . Sorry about this Jason; I know you're busy. >> >> Hubert, if you're out there, I sent you an email with an >> attachment. You >> said the output looks like what you were expecting. So I think we >> have two >> problems: >> >> 1) I haven't delved into the file scanning, but the fact that it >> takes so >> long should tell you something's seriously wrong there. Strip >> that part out >> and start with a simple script, say, like the one Jason or that I >> sent you; >> the script I used to generate that output works fine (on two OS's, >> WinXP and >> Mac OS X). Use it on one file at a time. Do everything on >> command line >> (not through Eclipse). IDE's can be notoriously flaky about running >> scripts, esp. when they run debugging. >> 2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >> will still >> not work whenever the text blast output has the following header, >> which >> comes from the new web version of BLAST: >> >> ----------------------------------------------------- >> BLASTP 2.2.13 [Nov-27-2005] >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >> protein database search programs", Nucleic Acids Res. 25:3389-3402. >> >> RID: 1139501210-857-165793005128.BLASTQ1 >> >> >> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >> 3,292,813 sequences; 1,128,164,434 total letters >> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >> tuberculosis H37Rv]. >> Length=193 >> ....... >> ----------------------------------------------------- >> >> It will work if the text output has the following header (or is an >> older >> version of BLAST): >> >> ----------------------------------------------------- >> BLASTP 2.2.12 [Aug-07-2005] >> >> >> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >> Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >> protein database search >> programs", Nucleic Acids Res. 25:3389-3402. >> >> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >> tuberculosis H37Rv]. >> (193 letters) >> >> Database: All non-redundant GenBank CDS >> translations+PDB+SwissProt+PIR+PRF excluding environmental samples >> 2,895,325 sequences; 997,103,285 total letters >> ----------------------------------------------------- >> You have the former (2.2.13) version. I know b/c I have your >> BLAST files. >> Therefore, even bioperl-1.5.1 will not work! >> >> If you want the really gory details on why this is a problem, look >> here: >> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >> >> So, any text output with the above header will not work; it will >> either hang >> or end abruptly (depending on OS, perl version, memory, >> patience). If you >> look in the above, I have added a preliminary fix for this. I'll >> reiterate >> for the billionth time, it hasn't been committed yet, so don't >> kill me if >> blows your computer up ;> >> Here's the direct link: >> http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >> This is a modified version of Bio::SearchIO::blast.pm (it says >> it's version >> 1.90, but it's lying, I didn't change the version, only the regex; >> sorry >> Jason). From what you've been posting it doesn't sound like >> you've tried >> this, and I believe I've suggested this fix before. >> >> Replace the one in your Bio/SearchIO directory (which looks like >> '/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >> prev. >> message) with this file. Make sure the filename stays the same >> (blast.pm). >> >> Run everything again, one file at a time. Make sure you use >> Jason's script >> as well as the one I sent you. Do NOT rely on running through >> multiple >> files yet. Fix one bug at a time. And heed Joel's words about >> file checks. >> >> >> Here's a small chunk of output from one of your blast files using the >> modifed script I sent you: >> >> sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >> Query: 1 RWKWKRKK 8 >> Seq: 542 RWAWRRKK 549 >> >> Look familiar? >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >>> -----Original Message----- >>> From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>> February 09, 2006 3:24 PM >>> To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> In other words, yes, I'm on the wrong trail. :} >>> >>> Sorry - I'll look at the output issue this evening (or realize >>> that Chris already solved the issue). ;} >>> >>> Thanks! >>> >>> Roger >>> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>> Prielinger >>> Sent: Thursday, February 09, 2006 2:14 PM >>> To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>> Stajich >>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> dear roger, >>> this error message I got, when I tried to parse Blast output >>> (version >>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>> a lot of Blast output files with version 2.2.13 and for that I >>> don't get any error message.....it just doesn't work >>> >>> Hubert >>> >>> >>> >>> Roger Hall wrote: >>> >>> >>>> Guys - I'm looking at the error message: >>>> >>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>> STACK Bio::SearchIO::blast::next_result >>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>> STACK toplevel >>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>> Blast.pl:21 >>>> >>>> This is my line of thought: >>>> 1. "no data for midline $_" is a unique message generated by >>> blast.pm >>>> in >>>> >>> one >>> >>>> location only at the point of a. reading three lines b. >>> dropping lines >>>> with spaces only c. identifying the Query, Midline, and >>> Match lines (0 >>>> <= $i < >>>> >>> 3) >>> >>>> 2. There is a regexp match that fails in order to reach that >>> error message >>> >>>> 3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>> expression >>> >>>> 4. It does anyway >>>> 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>> in the blast >>> >>>> reports >>>> >>>> I suspect a newline/chomp/metacharacter issue. Not finding >>> the string >>>> anywhere has me thoroughly confused - I asked Hubert for the >>> additional >>>> file, assuming that I didn't have it. >>>> >>>> My next thought is to write a quick script to test perl behavior >>>> on "Fedora Core 9". >>>> >>>> Thoughts? >>>> >>>> Did I misread the issue entirely? :} >>>> >>>> Roger >>>> >>>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Chris Fields >>> >>>> Sent: Thursday, February 09, 2006 10:16 AM >>>> To: 'Jason Stajich'; 'Hubert Prielinger' >>>> Cc: bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>> parsing Blast output >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>> Sent: Thursday, February 09, 2006 9:13 AM >>>>> To: Hubert Prielinger >>>>> Cc: Chris Fields; bioperl-l at bioperl.org >>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>> parsing Blast output >>>>> >>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>> >>>>> >>>>>> hi chris, >>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>> >>>>>> >>>>> working, >>>>> >>>>> >>>>>> do you have any ohter idea, the problem I have is that I >>>>>> >>>>>> >>>>> have to parse >>>>> >>>>> >>>>>> a lot of textfiles.... >>>>>> or shall I look for another option to parse those files... >>>>>> >>>>>> regards >>>>>> Hubert >>>>>> >>>>>> >>>>> The code from Bioperl 1.5.1 works fine for me for blast >>>>> 2.2.13 reports but unless you post your blast report we >>> can't really >>>>> determine the problem. >>>>> >>>>> If you are still getting the same error like this I am not >>> convinced >>>>> you have upgraded to 1.5.1 which includes a fix in the fact >>> that NCBI >>>>> changed the HSP result format to remove the ':' from the >>> Query/Sbjct >>>>> prefixes. We fixed this as soon as it was apparent sometime in >>>>> September. >>>>> >>>>> >>>>> >>>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>> STACK toplevel >>>>>>>> >>>>>>>> >>>>>>>> >>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>> Blast.pl:21 >>>>> >>>>> If you are just getting no results but also no warnings wrt >>> parsing, >>>>> are you sure your logic is correct? >>>>> >>>>> If you remove your filters do you see all the HSPS? >>>>> >>>>> >>>>> while (my $result = $search->next_result) { >>>>> print $result->query_name, "\n"; >>>>> #iterate over each hit on the query sequence >>>>> while (my $hit = $result->next_hit) { >>>>> print $hit->name, "\n"; >>>>> #iterate over each HSP in the hit >>>>> while (my $hsp = $hit->next_hsp) { >>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>> >>>>>> hit_string, "\n"; >>>>>> >>>>> } >>>>> } >>>>> } >>>>> >>>>> >>>> I tested some of the BLAST results that Hubert sent Roger >>> and me with a >>>> similar script to the above. I removed the file parsing logic >>>> and it >>>> >>> seemed >>> >>>> to work just fine. It may very well be a logic issue or >>> that he hasn't >>>> installed the latest fix. >>>> It's a funny thing, though. When I tried using blastcl3 (v. >>> 2.2.13), >>>> even though the returned output was from nr, the top of the >>>> blast output showed that it was v2.2.12: >>>> >>>> BLASTP 2.2.12 [Aug-07-2005] >>>> >>>> I double-checked my local version and it's definitely v.2.2.13: >>>> ------------------------------------- >>>> C:\Perl\Scripts>blastcl3 - >>>> >>>> blastcl3 2.2.13 arguments:... >>>> ------------------------------------- >>>> >>>> If you use RemoteBlast using the same settings, the version in >>>> the header looks like this: >>>> >>>> BLASTP 2.2.13 [Nov-27-2005] >>>> >>>> I'm wondering if all the blast executables (blast and netblast) >>>> from NCBI have text output like v.2.2.12, while the wwwblast >>> outputs a new >>>> format (2.2.13). I'll ask blast-help at NCBI about this. >>>> >>>> >>>> >>>>> To clarify some stuff - >>>>> Chris I don't necessarily think the XML is best way forward >>> for BLAST >>>>> reports generated locally, it isn't as detailed as the Text >>> format and >>>>> it is what most people expect to be able to scroll through >>> and parse >>>>> -- it is also harder for the format to change dramatically >>> if you have >>>>> a static binary on your machine =). I think for >>> remoteblast the XML >>>>> format should be the way forward but I expect Bioperl to >>>>> maintain support of any plain text BLAST report format that >>>>> people use on a regular basis. >>>>> >>>>> >>>>> >>>> Does XML lack some specific info that text output has? >>> Didn't know that. >>> I >>> >>>> believe that XML should be default in RemoteBlast since it will >>>> not break, but I agree with you about text output. I also agree >>>> that it will need somebody to maintain it constantly, much like >>>> RemoteBlast. >>>> >>>> >>>> >>>>> -jason >>>>> >>>>> >>>>>> Chris Fields wrote: >>>>>> >>>>>> >>>>>> >>>>>>> My guess is you're running into text parsing problems in >>>>>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>> (1.5.1) or >>>>>>> bioperl-live (CVS), then see the bug below. >>>>>>> >>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>> >>>>>>> I think the first problem you ran into is solved in >>> bioperl 1.5.1, >>>>>>> the last problem (more recent, not related to the first) has >>>>>>> been fixed but hasn't been committed to bioperl-live yet. >>>>>>> The fixed SearchIO::blast is available in the link above, but >>>>>>> >>>>>>> >>>>> realize it hasn't >>>>> >>>>> >>>>>>> been committed yet and may change. >>>>>>> >>>>>>> Christopher Fields >>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>> University of Illinois Urbana-Champaign >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>> Of Hubert >>>>>>>> Prielinger >>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>> To: bioperl-l at bioperl.org >>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>> >>>>>>>> >>>>> parsing Blast >>>>> >>>>> >>>>>>>> output >>>>>>>> >>>>>>>> Hi, >>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>> Bio::SearchIO, I get the following error message: >>>>>>>> >>>>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>> STACK Bio::SearchIO::blast::next_result >>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>> STACK toplevel >>>>>>>> >>>>>>>> >>>>>>>> >>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>> Blast.pl:21 >>>>> >>>>> >>>>>>>> is that a bug...... >>>>>>>> >>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>> get anything..... >>>>>>>> I'm using bioperl 1.4 >>>>>>>> >>>>>>>> before, I have installed bioperl 1.4, it worked fine >>>>>>>> >>>>>>>> >>>>> parsing Blast >>>>> >>>>> >>>>>>>> Output (version 2.2.12), but I don't remember which >>>>>>>> >>>>>>>> >>>>> bioperl version >>>>> >>>>> >>>>>>>> I had installed >>>>>>>> >>>>>>>> thanks in advance >>>>>>>> >>>>>>>> Hubert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> -- >>>>> Jason Stajich >>>>> Duke University >>>>> http://www.duke.edu/~jes12 >>>>> >>>>> >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher - Switzer Lab >>>> Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From heikki at sanbi.ac.za Thu Feb 9 23:47:42 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 10 Feb 2006 06:47:42 +0200 Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) In-Reply-To: <000901c62dbf$49bfae20$15327e82@pyrimidine> References: <000901c62dbf$49bfae20$15327e82@pyrimidine> Message-ID: <200602100647.43173.heikki@sanbi.ac.za> On Thursday 09 February 2006 23:25, Chris Fields wrote: > Thanks! I think, as long as the tests pass everything is fine with me. I > may be submitting another module or two in the next few weeks; just depends > on how much time I can spend on them. Looking forwart to them! -Heikki > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za] > > Sent: Thursday, February 09, 2006 1:42 PM > > To: bioperl-l at lists.open-bio.org > > Cc: Chris Fields > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > Chris, > > > > I committed your file. All tests pass; code looks like > > written by a long term bioperl contributor! Impressive. > > > > I truncated the larger test file from 270K to 20K (200 > > lines), to not bloat the distribution unnecessarily. Tests > > pass which is the main thing. Shout if if you disagree. > > > > Great job! > > > > -Heikki > > > > On Thursday 09 February 2006 19:53, Chris Fields wrote: > > > Heikki, > > > > > > I've added the Bio::Tools::RNAMotif module with test suite > > > > (24 tests) > > > > > and two test data files to bugzilla. The first data file is needed > > > for normal tests, the second is for testing parsing with > > > > modified data > > > > > in the score tag (using sprintf() in the RNAMotif > > > > descriptor). I ran > > > > > 'perl t\RNAMotif.t' and they all passed. > > > > > > Thanks! > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org > > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki > > > > Lehvaslaiho > > > > Sent: Wednesday, February 08, 2006 12:54 AM > > > > To: bioperl-l at lists.open-bio.org > > > > Cc: Chris Fields > > > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif) > > > > > > > > Chris, > > > > > > > > Post your files to bugzilla (ticket type enhancement, add > > > > files to > > > > > > ticket after creation) and someone with commit ability will add > > > > them to CVS once the code is in satisfactory condition. > > > > > > > > Thanks, > > > > > > > > -Heikki > > > > > > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote: > > > > > I want to submit a module for parsing RNAMotif output > > > > > (Bio::Tools::RNAMotif). It is capable, at the moment, > > > > of scanning > > > > > > > output and returning Bio::SeqFeature::Generic objects with > > > > > > > > added tags > > > > > > > > > for descriptors/sequences/file info. I'm in the process of > > > > > > > > writing up > > > > > > > > > tests and going through biodesign to make sure everything's > > > > > kosher, but the module itself is essentially ready-to-go. What > > > > > should I do next? > > > > > > > > > > Christopher Fields > > > > > Postdoctoral Researcher > > > > > Lab of Dr. Robert Switzer > > > > > Dept of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > > > ______ _/ > > > > _/_____________________________________________________ > > > > > > _/ _/ > > > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > > > _/ _/ _/ SANBI, South African National > > > > Bioinformatics Institute > > > > > > _/ _/ _/ University of Western Cape, South Africa > > > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > > > ___ > > > > _/_/_/_/_/________________________________________________________ > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > ______ _/ _/_____________________________________________________ > > _/ _/ > > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > > _/ _/ _/ SANBI, South African National Bioinformatics Institute > > _/ _/ _/ University of Western Cape, South Africa > > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > > ___ > > _/_/_/_/_/________________________________________________________ > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Feb 9 23:51:11 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 10 Feb 2006 06:51:11 +0200 Subject: [Bioperl-l] module for finding restriction site in batch of sequences? In-Reply-To: References: Message-ID: <200602100651.12028.heikki@sanbi.ac.za> It should: #loop over each seq my $ra=Bio::Restriction::Analysis->new(-seq=>$seq1); @cuts = $ra->fragments('EcoRI'); # or call some other method or is it something else you are trying to do? Yours, -Heikki On Thursday 09 February 2006 22:53, Lalancette, Claudia wrote: > Greetings, > > > > I need to find a way to look for a specific restriction enzyme site in > hundreds of sequences. Been looking at Bio::Restriction, but not sure > if will work... Any suggestions? > > > > Thanks, > > Claudia > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Fri Feb 10 02:06:11 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 10 Feb 2006 09:06:11 +0200 Subject: [Bioperl-l] planning sequence mutating modules Message-ID: <200602100906.11885.heikki@sanbi.ac.za> Ryan Golhar's mail got me thinking that we should have a simple framework for mutating sequences to a desired level. The model can then be extended to necessary complexity when needed by subclassing. To start with, I have been planning: Bio::SeqEvolution::EvolutionI - interface file Bio::SeqEvolution::EvolutionI::seq() - seq to mutate Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, (defaults to Bio::PrimarySeq) Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses Bio::SeqEvolution::EvolutionI::each_seqs($count) - returns an array of $count seqs Bio::SeqEvolution::EvolutionI::_generate_seq() Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring converteed to probabilites of change internally various methods to define the extent of divergence: only one to start with: Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation (= 100% - identity) Bio::SeqEvolution::Factory - core class to call, instantiates subclasses, Bio::SeqEvolution::DNASimple for nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, defaults to Bio::SeqEvolution::DNASimple for nucleotides Bio::SeqEvolution::DNASimple - default for nucleotides Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, e.g. 5 => 5:1, defaults to 1:1 simple alternative to a scoring matrix I am soliciting usual comments and suggestions about naming and minimal functionality. -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Pieter.Monsieurs at esat.kuleuven.be Fri Feb 10 03:53:43 2006 From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs) Date: Fri, 10 Feb 2006 09:53:43 +0100 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> Message-ID: <43EC5497.3050505@esat.kuleuven.be> Hi Chris, The parsing of the Blast output still doesn't work for me with the bug fix download of blast.pm. The module keeps turning around in the while loop at line 487 looking for a database or query-size: while( defined ($_) ) { if( /^Database:/ ) { $self->_pushback($_); last; } chomp; if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { $size = $1; $size =~ s/,//g; last; } else { $q .= " $_"; $q =~ s/ +/ /g; $q =~ s/^ | $//g; } $_ = $self->_readline; } The code keeps looking for the database information, however - as you mentioned - this information is given before the query line in the new Blast output format. This way, all hits and hsps are stored in the query_description ($hit->query_description), no hits are found and query_length is 0. Because you already adapted the module to retrieve database information at another position in the module, deleting the while loop and adding the following lines after $_ = $self->_readline (line 486), worked fine for me (using blastn and blastp): if (/Length=([\d,]+)/) { $size = $1; $size =~ s/,//g; } Regards, Pieter Chris Fields wrote: > From 'perldoc Bio::SearchIO::blast': > >DESCRIPTION > This object encapsulated the necessary methods for generating >events > suitable for building Bio::Search objects from a BLAST report >file. > Read the Bio::SearchIO for more information about how to use >this. > > This driver can parse: > > o NCBI produced plain text BLAST reports from blastall, >this also > includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >reports. NCBI > XML BLAST output is parsed with the blastxml SearchIO driver > > o WU-BLAST all reports > > o Jim Kent's BLAST-like output from his programs (BLASTZ, >BLAT) > > o BLAST-like output from Paracel BTK output > >So, it should. Let us know if it doesn't. > >On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: > > > >>Hi Chris, >>I'm incredibly sorry for causing so much inconvenience, yes you are >>right, I had only to change the blast.pm file, it is working very >>fine, thank you very much, and you are right, you have mentioned it >>ealier either to change the file... ;) >> >>but I have another question: does it work with the WU-Blast output >>too? >>regards >>Hubert >> >> >>Chris Fields wrote: >> >> >> >>>Ha! I come back from meeting and there's a billion emails! What >>>have we >>>started? ;p . Sorry about this Jason; I know you're busy. >>> >>>Hubert, if you're out there, I sent you an email with an >>>attachment. You >>>said the output looks like what you were expecting. So I think we >>>have two >>>problems: >>> >>>1) I haven't delved into the file scanning, but the fact that it >>>takes so >>>long should tell you something's seriously wrong there. Strip >>>that part out >>>and start with a simple script, say, like the one Jason or that I >>>sent you; >>>the script I used to generate that output works fine (on two OS's, >>>WinXP and >>>Mac OS X). Use it on one file at a time. Do everything on >>>command line >>>(not through Eclipse). IDE's can be notoriously flaky about running >>>scripts, esp. when they run debugging. >>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>will still >>>not work whenever the text blast output has the following header, >>>which >>>comes from the new web version of BLAST: >>> >>>----------------------------------------------------- >>>BLASTP 2.2.13 [Nov-27-2005] >>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>> >>>RID: 1139501210-857-165793005128.BLASTQ1 >>> >>> >>>Database: All non-redundant GenBank CDS >>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>> 3,292,813 sequences; 1,128,164,434 total letters >>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>tuberculosis H37Rv]. >>>Length=193 >>>....... >>>----------------------------------------------------- >>> >>>It will work if the text output has the following header (or is an >>>older >>>version of BLAST): >>> >>>----------------------------------------------------- >>>BLASTP 2.2.12 [Aug-07-2005] >>> >>> >>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>protein database search >>>programs", Nucleic Acids Res. 25:3389-3402. >>> >>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>tuberculosis H37Rv]. >>> (193 letters) >>> >>>Database: All non-redundant GenBank CDS >>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>> 2,895,325 sequences; 997,103,285 total letters >>>----------------------------------------------------- >>>You have the former (2.2.13) version. I know b/c I have your >>>BLAST files. >>>Therefore, even bioperl-1.5.1 will not work! >>> >>>If you want the really gory details on why this is a problem, look >>>here: >>> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>> >>>So, any text output with the above header will not work; it will >>>either hang >>>or end abruptly (depending on OS, perl version, memory, >>>patience). If you >>>look in the above, I have added a preliminary fix for this. I'll >>>reiterate >>>for the billionth time, it hasn't been committed yet, so don't >>>kill me if >>>blows your computer up ;> >>>Here's the direct link: >>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>it's version >>>1.90, but it's lying, I didn't change the version, only the regex; >>>sorry >>>Jason). From what you've been posting it doesn't sound like >>>you've tried >>>this, and I believe I've suggested this fix before. >>> >>>Replace the one in your Bio/SearchIO directory (which looks like >>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>prev. >>>message) with this file. Make sure the filename stays the same >>>(blast.pm). >>> >>>Run everything again, one file at a time. Make sure you use >>>Jason's script >>>as well as the one I sent you. Do NOT rely on running through >>>multiple >>>files yet. Fix one bug at a time. And heed Joel's words about >>>file checks. >>> >>> >>>Here's a small chunk of output from one of your blast files using the >>>modifed script I sent you: >>> >>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>Query: 1 RWKWKRKK 8 >>>Seq: 542 RWAWRRKK 549 >>> >>>Look familiar? >>> >>>Christopher Fields >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>>>-----Original Message----- >>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>February 09, 2006 3:24 PM >>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>parsing Blast output >>>> >>>>In other words, yes, I'm on the wrong trail. :} >>>> >>>>Sorry - I'll look at the output issue this evening (or realize >>>>that Chris already solved the issue). ;} >>>> >>>>Thanks! >>>> >>>>Roger >>>> >>>>-----Original Message----- >>>>From: bioperl-l-bounces at lists.open-bio.org >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>Prielinger >>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>Stajich >>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>parsing Blast output >>>> >>>>dear roger, >>>>this error message I got, when I tried to parse Blast output >>>>(version >>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>a lot of Blast output files with version 2.2.13 and for that I >>>>don't get any error message.....it just doesn't work >>>> >>>>Hubert >>>> >>>> >>>> >>>>Roger Hall wrote: >>>> >>>> >>>> >>>> >>>>>Guys - I'm looking at the error message: >>>>> >>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>STACK Bio::SearchIO::blast::next_result >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>STACK toplevel >>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>Blast.pl:21 >>>>> >>>>>This is my line of thought: >>>>>1. "no data for midline $_" is a unique message generated by >>>>> >>>>> >>>>blast.pm >>>> >>>> >>>>>in >>>>> >>>>> >>>>> >>>>one >>>> >>>> >>>> >>>>>location only at the point of a. reading three lines b. >>>>> >>>>> >>>>dropping lines >>>> >>>> >>>>>with spaces only c. identifying the Query, Midline, and >>>>> >>>>> >>>>Match lines (0 >>>> >>>> >>>>><= $i < >>>>> >>>>> >>>>> >>>>3) >>>> >>>> >>>> >>>>>2. There is a regexp match that fails in order to reach that >>>>> >>>>> >>>>error message >>>> >>>> >>>> >>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>> >>>>> >>>>expression >>>> >>>> >>>> >>>>>4. It does anyway >>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>> >>>>> >>>>in the blast >>>> >>>> >>>> >>>>>reports >>>>> >>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>> >>>>> >>>>the string >>>> >>>> >>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>> >>>>> >>>>additional >>>> >>>> >>>>>file, assuming that I didn't have it. >>>>> >>>>>My next thought is to write a quick script to test perl behavior >>>>>on "Fedora Core 9". >>>>> >>>>>Thoughts? >>>>> >>>>>Did I misread the issue entirely? :} >>>>> >>>>>Roger >>>>> >>>>> >>>>>-----Original Message----- >>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>> >>>>> >>>>Chris Fields >>>> >>>> >>>> >>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>Cc: bioperl-l at bioperl.org >>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>parsing Blast output >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>To: Hubert Prielinger >>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>hi chris, >>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>working, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>have to parse >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>a lot of textfiles.... >>>>>>>or shall I look for another option to parse those files... >>>>>>> >>>>>>>regards >>>>>>>Hubert >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>2.2.13 reports but unless you post your blast report we >>>>>> >>>>>> >>>>can't really >>>> >>>> >>>>>>determine the problem. >>>>>> >>>>>>If you are still getting the same error like this I am not >>>>>> >>>>>> >>>>convinced >>>> >>>> >>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>> >>>>>> >>>>that NCBI >>>> >>>> >>>>>>changed the HSP result format to remove the ':' from the >>>>>> >>>>>> >>>>Query/Sbjct >>>> >>>> >>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>September. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>STACK toplevel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>Blast.pl:21 >>>>>> >>>>>>If you are just getting no results but also no warnings wrt >>>>>> >>>>>> >>>>parsing, >>>> >>>> >>>>>>are you sure your logic is correct? >>>>>> >>>>>>If you remove your filters do you see all the HSPS? >>>>>> >>>>>> >>>>>>while (my $result = $search->next_result) { >>>>>> print $result->query_name, "\n"; >>>>>> #iterate over each hit on the query sequence >>>>>> while (my $hit = $result->next_hit) { >>>>>> print $hit->name, "\n"; >>>>>> #iterate over each HSP in the hit >>>>>> while (my $hsp = $hit->next_hsp) { >>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>> >>>>>> >>>>>> >>>>>>>hit_string, "\n"; >>>>>>> >>>>>>> >>>>>>> >>>>>> } >>>>>> } >>>>>>} >>>>>> >>>>>> >>>>>> >>>>>> >>>>>I tested some of the BLAST results that Hubert sent Roger >>>>> >>>>> >>>>and me with a >>>> >>>> >>>>>similar script to the above. I removed the file parsing logic >>>>>and it >>>>> >>>>> >>>>> >>>>seemed >>>> >>>> >>>> >>>>>to work just fine. It may very well be a logic issue or >>>>> >>>>> >>>>that he hasn't >>>> >>>> >>>>>installed the latest fix. >>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>> >>>>> >>>>2.2.13), >>>> >>>> >>>>>even though the returned output was from nr, the top of the >>>>>blast output showed that it was v2.2.12: >>>>> >>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>> >>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>------------------------------------- >>>>>C:\Perl\Scripts>blastcl3 - >>>>> >>>>>blastcl3 2.2.13 arguments:... >>>>>------------------------------------- >>>>> >>>>>If you use RemoteBlast using the same settings, the version in >>>>>the header looks like this: >>>>> >>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>> >>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>> >>>>> >>>>outputs a new >>>> >>>> >>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>To clarify some stuff - >>>>>>Chris I don't necessarily think the XML is best way forward >>>>>> >>>>>> >>>>for BLAST >>>> >>>> >>>>>>reports generated locally, it isn't as detailed as the Text >>>>>> >>>>>> >>>>format and >>>> >>>> >>>>>>it is what most people expect to be able to scroll through >>>>>> >>>>>> >>>>and parse >>>> >>>> >>>>>>-- it is also harder for the format to change dramatically >>>>>> >>>>>> >>>>if you have >>>> >>>> >>>>>>a static binary on your machine =). I think for >>>>>> >>>>>> >>>>remoteblast the XML >>>> >>>> >>>>>>format should be the way forward but I expect Bioperl to >>>>>>maintain support of any plain text BLAST report format that >>>>>>people use on a regular basis. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Does XML lack some specific info that text output has? >>>>> >>>>> >>>>Didn't know that. >>>>I >>>> >>>> >>>> >>>>>believe that XML should be default in RemoteBlast since it will >>>>>not break, but I agree with you about text output. I also agree >>>>>that it will need somebody to maintain it constantly, much like >>>>>RemoteBlast. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-jason >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Chris Fields wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>(1.5.1) or >>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>> >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>> >>>>>>>>I think the first problem you ran into is solved in >>>>>>>> >>>>>>>> >>>>bioperl 1.5.1, >>>> >>>> >>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>realize it hasn't >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>been committed yet and may change. >>>>>>>> >>>>>>>>Christopher Fields >>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>-----Original Message----- >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>> >>>>>>>>> >>>>Of Hubert >>>> >>>> >>>>>>>>>Prielinger >>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>parsing Blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>output >>>>>>>>> >>>>>>>>>Hi, >>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>> >>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>STACK toplevel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>Blast.pl:21 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>is that a bug...... >>>>>>>>> >>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>get anything..... >>>>>>>>>I'm using bioperl 1.4 >>>>>>>>> >>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>parsing Blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>bioperl version >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>I had installed >>>>>>>>> >>>>>>>>>thanks in advance >>>>>>>>> >>>>>>>>>Hubert >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>-- >>>>>>Jason Stajich >>>>>>Duke University >>>>>>http://www.duke.edu/~jes12 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Christopher Fields >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>_______________________________________________ >>>>Bioperl-l mailing list >>>>Bioperl-l at lists.open-bio.org >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> >>> >>> > >Christopher Fields >Postdoctoral Researcher >Lab of Dr. Robert Switzer >Dept of Biochemistry >University of Illinois Urbana-Champaign > > > > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From Pieter.Monsieurs at esat.kuleuven.be Fri Feb 10 04:44:10 2006 From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs) Date: Fri, 10 Feb 2006 10:44:10 +0100 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43EC5497.3050505@esat.kuleuven.be> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> <43EC5497.3050505@esat.kuleuven.be> Message-ID: <43EC606A.20003@esat.kuleuven.be> Sorry for disturbing. I now works correctly with the bug fix of Chris. Thanx, Pieter Pieter Monsieurs wrote: >Hi Chris, > >The parsing of the Blast output still doesn't work for me with the bug >fix download of blast.pm. >The module keeps turning around in the while loop at line 487 looking >for a database or query-size: > >while( defined ($_) ) { > if( /^Database:/ ) { > $self->_pushback($_); > last; > } > chomp; > if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { > $size = $1; > $size =~ s/,//g; > last; > } else { > $q .= " $_"; > $q =~ s/ +/ /g; > $q =~ s/^ | $//g; > } > $_ = $self->_readline; >} > > >The code keeps looking for the database information, however - as you >mentioned - this information is given before the query line in the new >Blast output format. >This way, all hits and hsps are stored in the query_description >($hit->query_description), no hits are found and query_length is 0. >Because you already adapted the module to retrieve database information >at another position in the module, deleting the while loop and adding >the following lines after $_ = $self->_readline (line 486), worked fine >for me (using blastn and blastp): > >if (/Length=([\d,]+)/) { > $size = $1; > $size =~ s/,//g; >} > > >Regards, >Pieter > > > >Chris Fields wrote: > > > >>From 'perldoc Bio::SearchIO::blast': >> >>DESCRIPTION >> This object encapsulated the necessary methods for generating >>events >> suitable for building Bio::Search objects from a BLAST report >>file. >> Read the Bio::SearchIO for more information about how to use >>this. >> >> This driver can parse: >> >> o NCBI produced plain text BLAST reports from blastall, >>this also >> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >>reports. NCBI >> XML BLAST output is parsed with the blastxml SearchIO driver >> >> o WU-BLAST all reports >> >> o Jim Kent's BLAST-like output from his programs (BLASTZ, >>BLAT) >> >> o BLAST-like output from Paracel BTK output >> >>So, it should. Let us know if it doesn't. >> >>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: >> >> >> >> >> >>>Hi Chris, >>>I'm incredibly sorry for causing so much inconvenience, yes you are >>>right, I had only to change the blast.pm file, it is working very >>>fine, thank you very much, and you are right, you have mentioned it >>>ealier either to change the file... ;) >>> >>>but I have another question: does it work with the WU-Blast output >>>too? >>>regards >>>Hubert >>> >>> >>>Chris Fields wrote: >>> >>> >>> >>> >>> >>>>Ha! I come back from meeting and there's a billion emails! What >>>>have we >>>>started? ;p . Sorry about this Jason; I know you're busy. >>>> >>>>Hubert, if you're out there, I sent you an email with an >>>>attachment. You >>>>said the output looks like what you were expecting. So I think we >>>>have two >>>>problems: >>>> >>>>1) I haven't delved into the file scanning, but the fact that it >>>>takes so >>>>long should tell you something's seriously wrong there. Strip >>>>that part out >>>>and start with a simple script, say, like the one Jason or that I >>>>sent you; >>>>the script I used to generate that output works fine (on two OS's, >>>>WinXP and >>>>Mac OS X). Use it on one file at a time. Do everything on >>>>command line >>>>(not through Eclipse). IDE's can be notoriously flaky about running >>>>scripts, esp. when they run debugging. >>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>>will still >>>>not work whenever the text blast output has the following header, >>>>which >>>>comes from the new web version of BLAST: >>>> >>>>----------------------------------------------------- >>>>BLASTP 2.2.13 [Nov-27-2005] >>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>>> >>>>RID: 1139501210-857-165793005128.BLASTQ1 >>>> >>>> >>>>Database: All non-redundant GenBank CDS >>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>> 3,292,813 sequences; 1,128,164,434 total letters >>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>tuberculosis H37Rv]. >>>>Length=193 >>>>....... >>>>----------------------------------------------------- >>>> >>>>It will work if the text output has the following header (or is an >>>>older >>>>version of BLAST): >>>> >>>>----------------------------------------------------- >>>>BLASTP 2.2.12 [Aug-07-2005] >>>> >>>> >>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>protein database search >>>>programs", Nucleic Acids Res. 25:3389-3402. >>>> >>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>tuberculosis H37Rv]. >>>> (193 letters) >>>> >>>>Database: All non-redundant GenBank CDS >>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>> 2,895,325 sequences; 997,103,285 total letters >>>>----------------------------------------------------- >>>>You have the former (2.2.13) version. I know b/c I have your >>>>BLAST files. >>>>Therefore, even bioperl-1.5.1 will not work! >>>> >>>>If you want the really gory details on why this is a problem, look >>>>here: >>>> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>> >>>>So, any text output with the above header will not work; it will >>>>either hang >>>>or end abruptly (depending on OS, perl version, memory, >>>>patience). If you >>>>look in the above, I have added a preliminary fix for this. I'll >>>>reiterate >>>>for the billionth time, it hasn't been committed yet, so don't >>>>kill me if >>>>blows your computer up ;> >>>>Here's the direct link: >>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>>it's version >>>>1.90, but it's lying, I didn't change the version, only the regex; >>>>sorry >>>>Jason). From what you've been posting it doesn't sound like >>>>you've tried >>>>this, and I believe I've suggested this fix before. >>>> >>>>Replace the one in your Bio/SearchIO directory (which looks like >>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>>prev. >>>>message) with this file. Make sure the filename stays the same >>>>(blast.pm). >>>> >>>>Run everything again, one file at a time. Make sure you use >>>>Jason's script >>>>as well as the one I sent you. Do NOT rely on running through >>>>multiple >>>>files yet. Fix one bug at a time. And heed Joel's words about >>>>file checks. >>>> >>>> >>>>Here's a small chunk of output from one of your blast files using the >>>>modifed script I sent you: >>>> >>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>>Query: 1 RWKWKRKK 8 >>>>Seq: 542 RWAWRRKK 549 >>>> >>>>Look familiar? >>>> >>>>Christopher Fields >>>>Postdoctoral Researcher - Switzer Lab >>>>Dept. of Biochemistry >>>>University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>>> >>>>>-----Original Message----- >>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>>February 09, 2006 3:24 PM >>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>parsing Blast output >>>>> >>>>>In other words, yes, I'm on the wrong trail. :} >>>>> >>>>>Sorry - I'll look at the output issue this evening (or realize >>>>>that Chris already solved the issue). ;} >>>>> >>>>>Thanks! >>>>> >>>>>Roger >>>>> >>>>>-----Original Message----- >>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>Prielinger >>>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>>Stajich >>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>parsing Blast output >>>>> >>>>>dear roger, >>>>>this error message I got, when I tried to parse Blast output >>>>>(version >>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>>a lot of Blast output files with version 2.2.13 and for that I >>>>>don't get any error message.....it just doesn't work >>>>> >>>>>Hubert >>>>> >>>>> >>>>> >>>>>Roger Hall wrote: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Guys - I'm looking at the error message: >>>>>> >>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>STACK toplevel >>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>Blast.pl:21 >>>>>> >>>>>>This is my line of thought: >>>>>>1. "no data for midline $_" is a unique message generated by >>>>>> >>>>>> >>>>>> >>>>>> >>>>>blast.pm >>>>> >>>>> >>>>> >>>>> >>>>>>in >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>one >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>location only at the point of a. reading three lines b. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>dropping lines >>>>> >>>>> >>>>> >>>>> >>>>>>with spaces only c. identifying the Query, Midline, and >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Match lines (0 >>>>> >>>>> >>>>> >>>>> >>>>>><= $i < >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>3) >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>2. There is a regexp match that fails in order to reach that >>>>>> >>>>>> >>>>>> >>>>>> >>>>>error message >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>>> >>>>>> >>>>>> >>>>>> >>>>>expression >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>4. It does anyway >>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>>> >>>>>> >>>>>> >>>>>> >>>>>in the blast >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>reports >>>>>> >>>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>>> >>>>>> >>>>>> >>>>>> >>>>>the string >>>>> >>>>> >>>>> >>>>> >>>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>>> >>>>>> >>>>>> >>>>>> >>>>>additional >>>>> >>>>> >>>>> >>>>> >>>>>>file, assuming that I didn't have it. >>>>>> >>>>>>My next thought is to write a quick script to test perl behavior >>>>>>on "Fedora Core 9". >>>>>> >>>>>>Thoughts? >>>>>> >>>>>>Did I misread the issue entirely? :} >>>>>> >>>>>>Roger >>>>>> >>>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Chris Fields >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>>Cc: bioperl-l at bioperl.org >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>-----Original Message----- >>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>>To: Hubert Prielinger >>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>parsing Blast output >>>>>>> >>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>hi chris, >>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>working, >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>have to parse >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>a lot of textfiles.... >>>>>>>>or shall I look for another option to parse those files... >>>>>>>> >>>>>>>>regards >>>>>>>>Hubert >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>>2.2.13 reports but unless you post your blast report we >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>can't really >>>>> >>>>> >>>>> >>>>> >>>>>>>determine the problem. >>>>>>> >>>>>>>If you are still getting the same error like this I am not >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>convinced >>>>> >>>>> >>>>> >>>>> >>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>that NCBI >>>>> >>>>> >>>>> >>>>> >>>>>>>changed the HSP result format to remove the ':' from the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>Query/Sbjct >>>>> >>>>> >>>>> >>>>> >>>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>>September. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>STACK toplevel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>>If you are just getting no results but also no warnings wrt >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>parsing, >>>>> >>>>> >>>>> >>>>> >>>>>>>are you sure your logic is correct? >>>>>>> >>>>>>>If you remove your filters do you see all the HSPS? >>>>>>> >>>>>>> >>>>>>>while (my $result = $search->next_result) { >>>>>>> print $result->query_name, "\n"; >>>>>>> #iterate over each hit on the query sequence >>>>>>> while (my $hit = $result->next_hit) { >>>>>>> print $hit->name, "\n"; >>>>>>> #iterate over each HSP in the hit >>>>>>> while (my $hsp = $hit->next_hsp) { >>>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>hit_string, "\n"; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> } >>>>>>> } >>>>>>>} >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>I tested some of the BLAST results that Hubert sent Roger >>>>>> >>>>>> >>>>>> >>>>>> >>>>>and me with a >>>>> >>>>> >>>>> >>>>> >>>>>>similar script to the above. I removed the file parsing logic >>>>>>and it >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>seemed >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>to work just fine. It may very well be a logic issue or >>>>>> >>>>>> >>>>>> >>>>>> >>>>>that he hasn't >>>>> >>>>> >>>>> >>>>> >>>>>>installed the latest fix. >>>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>2.2.13), >>>>> >>>>> >>>>> >>>>> >>>>>>even though the returned output was from nr, the top of the >>>>>>blast output showed that it was v2.2.12: >>>>>> >>>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>>> >>>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>>------------------------------------- >>>>>>C:\Perl\Scripts>blastcl3 - >>>>>> >>>>>>blastcl3 2.2.13 arguments:... >>>>>>------------------------------------- >>>>>> >>>>>>If you use RemoteBlast using the same settings, the version in >>>>>>the header looks like this: >>>>>> >>>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>> >>>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>> >>>>>> >>>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>> >>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>outputs a new >>>>> >>>>> >>>>> >>>>> >>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>To clarify some stuff - >>>>>>>Chris I don't necessarily think the XML is best way forward >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>for BLAST >>>>> >>>>> >>>>> >>>>> >>>>>>>reports generated locally, it isn't as detailed as the Text >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>format and >>>>> >>>>> >>>>> >>>>> >>>>>>>it is what most people expect to be able to scroll through >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>and parse >>>>> >>>>> >>>>> >>>>> >>>>>>>-- it is also harder for the format to change dramatically >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>if you have >>>>> >>>>> >>>>> >>>>> >>>>>>>a static binary on your machine =). I think for >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>remoteblast the XML >>>>> >>>>> >>>>> >>>>> >>>>>>>format should be the way forward but I expect Bioperl to >>>>>>>maintain support of any plain text BLAST report format that >>>>>>>people use on a regular basis. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Does XML lack some specific info that text output has? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>Didn't know that. >>>>>I >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>believe that XML should be default in RemoteBlast since it will >>>>>>not break, but I agree with you about text output. I also agree >>>>>>that it will need somebody to maintain it constantly, much like >>>>>>RemoteBlast. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>-jason >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>Chris Fields wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>>(1.5.1) or >>>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>>> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>> >>>>>>>>>I think the first problem you ran into is solved in >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>bioperl 1.5.1, >>>>> >>>>> >>>>> >>>>> >>>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>realize it hasn't >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>been committed yet and may change. >>>>>>>>> >>>>>>>>>Christopher Fields >>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>-----Original Message----- >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>Of Hubert >>>>> >>>>> >>>>> >>>>> >>>>>>>>>>Prielinger >>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>parsing Blast >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>output >>>>>>>>>> >>>>>>>>>>Hi, >>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>>> >>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>STACK toplevel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>is that a bug...... >>>>>>>>>> >>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>>get anything..... >>>>>>>>>>I'm using bioperl 1.4 >>>>>>>>>> >>>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>parsing Blast >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>bioperl version >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>>>I had installed >>>>>>>>>> >>>>>>>>>>thanks in advance >>>>>>>>>> >>>>>>>>>>Hubert >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>>Bioperl-l mailing list >>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>-- >>>>>>>Jason Stajich >>>>>>>Duke University >>>>>>>http://www.duke.edu/~jes12 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Christopher Fields >>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>Dept. of Biochemistry >>>>>>University of Illinois Urbana-Champaign >>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>_______________________________________________ >>>>>Bioperl-l mailing list >>>>>Bioperl-l at lists.open-bio.org >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> >>Christopher Fields >>Postdoctoral Researcher >>Lab of Dr. Robert Switzer >>Dept of Biochemistry >>University of Illinois Urbana-Champaign >> >> >> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> >> >> > > >Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From andrej.kastrin at guest.arnes.si Fri Feb 10 09:28:19 2006 From: andrej.kastrin at guest.arnes.si (Andrej Kastrin) Date: Fri, 10 Feb 2006 15:28:19 +0100 Subject: [Bioperl-l] Medline to XML Message-ID: <43ECA303.8090904@guest.arnes.si> Dear users, my problem is not directly related to this list, by I hope, you can help me. Is there any tool to convert standard Medline record to XML format. I know there is build in function (med2xml) in Pubmed, but I'm looking for some independent perl script. Thanks in advance for any suggesions or pointers. Cheers, Andrej From cjfields at uiuc.edu Fri Feb 10 12:01:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Feb 2006 11:01:27 -0600 Subject: [Bioperl-l] Handling miRNA's In-Reply-To: Message-ID: <001801c62e63$a4a71090$15327e82@pyrimidine> I don't think there's anything like this in Bioperl, and I'm unfamilar with the naming scheme you're using. If you're searching for specific miRNA's, a good resource looks like the miRNA database, which seems to be updated regularly (http://microrna.sanger.ac.uk/sequences/) and uses the same system for RNA annotation that you use (which, I'm guessing, is a standardized annotation scheme of some sort). I believe the database is downloadable and searchable by name, so you could probably build a querying scheme using LWP or HTTP::Request (if the web interface allows for this). I know that Sean Eddy's Rfam database (http://www.sanger.ac.uk/Software/Rfam/) also has information on miRNA's, but it's somewhat limited. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > barry.m.dancis at gsk.com > Sent: Wednesday, February 08, 2006 3:45 PM > To: 'bioperl-l'; bioperl-l-bounces at lists.open-bio.org > Cc: James.R.Brown at gsk.com > Subject: Re: [Bioperl-l] Handling miRNA's > > Hi Chris-- > > The problem I am solving is given a mature miRna > name, how do I use it to search for its pre/pri miRna and > vice versa. For example, how to go from mir-102a* to > hsa-mir-102a-1*. Yes, I can write a parser for it, but I'm > hoping that someone else has already done it and has some > bells and whistles to go with it. Below is a hierarchy chart > of a data structure to hold the naming information. The > parsing is not trivial and given data in that structure there > could be all kinds of neat functions that return various > aspects of the names. > > Barry > > > > > > > > > > > > > "Chris Fields" > Sent by: bioperl-l-bounces at lists.open-bio.org > 07-Feb-2006 17:40 > > To > barry.m.dancis at gsk.com, "'bioperl-l'" cc > > Subject > Re: [Bioperl-l] Handling miRNA's > > > > > > > Are you talking about sequences or text output from a > specific program? If you are talking about sequences in a > particular format, then listen to Brian. If you are talking > about output, then we need to know which program you're > using, as a parser may exist or could be built. > > There are a few modules in Bio::Tools that handle RNA (like > QRNA, tRNAscan-SE), so check those out first. I'm currently > finishing up a Bio::Tools module for RNAMotif and have plans > for making an ERPIN parser. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > > barry.m.dancis at gsk.com > > Sent: Tuesday, February 07, 2006 2:26 PM > > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org > > Subject: Re: [Bioperl-l] Handling miRNA's > > > > It's the parser in particular that I need > > > > > > > > > > "Brian Osborne" Sent by: > > bioperl-l-bounces at lists.open-bio.org > > 07-Feb-2006 12:05 > > > > To > > barry.m.dancis at gsk.com, "bioperl-l" , > > bioperl-l-bounces at lists.open-bio.org > > cc > > > > Subject > > Re: [Bioperl-l] Handling miRNA's > > > > > > > > > > > > > > Barry, > > > > If the sequence information is in one of the formats that Bioperl > > understands (Genbank, Swissprot flat, and so on) then the answer is > > yes. > > This assumes that the details on sequence that you > mentioned are found > > in some sequence feature section in the file. But it looks > to me like > > there's no specialized parser for miRNA sequence per se, I'll be > > corrected if I'm wrong. > > > > Brian O. > > > > > > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" > > > wrote: > > > > > Hi -- > > > > > > Are there any classes for manipulating miRNA's with > > functions > > such > > > as parsing the name, storing and interlinking pri/pre/mat > sequences, > > etc? > > > > > > Thanks, > > > > > > Barry > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From allenday at ucla.edu Fri Feb 10 11:13:39 2006 From: allenday at ucla.edu (Allen Day) Date: Fri, 10 Feb 2006 08:13:39 -0800 (PST) Subject: [Bioperl-l] Medline to XML In-Reply-To: <43ECA303.8090904@guest.arnes.si> References: <43ECA303.8090904@guest.arnes.si> Message-ID: why not just retrieve xml directly from the eutils service? -allen On Fri, 10 Feb 2006, Andrej Kastrin wrote: > Dear users, > > my problem is not directly related to this list, by I hope, you can help > me. Is there any tool to convert standard Medline record to XML format. > I know there is build in function (med2xml) in Pubmed, but I'm looking > for some independent perl script. > > Thanks in advance for any suggesions or pointers. > > Cheers, Andrej > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Feb 10 12:15:17 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 10 Feb 2006 12:15:17 -0500 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> Message-ID: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> Paul - The reason for suggesting a change has to do with the instability of the CGI interface/format of the returned data, the text format is not a stable format from the webserver which reportedly will cease to be reliably parsed. Yes we can keep hacking the blast parser code to handle this, but the bioperl release cycle is certainly not tied to the NCBI blast release cycle so I find it unsatisfying to know that we are going to have broken code when they change the output formats (but not know when). Mostly I think we need to try and support something that will "ALWAYS" work so that individuals setting up webservices which rely on remote blast functionality. In theory, netblast/blastcl3 should always work since NCBI has to update the exe when they change their server setup. In terms of the web-based queues - I think the best change we can make is have the XML be the preferred retrieval method. I also see value in providing a wrapper for netblast since it should look an awful lot like running blast locally. Ideally I'd like to see a more extensible system, something like (and please feel free to come up with better names for the modules!): Bio::Tools::Run::Blast --> StandAlone (support for both WU-BLAST and NCBI- BLAST local binaries and MPI-BLAST too if simple) --> RemoteNCBI (currently the RemoteBlast server) --> RemoteEBISOAP (EBI has a nice SOAP interface that works quite well, but may not provide all the same databases as what people expect from NCBI) --> RemoteNetBlast (blastcl3 or netblast local executable) (other things that people want) [note: If these ideas are appealing or not, someone should archive the discussions and discussions on the wiki page so we can rely less on people searching the mailing archives for how a decision was made. Perhaps Roger can do this sort of editing in addition to the planning for support of this module]. -jason On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote: > Hi Roger, > > I would definitely prefer a fully Perl-based implementation. For > starters, I have not > been successful in compiling the Toolkit that contains netblast for > some platforms (e.g. > AIX 5.2 w/gcc 4.0). > > I haven't been following the discussion: is there some compelling > reason to prefer a > netblast-based system that's come up recently? I'm guessing that > adding a new non-perl > dependency would only be done if there was considerable > justification for this type of > change, but I'm not clear from your message what that justification > is. > > Paul > > > > ------------------------------ > > Message: 12 > Date: Mon, 6 Feb 2006 20:46:44 -0600 > From: "Roger Hall" > Subject: [Bioperl-l] RemoteBlast users - potentially major changes - > please reply > To: > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> > Content-Type: text/plain; charset="us-ascii" > > To everyone who uses RemoteBlast.pm: > > Would anyone object to RemoteBlast being rewritten in a way that > requires > NCBI's blastcl3 executable? > > Binary downloads of blastcl3 (column "netblast") are available for > numerous > platforms at: http://ncbi.nih.gov/BLAST/download.shtml > > Does anyone require or desire a "pure perl" implementation? If so, > please > explain the advantage you see with such an implementation. > > Thanks! > > > Roger Hall > > Technical Director > > MidSouth Bioinformatics Center > > University of Arkansas at Little Rock > > (501) 569-8074 > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From hubert.prielinger at gmx.at Fri Feb 10 11:26:47 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 10 Feb 2006 10:26:47 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43EC606A.20003@esat.kuleuven.be> References: <000e01c62dca$bc66df60$15327e82@pyrimidine> <43EBC03E.4040900@gmx.at> <43EC5497.3050505@esat.kuleuven.be> <43EC606A.20003@esat.kuleuven.be> Message-ID: <43ECBEC7.7040506@gmx.at> Hi, I'm sorry for disturbing once more. Yesterday the script was working, today it isn't working at all, but I didn't change anything, I get the following error message: ------------- EXCEPTION ------------- MSG: Could not open comp80swiss2114.txt: No such file or directory STACK Bio::Root::IO::_initialize_io /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167 STACK toplevel ./Blast.pl:14 -------------------------------------- the file exists and the bug I have fixed yesterday thanks for help Hubert Pieter Monsieurs wrote: > Sorry for disturbing. I now works correctly with the bug fix of Chris. > Thanx, > Pieter > > Pieter Monsieurs wrote: > >>Hi Chris, >> >>The parsing of the Blast output still doesn't work for me with the bug >>fix download of blast.pm. >>The module keeps turning around in the while loop at line 487 looking >>for a database or query-size: >> >>while( defined ($_) ) { >> if( /^Database:/ ) { >> $self->_pushback($_); >> last; >> } >> chomp; >> if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { >> $size = $1; >> $size =~ s/,//g; >> last; >> } else { >> $q .= " $_"; >> $q =~ s/ +/ /g; >> $q =~ s/^ | $//g; >> } >> $_ = $self->_readline; >>} >> >> >>The code keeps looking for the database information, however - as you >>mentioned - this information is given before the query line in the new >>Blast output format. >>This way, all hits and hsps are stored in the query_description >>($hit->query_description), no hits are found and query_length is 0. >>Because you already adapted the module to retrieve database information >>at another position in the module, deleting the while loop and adding >>the following lines after $_ = $self->_readline (line 486), worked fine >>for me (using blastn and blastp): >> >>if (/Length=([\d,]+)/) { >> $size = $1; >> $size =~ s/,//g; >>} >> >> >>Regards, >>Pieter >> >> >> >>Chris Fields wrote: >> >> >> >>>From 'perldoc Bio::SearchIO::blast': >>> >>>DESCRIPTION >>> This object encapsulated the necessary methods for generating >>>events >>> suitable for building Bio::Search objects from a BLAST report >>>file. >>> Read the Bio::SearchIO for more information about how to use >>>this. >>> >>> This driver can parse: >>> >>> o NCBI produced plain text BLAST reports from blastall, >>>this also >>> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >>>reports. NCBI >>> XML BLAST output is parsed with the blastxml SearchIO driver >>> >>> o WU-BLAST all reports >>> >>> o Jim Kent's BLAST-like output from his programs (BLASTZ, >>>BLAT) >>> >>> o BLAST-like output from Paracel BTK output >>> >>>So, it should. Let us know if it doesn't. >>> >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: >>> >>> >>> >>> >>> >>>>Hi Chris, >>>>I'm incredibly sorry for causing so much inconvenience, yes you are >>>>right, I had only to change the blast.pm file, it is working very >>>>fine, thank you very much, and you are right, you have mentioned it >>>>ealier either to change the file... ;) >>>> >>>>but I have another question: does it work with the WU-Blast output >>>>too? >>>>regards >>>>Hubert >>>> >>>> >>>>Chris Fields wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Ha! I come back from meeting and there's a billion emails! What >>>>>have we >>>>>started? ;p . Sorry about this Jason; I know you're busy. >>>>> >>>>>Hubert, if you're out there, I sent you an email with an >>>>>attachment. You >>>>>said the output looks like what you were expecting. So I think we >>>>>have two >>>>>problems: >>>>> >>>>>1) I haven't delved into the file scanning, but the fact that it >>>>>takes so >>>>>long should tell you something's seriously wrong there. Strip >>>>>that part out >>>>>and start with a simple script, say, like the one Jason or that I >>>>>sent you; >>>>>the script I used to generate that output works fine (on two OS's, >>>>>WinXP and >>>>>Mac OS X). Use it on one file at a time. Do everything on >>>>>command line >>>>>(not through Eclipse). IDE's can be notoriously flaky about running >>>>>scripts, esp. when they run debugging. >>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>>>will still >>>>>not work whenever the text blast output has the following header, >>>>>which >>>>>comes from the new web version of BLAST: >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>RID: 1139501210-857-165793005128.BLASTQ1 >>>>> >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 3,292,813 sequences; 1,128,164,434 total letters >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>>Length=193 >>>>>....... >>>>>----------------------------------------------------- >>>>> >>>>>It will work if the text output has the following header (or is an >>>>>older >>>>>version of BLAST): >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>> >>>>> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search >>>>>programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>> (193 letters) >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 2,895,325 sequences; 997,103,285 total letters >>>>>----------------------------------------------------- >>>>>You have the former (2.2.13) version. I know b/c I have your >>>>>BLAST files. >>>>>Therefore, even bioperl-1.5.1 will not work! >>>>> >>>>>If you want the really gory details on why this is a problem, look >>>>>here: >>>>> >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>>So, any text output with the above header will not work; it will >>>>>either hang >>>>>or end abruptly (depending on OS, perl version, memory, >>>>>patience). If you >>>>>look in the above, I have added a preliminary fix for this. I'll >>>>>reiterate >>>>>for the billionth time, it hasn't been committed yet, so don't >>>>>kill me if >>>>>blows your computer up ;> >>>>>Here's the direct link: >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>>>it's version >>>>>1.90, but it's lying, I didn't change the version, only the regex; >>>>>sorry >>>>>Jason). From what you've been posting it doesn't sound like >>>>>you've tried >>>>>this, and I believe I've suggested this fix before. >>>>> >>>>>Replace the one in your Bio/SearchIO directory (which looks like >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>>>prev. >>>>>message) with this file. Make sure the filename stays the same >>>>>(blast.pm). >>>>> >>>>>Run everything again, one file at a time. Make sure you use >>>>>Jason's script >>>>>as well as the one I sent you. Do NOT rely on running through >>>>>multiple >>>>>files yet. Fix one bug at a time. And heed Joel's words about >>>>>file checks. >>>>> >>>>> >>>>>Here's a small chunk of output from one of your blast files using the >>>>>modifed script I sent you: >>>>> >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>>>Query: 1 RWKWKRKK 8 >>>>>Seq: 542 RWAWRRKK 549 >>>>> >>>>>Look familiar? >>>>> >>>>>Christopher Fields >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>>>February 09, 2006 3:24 PM >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>In other words, yes, I'm on the wrong trail. :} >>>>>> >>>>>>Sorry - I'll look at the output issue this evening (or realize >>>>>>that Chris already solved the issue). ;} >>>>>> >>>>>>Thanks! >>>>>> >>>>>>Roger >>>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>>Prielinger >>>>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>>>Stajich >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>dear roger, >>>>>>this error message I got, when I tried to parse Blast output >>>>>>(version >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>>>a lot of Blast output files with version 2.2.13 and for that I >>>>>>don't get any error message.....it just doesn't work >>>>>> >>>>>>Hubert >>>>>> >>>>>> >>>>>> >>>>>>Roger Hall wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Guys - I'm looking at the error message: >>>>>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>>This is my line of thought: >>>>>>>1. "no data for midline $_" is a unique message generated by >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>blast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>in >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>one >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>location only at the point of a. reading three lines b. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>dropping lines >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>with spaces only c. identifying the Query, Midline, and >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Match lines (0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>><= $i < >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>3) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>2. There is a regexp match that fails in order to reach that >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>error message >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>expression >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>4. It does anyway >>>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>in the blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>reports >>>>>>> >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>the string >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>additional >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>file, assuming that I didn't have it. >>>>>>> >>>>>>>My next thought is to write a quick script to test perl behavior >>>>>>>on "Fedora Core 9". >>>>>>> >>>>>>>Thoughts? >>>>>>> >>>>>>>Did I misread the issue entirely? :} >>>>>>> >>>>>>>Roger >>>>>>> >>>>>>> >>>>>>>-----Original Message----- >>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Chris Fields >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>>>Cc: bioperl-l at bioperl.org >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>parsing Blast output >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-----Original Message----- >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>>>To: Hubert Prielinger >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>parsing Blast output >>>>>>>> >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hi chris, >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>working, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>have to parse >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>a lot of textfiles.... >>>>>>>>>or shall I look for another option to parse those files... >>>>>>>>> >>>>>>>>>regards >>>>>>>>>Hubert >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>>>2.2.13 reports but unless you post your blast report we >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>can't really >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>determine the problem. >>>>>>>> >>>>>>>>If you are still getting the same error like this I am not >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>convinced >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>that NCBI >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>changed the HSP result format to remove the ':' from the >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>Query/Sbjct >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>>>September. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>>If you are just getting no results but also no warnings wrt >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>parsing, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>are you sure your logic is correct? >>>>>>>> >>>>>>>>If you remove your filters do you see all the HSPS? >>>>>>>> >>>>>>>> >>>>>>>>while (my $result = $search->next_result) { >>>>>>>> print $result->query_name, "\n"; >>>>>>>> #iterate over each hit on the query sequence >>>>>>>> while (my $hit = $result->next_hit) { >>>>>>>> print $hit->name, "\n"; >>>>>>>> #iterate over each HSP in the hit >>>>>>>> while (my $hsp = $hit->next_hsp) { >>>>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hit_string, "\n"; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>>} >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>I tested some of the BLAST results that Hubert sent Roger >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>and me with a >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>similar script to the above. I removed the file parsing logic >>>>>>>and it >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>seemed >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>to work just fine. It may very well be a logic issue or >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>that he hasn't >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>installed the latest fix. >>>>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>2.2.13), >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>even though the returned output was from nr, the top of the >>>>>>>blast output showed that it was v2.2.12: >>>>>>> >>>>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>>>> >>>>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>>>------------------------------------- >>>>>>>C:\Perl\Scripts>blastcl3 - >>>>>>> >>>>>>>blastcl3 2.2.13 arguments:... >>>>>>>------------------------------------- >>>>>>> >>>>>>>If you use RemoteBlast using the same settings, the version in >>>>>>>the header looks like this: >>>>>>> >>>>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>>> >>>>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>>> >>>>>>> >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>outputs a new >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>To clarify some stuff - >>>>>>>>Chris I don't necessarily think the XML is best way forward >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>for BLAST >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>reports generated locally, it isn't as detailed as the Text >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>format and >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>it is what most people expect to be able to scroll through >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>and parse >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>-- it is also harder for the format to change dramatically >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>if you have >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>a static binary on your machine =). I think for >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>remoteblast the XML >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>format should be the way forward but I expect Bioperl to >>>>>>>>maintain support of any plain text BLAST report format that >>>>>>>>people use on a regular basis. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Does XML lack some specific info that text output has? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Didn't know that. >>>>>>I >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>believe that XML should be default in RemoteBlast since it will >>>>>>>not break, but I agree with you about text output. I also agree >>>>>>>that it will need somebody to maintain it constantly, much like >>>>>>>RemoteBlast. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Chris Fields wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>>>(1.5.1) or >>>>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>>>> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> >>>>>>>>>>I think the first problem you ran into is solved in >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>bioperl 1.5.1, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>realize it hasn't >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>been committed yet and may change. >>>>>>>>>> >>>>>>>>>>Christopher Fields >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>-----Original Message----- >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>Of Hubert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>>Prielinger >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>output >>>>>>>>>>> >>>>>>>>>>>Hi, >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>is that a bug...... >>>>>>>>>>> >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>>>get anything..... >>>>>>>>>>>I'm using bioperl 1.4 >>>>>>>>>>> >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>bioperl version >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>I had installed >>>>>>>>>>> >>>>>>>>>>>thanks in advance >>>>>>>>>>> >>>>>>>>>>>Hubert >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>-- >>>>>>>>Jason Stajich >>>>>>>>Duke University >>>>>>>>http://www.duke.edu/~jes12 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Christopher Fields >>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>Dept. of Biochemistry >>>>>>>University of Illinois Urbana-Champaign >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>Christopher Fields >>>Postdoctoral Researcher >>>Lab of Dr. Robert Switzer >>>Dept of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >> >> >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more > information. > From cjfields at uiuc.edu Fri Feb 10 12:45:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Feb 2006 11:45:32 -0600 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> Message-ID: <002201c62e69$ca8363d0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Jason Stajich > Sent: Friday, February 10, 2006 11:15 AM > To: Paul.Boutros at utoronto.ca > Cc: BioPerl Mailing List > Subject: [Bioperl-l] Remote BLAST support discussion > > Paul - > > The reason for suggesting a change has to do with the > instability of the CGI interface/format of the returned data, > the text format is not a stable format from the webserver > which reportedly will cease to be reliably parsed. Yes we > can keep hacking the blast parser code to handle this, but > the bioperl release cycle is certainly not tied to the NCBI > blast release cycle so I find it unsatisfying to know that we > are going to have broken code when they change the output > formats (but not know when). > > Mostly I think we need to try and support something that will > "ALWAYS" work so that individuals setting up webservices > which rely on remote blast functionality. In theory, > netblast/blastcl3 should always work since NCBI has to update > the exe when they change their server setup. > > In terms of the web-based queues - I think the best change we > can make is have the XML be the preferred retrieval method. > > I also see value in providing a wrapper for netblast since it > should look an awful lot like running blast locally. > > Ideally I'd like to see a more extensible system, something > like (and please feel free to come up with better names for > the modules!): > > Bio::Tools::Run::Blast > --> StandAlone (support for both WU-BLAST and NCBI-> BLAST local binaries and MPI-BLAST too if simple) > --> RemoteNCBI (currently the RemoteBlast server) > --> RemoteEBISOAP (EBI has a nice SOAP interface that works quite well, but may not provide all the same databases as what people expect from NCBI) > --> RemoteNetBlast (blastcl3 or netblast local executable) > (other things that people want) Sounds good to me. I think any wrapper for netblast could most easily be based on StandAloneBlast; the parameters look pretty much identical, though it'll probably need a little configuring as a quick text search through StandAloneBlast didn't show any 'xml' tags. Roger seemed to agree on this. > [note: If these ideas are appealing or not, someone should > archive the discussions and discussions on the wiki page so > we can rely less on people searching the mailing archives for > how a decision was made. Perhaps Roger can do this sort of > editing in addition to the planning for support of this module]. > > -jason > > On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote: > > > Hi Roger, > > > > I would definitely prefer a fully Perl-based implementation. For > > starters, I have not been successful in compiling the Toolkit that > > contains netblast for some platforms (e.g. > > AIX 5.2 w/gcc 4.0). > > > > I haven't been following the discussion: is there some compelling > > reason to prefer a netblast-based system that's come up > recently? I'm > > guessing that adding a new non-perl dependency would only > be done if > > there was considerable justification for this type of > change, but I'm > > not clear from your message what that justification is. > > > > Paul > > > > > > > > ------------------------------ > > > > Message: 12 > > Date: Mon, 6 Feb 2006 20:46:44 -0600 > > From: "Roger Hall" > > Subject: [Bioperl-l] RemoteBlast users - potentially major changes - > > please reply > > To: > > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> > > Content-Type: text/plain; charset="us-ascii" > > > > To everyone who uses RemoteBlast.pm: > > > > Would anyone object to RemoteBlast being rewritten in a way that > > requires NCBI's blastcl3 executable? > > > > Binary downloads of blastcl3 (column "netblast") are available for > > numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml > > > > Does anyone require or desire a "pure perl" implementation? If so, > > please explain the advantage you see with such an implementation. > > > > Thanks! > > > > > > Roger Hall > > > > Technical Director > > > > MidSouth Bioinformatics Center > > > > University of Arkansas at Little Rock > > > > (501) 569-8074 > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From rahall2 at ualr.edu Fri Feb 10 12:54:23 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 10 Feb 2006 11:54:23 -0600 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <002201c62e69$ca8363d0$15327e82@pyrimidine> Message-ID: <002501c62e6b$0686be30$d416a790@LIBERAL> It seems so obvious now. :} The only issue I see is likely obvious to those of you who have maintained this over the years - no backward compatibility, but I can live with that if yall can. I will document on wikki as suggested and then build the RemoteNCBI module described. After that is tested and committed, I will contact Torsten to see if I can help with the rest. Thanks! Roger > > Bio::Tools::Run::Blast > --> StandAlone (support for both WU-BLAST and NCBI-> BLAST local binaries and MPI-BLAST too if simple) > --> RemoteNCBI (currently the RemoteBlast server) > --> RemoteEBISOAP (EBI has a nice SOAP interface that works quite well, but may not provide all the same databases as what people expect from NCBI) > --> RemoteNetBlast (blastcl3 or netblast local executable) > (other things that people want) Sounds good to me. I think any wrapper for netblast could most easily be based on StandAloneBlast; the parameters look pretty much identical, though it'll probably need a little configuring as a quick text search through StandAloneBlast didn't show any 'xml' tags. Roger seemed to agree on this. From rahall2 at ualr.edu Fri Feb 10 13:00:51 2006 From: rahall2 at ualr.edu (Roger Hall) Date: Fri, 10 Feb 2006 12:00:51 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <43ECBEC7.7040506@gmx.at> Message-ID: <002701c62e6b$edd845b0$d416a790@LIBERAL> Hubert, I got the same message when I first ran your script. The issue for me was that "readdir(DIR)" doesn't return the full path, only the file name. I edited your script to include: $file = $directory . '/' . $file; just before the Bio::SearchIO call. Roger -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger Sent: Friday, February 10, 2006 10:27 AM To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; rahall2 at ualr.edu Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output Hi, I'm sorry for disturbing once more. Yesterday the script was working, today it isn't working at all, but I didn't change anything, I get the following error message: ------------- EXCEPTION ------------- MSG: Could not open comp80swiss2114.txt: No such file or directory STACK Bio::Root::IO::_initialize_io /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273 STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135 STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167 STACK toplevel ./Blast.pl:14 -------------------------------------- the file exists and the bug I have fixed yesterday thanks for help Hubert Pieter Monsieurs wrote: > Sorry for disturbing. I now works correctly with the bug fix of Chris. > Thanx, > Pieter > > Pieter Monsieurs wrote: > >>Hi Chris, >> >>The parsing of the Blast output still doesn't work for me with the bug >>fix download of blast.pm. >>The module keeps turning around in the while loop at line 487 looking >>for a database or query-size: >> >>while( defined ($_) ) { >> if( /^Database:/ ) { >> $self->_pushback($_); >> last; >> } >> chomp; >> if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { >> $size = $1; >> $size =~ s/,//g; >> last; >> } else { >> $q .= " $_"; >> $q =~ s/ +/ /g; >> $q =~ s/^ | $//g; >> } >> $_ = $self->_readline; >>} >> >> >>The code keeps looking for the database information, however - as you >>mentioned - this information is given before the query line in the new >>Blast output format. >>This way, all hits and hsps are stored in the query_description >>($hit->query_description), no hits are found and query_length is 0. >>Because you already adapted the module to retrieve database information >>at another position in the module, deleting the while loop and adding >>the following lines after $_ = $self->_readline (line 486), worked fine >>for me (using blastn and blastp): >> >>if (/Length=([\d,]+)/) { >> $size = $1; >> $size =~ s/,//g; >>} >> >> >>Regards, >>Pieter >> >> >> >>Chris Fields wrote: >> >> >> >>>From 'perldoc Bio::SearchIO::blast': >>> >>>DESCRIPTION >>> This object encapsulated the necessary methods for generating >>>events >>> suitable for building Bio::Search objects from a BLAST report >>>file. >>> Read the Bio::SearchIO for more information about how to use >>>this. >>> >>> This driver can parse: >>> >>> o NCBI produced plain text BLAST reports from blastall, >>>this also >>> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq >>>reports. NCBI >>> XML BLAST output is parsed with the blastxml SearchIO driver >>> >>> o WU-BLAST all reports >>> >>> o Jim Kent's BLAST-like output from his programs (BLASTZ, >>>BLAT) >>> >>> o BLAST-like output from Paracel BTK output >>> >>>So, it should. Let us know if it doesn't. >>> >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: >>> >>> >>> >>> >>> >>>>Hi Chris, >>>>I'm incredibly sorry for causing so much inconvenience, yes you are >>>>right, I had only to change the blast.pm file, it is working very >>>>fine, thank you very much, and you are right, you have mentioned it >>>>ealier either to change the file... ;) >>>> >>>>but I have another question: does it work with the WU-Blast output >>>>too? >>>>regards >>>>Hubert >>>> >>>> >>>>Chris Fields wrote: >>>> >>>> >>>> >>>> >>>> >>>>>Ha! I come back from meeting and there's a billion emails! What >>>>>have we >>>>>started? ;p . Sorry about this Jason; I know you're busy. >>>>> >>>>>Hubert, if you're out there, I sent you an email with an >>>>>attachment. You >>>>>said the output looks like what you were expecting. So I think we >>>>>have two >>>>>problems: >>>>> >>>>>1) I haven't delved into the file scanning, but the fact that it >>>>>takes so >>>>>long should tell you something's seriously wrong there. Strip >>>>>that part out >>>>>and start with a simple script, say, like the one Jason or that I >>>>>sent you; >>>>>the script I used to generate that output works fine (on two OS's, >>>>>WinXP and >>>>>Mac OS X). Use it on one file at a time. Do everything on >>>>>command line >>>>>(not through Eclipse). IDE's can be notoriously flaky about running >>>>>scripts, esp. when they run debugging. >>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast >>>>>will still >>>>>not work whenever the text blast output has the following header, >>>>>which >>>>>comes from the new web version of BLAST: >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>RID: 1139501210-857-165793005128.BLASTQ1 >>>>> >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 3,292,813 sequences; 1,128,164,434 total letters >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>>Length=193 >>>>>....... >>>>>----------------------------------------------------- >>>>> >>>>>It will work if the text output has the following header (or is an >>>>>older >>>>>version of BLAST): >>>>> >>>>>----------------------------------------------------- >>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>> >>>>> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of >>>>>protein database search >>>>>programs", Nucleic Acids Res. 25:3389-3402. >>>>> >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium >>>>>tuberculosis H37Rv]. >>>>> (193 letters) >>>>> >>>>>Database: All non-redundant GenBank CDS >>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples >>>>> 2,895,325 sequences; 997,103,285 total letters >>>>>----------------------------------------------------- >>>>>You have the former (2.2.13) version. I know b/c I have your >>>>>BLAST files. >>>>>Therefore, even bioperl-1.5.1 will not work! >>>>> >>>>>If you want the really gory details on why this is a problem, look >>>>>here: >>>>> >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>>So, any text output with the above header will not work; it will >>>>>either hang >>>>>or end abruptly (depending on OS, perl version, memory, >>>>>patience). If you >>>>>look in the above, I have added a preliminary fix for this. I'll >>>>>reiterate >>>>>for the billionth time, it hasn't been committed yet, so don't >>>>>kill me if >>>>>blows your computer up ;> >>>>>Here's the direct link: >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view >>>>>This is a modified version of Bio::SearchIO::blast.pm (it says >>>>>it's version >>>>>1.90, but it's lying, I didn't change the version, only the regex; >>>>>sorry >>>>>Jason). From what you've been posting it doesn't sound like >>>>>you've tried >>>>>this, and I believe I've suggested this fix before. >>>>> >>>>>Replace the one in your Bio/SearchIO directory (which looks like >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your >>>>>prev. >>>>>message) with this file. Make sure the filename stays the same >>>>>(blast.pm). >>>>> >>>>>Run everything again, one file at a time. Make sure you use >>>>>Jason's script >>>>>as well as the one I sent you. Do NOT rely on running through >>>>>multiple >>>>>files yet. Fix one bug at a time. And heed Joel's words about >>>>>file checks. >>>>> >>>>> >>>>>Here's a small chunk of output from one of your blast files using the >>>>>modifed script I sent you: >>>>> >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 >>>>>Query: 1 RWKWKRKK 8 >>>>>Seq: 542 RWAWRRKK 549 >>>>> >>>>>Look familiar? >>>>> >>>>>Christopher Fields >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>>-----Original Message----- >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, >>>>>>February 09, 2006 3:24 PM >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>In other words, yes, I'm on the wrong trail. :} >>>>>> >>>>>>Sorry - I'll look at the output issue this evening (or realize >>>>>>that Chris already solved the issue). ;} >>>>>> >>>>>>Thanks! >>>>>> >>>>>>Roger >>>>>> >>>>>>-----Original Message----- >>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>>Prielinger >>>>>>Sent: Thursday, February 09, 2006 2:14 PM >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason >>>>>>Stajich >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>parsing Blast output >>>>>> >>>>>>dear roger, >>>>>>this error message I got, when I tried to parse Blast output >>>>>>(version >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have >>>>>>a lot of Blast output files with version 2.2.13 and for that I >>>>>>don't get any error message.....it just doesn't work >>>>>> >>>>>>Hubert >>>>>> >>>>>> >>>>>> >>>>>>Roger Hall wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Guys - I'm looking at the error message: >>>>>>> >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>STACK toplevel >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>Blast.pl:21 >>>>>>> >>>>>>>This is my line of thought: >>>>>>>1. "no data for midline $_" is a unique message generated by >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>blast.pm >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>in >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>one >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>location only at the point of a. reading three lines b. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>dropping lines >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>with spaces only c. identifying the Query, Midline, and >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Match lines (0 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>><= $i < >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>3) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>2. There is a regexp match that fails in order to reach that >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>error message >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>expression >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>4. It does anyway >>>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>in the blast >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>reports >>>>>>> >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>the string >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>additional >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>file, assuming that I didn't have it. >>>>>>> >>>>>>>My next thought is to write a quick script to test perl behavior >>>>>>>on "Fedora Core 9". >>>>>>> >>>>>>>Thoughts? >>>>>>> >>>>>>>Did I misread the issue entirely? :} >>>>>>> >>>>>>>Roger >>>>>>> >>>>>>> >>>>>>>-----Original Message----- >>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Chris Fields >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' >>>>>>>Cc: bioperl-l at bioperl.org >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>parsing Blast output >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-----Original Message----- >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM >>>>>>>>To: Hubert Prielinger >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>parsing Blast output >>>>>>>> >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hi chris, >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>working, >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>do you have any ohter idea, the problem I have is that I >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>have to parse >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>a lot of textfiles.... >>>>>>>>>or shall I look for another option to parse those files... >>>>>>>>> >>>>>>>>>regards >>>>>>>>>Hubert >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast >>>>>>>>2.2.13 reports but unless you post your blast report we >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>can't really >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>determine the problem. >>>>>>>> >>>>>>>>If you are still getting the same error like this I am not >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>convinced >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>that NCBI >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>changed the HSP result format to remove the ':' from the >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>Query/Sbjct >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>prefixes. We fixed this as soon as it was apparent sometime in >>>>>>>>September. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>>If you are just getting no results but also no warnings wrt >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>parsing, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>are you sure your logic is correct? >>>>>>>> >>>>>>>>If you remove your filters do you see all the HSPS? >>>>>>>> >>>>>>>> >>>>>>>>while (my $result = $search->next_result) { >>>>>>>> print $result->query_name, "\n"; >>>>>>>> #iterate over each hit on the query sequence >>>>>>>> while (my $hit = $result->next_hit) { >>>>>>>> print $hit->name, "\n"; >>>>>>>> #iterate over each HSP in the hit >>>>>>>> while (my $hsp = $hit->next_hsp) { >>>>>>>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>hit_string, "\n"; >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> } >>>>>>>> } >>>>>>>>} >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>I tested some of the BLAST results that Hubert sent Roger >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>and me with a >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>similar script to the above. I removed the file parsing logic >>>>>>>and it >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>seemed >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>to work just fine. It may very well be a logic issue or >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>that he hasn't >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>installed the latest fix. >>>>>>> It's a funny thing, though. When I tried using blastcl3 (v. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>2.2.13), >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>even though the returned output was from nr, the top of the >>>>>>>blast output showed that it was v2.2.12: >>>>>>> >>>>>>>BLASTP 2.2.12 [Aug-07-2005] >>>>>>> >>>>>>>I double-checked my local version and it's definitely v.2.2.13: >>>>>>>------------------------------------- >>>>>>>C:\Perl\Scripts>blastcl3 - >>>>>>> >>>>>>>blastcl3 2.2.13 arguments:... >>>>>>>------------------------------------- >>>>>>> >>>>>>>If you use RemoteBlast using the same settings, the version in >>>>>>>the header looks like this: >>>>>>> >>>>>>>BLASTP 2.2.13 [Nov-27-2005] >>>>>>> >>>>>>>I'm wondering if all the blast executables (blast and netblast) >>>>>>> >>>>>>> >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>outputs a new >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>To clarify some stuff - >>>>>>>>Chris I don't necessarily think the XML is best way forward >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>for BLAST >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>reports generated locally, it isn't as detailed as the Text >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>format and >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>it is what most people expect to be able to scroll through >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>and parse >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>-- it is also harder for the format to change dramatically >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>if you have >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>a static binary on your machine =). I think for >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>remoteblast the XML >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>format should be the way forward but I expect Bioperl to >>>>>>>>maintain support of any plain text BLAST report format that >>>>>>>>people use on a regular basis. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Does XML lack some specific info that text output has? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>Didn't know that. >>>>>>I >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>believe that XML should be default in RemoteBlast since it will >>>>>>>not break, but I agree with you about text output. I also agree >>>>>>>that it will need somebody to maintain it constantly, much like >>>>>>>RemoteBlast. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>>-jason >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>Chris Fields wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>My guess is you're running into text parsing problems in >>>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest developer version >>>>>>>>>>(1.5.1) or >>>>>>>>>>bioperl-live (CVS), then see the bug below. >>>>>>>>>> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> >>>>>>>>>>I think the first problem you ran into is solved in >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>bioperl 1.5.1, >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>the last problem (more recent, not related to the first) has >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. >>>>>>>>>>The fixed SearchIO::blast is available in the link above, but >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>realize it hasn't >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>been committed yet and may change. >>>>>>>>>> >>>>>>>>>>Christopher Fields >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>-----Original Message----- >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>Of Hubert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>>>>>>Prielinger >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM >>>>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>output >>>>>>>>>>> >>>>>>>>>>>Hi, >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with >>>>>>>>>>>Bio::SearchIO, I get the following error message: >>>>>>>>>>> >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>>>>>>>STACK toplevel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ >>>>>>>>Blast.pl:21 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>is that a bug...... >>>>>>>>>>> >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't >>>>>>>>>>>get anything..... >>>>>>>>>>>I'm using bioperl 1.4 >>>>>>>>>>> >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>parsing Blast >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>Output (version 2.2.12), but I don't remember which >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>bioperl version >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>>I had installed >>>>>>>>>>> >>>>>>>>>>>thanks in advance >>>>>>>>>>> >>>>>>>>>>>Hubert >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>_______________________________________________ >>>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>-- >>>>>>>>Jason Stajich >>>>>>>>Duke University >>>>>>>>http://www.duke.edu/~jes12 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>Christopher Fields >>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>Dept. of Biochemistry >>>>>>>University of Illinois Urbana-Champaign >>>>>>> >>>>>>>_______________________________________________ >>>>>>>Bioperl-l mailing list >>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>_______________________________________________ >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>Christopher Fields >>>Postdoctoral Researcher >>>Lab of Dr. Robert Switzer >>>Dept of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >>>_______________________________________________ >>>Bioperl-l mailing list >>>Bioperl-l at lists.open-bio.org >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >>> >> >> >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more > information. > _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Feb 10 13:08:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 10 Feb 2006 12:08:37 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast output In-Reply-To: <002701c62e6b$edd845b0$d416a790@LIBERAL> Message-ID: <002501c62e6d$04158530$15327e82@pyrimidine> Makes sense. I didn't see this since I passed the files directly from command-line. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Roger Hall [mailto:rahall2 at ualr.edu] > Sent: Friday, February 10, 2006 12:01 PM > To: 'Hubert Prielinger'; 'Pieter Monsieurs'; > bioperl-l at bioperl.org; 'Chris Fields' > Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing blast output > > Hubert, > > I got the same message when I first ran your script. The > issue for me was that "readdir(DIR)" doesn't return the full > path, only the file name. > > I edited your script to include: > > $file = $directory . '/' . $file; > > just before the Bio::SearchIO call. > > Roger > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Hubert Prielinger > Sent: Friday, February 10, 2006 10:27 AM > To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; > rahall2 at ualr.edu > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > parsing blast output > > Hi, > I'm sorry for disturbing once more. Yesterday the script was > working, today it isn't working at all, but I didn't change > anything, I get the following error message: > > ------------- EXCEPTION ------------- > MSG: Could not open comp80swiss2114.txt: No such file or > directory STACK Bio::Root::IO::_initialize_io > /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273 > STACK Bio::Root::IO::new > /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213 > STACK Bio::SearchIO::new > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135 > STACK Bio::SearchIO::new > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167 > STACK toplevel ./Blast.pl:14 > > -------------------------------------- > > the file exists and the bug I have fixed yesterday thanks for help > > Hubert > > > > > Pieter Monsieurs wrote: > > > Sorry for disturbing. I now works correctly with the bug > fix of Chris. > > Thanx, > > Pieter > > > > Pieter Monsieurs wrote: > > > >>Hi Chris, > >> > >>The parsing of the Blast output still doesn't work for me > with the bug > >>fix download of blast.pm. > >>The module keeps turning around in the while loop at line > 487 looking > >>for a database or query-size: > >> > >>while( defined ($_) ) { > >> if( /^Database:/ ) { > >> $self->_pushback($_); > >> last; > >> } > >> chomp; > >> if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) { > >> $size = $1; > >> $size =~ s/,//g; > >> last; > >> } else { > >> $q .= " $_"; > >> $q =~ s/ +/ /g; > >> $q =~ s/^ | $//g; > >> } > >> $_ = $self->_readline; > >>} > >> > >> > >>The code keeps looking for the database information, > however - as you > >>mentioned - this information is given before the query line > in the new > >>Blast output format. > >>This way, all hits and hsps are stored in the query_description > >>($hit->query_description), no hits are found and query_length is 0. > >>Because you already adapted the module to retrieve database > >>information at another position in the module, deleting the > while loop > >>and adding the following lines after $_ = $self->_readline > (line 486), > >>worked fine for me (using blastn and blastp): > >> > >>if (/Length=([\d,]+)/) { > >> $size = $1; > >> $size =~ s/,//g; > >>} > >> > >> > >>Regards, > >>Pieter > >> > >> > >> > >>Chris Fields wrote: > >> > >> > >> > >>>From 'perldoc Bio::SearchIO::blast': > >>> > >>>DESCRIPTION > >>> This object encapsulated the necessary methods for > generating > >>>events > >>> suitable for building Bio::Search objects from a > BLAST report > >>>file. > >>> Read the Bio::SearchIO for more information about > how to use > >>>this. > >>> > >>> This driver can parse: > >>> > >>> o NCBI produced plain text BLAST reports from blastall, > >>>this also > >>> includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq > >>>reports. NCBI > >>> XML BLAST output is parsed with the blastxml SearchIO > >>>driver > >>> > >>> o WU-BLAST all reports > >>> > >>> o Jim Kent's BLAST-like output from his programs > (BLASTZ, > >>>BLAT) > >>> > >>> o BLAST-like output from Paracel BTK output > >>> > >>>So, it should. Let us know if it doesn't. > >>> > >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote: > >>> > >>> > >>> > >>> > >>> > >>>>Hi Chris, > >>>>I'm incredibly sorry for causing so much inconvenience, > yes you are > >>>>right, I had only to change the blast.pm file, it is working very > >>>>fine, thank you very much, and you are right, you have > mentioned it > >>>>ealier either to change the file... ;) > >>>> > >>>>but I have another question: does it work with the > WU-Blast output > >>>>too? > >>>>regards > >>>>Hubert > >>>> > >>>> > >>>>Chris Fields wrote: > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>>Ha! I come back from meeting and there's a billion > emails! What > >>>>>have we started? ;p . Sorry about this Jason; I know > you're busy. > >>>>> > >>>>>Hubert, if you're out there, I sent you an email with an > >>>>>attachment. You said the output looks like what you were > >>>>>expecting. So I think we have two > >>>>>problems: > >>>>> > >>>>>1) I haven't delved into the file scanning, but the > fact that it > >>>>>takes so long should tell you something's seriously > wrong there. > >>>>>Strip that part out and start with a simple script, say, > like the > >>>>>one Jason or that I sent you; the script I used to generate that > >>>>>output works fine (on two OS's, WinXP and Mac OS X). > Use it on one > >>>>>file at a time. Do everything on command line (not through > >>>>>Eclipse). IDE's can be notoriously flaky about running scripts, > >>>>>esp. when they run debugging. > >>>>>2) Even if you have bioperl-1.5.1 installed, > Bio::SearchIO::blast > >>>>>will still not work whenever the text blast output has the > >>>>>following header, which comes from the new web version of BLAST: > >>>>> > >>>>>----------------------------------------------------- > >>>>>BLASTP 2.2.13 [Nov-27-2005] > >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and > David J. > >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of > >>>>>protein database search programs", Nucleic Acids Res. > 25:3389-3402. > >>>>> > >>>>>RID: 1139501210-857-165793005128.BLASTQ1 > >>>>> > >>>>> > >>>>>Database: All non-redundant GenBank CDS > >>>>>translations+PDB+SwissProt+PIR+PRF excluding > environmental samples > >>>>> 3,292,813 sequences; 1,128,164,434 total > letters Query= > >>>>>NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium > >>>>>tuberculosis H37Rv]. > >>>>>Length=193 > >>>>>....... > >>>>>----------------------------------------------------- > >>>>> > >>>>>It will work if the text output has the following header > (or is an > >>>>>older version of BLAST): > >>>>> > >>>>>----------------------------------------------------- > >>>>>BLASTP 2.2.12 [Aug-07-2005] > >>>>> > >>>>> > >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. > >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of > >>>>>protein database search programs", Nucleic Acids Res. > >>>>>25:3389-3402. > >>>>> > >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR > [Mycobacterium > >>>>>tuberculosis H37Rv]. > >>>>> (193 letters) > >>>>> > >>>>>Database: All non-redundant GenBank CDS > >>>>>translations+PDB+SwissProt+PIR+PRF excluding > environmental samples > >>>>> 2,895,325 sequences; 997,103,285 total letters > >>>>>----------------------------------------------------- > >>>>>You have the former (2.2.13) version. I know b/c I have > your BLAST > >>>>>files. > >>>>>Therefore, even bioperl-1.5.1 will not work! > >>>>> > >>>>>If you want the really gory details on why this is a > problem, look > >>>>>here: > >>>>> > >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>> > >>>>>So, any text output with the above header will not work; it will > >>>>>either hang or end abruptly (depending on OS, perl > version, memory, > >>>>>patience). If you look in the above, I have added a preliminary > >>>>>fix for this. I'll reiterate for the billionth time, it hasn't > >>>>>been committed yet, so don't kill me if blows your > computer up ;> > >>>>>Here's the direct link: > >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view > >>>>>This is a modified version of Bio::SearchIO::blast.pm > (it says it's > >>>>>version 1.90, but it's lying, I didn't change the > version, only the > >>>>>regex; sorry Jason). From what you've been posting it doesn't > >>>>>sound like you've tried this, and I believe I've > suggested this fix > >>>>>before. > >>>>> > >>>>>Replace the one in your Bio/SearchIO directory (which looks like > >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging > from your > >>>>>prev. > >>>>>message) with this file. Make sure the filename stays the same > >>>>>(blast.pm). > >>>>> > >>>>>Run everything again, one file at a time. Make sure you use > >>>>>Jason's script as well as the one I sent you. Do NOT rely on > >>>>>running through multiple files yet. Fix one bug at a time. And > >>>>>heed Joel's words about file checks. > >>>>> > >>>>> > >>>>>Here's a small chunk of output from one of your blast > files using > >>>>>the modifed script I sent you: > >>>>> > >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1 > >>>>>Query: 1 RWKWKRKK 8 > >>>>>Seq: 542 RWAWRRKK 549 > >>>>> > >>>>>Look familiar? > >>>>> > >>>>>Christopher Fields > >>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>>University of Illinois Urbana-Champaign > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>>-----Original Message----- > >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, > >>>>>>February 09, 2006 3:24 PM > >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich' > >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't > work parsing > >>>>>>Blast output > >>>>>> > >>>>>>In other words, yes, I'm on the wrong trail. :} > >>>>>> > >>>>>>Sorry - I'll look at the output issue this evening (or realize > >>>>>>that Chris already solved the issue). ;} > >>>>>> > >>>>>>Thanks! > >>>>>> > >>>>>>Roger > >>>>>> > >>>>>>-----Original Message----- > >>>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf > Of Hubert > >>>>>>Prielinger > >>>>>>Sent: Thursday, February 09, 2006 2:14 PM > >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris > Fields; Jason > >>>>>>Stajich > >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't > work parsing > >>>>>>Blast output > >>>>>> > >>>>>>dear roger, > >>>>>>this error message I got, when I tried to parse Blast output > >>>>>>(version > >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, > because I have > >>>>>>a lot of Blast output files with version 2.2.13 and for that I > >>>>>>don't get any error message.....it just doesn't work > >>>>>> > >>>>>>Hubert > >>>>>> > >>>>>> > >>>>>> > >>>>>>Roger Hall wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Guys - I'm looking at the error message: > >>>>>>> > >>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>>>STACK Bio::SearchIO::blast::next_result > >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>>>STACK toplevel > >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ > >>>>>>>Blast.pl:21 > >>>>>>> > >>>>>>>This is my line of thought: > >>>>>>>1. "no data for midline $_" is a unique message generated by > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>blast.pm > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>in > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>one > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>location only at the point of a. reading three lines b. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>dropping lines > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>with spaces only c. identifying the Query, Midline, and > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>Match lines (0 > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>><= $i < > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>3) > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>2. There is a regexp match that fails in order to reach that > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>error message > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>3. The $_ value "Query 1 WWWKWRW 7" should not fail the > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>expression > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>4. It does anyway > >>>>>>>5. I cannot find the value "Query 1 WWWKWRW 7" anywhere > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>in the blast > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>reports > >>>>>>> > >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>the string > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>additional > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>file, assuming that I didn't have it. > >>>>>>> > >>>>>>>My next thought is to write a quick script to test > perl behavior > >>>>>>>on "Fedora Core 9". > >>>>>>> > >>>>>>>Thoughts? > >>>>>>> > >>>>>>>Did I misread the issue entirely? :} > >>>>>>> > >>>>>>>Roger > >>>>>>> > >>>>>>> > >>>>>>>-----Original Message----- > >>>>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>Chris Fields > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM > >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger' > >>>>>>>Cc: bioperl-l at bioperl.org > >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>>>>parsing Blast output > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>-----Original Message----- > >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu] > >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM > >>>>>>>>To: Hubert Prielinger > >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org > >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>>>>>parsing Blast output > >>>>>>>> > >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>hi chris, > >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>working, > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>do you have any ohter idea, the problem I have is that I > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>have to parse > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>a lot of textfiles.... > >>>>>>>>>or shall I look for another option to parse those files... > >>>>>>>>> > >>>>>>>>>regards > >>>>>>>>>Hubert > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast > >>>>>>>>2.2.13 reports but unless you post your blast report we > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>can't really > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>determine the problem. > >>>>>>>> > >>>>>>>>If you are still getting the same error like this I am not > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>convinced > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>that NCBI > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>changed the HSP result format to remove the ':' from the > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>Query/Sbjct > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>prefixes. We fixed this as soon as it was apparent > sometime in > >>>>>>>>September. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result > >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>>>>>>>STACK toplevel > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ > >>>>>>>>Blast.pl:21 > >>>>>>>> > >>>>>>>>If you are just getting no results but also no warnings wrt > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>parsing, > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>are you sure your logic is correct? > >>>>>>>> > >>>>>>>>If you remove your filters do you see all the HSPS? > >>>>>>>> > >>>>>>>> > >>>>>>>>while (my $result = $search->next_result) { > >>>>>>>> print $result->query_name, "\n"; > >>>>>>>> #iterate over each hit on the query sequence > >>>>>>>> while (my $hit = $result->next_hit) { > >>>>>>>> print $hit->name, "\n"; > >>>>>>>> #iterate over each HSP in the hit > >>>>>>>> while (my $hsp = $hit->next_hsp) { > >>>>>>>> print $hsp->evalue, " ", > $hsp->length('sbjct'), " ", $hsp- > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>hit_string, "\n"; > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> } > >>>>>>>> } > >>>>>>>>} > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>I tested some of the BLAST results that Hubert sent Roger > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>and me with a > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>similar script to the above. I removed the file parsing logic > >>>>>>>and it > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>seemed > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>to work just fine. It may very well be a logic issue or > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>that he hasn't > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>installed the latest fix. > >>>>>>> It's a funny thing, though. When I tried using blastcl3 (v. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>2.2.13), > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>even though the returned output was from nr, the top > of the blast > >>>>>>>output showed that it was v2.2.12: > >>>>>>> > >>>>>>>BLASTP 2.2.12 [Aug-07-2005] > >>>>>>> > >>>>>>>I double-checked my local version and it's definitely v.2.2.13: > >>>>>>>------------------------------------- > >>>>>>>C:\Perl\Scripts>blastcl3 - > >>>>>>> > >>>>>>>blastcl3 2.2.13 arguments:... > >>>>>>>------------------------------------- > >>>>>>> > >>>>>>>If you use RemoteBlast using the same settings, the version in > >>>>>>>the header looks like this: > >>>>>>> > >>>>>>>BLASTP 2.2.13 [Nov-27-2005] > >>>>>>> > >>>>>>>I'm wondering if all the blast executables (blast and netblast) > >>>>>>> > >>>>>>> > >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast > >>>>>> > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>outputs a new > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>format (2.2.13). I'll ask blast-help at NCBI about this. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>To clarify some stuff - > >>>>>>>>Chris I don't necessarily think the XML is best way forward > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>for BLAST > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>reports generated locally, it isn't as detailed as the Text > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>format and > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>it is what most people expect to be able to scroll through > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>and parse > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>-- it is also harder for the format to change > dramatically > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>if you have > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>a static binary on your machine =). I think for > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>remoteblast the XML > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>format should be the way forward but I expect Bioperl to > >>>>>>>>maintain support of any plain text BLAST report format that > >>>>>>>>people use on a regular basis. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>Does XML lack some specific info that text output has? > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>Didn't know that. > >>>>>>I > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>believe that XML should be default in RemoteBlast > since it will > >>>>>>>not break, but I agree with you about text output. I > also agree > >>>>>>>that it will need somebody to maintain it constantly, > much like > >>>>>>>RemoteBlast. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>>-jason > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>Chris Fields wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>My guess is you're running into text parsing problems in > >>>>>>>>>>Bio::SearchIO::blast. Upgrade to the latest > developer version > >>>>>>>>>>(1.5.1) or > >>>>>>>>>>bioperl-live (CVS), then see the bug below. > >>>>>>>>>> > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>>>>>>> > >>>>>>>>>>I think the first problem you ran into is solved in > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>bioperl 1.5.1, > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>the last problem (more recent, not related to the > first) has > >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet. > >>>>>>>>>>The fixed SearchIO::blast is available in the link > above, but > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>realize it hasn't > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>been committed yet and may change. > >>>>>>>>>> > >>>>>>>>>>Christopher Fields > >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>-----Original Message----- > >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org > >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>Of Hubert > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>>>>>>Prielinger > >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM > >>>>>>>>>>>To: bioperl-l at bioperl.org > >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>parsing Blast > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>output > >>>>>>>>>>> > >>>>>>>>>>>Hi, > >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with > >>>>>>>>>>>Bio::SearchIO, I get the following error message: > >>>>>>>>>>> > >>>>>>>>>>>MSG: no data for midline Query 1 WWWKWRW 7 > >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result > >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > >>>>>>>>>>>STACK toplevel > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ > >>>>>>>>Blast.pl:21 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>is that a bug...... > >>>>>>>>>>> > >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), > I don't get > >>>>>>>>>>>anything..... > >>>>>>>>>>>I'm using bioperl 1.4 > >>>>>>>>>>> > >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>parsing Blast > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>Output (version 2.2.12), but I don't remember which > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>bioperl version > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>I had installed > >>>>>>>>>>> > >>>>>>>>>>>thanks in advance > >>>>>>>>>>> > >>>>>>>>>>>Hubert > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>_______________________________________________ > >>>>>>>>>>>Bioperl-l mailing list > >>>>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>_______________________________________________ > >>>>>>>>>Bioperl-l mailing list > >>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>-- > >>>>>>>>Jason Stajich > >>>>>>>>Duke University > >>>>>>>>http://www.duke.edu/~jes12 > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>Christopher Fields > >>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry > >>>>>>>University of Illinois Urbana-Champaign > >>>>>>> > >>>>>>>_______________________________________________ > >>>>>>>Bioperl-l mailing list > >>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>_______________________________________________ > >>>>>>Bioperl-l mailing list > >>>>>>Bioperl-l at lists.open-bio.org > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>Christopher Fields > >>>Postdoctoral Researcher > >>>Lab of Dr. Robert Switzer > >>>Dept of Biochemistry > >>>University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> > >>>_______________________________________________ > >>>Bioperl-l mailing list > >>>Bioperl-l at lists.open-bio.org > >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > >>> > >> > >> > >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > >> > >>_______________________________________________ > >>Bioperl-l mailing list > >>Bioperl-l at lists.open-bio.org > >>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > >> > > > > > > Disclaimer: > http://www.kuleuven.be/cwis/email_disclaimer.htm for more > > information. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From victor.ruotti at gmail.com Fri Feb 10 15:09:16 2006 From: victor.ruotti at gmail.com (Victor) Date: Fri, 10 Feb 2006 14:09:16 -0600 Subject: [Bioperl-l] Running BLAT with BioPerl In-Reply-To: References: Message-ID: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> Hi Jason, Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to /usr/local/bin, I get the same problem. I think this might have to do with the _run internal method/sub. If you look at that subroutine, you'll see that it is using both $self->executable and $self->program_name. The test passes fine, but we might need to write a better test for this particular case. Instead of saying: my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); I think the author meant to say: my $str= Bio::Root::IO->catfile($self->program_dir,$self->program_name); I quickly used Data::Dumper on both executate and program_name and this is what I get: $VAR1 = 'blat'; $VAR1 = 'blat'; So the path is hardcoded to be /usr/local/bin/blat/blat when calling run though factory. I'd like to change the constructor a bit to deal with the params a little better and include a config file using Config::General. Also, I noticed that there is a another Blat.pm module, a parser module. Should we integrate this parser with the blat run module? Brian/Jason. Does that sound like a good idea? Victor On 2/10/06, Jason Stajich wrote: > > brian - just FYI - > > The AUTOLOAD stuff is present a great number of the run modules so this > is standard per se in that set. > > I think Victor's problem may have been the BLATDIR env variable pointing > to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor? > > tests passed for me before I did the 1.5.1 release for this module so it > basically works. It definitely needs a carekeeper as lot of these run > modules were built during the fugu group annotation project and never got > audited/re-vised after that. > > > -jason > On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote: > > Victor, > > Fantastic, this is certainly a module in need, in fact there was already a > note on this in the Wiki, I'll update it: > > http://bioperl.open-bio.org/wiki/Orphan_modules > > So all I did was: > > >cd bioperl-run > >perl ?I. -w t/Blat.t > > This is the most recent bioperl-run, the live version, and all tests > passed. I'd downloaded the most recent binaries and put them in my > /usr/local/bin, already in my PATH. That's it. > > That's the saddest looking new() I've ever seen in Bioperl, a mixture of > named and unnamed parameters like that, how bizarre. The "proper" way, of > course, is to use _rearrange, and not use AUTOLOAD. > > Thanks again, > > Brian O. > > > On 2/10/06 11:02 AM, "Victor" wrote: > > Brian, > I'd be happy to do that. Can you send me a quick snap on how you got it to > work first. I'd like to see what is working first, before I start fixing > things. > > And yes I'll take a look at the Blat.t to see more on it. > > Victor > > > On 2/9/06, *Brian Osborne* wrote: > > Victor, > > Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is > working for me even though I haven't set BLATDIR. This is using the latest > blat, v. 33. > > There is a problem here though, you can see it if you read Blat.t. The > constructor does not look like your usual new(): > > my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet' => 1, > > -verbose => $verbose, > "DB" => $db); > > Unfortunate - would you be willing to do more than add a useful SYNOPSIS > and > actually fix new()? There is a subtext here, we're trying to find people > who > would be willing to maintain useful modules like these, the ideal person > in > this case would be someone who'd regularly use the module. > > Brian O. > > > On 2/9/06 6:22 PM, "Victor" wrote: > > > Hi, > > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to > date > > in the lastest bioperl release? > > > > > > > > use Bio::Tools::Run::Alignment::Blat; > > my $factory = Bio::Tools::Run::Alignment::Blat->new(); > > my $seq = > > "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA"; > > > > my @feats = $factory->run( $seq); > > > > Here is what I get when tring to use it: > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Blat call (/usr/local/bin/blat/blat -out=blast TGAAATAAAACTCAGTA > > /tmp/fB09bp5F76) crashed: -1 > > > > Notice that it is using "blat' twice in the path. The way that I fixed > this > > is by going to the blat.pm module and > changing the following lines: > > #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); > > my $str= Bio::Root::IO->catfile($self->program_name); > > > > Any ideas, maybe I'm missing the $ENV variable somewhere? > > I'd like to avoid making this change. Also does anyone have a known > synopsis > > of this blat module (where to set the parameters, and whether it allows > you > > to have a config file). > > I'll be happy to add a better synopsis to the module if needed. > > > > Thanks in advance, > > Victor > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > From jason.stajich at duke.edu Fri Feb 10 15:36:04 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 10 Feb 2006 15:36:04 -0500 Subject: [Bioperl-l] Running BLAT with BioPerl In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> References: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> Message-ID: <7F520AFA-84C9-485B-A408-7A9DEFC1186E@duke.edu> On Feb 10, 2006, at 3:09 PM, Victor wrote: > Hi Jason, > Well, in my env. BLATDIR was not setup at all. When setting BLATDIR > to /usr/local/bin, I get the same problem. I think this might have > to do with the _run internal method/sub. If you look at that > subroutine, you'll see that it is using both $self->executable and > $self->program_name. The test passes fine, but we might need to > write a better test for this particular case. > > Instead of saying: > my $str= Bio::Root::IO->catfile($self->executable,$self- > >program_name); > I think the author meant to say: > my $str= Bio::Root::IO->catfile($self->program_dir,$self- > >program_name); > > I quickly used Data::Dumper on both executate and program_name and > this is what I get: > $VAR1 = 'blat'; > $VAR1 = 'blat'; > > So the path is hardcoded to be /usr/local/bin/blat/blat when > calling run though factory. > Hmm are you sure you are looking at the 1.5.1 code and/or what is in CVS? > I'd like to change the constructor a bit to deal with the params a > little better and include a config file using > Config::General. Also, I noticed that there is a another Blat.pm > module, a parser module. Should we integrate this parser with the > blat run module? > Well maybe as another parser option - I believe I added/edited it to use the PSL parser in Bio::SearchIO is that not what you see? Ick there are also some system commands in this module too which need to be removed and replaced with File::Copy or figure out how to remove them all together. > Brian/Jason. Does that sound like a good idea? But yes it needs some TLC I'm not sure I know enough about Config::General to say yes or no - but all of the run modules need some help in standardization so I would propose trying to integrate some changes into the base class (WrapperBase) that can be utilized by all the sub-classes -- if you want to use this as a model for how to do it that would be great too. thx, -j > > Victor > > > On 2/10/06, Jason Stajich wrote: > brian - > just FYI - > > The AUTOLOAD stuff is present a great number of the run modules so > this is standard per se in that set. > > I think Victor's problem may have been the BLATDIR env variable > pointing to /usr/local/bin/blat instead of /usr/local/bin - is that > the case victor? > > tests passed for me before I did the 1.5.1 release for this module > so it basically works. It definitely needs a carekeeper as lot of > these run modules were built during the fugu group annotation > project and never got audited/re-vised after that. > > > -jason > > On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote: > >> Victor, >> >> Fantastic, this is certainly a module in need, in fact there was >> already a note on this in the Wiki, I'll update it: >> >> http://bioperl.open-bio.org/wiki/Orphan_modules >> >> So all I did was: >> >> >cd bioperl-run >> >perl ?I. -w t/Blat.t >> >> This is the most recent bioperl-run, the live version, and all >> tests passed. I'd downloaded the most recent binaries and put them >> in my /usr/local/bin, already in my PATH. That's it. >> >> That's the saddest looking new() I've ever seen in Bioperl, a >> mixture of named and unnamed parameters like that, how bizarre. >> The "proper" way, of course, is to use _rearrange, and not use >> AUTOLOAD. >> >> Thanks again, >> >> Brian O. >> >> >> On 2/10/06 11:02 AM, "Victor" wrote: >> >>> Brian, >>> I'd be happy to do that. Can you send me a quick snap on how you >>> got it to work first. I'd like to see what is working first, >>> before I start fixing things. >>> >>> And yes I'll take a look at the Blat.t to see more on it. >>> >>> Victor >>> >>> >>> On 2/9/06, Brian Osborne < osborne1 at optonline.net> wrote: >>>> Victor, >>>> >>>> Yes, it may be that blat is not in your path, bioperl-run/t/ >>>> Blat.t is >>>> working for me even though I haven't set BLATDIR. This is using >>>> the latest >>>> blat, v. 33. >>>> >>>> There is a problem here though, you can see it if you read >>>> Blat.t. The >>>> constructor does not look like your usual new(): >>>> >>>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet' => 1, >>>> >>>> -verbose => $verbose, >>>> "DB" => $db); >>>> >>>> Unfortunate - would you be willing to do more than add a useful >>>> SYNOPSIS and >>>> actually fix new()? There is a subtext here, we're trying to >>>> find people who >>>> would be willing to maintain useful modules like these, the >>>> ideal person in >>>> this case would be someone who'd regularly use the module. >>>> >>>> Brian O. >>>> >>>> >>>> On 2/9/06 6:22 PM, "Victor" wrote: >>>> >>>> > Hi, >>>> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module >>>> is up to date >>>> > in the lastest bioperl release? >>>> > >>>> > >>>> > >>>> > use Bio::Tools::Run::Alignment::Blat; >>>> > my $factory = Bio::Tools::Run::Alignment::Blat->new(); >>>> > my $seq = >>>> > >>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAG >>>> TA"; >>>> > >>>> > my @feats = $factory->run( $seq); >>>> > >>>> > Here is what I get when tring to use it: >>>> > >>>> > ------------- EXCEPTION: Bio::Root::Exception ------------- >>>> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast >>>> TGAAATAAAACTCAGTA >>>> > /tmp/fB09bp5F76) crashed: -1 >>>> > >>>> > Notice that it is using "blat' twice in the path. The way that >>>> I fixed this >>>> > is by going to the blat.pm module and >>>> changing the following lines: >>>> > #my $str= Bio::Root::IO->catfile($self->executable,$self- >>>> >program_name); >>>> > my $str= Bio::Root::IO->catfile($self->program_name); >>>> > >>>> > Any ideas, maybe I'm missing the $ENV variable somewhere? >>>> > I'd like to avoid making this change. Also does anyone have a >>>> known synopsis >>>> > of this blat module (where to set the parameters, and whether >>>> it allows you >>>> > to have a config file). >>>> > I'll be happy to add a better synopsis to the module if needed. >>>> > >>>> > Thanks in advance, >>>> > Victor >>>> > >>>> > _______________________________________________ >>>> > Bioperl-l mailing list >>>> > Bioperl-l at lists.open-bio.org >>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> lists.open-bio.org/mailman/listinfo/bioperl-l> >>>> >>>> >>> >>> >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From hlapp at gmx.net Fri Feb 10 16:39:39 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 10 Feb 2006 13:39:39 -0800 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: <000001c62e60$9acecca0$c2987ca5@pc13> References: <000001c62e60$9acecca0$c2987ca5@pc13> Message-ID: Sohel, please allow me to copy the list in my response. There's many good and insightful people on the list who may have something to add or different ideas. I've come across that problem myself, for instance with InterPro. What I've done so far simply is to stick it unstructured into the definition slot, which is not helpful if your purpose goes further than just displaying it in an unstructured fashion. I'm not sure you would want to create another class for this (like AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the implementation, probably not the interface) annotatable (i.e., implement Bio::Annotatable), which supposedly would be simple to do (AnnotationCollection is already implemented, you'd just return an instance of it). Even though tag/value pairs sound like quick&fast way to go I'm leaning against it; in essence we're moving away from that elsewhere (SeqFeatureI) and hence I don't think we should restart it here. I'm not giving a definitive answer here, just my (initial) thoughts. Hope that helps nonetheless. Can you fancy yourself trying the Annotatable approach and let us know how it goes? -hilmar On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > Hi Hilmar, > ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, > Northwestern University. I am working on a parser for an ontology > file. I really like the ontology object model which you have > contributed to Bioperl. I think its just Awesome!! One of things which > I thought would be great to capture is the ontology headers. Right now > one can specify only the name, authority information. I was wondering > if there is any way, I could also capture other ontology file headers > like version of the file, date when that ontology file was made. I was > thinking of making a header class or alternatively it could go as Hash > of values in the Bio::Ontology::Ontology class itself. I wanted to > know whets your thoughts about on this. > ? > Thanks, > Sohel Merchant > dictyBase > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From osborne1 at optonline.net Fri Feb 10 16:49:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 10 Feb 2006 16:49:18 -0500 Subject: [Bioperl-l] Running BLAT with BioPerl In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com> Message-ID: Victor, Just a note on "convention", excuse me if this is obvious. A few different greps on the modules in bioperl-run shows that executable() gets or sets the full path to the program in question, program() or program_name() gets or sets the name of the app (e.g. "blat"). program_dir() does what it sounds like. So you're right, "($self->executable,$self->program_name)", doesn't make sense. I can't speak to Config::General but I'd say that my first concern would be that the things works in the normal way, either by naming parameters or by passing an array of arguments, but not a mixture of both! Of course you're right in thinking that tying execution to parsing is a good idea, and it looks like this is done already, just glancing at t/Blat.t. Brian O. On 2/10/06 3:09 PM, "Victor" wrote: > Hi Jason, > Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to > /usr/local/bin, I get the same problem. I think this might have to do with > the _run internal method/sub. If you look at that subroutine, you'll see > that it is using both $self->executable and $self->program_name. The test > passes fine, but we might need to write a better test for this particular > case. > > Instead of saying: > my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); > I think the author meant to say: > my $str= > Bio::Root::IO->catfile($self->program_dir,$self->program_name); > > I quickly used Data::Dumper on both executate and program_name and this is > what I get: > $VAR1 = 'blat'; > $VAR1 = 'blat'; > > So the path is hardcoded to be /usr/local/bin/blat/blat when calling run > though factory. > > I'd like to change the constructor a bit to deal with the params a little > better and include a config file using > Config::General. Also, I noticed that there is a another Blat.pm module, a > parser module. Should we integrate this parser with the blat run module? > > Brian/Jason. Does that sound like a good idea? > > Victor > > > On 2/10/06, Jason Stajich wrote: >> >> brian - just FYI - >> >> The AUTOLOAD stuff is present a great number of the run modules so this >> is standard per se in that set. >> >> I think Victor's problem may have been the BLATDIR env variable pointing >> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor? >> >> tests passed for me before I did the 1.5.1 release for this module so it >> basically works. It definitely needs a carekeeper as lot of these run >> modules were built during the fugu group annotation project and never got >> audited/re-vised after that. >> >> >> -jason >> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote: >> >> Victor, >> >> Fantastic, this is certainly a module in need, in fact there was already a >> note on this in the Wiki, I'll update it: >> >> http://bioperl.open-bio.org/wiki/Orphan_modules >> >> So all I did was: >> >>> cd bioperl-run >>> perl ?I. -w t/Blat.t >> >> This is the most recent bioperl-run, the live version, and all tests >> passed. I'd downloaded the most recent binaries and put them in my >> /usr/local/bin, already in my PATH. That's it. >> >> That's the saddest looking new() I've ever seen in Bioperl, a mixture of >> named and unnamed parameters like that, how bizarre. The "proper" way, of >> course, is to use _rearrange, and not use AUTOLOAD. >> >> Thanks again, >> >> Brian O. >> >> >> On 2/10/06 11:02 AM, "Victor" wrote: >> >> Brian, >> I'd be happy to do that. Can you send me a quick snap on how you got it to >> work first. I'd like to see what is working first, before I start fixing >> things. >> >> And yes I'll take a look at the Blat.t to see more on it. >> >> Victor >> >> >> On 2/9/06, *Brian Osborne* wrote: >> >> Victor, >> >> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is >> working for me even though I haven't set BLATDIR. This is using the latest >> blat, v. 33. >> >> There is a problem here though, you can see it if you read Blat.t. The >> constructor does not look like your usual new(): >> >> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet' => 1, >> >> -verbose => $verbose, >> "DB" => $db); >> >> Unfortunate - would you be willing to do more than add a useful SYNOPSIS >> and >> actually fix new()? There is a subtext here, we're trying to find people >> who >> would be willing to maintain useful modules like these, the ideal person >> in >> this case would be someone who'd regularly use the module. >> >> Brian O. >> >> >> On 2/9/06 6:22 PM, "Victor" wrote: >> >>> Hi, >>> Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to >> date >>> in the lastest bioperl release? >>> >>> >>> >>> use Bio::Tools::Run::Alignment::Blat; >>> my $factory = Bio::Tools::Run::Alignment::Blat->new(); >>> my $seq = >>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA"; >>> >>> my @feats = $factory->run( $seq); >>> >>> Here is what I get when tring to use it: >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Blat call (/usr/local/bin/blat/blat -out=blast TGAAATAAAACTCAGTA >>> /tmp/fB09bp5F76) crashed: -1 >>> >>> Notice that it is using "blat' twice in the path. The way that I fixed >> this >>> is by going to the blat.pm module and >> changing the following lines: >>> #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name); >>> my $str= Bio::Root::IO->catfile($self->program_name); >>> >>> Any ideas, maybe I'm missing the $ENV variable somewhere? >>> I'd like to avoid making this change. Also does anyone have a known >> synopsis >>> of this blat module (where to set the parameters, and whether it allows >> you >>> to have a config file). >>> I'll be happy to add a better synopsis to the module if needed. >>> >>> Thanks in advance, >>> Victor >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > org/mailman/listinfo/bioperl-l> >> >> >> >> >> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Sat Feb 11 01:54:51 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Sat, 11 Feb 2006 08:54:51 +0200 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: References: <000001c62e60$9acecca0$c2987ca5@pc13> Message-ID: <200602110854.52116.heikki@sanbi.ac.za> I second Hilmar's suggestion to use Bio::Annotation::Collection for database (ontology database in this case) metadata. While you are at it, why do not define or use an existing (?) public ontology to do that. ;-) -Heikki On Friday 10 February 2006 23:39, Hilmar Lapp wrote: > Sohel, > > please allow me to copy the list in my response. There's many good and > insightful people on the list who may have something to add or > different ideas. > > I've come across that problem myself, for instance with InterPro. What > I've done so far simply is to stick it unstructured into the definition > slot, which is not helpful if your purpose goes further than just > displaying it in an unstructured fashion. > > I'm not sure you would want to create another class for this (like > AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the > implementation, probably not the interface) annotatable (i.e., > implement Bio::Annotatable), which supposedly would be simple to do > (AnnotationCollection is already implemented, you'd just return an > instance of it). > > Even though tag/value pairs sound like quick&fast way to go I'm leaning > against it; in essence we're moving away from that elsewhere > (SeqFeatureI) and hence I don't think we should restart it here. > > I'm not giving a definitive answer here, just my (initial) thoughts. > Hope that helps nonetheless. Can you fancy yourself trying the > Annotatable approach and let us know how it goes? > > -hilmar > > On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > > Hi Hilmar, > > ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, > > Northwestern University. I am working on a parser for an ontology > > file. I really like the ontology object model which you have > > contributed to Bioperl. I think its just Awesome!! One of things which > > I thought would be great to capture is the ontology headers. Right now > > one can specify only the name, authority information. I was wondering > > if there is any way, I could also capture other ontology file headers > > like version of the file, date when that ontology file was made. I was > > thinking of making a header class or alternatively it could go as Hash > > of values in the Bio::Ontology::Ontology class itself. I wanted to > > know whets your thoughts about on this. > > ? > > Thanks, > > Sohel Merchant > > dictyBase -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From hlapp at gmx.net Sun Feb 12 00:10:35 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 11 Feb 2006 21:10:35 -0800 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: <000001c62e9a$4f82eee0$c2987ca5@pc13> References: <000001c62e9a$4f82eee0$c2987ca5@pc13> Message-ID: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net> Sohel, please do keep the discussion on the list, in your own interest as there's a multitude of people who can respond to you. SimpleValue would probably be what I'd use too. As Heikki hinted you might even create an ontology for annotating ontologies, which would allow you to use Annotation::OntologyTerm for annotation, but then there's no qualifier value ... Bioperl 1.5.1 has been released last year, please check the website. -hilmar On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote: > Hi Hilmar, > I really like your suggestion of implementing the Bio::AnnotatableI > interface in the Bio::Ontology::Ontology class. I am going to implement > this and play around a little with it. I am planning to use > Bio::Annotation::SimpleValue for annotating the header as it provides a > good way of specifying the Tag/value pair. What are your thoughts on > using this? > > Also, I was wondering if you have any idea about the scheduled date > for the Bioperl 1.51 release. I would like to contribute some stuff in > the next release. > > Thanks, > Sohel. > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Friday, February 10, 2006 3:40 PM > To: Sohel Merchant > Cc: Bioperl > Subject: Re: Bio::Ontology::Ontology > > Sohel, > > please allow me to copy the list in my response. There's many good and > insightful people on the list who may have something to add or > different ideas. > > I've come across that problem myself, for instance with InterPro. What > I've done so far simply is to stick it unstructured into the definition > slot, which is not helpful if your purpose goes further than just > displaying it in an unstructured fashion. > > I'm not sure you would want to create another class for this (like > AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the > implementation, probably not the interface) annotatable (i.e., > implement Bio::Annotatable), which supposedly would be simple to do > (AnnotationCollection is already implemented, you'd just return an > instance of it). > > Even though tag/value pairs sound like quick&fast way to go I'm leaning > against it; in essence we're moving away from that elsewhere > (SeqFeatureI) and hence I don't think we should restart it here. > > I'm not giving a definitive answer here, just my (initial) thoughts. > Hope that helps nonetheless. Can you fancy yourself trying the > Annotatable approach and let us know how it goes? > > -hilmar > > > On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > >> Hi Hilmar, >> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, >> Northwestern University. I am working on a parser for an ontology >> file. I really like the ontology object model which you have >> contributed to Bioperl. I think its just Awesome!! One of things which > >> I thought would be great to capture is the ontology headers. Right now > >> one can specify only the name, authority information. I was wondering >> if there is any way, I could also capture other ontology file headers >> like version of the file, date when that ontology file was made. I was > >> thinking of making a header class or alternatively it could go as Hash > >> of values in the Bio::Ontology::Ontology class itself. I wanted to >> know whets your thoughts about on this. >> ? >> Thanks, >> Sohel Merchant >> dictyBase >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > -- ------------------------------------------------------------- Hilmar Lapp email: lapp at gnf.org GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 ------------------------------------------------------------- From hjm at tacgi.com Sun Feb 12 01:46:38 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Sat, 11 Feb 2006 22:46:38 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs Message-ID: <200602112246.38926.hjm@tacgi.com> Hi All, After perusing the tutorial and other docs for a an evening, I still can't find the answer to this. Forgive me if I've missed something obvious. This should not be a novel request, but I've not found it answered. If bioperl isn't the best way to do this, I'd be grateful to a pointer to a better way, especially if it includes an illuminating bit of code. The problem is to retrieve genomic sequences plus & minus some offset from a locus determined by HUGO keyword or GeneID. This would be a common followup chore for some extra analysis from a gene expression expt. Or maybe this is in the DBFetch routines, but I've missed the sequence type to specify...? TIA! -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From osborne1 at optonline.net Sun Feb 12 11:37:39 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 12 Feb 2006 11:37:39 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602112246.38926.hjm@tacgi.com> Message-ID: Harry, Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, from its documentation: use Bio::DB::Fasta; # create database from directory of fasta files my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); # simple access (for those without Bioperl) my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); my @ids = $db->ids; my $length = $db->length('CHROMOSOME_I'); my $alphabet = $db->alphabet('CHROMOSOME_I'); my $header = $db->header('CHROMOSOME_I'); # Bioperl-style access my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); my $seq = $obj->seq; my $subseq = $obj->subseq(4_000_000 => 4_100_000); Do you already have the offsets? Brian O. On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > Hi All, > > After perusing the tutorial and other docs for a an evening, I still can't > find the answer to this. Forgive me if I've missed something obvious. > > This should not be a novel request, but I've not found it answered. If > bioperl isn't the best way to do this, I'd be grateful to a pointer to a > better way, especially if it includes an illuminating bit of code. > > The problem is to retrieve genomic sequences plus & minus some offset from a > locus determined by HUGO keyword or GeneID. This would be a common followup > chore for some extra analysis from a gene expression expt. Or maybe this is > in the DBFetch routines, but I've missed the sequence type to specify...? > > > TIA! From pmiguel at purdue.edu Sun Feb 12 15:05:47 2006 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sun, 12 Feb 2006 15:05:47 -0500 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL> References: <004301c62db4$c9bcbab0$d416a790@LIBERAL> Message-ID: <43EF951B.4030601@purdue.edu> Roger, Just a data point, but in case you were not already aware of it, the characters W, K and R may be included in some DNA sequences. 'W' means 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember correctly. These are ambiguous bases, where a basecaller isn't sure, for example, whether a particular peak is an A or a T. Although I see these ambiguous bases less frequently these days, even common modern basecallers (such as Applied Biosystems basecallers) can generally be configured so they will generate them. Downstream applications may not like them, however. I may be just stating the obvious, or this might be irrelevant to the issue at hand. If so, my apologies. Phillip Roger Hall wrote: > Guys - I'm looking at the error message: > > MSG: no data for midline Query 1 WWWKWRW 7 > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 > STACK toplevel > /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 > > This is my line of thought: > 1. "no data for midline $_" is a unique message generated by blast.pm in one > location only at the point of a. reading three lines b. dropping lines with > spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3) > 2. There is a regexp match that fails in order to reach that error message > 3. The $_ value "Query 1 WWWKWRW 7" should not fail the expression > 4. It does anyway > 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the blast > reports > > I suspect a newline/chomp/metacharacter issue. Not finding the string > anywhere has me thoroughly confused - I asked Hubert for the additional > file, assuming that I didn't have it. > > My next thought is to write a quick script to test perl behavior on "Fedora > Core 9". > > Thoughts? > > Did I misread the issue entirely? :} > > Roger > > > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, February 09, 2006 10:16 AM > To: 'Jason Stajich'; 'Hubert Prielinger' > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast > output > > > >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Thursday, February 09, 2006 9:13 AM >> To: Hubert Prielinger >> Cc: Chris Fields; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >> parsing Blast output >> >> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >> >>> hi chris, >>> thanks, I have upgraded to version 1.5.1 but it isn't still >>> >> working, >> >>> do you have any ohter idea, the problem I have is that I >>> >> have to parse >> >>> a lot of textfiles.... >>> or shall I look for another option to parse those files... >>> >>> regards >>> Hubert >>> >> The code from Bioperl 1.5.1 works fine for me for blast >> 2.2.13 reports but unless you post your blast report we can't >> really determine the problem. >> >> If you are still getting the same error like this I am not >> convinced you have upgraded to 1.5.1 which includes a fix in >> the fact that NCBI changed the HSP result format to remove >> the ':' from the Query/Sbjct prefixes. We fixed this as soon >> as it was apparent sometime in September. >> >> >>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>> STACK Bio::SearchIO::blast::next_result >>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>> STACK toplevel >>>>> >>>>> >> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >> If you are just getting no results but also no warnings wrt >> parsing, are you sure your logic is correct? >> >> If you remove your filters do you see all the HSPS? >> >> >> while (my $result = $search->next_result) { >> print $result->query_name, "\n"; >> #iterate over each hit on the query sequence >> while (my $hit = $result->next_hit) { >> print $hit->name, "\n"; >> #iterate over each HSP in the hit >> while (my $hsp = $hit->next_hsp) { >> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >> >hit_string, "\n"; >> } >> } >> } >> > > I tested some of the BLAST results that Hubert sent Roger and me with a > similar script to the above. I removed the file parsing logic and it seemed > to work just fine. It may very well be a logic issue or that he hasn't > installed the latest fix. > > It's a funny thing, though. When I tried using blastcl3 (v. 2.2.13), even > though the returned output was from nr, the top of the blast output showed > that it was v2.2.12: > > BLASTP 2.2.12 [Aug-07-2005] > > I double-checked my local version and it's definitely v.2.2.13: > ------------------------------------- > C:\Perl\Scripts>blastcl3 - > > blastcl3 2.2.13 arguments:... > ------------------------------------- > > If you use RemoteBlast using the same settings, the version in the header > looks like this: > > BLASTP 2.2.13 [Nov-27-2005] > > I'm wondering if all the blast executables (blast and netblast) from NCBI > have text output like v.2.2.12, while the wwwblast outputs a new format > (2.2.13). I'll ask blast-help at NCBI about this. > > >> To clarify some stuff - >> Chris I don't necessarily think the XML is best way forward >> for BLAST reports generated locally, it isn't as detailed as >> the Text format and it is what most people expect to be able >> to scroll through and parse -- it is also harder for the >> format to change dramatically if you have a static binary on >> your machine =). I think for remoteblast the XML format >> should be the way forward but I expect Bioperl to maintain >> support of any plain text BLAST report format that people use >> on a regular basis. >> >> > > Does XML lack some specific info that text output has? Didn't know that. I > believe that XML should be default in RemoteBlast since it will not break, > but I agree with you about text output. I also agree that it will need > somebody to maintain it constantly, much like RemoteBlast. > > >> -jason >> >>> Chris Fields wrote: >>> >>> >>>> My guess is you're running into text parsing problems in >>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>> (1.5.1) or >>>> bioperl-live (CVS), then see the bug below. >>>> >>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>> >>>> I think the first problem you ran into is solved in bioperl 1.5.1, >>>> the last problem (more recent, not related to the first) has been >>>> fixed but hasn't been committed to bioperl-live yet. The fixed >>>> SearchIO::blast is available in the link above, but >>>> >> realize it hasn't >> >>>> been committed yet and may change. >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>> Prielinger >>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>> To: bioperl-l at bioperl.org >>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>> >> parsing Blast >> >>>>> output >>>>> >>>>> Hi, >>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>> Bio::SearchIO, I get the following error message: >>>>> >>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>> STACK Bio::SearchIO::blast::next_result >>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>> STACK toplevel >>>>> >>>>> >> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >>>>> is that a bug...... >>>>> >>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>> anything..... >>>>> I'm using bioperl 1.4 >>>>> >>>>> before, I have installed bioperl 1.4, it worked fine >>>>> >> parsing Blast >> >>>>> Output (version 2.2.12), but I don't remember which >>>>> >> bioperl version >> >>>>> I had installed >>>>> >>>>> thanks in advance >>>>> >>>>> Hubert >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Sun Feb 12 17:30:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 12 Feb 2006 16:30:07 -0600 Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output In-Reply-To: <43EF951B.4030601@purdue.edu> References: <004301c62db4$c9bcbab0$d416a790@LIBERAL> <43EF951B.4030601@purdue.edu> Message-ID: <855DEC6F-8057-47BA-9D1D-9BDC16D1D83B@uiuc.edu> Sequences are converted to FASTA format in RemoteBlast using Bio::SeqIO, which I think includes IUPAC base and amino acid ambiguities like you mention, so my guess is any errors (like odd non- IUPAC letters in nucleotide or aa queries) are likely caught there. As long as it passes Bio::SeqIO it shouldn't be a problem. Haven't tried this myself, though, so I can't say that with absolute certainty. Chris On Feb 12, 2006, at 2:05 PM, Phillip SanMiguel wrote: > Roger, > Just a data point, but in case you were not already aware of it, the > characters W, K and R may be included in some DNA sequences. 'W' means > 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember > correctly. These are ambiguous bases, where a basecaller isn't > sure, for > example, whether a particular peak is an A or a T. Although I see > these > ambiguous bases less frequently these days, even common modern > basecallers (such as Applied Biosystems basecallers) can generally be > configured so they will generate them. Downstream applications may not > like them, however. > I may be just stating the obvious, or this might be irrelevant to > the issue at hand. If so, my apologies. > > Phillip > Roger Hall wrote: >> Guys - I'm looking at the error message: >> >> MSG: no data for midline Query 1 WWWKWRW 7 >> STACK Bio::SearchIO::blast::next_result >> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >> STACK toplevel >> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >> >> This is my line of thought: >> 1. "no data for midline $_" is a unique message generated by >> blast.pm in one >> location only at the point of a. reading three lines b. dropping >> lines with >> spaces only c. identifying the Query, Midline, and Match lines (0 >> <= $i < 3) >> 2. There is a regexp match that fails in order to reach that error >> message >> 3. The $_ value "Query 1 WWWKWRW 7" should not fail the >> expression >> 4. It does anyway >> 5. I cannot find the value "Query 1 WWWKWRW 7" anywhere in the >> blast >> reports >> >> I suspect a newline/chomp/metacharacter issue. Not finding the string >> anywhere has me thoroughly confused - I asked Hubert for the >> additional >> file, assuming that I didn't have it. >> >> My next thought is to write a quick script to test perl behavior >> on "Fedora >> Core 9". >> >> Thoughts? >> >> Did I misread the issue entirely? :} >> >> Roger >> >> >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris >> Fields >> Sent: Thursday, February 09, 2006 10:16 AM >> To: 'Jason Stajich'; 'Hubert Prielinger' >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing >> Blast >> output >> >> >> >>> -----Original Message----- >>> From: Jason Stajich [mailto:jason.stajich at duke.edu] >>> Sent: Thursday, February 09, 2006 9:13 AM >>> To: Hubert Prielinger >>> Cc: Chris Fields; bioperl-l at bioperl.org >>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>> parsing Blast output >>> >>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote: >>> >>>> hi chris, >>>> thanks, I have upgraded to version 1.5.1 but it isn't still >>>> >>> working, >>> >>>> do you have any ohter idea, the problem I have is that I >>>> >>> have to parse >>> >>>> a lot of textfiles.... >>>> or shall I look for another option to parse those files... >>>> >>>> regards >>>> Hubert >>>> >>> The code from Bioperl 1.5.1 works fine for me for blast >>> 2.2.13 reports but unless you post your blast report we can't >>> really determine the problem. >>> >>> If you are still getting the same error like this I am not >>> convinced you have upgraded to 1.5.1 which includes a fix in >>> the fact that NCBI changed the HSP result format to remove >>> the ':' from the Query/Sbjct prefixes. We fixed this as soon >>> as it was apparent sometime in September. >>> >>> >>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>> STACK Bio::SearchIO::blast::next_result >>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>> STACK toplevel >>>>>> >>>>>> >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>> If you are just getting no results but also no warnings wrt >>> parsing, are you sure your logic is correct? >>> >>> If you remove your filters do you see all the HSPS? >>> >>> >>> while (my $result = $search->next_result) { >>> print $result->query_name, "\n"; >>> #iterate over each hit on the query sequence >>> while (my $hit = $result->next_hit) { >>> print $hit->name, "\n"; >>> #iterate over each HSP in the hit >>> while (my $hsp = $hit->next_hsp) { >>> print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- >>>> hit_string, "\n"; >>> } >>> } >>> } >>> >> >> I tested some of the BLAST results that Hubert sent Roger and me >> with a >> similar script to the above. I removed the file parsing logic and >> it seemed >> to work just fine. It may very well be a logic issue or that he >> hasn't >> installed the latest fix. >> >> It's a funny thing, though. When I tried using blastcl3 (v. >> 2.2.13), even >> though the returned output was from nr, the top of the blast >> output showed >> that it was v2.2.12: >> >> BLASTP 2.2.12 [Aug-07-2005] >> >> I double-checked my local version and it's definitely v.2.2.13: >> ------------------------------------- >> C:\Perl\Scripts>blastcl3 - >> >> blastcl3 2.2.13 arguments:... >> ------------------------------------- >> >> If you use RemoteBlast using the same settings, the version in the >> header >> looks like this: >> >> BLASTP 2.2.13 [Nov-27-2005] >> >> I'm wondering if all the blast executables (blast and netblast) >> from NCBI >> have text output like v.2.2.12, while the wwwblast outputs a new >> format >> (2.2.13). I'll ask blast-help at NCBI about this. >> >> >>> To clarify some stuff - >>> Chris I don't necessarily think the XML is best way forward >>> for BLAST reports generated locally, it isn't as detailed as >>> the Text format and it is what most people expect to be able >>> to scroll through and parse -- it is also harder for the >>> format to change dramatically if you have a static binary on >>> your machine =). I think for remoteblast the XML format >>> should be the way forward but I expect Bioperl to maintain >>> support of any plain text BLAST report format that people use >>> on a regular basis. >>> >>> >> >> Does XML lack some specific info that text output has? Didn't >> know that. I >> believe that XML should be default in RemoteBlast since it will >> not break, >> but I agree with you about text output. I also agree that it will >> need >> somebody to maintain it constantly, much like RemoteBlast. >> >> >>> -jason >>> >>>> Chris Fields wrote: >>>> >>>> >>>>> My guess is you're running into text parsing problems in >>>>> Bio::SearchIO::blast. Upgrade to the latest developer version >>>>> (1.5.1) or >>>>> bioperl-live (CVS), then see the bug below. >>>>> >>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>> >>>>> I think the first problem you ran into is solved in bioperl 1.5.1, >>>>> the last problem (more recent, not related to the first) has been >>>>> fixed but hasn't been committed to bioperl-live yet. The fixed >>>>> SearchIO::blast is available in the link above, but >>>>> >>> realize it hasn't >>> >>>>> been committed yet and may change. >>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert >>>>>> Prielinger >>>>>> Sent: Wednesday, February 08, 2006 2:52 PM >>>>>> To: bioperl-l at bioperl.org >>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work >>>>>> >>> parsing Blast >>> >>>>>> output >>>>>> >>>>>> Hi, >>>>>> If I want to parse a Blast Output (Version 2.2.12) with >>>>>> Bio::SearchIO, I get the following error message: >>>>>> >>>>>> MSG: no data for midline Query 1 WWWKWRW 7 >>>>>> STACK Bio::SearchIO::blast::next_result >>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151 >>>>>> STACK toplevel >>>>>> >>>>>> >>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21 >>> >>>>>> is that a bug...... >>>>>> >>>>>> If I want to parse Blast Output (version 2.2.13), I don't get >>>>>> anything..... >>>>>> I'm using bioperl 1.4 >>>>>> >>>>>> before, I have installed bioperl 1.4, it worked fine >>>>>> >>> parsing Blast >>> >>>>>> Output (version 2.2.12), but I don't remember which >>>>>> >>> bioperl version >>> >>>>>> I had installed >>>>>> >>>>>> thanks in advance >>>>>> >>>>>> Hubert >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From torsten.seemann at infotech.monash.edu.au Sun Feb 12 18:56:32 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 13 Feb 2006 10:56:32 +1100 Subject: [Bioperl-l] RemoteBlast In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL> References: <004401c62c6e$da906a40$4301a8c0@LIBERAL> Message-ID: <1139788592.29375.13.camel@chauvel.csse.monash.edu.au> Roger, > I think that most core Bioperl folks have long since moved away from > RemoteBlast and are using the functionality in StandAloneBlast to run their > own local servers. Agreed. Even smaller centres like my workplace need the throughput that a local PC, SMP system or Cluster can provide. > wave of the future, but I think there is still some concern that not every > flavor of BLAST produces XML yet. Even so, the XML parser is considered to > be very strong, and only helps hasten the end of text-formatted support, > since parsing text-formatted reports is the primary source of pain. If BioPerl switches primarily to XML parsing, the tool authors will soon add support for XML (not very difficult really) due to BioPerl's pervasiveness? > I do, however, see the advantage in shifting to XML-formatted reporting and > parsing *only* as soon as every BLAST flavor supports it, if not before. > (Anyone - is this still an issue. Please educate me.) The four BLAST flavours I utilise all support XML output: 1) NCBI BLAST 2) WU-BLAST 3) MPI-BLAST 4) FSA-BLAST. > At the moment, I'm leaning towards adding an option to RemoteBlast. The > default (no option) would use a "pure perl" implementation, and the > enhancement (with explicit option) would merely wrap the NCBI executable. If the API is done correctly both of these could co-exist with very little redundant code. (I personally rarely use remote blast). -- Torsten Seemann Victorian Bioinformatics Consortium From torsten.seemann at infotech.monash.edu.au Sun Feb 12 19:35:06 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Mon, 13 Feb 2006 11:35:06 +1100 Subject: [Bioperl-l] Remote BLAST support discussion In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca> <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu> Message-ID: <1139790906.29375.27.camel@chauvel.csse.monash.edu.au> > Mostly I think we need to try and support something that will > "ALWAYS" work so that individuals setting up webservices which rely > on remote blast functionality. In theory, netblast/blastcl3 should > always work since NCBI has to update the exe when they change their > server setup. What usually happens when an older 'blastcl3' binary is used on a newer server setup? I guess it fails in a deterministic manner so the BioPerl user can throw a useful exception. > I also see value in providing a wrapper for netblast since it should > look an awful lot like running blast locally. Agreed - they are virtually indistinguishable. > Ideally I'd like to see a more extensible system, something like (and > please feel free to come up with better names for the modules!): Do BioPerl coding standards require "::Blast" over "::BLAST" ? (not important anyway) > Bio::Tools::Run::Blast > --> StandAlone (support for [..as many flavours as poss]) > --> RemoteNCBI (currently the RemoteBlast server) > --> RemoteEBISOAP (EBI has a nice SOAP interface that > --> RemoteNetBlast (blastcl3 or netblast local executable) > (other things that people want) Looks reasonable. I assume there's some interfaces in there like Bio::Tools::Blast::BlastI etc. Could probably call "RemoteNetBlast" just "RemoteNet" because it is already in the Blast:: namespace. (not important though) My only suggestion for StandAlone (and RemoteNetBlast) is that they both do a generic "run a local binary with env. vars and parameters and capture the stdout, stderr and return code". This needs to be abstracted away (or re-use existing code from bioperl-run?). Jason mentioned Ensembl::Runnable as a source of code we could incorporate into Bioperl. -- Torsten Seemann Victorian Bioinformatics Consortium From cjfields at uiuc.edu Mon Feb 13 11:45:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Feb 2006 10:45:14 -0600 Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 In-Reply-To: <20060213152603.ed3f3118@dogwood.plantbio.uga.edu> Message-ID: <001801c630bc$dd35bff0$15327e82@pyrimidine> If you're using RemoteBlast 1.28, then you've likely updated from CVS which isn't the latest fix. Make sure that you check the following: 1) Always post to the mailing list: http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed first. Perform a clean installation; do not upgrade only Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee that mixing modules from old and new distributions (1.4 and 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live installation will allow text output from BLAST v.2.2.12 to be saved and parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) but it should still save it. I believe as long as next_results() isn't called, it will work. 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output are NOT in CVS; they haven't been cleared and checked in by Roger Hall (who's now taking care of RemoteBlast) and the powers that be (Jason or whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving XML output, so isn't necessary if you don't plan on using this option. And, remember, they haven't been committed yet to CVS, which means that the final version will change to refle the new version. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign _____ From: Guojun Yang [mailto:gyang at plantbio.uga.edu] Sent: Monday, February 13, 2006 9:26 AM To: Chris Fields Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 Hi, Chris Thanks for your suggestion, however, it doesn't seem to work for my cgi even after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID. Is there any suggestion? Guojun Guojun Yang Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org Sent: Fri, 03 Feb 2006 16:07:29 -0500 Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 I would say give the new code a try, but realize that it hasn't been checked in (like I said below). I will try going over the modified Bio::SearchIO::blast again this weekend to see if there is anything I might have missed. The changed order in the header of BLAST text output has me a bit worried that it might not catch everything, but it at least doesn't hang in the while() loop I described in the bug report below (bug #1934) and seems to process everything fine. If you want more stability in the code, you might consider changing over to XML output and parsing with Bio::SearchIO::blastxml. There are some changes in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML output, but I believe it parses everything regardless. If you look back the last month or so there has been a bit of discussion here about it. Jason describes a bit on how to set up RemoteBlast for XML: http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Friday, February 03, 2006 1:45 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > Hi, Everybody, > I see this post and am wondering if this is the reason for the > malfunctionning of my webserver. We set up a webserver named MAK, for MITE > sequence analysis. It was working very well until around November 2005, > when it stopped returning any result (the site is fine and seems to be > doing sth after submission). In the CGI script, I used remoteblast (that > work was done in 2003) to do searches. I currently do not have access to > the server because I moved. Quite several people sent emails to us about > its malfunctioning. Is there any suggestion on fixing the problem? Should > I simplily ask the remoteblast.pm be replaced with the new version? > Thanks a lot, > Guojun > > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > l at bioperl.org] > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > will > work for saving text output. However, it will not parse anything using > next_result (it will likely hang) and will not save XML format. See these > bugs: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > for explanations and possible fixes (changes to RemoteBlast and > Bio::SearchIO::blast). Note that these haven't been checked in yet so are > still not included in bioperl-live; they may be further modified before > committing to CVS. If you're not worried about XML, you could just try the > first fix, which is a change to SearchIO::blast. > > Nagesh, I remember you posting to the list a month ago using a script > which > had problems; the script you used saves the output but doesn't actually > parse it (i.e. you don't use next_result() to go through the data). Is the > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > parsing the output using "-readmethod => SearchIO" or "-readmethod => > blast" > using your version of RemoteBlast and method next_result()? Like below > (from > perldoc): > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { # parsing > starts here > my $result = $rc->next_result(); # it should hang > here > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > My script hanged if I used next_result() in any way prior to the fixes. I > want to see how many others are having the same issues with parsing using > the CVS version of bioperl-live. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > Sent: Thursday, February 02, 2006 7:24 PM > > To: Huang Jian; bioperl-l > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Hi Huang, > > Thanks for the message. The older version of RemoteBlast.pm works on the > > logic of checking the temporary file size to determine whether the Blast > > results are ready. This condition is not getting satisfied may be due to > > some changes brought about by NCBI. I had this problem recently and > > figured out that the solution was to use the latest version which has > > this problem fixed (does not use file size logic any more) which is not > > yet included in the BioPerl package. > > Cheers > > Nagesh > > > > Huang Jian wrote: > > > > > Dear Nagesh, > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > me. Now it works perfectly!!! > > > > > > Thank you!! > > > > > > Huang > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > via email > > > > > > > > >> Hi Huang, > > >> I see that you are submitting a sequence for a remote blast search. > Can > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > > >> not I have attached it with this email, try to replace it with the > old > > >> one which has a bug. > > >> Let me know if it works. > > >> Nagesh > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gyang at plantbio.uga.edu Mon Feb 13 13:32:14 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 13 Feb 2006 13:32:14 -0500 Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 In-Reply-To: <001801c630bc$dd35bff0$15327e82@pyrimidine> Message-ID: <20060213183214.342b90da@dogwood.plantbio.uga.edu> Hi, Chris, I do have different versions of bioperl on my Linux machine (1.4. and 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I need to uninstall and remove the previous versions. I could not find any hint on uninstalling bioperl on linux. Could you please give me some suggestion? Thanks, Guojun Department of Plant Biology University of Georgia _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org Sent: Mon, 13 Feb 2006 11:45:14 -0500 Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 If you?re using RemoteBlast 1.28, then you?ve likely updated from CVS which isn?t the latest fix. Make sure that you check the following: 1) Always post to the mailing list: http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed first. Perform a clean installation; do not upgrade only Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee that mixing modules from old and new distributions (1.4 and 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live installation will allow text output from BLAST v.2.2.12 to be saved and parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) but it should still save it. I believe as long as next_results() isn?t called, it will work. 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output are NOT in CVS; they haven?t been cleared and checked in by Roger Hall (who?s now taking care of RemoteBlast) and the powers that be (Jason or whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 http://bugzilla.bioperl.org/show_bug.cgi?id=1935 The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving XML output, so isn?t necessary if you don?t plan on using this option. And, remember, they haven?t been committed yet to CVS, which means that the final version will change to refle the new version. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign _____ From: Guojun Yang [mailto:gyang at plantbio.uga.edu] Sent: Monday, February 13, 2006 9:26 AM To: Chris Fields Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 Hi, Chris Thanks for your suggestion, however, it doesn't seem to work for my cgi even after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID. Is there any suggestion? Guojun Guojun Yang Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun _____ From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org Sent: Fri, 03 Feb 2006 16:07:29 -0500 Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 I would say give the new code a try, but realize that it hasn't been checked in (like I said below). I will try going over the modified Bio::SearchIO::blast again this weekend to see if there is anything I might have missed. The changed order in the header of BLAST text output has me a bit worried that it might not catch everything, but it at least doesn't hang in the while() loop I described in the bug report below (bug #1934) and seems to process everything fine. If you want more stability in the code, you might consider changing over to XML output and parsing with Bio::SearchIO::blastxml. There are some changes in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML output, but I believe it parses everything regardless. If you look back the last month or so there has been a bit of discussion here about it. Jason describes a bit on how to set up RemoteBlast for XML: http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Friday, February 03, 2006 1:45 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > Hi, Everybody, > I see this post and am wondering if this is the reason for the > malfunctionning of my webserver. We set up a webserver named MAK, for MITE > sequence analysis. It was working very well until around November 2005, > when it stopped returning any result (the site is fine and seems to be > doing sth after submission). In the CGI script, I used remoteblast (that > work was done in 2003) to do searches. I currently do not have access to > the server because I moved. Quite several people sent emails to us about > its malfunctioning. Is there any suggestion on fixing the problem? Should > I simplily ask the remoteblast.pm be replaced with the new version? > Thanks a lot, > Guojun > > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > l at bioperl.org] > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > will > work for saving text output. However, it will not parse anything using > next_result (it will likely hang) and will not save XML format. See these > bugs: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > for explanations and possible fixes (changes to RemoteBlast and > Bio::SearchIO::blast). Note that these haven't been checked in yet so are > still not included in bioperl-live; they may be further modified before > committing to CVS. If you're not worried about XML, you could just try the > first fix, which is a change to SearchIO::blast. > > Nagesh, I remember you posting to the list a month ago using a script > which > had problems; the script you used saves the output but doesn't actually > parse it (i.e. you don't use next_result() to go through the data). Is the > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > parsing the output using "-readmethod => SearchIO" or "-readmethod => > blast" > using your version of RemoteBlast and method next_result()? Like below > (from > perldoc): > > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { # parsing > starts here > my $result = $rc->next_result(); # it should hang > here > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > My script hanged if I used next_result() in any way prior to the fixes. I > want to see how many others are having the same issues with parsing using > the CVS version of bioperl-live. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > Sent: Thursday, February 02, 2006 7:24 PM > > To: Huang Jian; bioperl-l > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Hi Huang, > > Thanks for the message. The older version of RemoteBlast.pm works on the > > logic of checking the temporary file size to determine whether the Blast > > results are ready. This condition is not getting satisfied may be due to > > some changes brought about by NCBI. I had this problem recently and > > figured out that the solution was to use the latest version which has > > this problem fixed (does not use file size logic any more) which is not > > yet included in the BioPerl package. > > Cheers > > Nagesh > > > > Huang Jian wrote: > > > > > Dear Nagesh, > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > me. Now it works perfectly!!! > > > > > > Thank you!! > > > > > > Huang > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > via email > > > > > > > > >> Hi Huang, > > >> I see that you are submitting a sequence for a remote blast search. > Can > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If > > >> not I have attached it with this email, try to replace it with the > old > > >> one which has a bug. > > >> Let me know if it works. > > >> Nagesh > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Feb 13 15:39:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Feb 2006 14:39:38 -0600 Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 In-Reply-To: <20060213183214.342b90da@dogwood.plantbio.uga.edu> Message-ID: <000901c630dd$9be54f40$15327e82@pyrimidine> How do you know two versions are installed (i.e. how are you checking the version)? Do you see have two complete bioperl distributions (in two separate directories) or are you looking in modules? Here's the way to check the version (from the FAQ): perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' If you have two full bioperl distributions on your computer, normally only one will be in use unless you have explicitly set the environment variable PERL5LIB. The PERL5LIB directories will be searched first before your normal perl directory list (@INC) is searched. You MAY get some mixing then, but only if perl can't find a particular module in the path designated in PERL5LIB; then it will progress through the directories listed in @INC. This may happen if a module is unique to a particular release, but shouldn't happen for the majority of modules, including RemoteBlast. You can check what @INC and PERL5LIB are set to by using 'perl -V'. @INC will differ depending on your OS, perl build, etc. Regardless, if you follow the directions for installing bioperl for your system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you explicitly change the installation directory when using 'perl Makefile.PL'), then 'uninstalling' Bioperl shouldn't be a problem as it will install the Bioperl distribution you downloaded over the old version in @INC. See this page: http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL for more details. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Monday, February 13, 2006 12:32 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > Hi, Chris, > I do have different versions of bioperl on my Linux machine (1.4. and > 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I > need to uninstall and remove the previous versions. I could not find any > hint on uninstalling bioperl on linux. Could you please give me some > suggestion? > Thanks, > Guojun > > Department of Plant Biology > University of Georgia > _____ > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > 1.28 > > > > If you're using RemoteBlast 1.28, then you've likely updated from CVS > which isn't the latest fix. > > Make sure that you check the following: > > 1) Always post to the mailing list: > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) > installed first. Perform a clean installation; do not upgrade only > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't > guarantee that mixing modules from old and new distributions (1.4 and > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > installation will allow text output from BLAST v.2.2.12 to be saved and > parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) > but it should still save it. I believe as long as next_results() isn't > called, it will work. > > 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output > are NOT in CVS; they haven't been cleared and checked in by Roger Hall > (who's now taking care of RemoteBlast) and the powers that be (Jason or > whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of > saving XML output, so isn't necessary if you don't plan on using this > option. And, remember, they haven't been committed yet to CVS, which > means that the final version will change to refle the new version. > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _____ > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > Sent: Monday, February 13, 2006 9:26 AM > To: Chris Fields > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > 1.28 > > > Hi, Chris > > Thanks for your suggestion, however, it doesn't seem to work for my cgi > even after I replace both blast.pm and RemoteBlast.pm. I didn't even get > any RID. Is there any suggestion? > > > > Guojun > > > Guojun Yang > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > _____ > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > 1.28 > > I would say give the new code a try, but realize that it hasn't been > checked > in (like I said below). I will try going over the modified > Bio::SearchIO::blast again this weekend to see if there is anything I > might > have missed. The changed order in the header of BLAST text output has me a > bit worried that it might not catch everything, but it at least doesn't > hang > in the while() loop I described in the bug report below (bug #1934) and > seems to process everything fine. > > If you want more stability in the code, you might consider changing over > to > XML output and parsing with Bio::SearchIO::blastxml. There are some > changes > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML > output, but I believe it parses everything regardless. If you look back > the > last month or so there has been a bit of discussion here about it. Jason > describes a bit on how to set up RemoteBlast for XML: > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > Sent: Friday, February 03, 2006 1:45 PM > > To: bioperl-l at bioperl.org > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > > > Hi, Everybody, > > I see this post and am wondering if this is the reason for the > > malfunctionning of my webserver. We set up a webserver named MAK, for > MITE > > sequence analysis. It was working very well until around November 2005, > > when it stopped returning any result (the site is fine and seems to be > > doing sth after submission). In the CGI script, I used remoteblast (that > > work was done in 2003) to do searches. I currently do not have access to > > the server because I moved. Quite several people sent emails to us about > > its malfunctioning. Is there any suggestion on fixing the problem? > Should > > I simplily ask the remoteblast.pm be replaced with the new version? > > Thanks a lot, > > Guojun > > > > Department of Plant Biology > > University of Georgia > > Tel: 706-542-1857 > > Fax: 706-542-1805 > > http://www.arches.uga.edu/~guojun > > _____ > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > > l at bioperl.org] > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > > will > > work for saving text output. However, it will not parse anything using > > next_result (it will likely hang) and will not save XML format. See > these > > bugs: > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > for explanations and possible fixes (changes to RemoteBlast and > > Bio::SearchIO::blast). Note that these haven't been checked in yet so > are > > still not included in bioperl-live; they may be further modified before > > committing to CVS. If you're not worried about XML, you could just try > the > > first fix, which is a change to SearchIO::blast. > > > > Nagesh, I remember you posting to the list a month ago using a script > > which > > had problems; the script you used saves the output but doesn't actually > > parse it (i.e. you don't use next_result() to go through the data). Is > the > > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > > parsing the output using "-readmethod => SearchIO" or "-readmethod => > > blast" > > using your version of RemoteBlast and method next_result()? Like below > > (from > > perldoc): > > > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { # parsing > > starts here > > my $result = $rc->next_result(); # it should hang > > here > > #save the output > > my $filename = $result->query_name()."\.out"; > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > > > My script hanged if I used next_result() in any way prior to the fixes. > I > > want to see how many others are having the same issues with parsing > using > > the CVS version of bioperl-live. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > Sent: Thursday, February 02, 2006 7:24 PM > > > To: Huang Jian; bioperl-l > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > Hi Huang, > > > Thanks for the message. The older version of RemoteBlast.pm works on > the > > > logic of checking the temporary file size to determine whether the > Blast > > > results are ready. This condition is not getting satisfied may be due > to > > > some changes brought about by NCBI. I had this problem recently and > > > figured out that the solution was to use the latest version which has > > > this problem fixed (does not use file size logic any more) which is > not > > > yet included in the BioPerl package. > > > Cheers > > > Nagesh > > > > > > Huang Jian wrote: > > > > > > > Dear Nagesh, > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > > me. Now it works perfectly!!! > > > > > > > > Thank you!! > > > > > > > > Huang > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > > via email > > > > > > > > > > > >> Hi Huang, > > > >> I see that you are submitting a sequence for a remote blast search. > > Can > > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). > If > > > >> not I have attached it with this email, try to replace it with the > > old > > > >> one which has a bug. > > > >> Let me know if it works. > > > >> Nagesh > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gyang at plantbio.uga.edu Mon Feb 13 16:00:11 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 13 Feb 2006 16:00:11 -0500 Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 Message-ID: <20060213160011.1e89108c@dogwood.plantbio.uga.edu> Thanks, Chris, I installed version 1.5.1 and replaced the blast.pm file with the one from your bug report. The running version is 1.5 when I use the command you sent me. But when I tried the script, it doesn't change much. My remoteblast code (portion) is here: sub search { local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; local $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= 'no'; local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; my $query = Bio::Seq -> new ( -seq=>"$_[0]", -id=>"query", -desc=>"new seq"); my $len=$query->length(); @db=('nr','htgs','wgs'); foreach my $db (@db) { my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', '-data' =>"$db", '-expect'=>"$E_value"); my $blast_report = $factory->submit_blast($query); my @rids = $factory->each_rid(); foreach my $rid ( @rids ) { print STDERR "$rid\n"; } # RID = Remote Blast ID (e.g: 1017772174-16400-6638) print STDERR "waiting..."; sleep 60; foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); while (!ref($rc) ) { if( $rc < 0 ) { # retrieve_blast returns -1 on error $factory->remove_rid($rid); print "Error!\n"; send_error($email,$function,$seqname,$queryname[$ST]); die "Can't retrieve $rid"; } if ($rc==0) { # retrieve_blast returns 0 on 'job not finished' sleep 60; $rc = $factory->retrieve_blast($rid); } } if (ref($rc)) { print STDERR "Done.\n"; while( my $result = $rc->next_result) { while( my $hit = $result->next_hit()) { $hit_name=$hit->name; $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; $name=$1; @left_plus_start=(); @left_plus_end=(); @left_minus_start=(); @left_minus_end=(); @right_plus_start=(); @right_plus_end=(); @right_minus_start=(); @right_minus_end=(); if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { while( my $hsp = $hit->next_hsp()) { ...... It was working quite well before around October laster year, but it has stopped since then, When a submission is sent via a webpage, the cgi starts to work and use a memory of ~20 Mb. Then it hangs there, finally the expected email is received but without real results although it does contain something from other parts of the script. Apparently the search sub did not return anything (I know there is something should be returned.). Is it also possible the format of the NCBI output for each result has changed? Thank you, Guojun Department of Plant Biology University of Georgia ----- Original Message ----- From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > How do you know two versions are installed (i.e. how are you checking the > version)? Do you see have two complete bioperl distributions (in two > separate directories) or are you looking in modules? Here's the way to > check the version (from the FAQ): > > perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"' > > If you have two full bioperl distributions on your computer, normally only > one will be in use unless you have explicitly set the environment variable > PERL5LIB. The PERL5LIB directories will be searched first before your > normal perl directory list (@INC) is searched. You MAY get some mixing > then, but only if perl can't find a particular module in the path designated > in PERL5LIB; then it will progress through the directories listed in @INC. > This may happen if a module is unique to a particular release, but shouldn't > happen for the majority of modules, including RemoteBlast. You can check > what @INC and PERL5LIB are set to by using 'perl -V'. @INC will differ > depending on your OS, perl build, etc. > > Regardless, if you follow the directions for installing bioperl for your > system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you > explicitly change the installation directory when using 'perl Makefile.PL'), > then 'uninstalling' Bioperl shouldn't be a problem as it will install the > Bioperl distribution you downloaded over the old version in @INC. See this > page: > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > for more details. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > Sent: Monday, February 13, 2006 12:32 PM > > To: bioperl-l at lists.open-bio.org > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > Hi, Chris, > > I do have different versions of bioperl on my Linux machine (1.4. and > > 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I > > need to uninstall and remove the previous versions. I could not find any > > hint on uninstalling bioperl on linux. Could you please give me some > > suggestion? > > Thanks, > > Guojun > > > > Department of Plant Biology > > University of Georgia > > _____ > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > > 1.28 > > > > > > > > If you're using RemoteBlast 1.28, then you've likely updated from CVS > > which isn't the latest fix. > > > > Make sure that you check the following: > > > > 1) Always post to the mailing list: > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) > > installed first. Perform a clean installation; do not upgrade only > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't > > guarantee that mixing modules from old and new distributions (1.4 and > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > installation will allow text output from BLAST v.2.2.12 to be saved and > > parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) > > but it should still save it. I believe as long as next_results() isn't > > called, it will work. > > > > 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output > > are NOT in CVS; they haven't been cleared and checked in by Roger Hall > > (who's now taking care of RemoteBlast) and the powers that be (Jason or > > whomever is in charge of Bio::SearchIO). They can be found in Bugzilla: > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of > > saving XML output, so isn't necessary if you don't plan on using this > > option. And, remember, they haven't been committed yet to CVS, which > > means that the final version will change to refle the new version. > > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > _____ > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > Sent: Monday, February 13, 2006 9:26 AM > > To: Chris Fields > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > > 1.28 > > > > > > Hi, Chris > > > > Thanks for your suggestion, however, it doesn't seem to work for my cgi > > even after I replace both blast.pm and RemoteBlast.pm. I didn't even get > > any RID. Is there any suggestion? > > > > > > > > Guojun > > > > > > Guojun Yang > > Department of Plant Biology > > University of Georgia > > Tel: 706-542-1857 > > Fax: 706-542-1805 > > http://www.arches.uga.edu/~guojun > > _____ > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version > > 1.28 > > > > I would say give the new code a try, but realize that it hasn't been > > checked > > in (like I said below). I will try going over the modified > > Bio::SearchIO::blast again this weekend to see if there is anything I > > might > > have missed. The changed order in the header of BLAST text output has me a > > bit worried that it might not catch everything, but it at least doesn't > > hang > > in the while() loop I described in the bug report below (bug #1934) and > > seems to process everything fine. > > > > If you want more stability in the code, you might consider changing over > > to > > XML output and parsing with Bio::SearchIO::blastxml. There are some > > changes > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML > > output, but I believe it parses everything regardless. If you look back > > the > > last month or so there has been a bit of discussion here about it. Jason > > describes a bit on how to set up RemoteBlast for XML: > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/ > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > Sent: Friday, February 03, 2006 1:45 PM > > > To: bioperl-l at bioperl.org > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28 > > > > > > Hi, Everybody, > > > I see this post and am wondering if this is the reason for the > > > malfunctionning of my webserver. We set up a webserver named MAK, for > > MITE > > > sequence analysis. It was working very well until around November 2005, > > > when it stopped returning any result (the site is fine and seems to be > > > doing sth after submission). In the CGI script, I used remoteblast (that > > > work was done in 2003) to do searches. I currently do not have access to > > > the server because I moved. Quite several people sent emails to us about > > > its malfunctioning. Is there any suggestion on fixing the problem? > > Should > > > I simplily ask the remoteblast.pm be replaced with the new version? > > > Thanks a lot, > > > Guojun > > > > > > Department of Plant Biology > > > University of Georgia > > > Tel: 706-542-1857 > > > Fax: 706-542-1805 > > > http://www.arches.uga.edu/~guojun > > > _____ > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl- > > > l at bioperl.org] > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It > > > will > > > work for saving text output. However, it will not parse anything using > > > next_result (it will likely hang) and will not save XML format. See > > these > > > bugs: > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > for explanations and possible fixes (changes to RemoteBlast and > > > Bio::SearchIO::blast). Note that these haven't been checked in yet so > > are > > > still not included in bioperl-live; they may be further modified before > > > committing to CVS. If you're not worried about XML, you could just try > > the > > > first fix, which is a change to SearchIO::blast. > > > > > > Nagesh, I remember you posting to the list a month ago using a script > > > which > > > had problems; the script you used saves the output but doesn't actually > > > parse it (i.e. you don't use next_result() to go through the data). Is > > the > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried > > > parsing the output using "-readmethod => SearchIO" or "-readmethod => > > > blast" > > > using your version of RemoteBlast and method next_result()? Like below > > > (from > > > perldoc): > > > > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { # parsing > > > starts here > > > my $result = $rc->next_result(); # it should hang > > > here > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\n"; > > > while( my $hsp = $hit->next_hsp ) { > > > print "\t\tscore is ", $hsp->score, "\n"; > > > } > > > } > > > } > > > } > > > } > > > } > > > > > > > > > My script hanged if I used next_result() in any way prior to the fixes. > > I > > > want to see how many others are having the same issues with parsing > > using > > > the CVS version of bioperl-live. > > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > To: Huang Jian; bioperl-l > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > Hi Huang, > > > > Thanks for the message. The older version of RemoteBlast.pm works on > > the > > > > logic of checking the temporary file size to determine whether the > > Blast > > > > results are ready. This condition is not getting satisfied may be due > > to > > > > some changes brought about by NCBI. I had this problem recently and > > > > figured out that the solution was to use the latest version which has > > > > this problem fixed (does not use file size logic any more) which is > > not > > > > yet included in the BioPerl package. > > > > Cheers > > > > Nagesh > > > > > > > > Huang Jian wrote: > > > > > > > > > Dear Nagesh, > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send > > > > > me. Now it works perfectly!!! > > > > > > > > > > Thank you!! > > > > > > > > > > Huang > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still > > > > > via email > > > > > > > > > > > > > > >> Hi Huang, > > > > >> I see that you are submitting a sequence for a remote blast search. > > > Can > > > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). > > If > > > > >> not I have attached it with this email, try to replace it with the > > > old > > > > >> one which has a bug. > > > > >> Let me know if it works. > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From akarger at CGR.Harvard.edu Mon Feb 13 15:57:08 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 13 Feb 2006 15:57:08 -0500 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA Message-ID: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> I'm trying to get the sequences of each exon in a gene. I have a genbank file with mRNA and exon features (among others) that look like: mRNA join(complement(22257..22386),complement(22067..22186), complement(16753..17101),complement(13840..13962), complement(10649..10820),complement(502..3028)) /gene="ENSG00000005812" /note="transcript_id=ENST00000355619" exon complement(13840..13962) /note="exon_id=ENSE00000802462" I want to make a FASTA file with 6 sequences corresponding to the 6 exons in the mRNA above. I tried writing the below code, but it doesn't do what I want. (You'll note that the code is stolen from the Bio::Seq and Feature HOWTOs.) my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); while (my $seq = $inseq->next_seq) { my @features = $seq->get_SeqFeatures(); # just top level foreach my $feat ( @features ) { my $type = $feat->primary_tag; if ($type eq "mRNA") { print "Feature ",$feat->primary_tag, " starts ",$feat->start," ends ", $feat->end, " strand ",$feat->strand,"\n"; my @feats = $feat->get_SeqFeatures(); print "Found ", scalar @feats, " sub-features\n"; } elsif ($type eq "exon") { print "Feature ",$feat->primary_tag, " starts ",$feat->start," ends ", $feat->end, " strand ",$feat->strand,"\n"; } } } When I run the above, it says that the mRNA features have no sub-features. So how do I pull out the 6 sequences? Thanks, - Amir Karger Computational Biology Group Bauer Center for Genomics Research Harvard University 617-496-0626 From cjfields at uiuc.edu Mon Feb 13 18:18:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 13 Feb 2006 17:18:24 -0600 Subject: [Bioperl-l] INSTALL.WIN in wiki Message-ID: <000001c630f3$c9efa5f0$15327e82@pyrimidine> I just added "Installing Bioperl on Windows" to the wiki. It needs some major updating and changes in formatting: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows Jason has mentioned changing up some of the INSTALL docs for the wiki (http://www.bioperl.org/wiki/Talk:Getting_BioPerl). Any thoughts? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Mon Feb 13 20:38:30 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 13 Feb 2006 20:38:30 -0500 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> Message-ID: Amir, The idea is to look at the sub-locations in the SplitLocation object, this is discussed in FAQ 5.2: http://www.bioperl.org/wiki/FAQ#How_do_I_parse_the_CDS_join_or_complement_st atements_in_GenBank_or_EMBL_files_to_get_the_sub-locations.3F The sequence of the feature itself can be obtained by using the entire_seq() method: http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences Brian O. On 2/13/06 3:57 PM, "Amir Karger" wrote: > I'm trying to get the sequences of each exon in a gene. I have a genbank > file with mRNA and exon features (among others) that look like: > mRNA join(complement(22257..22386),complement(22067..22186), > complement(16753..17101),complement(13840..13962), > complement(10649..10820),complement(502..3028)) > /gene="ENSG00000005812" > /note="transcript_id=ENST00000355619" > exon complement(13840..13962) > /note="exon_id=ENSE00000802462" > > I want to make a FASTA file with 6 sequences corresponding to the 6 exons in > the mRNA above. I tried writing the below code, but it doesn't do what I > want. (You'll note that the code is stolen from the Bio::Seq and Feature > HOWTOs.) > > my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); > while (my $seq = $inseq->next_seq) { > my @features = $seq->get_SeqFeatures(); # just top level > foreach my $feat ( @features ) { > my $type = $feat->primary_tag; > if ($type eq "mRNA") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > my @feats = $feat->get_SeqFeatures(); > print "Found ", scalar @feats, " sub-features\n"; > } elsif ($type eq "exon") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > } > } > } > > When I run the above, it says that the mRNA features have no sub-features. > So how do I pull out the 6 sequences? > > Thanks, > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > 617-496-0626 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Mon Feb 13 18:58:46 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 13 Feb 2006 15:58:46 -0800 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> References: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO> Message-ID: Why you want subfeatures? This is genbank format you're parsing, right? Your mRNA features will have a split location. Loop over $feat->location->each_Location() and get $seq->subseq() with the start and end of each sublocation. If you don't know how to do this check out the implementation of $feature->splice_seq(). This should be in the HOWTO. Is it not? -hilmar On 2/13/06, Amir Karger wrote: > I'm trying to get the sequences of each exon in a gene. I have a genbank > file with mRNA and exon features (among others) that look like: > mRNA join(complement(22257..22386),complement(22067..22186), > complement(16753..17101),complement(13840..13962), > complement(10649..10820),complement(502..3028)) > /gene="ENSG00000005812" > /note="transcript_id=ENST00000355619" > exon complement(13840..13962) > /note="exon_id=ENSE00000802462" > > I want to make a FASTA file with 6 sequences corresponding to the 6 exons in > the mRNA above. I tried writing the below code, but it doesn't do what I > want. (You'll note that the code is stolen from the Bio::Seq and Feature > HOWTOs.) > > my $inseq = Bio::SeqIO->new(-file => "<$file", -format => $format ); > while (my $seq = $inseq->next_seq) { > my @features = $seq->get_SeqFeatures(); # just top level > foreach my $feat ( @features ) { > my $type = $feat->primary_tag; > if ($type eq "mRNA") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > my @feats = $feat->get_SeqFeatures(); > print "Found ", scalar @feats, " sub-features\n"; > } elsif ($type eq "exon") { > print "Feature ",$feat->primary_tag, > " starts ",$feat->start," ends ", $feat->end, > " strand ",$feat->strand,"\n"; > } > } > } > > When I run the above, it says that the mRNA features have no sub-features. > So how do I pull out the 6 sequences? > > Thanks, > - Amir Karger > Computational Biology Group > Bauer Center for Genomics Research > Harvard University > 617-496-0626 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From osborne1 at optonline.net Mon Feb 13 21:11:33 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 13 Feb 2006 21:11:33 -0500 Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA In-Reply-To: Message-ID: Hilmar, It could be spelled out a bit more explicitly. Brian O. On 2/13/06 6:58 PM, "Hilmar Lapp" wrote: > This should be in the HOWTO. Is it not? From rmb32 at cornell.edu Mon Feb 13 17:12:10 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 13 Feb 2006 17:12:10 -0500 Subject: [Bioperl-l] game xml SeqIO Message-ID: <43F1043A.2000205@cornell.edu> Hi all, Currently, the SeqIO for doing GAME XML does not seem to support writing (or reading?) elements. Am I correct? If I am, are there any plans to add this functionality? Can I help / do it? If there are plans to add this, how would one distinguish SeqFeatures that should be rendered as from SeqFeatures that should be rendered as ? Would we do that with Bio::SeqFeature::Computation? I assume that a given Seq can have SeqFeatures of different types associated with it (I don't know, I'm a bioperl newb). Rob -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 607-255-2360 rmb32 at cornell.edu http://www.sgn.cornell.edu From heikki at sanbi.ac.za Tue Feb 14 01:59:29 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 14 Feb 2006 08:59:29 +0200 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <200602100906.11885.heikki@sanbi.ac.za> References: <200602100906.11885.heikki@sanbi.ac.za> Message-ID: <200602140859.30136.heikki@sanbi.ac.za> I've committed an interim solution to the sequence evolution problem: $newseq = Bio::SeqUtils-> evolve ($seq, $similarity, $transition_transversion_rate); I will go on to transform this code to fully OO, extensible solution. -Heikki On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: > Ryan Golhar's mail got me thinking that we should have a simple framework > for mutating sequences to a desired level. The model can then be extended > to necessary complexity when needed by subclassing. > > To start with, I have been planning: > > > Bio::SeqEvolution::EvolutionI - interface file > Bio::SeqEvolution::EvolutionI::seq() - seq to mutate > Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, > (defaults to Bio::PrimarySeq) > Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses > Bio::SeqEvolution::EvolutionI::each_seqs($count) > - returns an array of $count seqs > Bio::SeqEvolution::EvolutionI::_generate_seq() > Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring > converteed to probabilites of change internally > > various methods to define the extent of divergence: > only one to start with: > Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation > (= 100% - identity) > > Bio::SeqEvolution::Factory - core class to call, > instantiates subclasses, Bio::SeqEvolution::DNASimple for > nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, > defaults to Bio::SeqEvolution::DNASimple for nucleotides > > > Bio::SeqEvolution::DNASimple - default for nucleotides > Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, > e.g. 5 => 5:1, defaults to 1:1 > simple alternative to a scoring matrix > > > I am soliciting usual comments and suggestions about naming and minimal > functionality. > > > -Heikki -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From gbazykin at Princeton.EDU Tue Feb 14 09:34:54 2006 From: gbazykin at Princeton.EDU (Georgii A Bazykin) Date: Tue, 14 Feb 2006 09:34:54 -0500 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <200602140859.30136.heikki@sanbi.ac.za> References: <200602100906.11885.heikki@sanbi.ac.za> <200602140859.30136.heikki@sanbi.ac.za> Message-ID: <214316262.20060214093454@princeton.edu> Hi, Just a thought: I really think that in perspective, it would be nice to be able to evolve the sequence along a tree of given shape. I think PAML's "evolver" has this functionality. I've already been doing this in my scripts, but I am not sure how to couple the tree and the sequence data properly. Yegor (George) Bazykin ------------------------------ Tuesday, February 14, 2006, 1:59:29 AM, you wrote: > I've committed an interim solution to the sequence evolution problem: > $newseq = Bio::SeqUtils-> evolve > ($seq, $similarity, $transition_transversion_rate); > I will go on to transform this code to fully OO, extensible solution. > -Heikki > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: >> Ryan Golhar's mail got me thinking that we should have a simple framework >> for mutating sequences to a desired level. The model can then be extended >> to necessary complexity when needed by subclassing. >> >> To start with, I have been planning: >> >> >> Bio::SeqEvolution::EvolutionI - interface file >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, >> (defaults to Bio::PrimarySeq) >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses >> Bio::SeqEvolution::EvolutionI::each_seqs($count) >> - returns an array of $count seqs >> Bio::SeqEvolution::EvolutionI::_generate_seq() >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring >> converteed to probabilites of change internally >> >> various methods to define the extent of divergence: >> only one to start with: >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation >> (= 100% - identity) >> >> Bio::SeqEvolution::Factory - core class to call, >> instantiates subclasses, Bio::SeqEvolution::DNASimple for >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, >> defaults to Bio::SeqEvolution::DNASimple for nucleotides >> >> >> Bio::SeqEvolution::DNASimple - default for nucleotides >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, >> e.g. 5 => 5:1, defaults to 1:1 >> simple alternative to a scoring matrix >> >> >> I am soliciting usual comments and suggestions about naming and minimal >> functionality. >> >> >> -Heikki From maximilianh at gmail.com Tue Feb 14 05:11:42 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Tue, 14 Feb 2006 11:11:42 +0100 Subject: [Bioperl-l] [BiO BB] Re: Tool to mutate DNA sequence In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu> Message-ID: <76f031ae0602140211n2a0bbf4fl@mail.gmail.com> The tool ROSE also evolves sequences on a tree. There is a web interface and downloadable source at http://bibiserv.techfak.uni-bielefeld.de/rose/ Max On 09/02/06, Jason Stajich wrote: > Depending on whether or not you want to use evolutionary realistic > models... > * evolver which comes with PAML lets you evolve sequences on a tree > * SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html? > id=seqgen > also lets you do this > I believe there are PISE interfaces to both of these at the pasteur > bioweb site - http://bioweb.pasteur.fr/ > > -jason > On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote: > > > Does anyone know of tool to mutate a DNA sequence by a specified > > amount? > > For instance, say I have a DNA sequence 1000 bases long, and I want to > > simulate mutations to make it 75% (or 80%, etc) similar to the > > original. > > > > > > Ryan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioinformatics.Org general forum - BiO_Bulletin_Board at bioinformatics.org > https://bioinformatics.org/mailman/listinfo/bio_bulletin_board > -- Maximilian Haeussler, CNRS Gif-sur-Yvette, Paris tel: +33 6 12 82 76 16 icq: 3825815 -- msn: maximilian.haeussler at hpi.uni-potsdam.de skype: maximilianhaeussler From heikki at sanbi.ac.za Tue Feb 14 11:09:27 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 14 Feb 2006 18:09:27 +0200 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <214316262.20060214093454@princeton.edu> References: <200602100906.11885.heikki@sanbi.ac.za> <200602140859.30136.heikki@sanbi.ac.za> <214316262.20060214093454@princeton.edu> Message-ID: <200602141809.28057.heikki@sanbi.ac.za> Yegor, Like you said, there are examples how it is done.. It should be possible to evolve sequences based on a rooted tree. You just walk the tree and evolve each sequence from its parent. If there is an agreement how the branch lengths get translated to mutations, even that could be done. Do you have any suggestions? -Heikki On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote: > Hi, > > Just a thought: I really think that in perspective, it would be nice > to be able to evolve the sequence along a tree of given shape. I think > PAML's "evolver" has this functionality. I've already been doing this > in my scripts, but I am not sure how to couple the tree and the > sequence data properly. > > Yegor (George) Bazykin > > > ------------------------------ > > Tuesday, February 14, 2006, 1:59:29 AM, you wrote: > > I've committed an interim solution to the sequence evolution problem: > > > > $newseq = Bio::SeqUtils-> evolve > > ($seq, $similarity, $transition_transversion_rate); > > > > I will go on to transform this code to fully OO, extensible solution. > > > > -Heikki > > > > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: > >> Ryan Golhar's mail got me thinking that we should have a simple > >> framework for mutating sequences to a desired level. The model can then > >> be extended to necessary complexity when needed by subclassing. > >> > >> To start with, I have been planning: > >> > >> > >> Bio::SeqEvolution::EvolutionI - interface file > >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate > >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, > >> (defaults to Bio::PrimarySeq) > >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses > >> Bio::SeqEvolution::EvolutionI::each_seqs($count) > >> - returns an array of $count seqs > >> Bio::SeqEvolution::EvolutionI::_generate_seq() > >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring > >> converteed to probabilites of change internally > >> > >> various methods to define the extent of divergence: > >> only one to start with: > >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation > >> (= 100% - identity) > >> > >> Bio::SeqEvolution::Factory - core class to call, > >> instantiates subclasses, Bio::SeqEvolution::DNASimple for > >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, > >> defaults to Bio::SeqEvolution::DNASimple for nucleotides > >> > >> > >> Bio::SeqEvolution::DNASimple - default for nucleotides > >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, > >> e.g. 5 => 5:1, defaults to 1:1 > >> simple alternative to a scoring matrix > >> > >> > >> I am soliciting usual comments and suggestions about naming and minimal > >> functionality. > >> > >> > >> -Heikki > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From golharam at umdnj.edu Tue Feb 14 12:01:38 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 14 Feb 2006 12:01:38 -0500 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za> Message-ID: <016401c63188$52c9d4b0$2f01a8c0@GOLHARMOBILE1> Here are my two cents.... 1. Allow sequences to be mutated by some percent amount. 2. Use mutation patterns implied by PAM matrices or some known models of mutation. 3. Have the output show the original sequences and the mutated sequence so you can easily identify what was mutated and what is conserved. Ryan -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho Sent: Tuesday, February 14, 2006 11:09 AM To: bioperl-l at lists.open-bio.org; Georgii A Bazykin Subject: Re: [Bioperl-l] planning sequence mutating modules Yegor, Like you said, there are examples how it is done.. It should be possible to evolve sequences based on a rooted tree. You just walk the tree and evolve each sequence from its parent. If there is an agreement how the branch lengths get translated to mutations, even that could be done. Do you have any suggestions? -Heikki On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote: > Hi, > > Just a thought: I really think that in perspective, it would be nice > to be able to evolve the sequence along a tree of given shape. I think > PAML's "evolver" has this functionality. I've already been doing this > in my scripts, but I am not sure how to couple the tree and the > sequence data properly. > > Yegor (George) Bazykin > > > ------------------------------ > > Tuesday, February 14, 2006, 1:59:29 AM, you wrote: > > I've committed an interim solution to the sequence evolution > > problem: > > > > $newseq = Bio::SeqUtils-> evolve > > ($seq, $similarity, $transition_transversion_rate); > > > > I will go on to transform this code to fully OO, extensible > > solution. > > > > -Heikki > > > > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: > >> Ryan Golhar's mail got me thinking that we should have a simple > >> framework for mutating sequences to a desired level. The model can > >> then be extended to necessary complexity when needed by > >> subclassing. > >> > >> To start with, I have been planning: > >> > >> > >> Bio::SeqEvolution::EvolutionI - interface file > >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate > >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, > >> (defaults to Bio::PrimarySeq) > >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by > >> subclasses > >> Bio::SeqEvolution::EvolutionI::each_seqs($count) > >> - returns an array of $count seqs > >> Bio::SeqEvolution::EvolutionI::_generate_seq() > >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring > >> converteed to probabilites of change internally > >> > >> various methods to define the extent of divergence: > >> only one to start with: > >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation > >> (= 100% - identity) > >> > >> Bio::SeqEvolution::Factory - core class to call, > >> instantiates subclasses, Bio::SeqEvolution::DNASimple for > >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, > >> defaults to Bio::SeqEvolution::DNASimple for nucleotides > >> > >> > >> Bio::SeqEvolution::DNASimple - default for nucleotides > >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, > >> e.g. 5 => 5:1, defaults to 1:1 > >> simple alternative to a scoring matrix > >> > >> > >> I am soliciting usual comments and suggestions about naming and > >> minimal functionality. > >> > >> > >> -Heikki > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From hjm at tacgi.com Tue Feb 14 12:15:11 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Tue, 14 Feb 2006 09:15:11 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: References: Message-ID: <200602140915.11604.hjm@tacgi.com> Hi Brian, Thanks very much for the pointers and the speed of your reply and apologies for the speed of mine. This looks good, but what I was looking for was a bioP approach for hooking to an API at NCBI or EBI so I could get this info and seqs from them. In this case, speed of retrieval is not critical and I'd rather not download the entirety of the sequences to a local disk to hack at them. I've determined a screen-scraping approach to get them and could script that, but I thought that bioP had a method for using NCBI's external API's, tho it may be that my memory is faulty or the approach is no longer supported due to overload. Does NCBI make such APIs available anymore? I searched a bit for docs on them but couldn't find anything (unless it's buried in the NCBI tookit, which I haven't started to excavate). Failing that, would SEALS provide such a service? Any PerlPinipeds listening? Harry On Sunday 12 February 2006 08:37, Brian Osborne wrote: > Harry, > > Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, > from its documentation: > > use Bio::DB::Fasta; > > # create database from directory of fasta files > my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > # simple access (for those without Bioperl) > my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > my @ids = $db->ids; > my $length = $db->length('CHROMOSOME_I'); > my $alphabet = $db->alphabet('CHROMOSOME_I'); > my $header = $db->header('CHROMOSOME_I'); > > # Bioperl-style access > my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > my $seq = $obj->seq; > my $subseq = $obj->subseq(4_000_000 => 4_100_000); > > Do you already have the offsets? > > Brian O. > > On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > > Hi All, > > > > After perusing the tutorial and other docs for a an evening, I still > > can't find the answer to this. Forgive me if I've missed something > > obvious. > > > > This should not be a novel request, but I've not found it answered. If > > bioperl isn't the best way to do this, I'd be grateful to a pointer to a > > better way, especially if it includes an illuminating bit of code. > > > > The problem is to retrieve genomic sequences plus & minus some offset > > from a locus determined by HUGO keyword or GeneID. This would be a > > common followup chore for some extra analysis from a gene expression > > expt. Or maybe this is in the DBFetch routines, but I've missed the > > sequence type to specify...? > > > > > > TIA! -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From jason.stajich at duke.edu Tue Feb 14 13:25:21 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 14 Feb 2006 13:25:21 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> References: <200602140915.11604.hjm@tacgi.com> Message-ID: <13B3724F-3716-4C4B-95A7-6849EF167A80@duke.edu> Are you working spp that are in Ensembl? Is what you need not provided by Ensembl/EnsMart? Seems like they are doing the best job integrating gene ids to a central place. It is not exactly clear what API you are referring to - you can query Entrez via Bio::DB::Query::GenBank so if you can construct your query via the Entrez syntax you can access and retrieve it in bioperl. -jason On Feb 14, 2006, at 12:15 PM, Harry Mangalam wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and > apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for > hooking to > an API at NCBI or EBI so I could get this info and seqs from them. > In this > case, speed of retrieval is not critical and I'd rather not > download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could > script that, > but I thought that bioP had a method for using NCBI's external > API's, tho it > may be that my memory is faulty or the approach is no longer > supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for > docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, > which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds > listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on >> Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it >>> answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a >>> pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some >>> offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! > > -- > Cheers, Harry > Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com > <> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue Feb 14 13:40:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 14 Feb 2006 12:40:31 -0600 Subject: [Bioperl-l] FW: more on RemoteBlast.pm version 1.2 Message-ID: <000e01c63196$225159d0$15327e82@pyrimidine> Sorry, forgot to add that I didn't see the regex issue that you mentioned. It could be a perl-related issue. Try the fixes I mentioned and see what happens. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > Sent: Tuesday, February 14, 2006 12:36 PM > To: 'gyang at plantbio.uga.edu' > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > It's a good habit to always add single quotes around words. The perl > interpreter may think a single bare word is a subroutine or perlfunc > called with no args so will try to find a subroutine named blastp(). My > debugger actually gives the error that the bare word blastp may conflict > with a future reserved word. Like you said, 'use strict' will point that > out. > > As for the regex, it should match all the blast programs at NCBI (blastp, > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > else passes through. > > So, if you are using the script below, there are several errors. The bare > words for $prog and $db need quotes, and the flags for you @params array > don't have a dash before them. I get this after adding quotes but before > adding the dashes to @params: > > C:\Perl\Scripts>test_blast.pl > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: > STACK: Error::throw > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > live/Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > live/Bio/Tools/Run/RemoteBlast.pm:256 > STACK: C:\Perl\Scripts\test_blast.pl:15 > ----------------------------------------------------------- > > The last line indicates a problem with this line: > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > Changing the @params to this: > > my @params=( -prog=>$prog, > -data=>$db, > -expect=>$e_val, > -readmethod=>'SearchIO'); > > fixes it, and I get output as expected. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > Sent: Tuesday, February 14, 2006 11:48 AM > > To: Chris Fields; bioperl-l at lists.open-bio.org > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > Hi, Chris, > > When I tried with the perldoc script, It did not work either. First it > > says $prog can not be bare word if I "use strict". I added quotes on the > > words, then it says the value for $prog does not match expression > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > script > > is shown below. Why is the expression "t?blast[pnx]"? > > > > #!/usr/bin/perl > > > > use Bio::SeqIO; > > use Bio::Seq; > > use Bio::Tools::Run::RemoteBlast; > > use Bio::SearchIO; > > > > > > my $prog=blastp; > > my $db=swissprot; > > my $e_val=1e-10; > > my @params=( prog=>$prog, > > data=>$db, > > expect=>$e_val, > > readmethod=>'SearchIO'); > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > my $v = 1; > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > while (my $input = $str->next_seq()){ > > #Blast a sequence against a database: > > #Alternatively, you could pass in a file with many > > #sequences rather than loop through sequence one at a time > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = $result->query_name()."\.out"; > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > > } > > > > Thank you for your help! > > > > > > Guojun > > Department of Plant Biology > > University of Georgia > > > > ----- Original Message ----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > To: gyang at plantbio.uga.edu > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > Try two things: > > > > 1) Use a much simpler script, like the one in 'perldoc > > > Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > > wrong > > > with the logic in your subroutine: > > > > my $v = 1; > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > while (my $input = $str->next_seq()){ > > > #Blast a sequence against a database: > > > #Alternatively, you could pass in a file with many > > > #sequences rather than loop through sequence one at a time > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > #and swap the two lines below for an example of that. > > > my $r = $factory->submit_blast($input); > > > #my $r = $factory->submit_blast('amino.fa'); > > > print STDERR "waiting..." if( $v > 0 ); > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { > > > my $result = $rc->next_result(); > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\n"; > > > while( my $hsp = $hit->next_hsp ) { > > > print "\t\tscore is ", $hsp->score, "\n"; > > > } > > > } > > > } > > > } > > > } > > > } > > > > 2) Try the RemoteBlast from Bugzilla and see if that works. It > really > > > shouldn't make that much of a difference, but I noticed that the CVS > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > > > released; the Bugzilla version is based off CVS. > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > Sent: Monday, February 13, 2006 3:00 PM > > > > To: bioperl-l at lists.open-bio.org > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > Thanks, Chris, > > > > I installed version 1.5.1 and replaced the blast.pm file with the > one > > from > > > > your bug report. The running version is 1.5 when I use the command > you > > > > sent me. But when I tried the script, it doesn't change much. My > > > > remoteblast code (portion) is here: > > > > > > sub search { > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > > > local > > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > > > 'no'; > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > > > -id=>"query", > > > > -desc=>"new seq"); > > > > my $len=$query->length(); > > > > @db=('nr','htgs','wgs'); > > > > foreach my $db (@db) { > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', > > > > '-data' =>"$db", > > > > '-expect'=>"$E_value"); > > > > > > > > my $blast_report = $factory->submit_blast($query); > > > > > > my @rids = $factory->each_rid(); > > > > foreach my $rid ( @rids ) { > > > > print STDERR "$rid\n"; > > > > } > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > > > print STDERR "waiting..."; > > > > sleep 60; > > > > > > foreach my $rid ( @rids ) { > > > > my $rc = $factory->retrieve_blast($rid); > > > > while (!ref($rc) ) { > > > > if( $rc < 0 ) { > > > > # retrieve_blast returns -1 on error > > > > $factory->remove_rid($rid); > > > > print "Error!\n"; > > > > send_error($email,$function,$seqname,$queryname[$ST]); > > > > die "Can't retrieve $rid"; > > > > } if ($rc==0) { # retrieve_blast returns 0 on 'job not > finished' > > > > sleep 60; > > > > $rc = $factory->retrieve_blast($rid); > > > > } > > > > } > > > > if (ref($rc)) { > > > > print STDERR "Done.\n"; > > > > while( my $result = $rc->next_result) { > > > > while( my $hit = $result->next_hit()) { > > > > $hit_name=$hit->name; > > > > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > > > $name=$1; > > > > @left_plus_start=(); > > > > @left_plus_end=(); > > > > @left_minus_start=(); > > > > @left_minus_end=(); > > > > @right_plus_start=(); > > > > @right_plus_end=(); > > > > @right_minus_start=(); > > > > @right_minus_end=(); > > > > > > if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > > > while( my $hsp = $hit->next_hsp()) { > > > > ...... > > > > > > It was working quite well before around October laster year, but > > it has > > > > stopped since then, When a submission is sent via a webpage, the cgi > > > > starts to work and use a memory of ~20 Mb. Then it hangs there, > > finally > > > > the expected email is received but without real results although it > > does > > > > contain something from other parts of the script. Apparently the > > search > > > > sub did not return anything (I know there is something should be > > > > returned.). Is it also possible the format of the NCBI output for > each > > > > result has changed? > > > > Thank you, > > > > Guojun > > > > > > > > Department of Plant Biology > > > > University of Georgia > > > > > > > > > > ----- Original Message ----- > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > How do you know two versions are installed (i.e. how are > you > > checking > > > > the > > > > > version)? Do you see have two complete bioperl distributions (in > > two > > > > > separate directories) or are you looking in modules? Here's the > way > > to > > > > > check the version (from the FAQ): > > > > > > perl -MBio::Root::Version -e 'print > > $Bio::Root::Version::VERSION,"\n"' > > > > > > If you have two full bioperl distributions on your computer, > > normally > > > > only > > > > > one will be in use unless you have explicitly set the environment > > > > variable > > > > > PERL5LIB. The PERL5LIB directories will be searched first before > > your > > > > > normal perl directory list (@INC) is searched. You MAY get some > > mixing > > > > > then, but only if perl can't find a particular module in the path > > > > designated > > > > > in PERL5LIB; then it will progress through the directories listed > in > > > > @INC. > > > > > This may happen if a module is unique to a particular release, but > > > > shouldn't > > > > > happen for the majority of modules, including RemoteBlast. You > can > > > > check > > > > > what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > > differ > > > > > depending on your OS, perl build, etc. > > > > > > Regardless, if you follow the directions for installing bioperl > > for > > > > your > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install', > > unless > > > > you > > > > > explicitly change the installation directory when using 'perl > > > > Makefile.PL'), > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will > > install > > > > the > > > > > Bioperl distribution you downloaded over the old version in @INC. > > See > > > > this > > > > > page: > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > > > > > for more details. > > > > > > Christopher Fields > > > > > Postdoctoral Researcher - Switzer Lab > > > > > Dept. of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > > > > -----Original Message----- > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > Sent: Monday, February 13, 2006 12:32 PM > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > Hi, Chris, > > > > > > I do have different versions of bioperl on my Linux machine > (1.4. > > and > > > > > > 1.5.0), this may be the problem. Should I just install bioperl- > > 1.5.1 > > > > or I > > > > > > need to uninstall and remove the previous versions. I could not > > find > > > > any > > > > > > hint on uninstalling bioperl on linux. Could you please give me > > some > > > > > > suggestion? > > > > > > Thanks, > > > > > > Guojun > > > > > > > > Department of Plant Biology > > > > > > University of Georgia > > > > > > _____ > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely > > > > updated from CVS > > > > > > which isn't the latest fix. > > > > > > > > Make sure that you check the following: > > > > > > > > 1) Always post to the mailing list: > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live > > (CVS) > > > > > > installed first. Perform a clean installation; do not upgrade > > only > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > can't > > > > > > guarantee that mixing modules from old and new distributions > (1.4 > > and > > > > > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > > > > > installation will allow text output from BLAST v.2.2.12 to be > > saved > > > > and > > > > > > parsed; it will not parse the newest BLAST text output from NCBI > > > > (v2.2.13) > > > > > > but it should still save it. I believe as long as next_results() > > isn't > > > > > > called, it will work. > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST > 2.2.13 > > > > text output > > > > > > are NOT in CVS; they haven't been cleared and checked in by > Roger > > Hall > > > > > > (who's now taking care of RemoteBlast) and the powers that be > > (Jason > > > > or > > > > > > whomever is in charge of Bio::SearchIO). They can be found in > > > > Bugzilla: > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > option > > > > of > > > > > > saving XML output, so isn't necessary if you don't plan on using > > this > > > > > > option. And, remember, they haven't been committed yet to CVS, > > which > > > > > > means that the final version will change to refle the new > version. > > > > > > > > > > Christopher Fields > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > Dept. of Biochemistry > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > _____ > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > > > Sent: Monday, February 13, 2006 9:26 AM > > > > > > To: Chris Fields > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > > > Hi, Chris > > > > > > > > Thanks for your suggestion, however, it doesn't seem to work > > for > > > > my cgi > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't > > even > > > > get > > > > > > any RID. Is there any suggestion? > > > > > > > > > > > > Guojun > > > > > > > > > > Guojun Yang > > > > > > Department of Plant Biology > > > > > > University of Georgia > > > > > > Tel: 706-542-1857 > > > > > > Fax: 706-542-1805 > > > > > > http://www.arches.uga.edu/~guojun > > > > > > _____ > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > I would say give the new code a try, but realize that it > > hasn't > > > > been > > > > > > checked > > > > > > in (like I said below). I will try going over the modified > > > > > > Bio::SearchIO::blast again this weekend to see if there is > > anything I > > > > > > might > > > > > > have missed. The changed order in the header of BLAST text > output > > has > > > > me a > > > > > > bit worried that it might not catch everything, but it at least > > > > doesn't > > > > > > hang > > > > > > in the while() loop I described in the bug report below (bug > > #1934) > > > > and > > > > > > seems to process everything fine. > > > > > > > > If you want more stability in the code, you might consider > > > > changing over > > > > > > to > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are > > some > > > > > > changes > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > saving > > > > XML > > > > > > output, but I believe it parses everything regardless. If you > look > > > > back > > > > > > the > > > > > > last month or so there has been a bit of discussion here about > it. > > > > Jason > > > > > > describes a bit on how to set up RemoteBlast for XML: > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > > > remoteblast/ > > > > > > > > Christopher Fields > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > Dept. of Biochemistry > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > -----Original Message----- > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > Sent: Friday, February 03, 2006 1:45 PM > > > > > > > To: bioperl-l at bioperl.org > > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > version > > > > 1.28 > > > > > > > > > > > > > > Hi, Everybody, > > > > > > > I see this post and am wondering if this is the reason for the > > > > > > > malfunctionning of my webserver. We set up a webserver named > > MAK, > > > > for > > > > > > MITE > > > > > > > sequence analysis. It was working very well until around > > November > > > > 2005, > > > > > > > when it stopped returning any result (the site is fine and > seems > > to > > > > be > > > > > > > doing sth after submission). In the CGI script, I used > > remoteblast > > > > (that > > > > > > > work was done in 2003) to do searches. I currently do not have > > > > access to > > > > > > > the server because I moved. Quite several people sent emails > to > > us > > > > about > > > > > > > its malfunctioning. Is there any suggestion on fixing the > > problem? > > > > > > Should > > > > > > > I simplily ask the remoteblast.pm be replaced with the new > > version? > > > > > > > Thanks a lot, > > > > > > > Guojun > > > > > > > > > > > > > > Department of Plant Biology > > > > > > > University of Georgia > > > > > > > Tel: 706-542-1857 > > > > > > > Fax: 706-542-1805 > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > _____ > > > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > Jian' > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > [mailto:bioperl- > > > > > > > l at bioperl.org] > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live > > CVS. > > > > It > > > > > > > will > > > > > > > work for saving text output. However, it will not parse > anything > > > > using > > > > > > > next_result (it will likely hang) and will not save XML > format. > > See > > > > > > these > > > > > > > bugs: > > > > > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > > > > > for explanations and possible fixes (changes to RemoteBlast > and > > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in > > yet > > > > so > > > > > > are > > > > > > > still not included in bioperl-live; they may be further > modified > > > > before > > > > > > > committing to CVS. If you're not worried about XML, you could > > just > > > > try > > > > > > the > > > > > > > first fix, which is a change to SearchIO::blast. > > > > > > > > > > > > > > Nagesh, I remember you posting to the list a month ago using a > > > > script > > > > > > > which > > > > > > > had problems; the script you used saves the output but doesn't > > > > actually > > > > > > > parse it (i.e. you don't use next_result() to go through the > > data). > > > > Is > > > > > > the > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have > you > > > > tried > > > > > > > parsing the output using "-readmethod => SearchIO" or "- > > readmethod > > > > => > > > > > > > blast" > > > > > > > using your version of RemoteBlast and method next_result()? > Like > > > > below > > > > > > > (from > > > > > > > perldoc): > > > > > > > > > > > > > > while ( my @rids = $factory->each_rid ) { > > > > > > > foreach my $rid ( @rids ) { > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > if( !ref($rc) ) { > > > > > > > if( $rc < 0 ) { > > > > > > > $factory->remove_rid($rid); > > > > > > > } > > > > > > > print STDERR "." if ( $v > 0 ); > > > > > > > sleep 5; > > > > > > > } else { # parsing > > > > > > > starts here > > > > > > > my $result = $rc->next_result(); # it should hang > > > > > > > here > > > > > > > #save the output > > > > > > > my $filename = $result->query_name()."\.out"; > > > > > > > $factory->save_output($filename); > > > > > > > $factory->remove_rid($rid); > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > > > while ( my $hit = $result->next_hit ) { > > > > > > > next unless ( $v > 0); > > > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > > > > > > > > My script hanged if I used next_result() in any way prior to > the > > > > fixes. > > > > > > I > > > > > > > want to see how many others are having the same issues with > > parsing > > > > > > using > > > > > > > the CVS version of bioperl-live. > > > > > > > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > l- > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > > > > > To: Huang Jian; bioperl-l > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > Hi Huang, > > > > > > > > Thanks for the message. The older version of RemoteBlast.pm > > works > > > > on > > > > > > the > > > > > > > > logic of checking the temporary file size to determine > whether > > the > > > > > > Blast > > > > > > > > results are ready. This condition is not getting satisfied > may > > be > > > > due > > > > > > to > > > > > > > > some changes brought about by NCBI. I had this problem > > recently > > > > and > > > > > > > > figured out that the solution was to use the latest version > > which > > > > has > > > > > > > > this problem fixed (does not use file size logic any more) > > which > > > > is > > > > > > not > > > > > > > > yet included in the BioPerl package. > > > > > > > > Cheers > > > > > > > > Nagesh > > > > > > > > > > > > > > > > Huang Jian wrote: > > > > > > > > > > > > > > > > > Dear Nagesh, > > > > > > > > > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > you > > > > send > > > > > > > > > me. Now it works perfectly!!! > > > > > > > > > > > > > > > > > > Thank you!! > > > > > > > > > > > > > > > > > > Huang > > > > > > > > > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the > net, > > so > > > > still > > > > > > > > > via email > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Huang, > > > > > > > > >> I see that you are submitting a sequence for a remote > blast > > > > search. > > > > > > > Can > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28 > > > > (2005/12/09). > > > > > > If > > > > > > > > >> not I have attached it with this email, try to replace it > > with > > > > the > > > > > > > old > > > > > > > > >> one which has a bug. > > > > > > > > >> Let me know if it works. > > > > > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Bioperl-l mailing list > > > > > > Bioperl-l at lists.open-bio.org > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From sdavis2 at mail.nih.gov Tue Feb 14 15:02:59 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Tue, 14 Feb 2006 15:02:59 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> Message-ID: You can look get the upstream regions for genes via the table browser at UCSC. If you want to do it yourself, just download their refGene table (as a tab-delimited text file) that includes the HUGO gene name. Then, use the method given by Brian to look up the locations. The genome just isn't THAT big to download and to store locally. Note that most of the big sites (like NCBI, for example) impose restrictions on the number and timing of hits, so utilizing them for high-thoughput analysis (like for gene expression studies) is not always feasible. I have found that having the data locally is almost always better. Sean On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for hooking to > an API at NCBI or EBI so I could get this info and seqs from them. In this > case, speed of retrieval is not critical and I'd rather not download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could script that, > but I thought that bioP had a method for using NCBI's external API's, tho it > may be that my memory is faulty or the approach is no longer supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! From cjfields at uiuc.edu Tue Feb 14 15:32:42 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 14 Feb 2006 14:32:42 -0600 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db Message-ID: <001201c631a5$ce7496f0$15327e82@pyrimidine> Hilmar, Good News: I've added a section to the bioperl wiki on installing bioperl-db in Windows: http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Installing_bioperl -db Bad News: There's a new problem now. I updated from CVS yesterday; I walked through the steps and ran 'nmake test', with everything passing fine. However, load_seqdatabase.pl is extremely slow; it's loading a sequence every 5 minutes or so. I noticed (when using '-debug') that it is hanging up in Bio::DB::BioSQL::SpeciesAdaptor each time. If I create a database, load the biosql schema, and load sequences w/o loading taxonomy, the problem goes away. Here's the debugging output (I cut it off at the point it hangs up): ---------------------------------------------------------------------------- ------------------------- C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -driver mysql -namespace test -dbname biosql -dbuser root -dbpass ********** -format genbank -debug NP_252217.gpt Loading NP_252217.gpt ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor attempting to load adaptor class for Bio::Seq attempting to load module Bio::DB::BioSQL::SeqAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor attempting to load adaptor class for Bio::Species attempting to load module Bio::DB::BioSQL::SpeciesAdaptor instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor attempting to load adaptor class for Bio::Annotation::Collection attempting to load module Bio::DB::BioSQL::CollectionAdaptor attempting to load adaptor class for Bio::Root::Root attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::Root::RootI attempting to load module Bio::DB::BioSQL::RootIAdaptor attempting to load module Bio::DB::BioSQL::RootAdaptor attempting to load adaptor class for Bio::AnnotationCollectionI attempting to load module Bio::DB::BioSQL::AnnotationCollectionIAdaptor attempting to load module Bio::DB::BioSQL::AnnotationCollectionAdaptor instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor attempting to load adaptor class for Bio::Annotation::TypeManager attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for Bio::Annotation::SimpleValue attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor attempting to load adaptor class for Bio::Annotation::Reference attempting to load module Bio::DB::BioSQL::ReferenceAdaptor instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor attempting to load adaptor class for Bio::Annotation::Comment attempting to load module Bio::DB::BioSQL::CommentAdaptor instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor attempting to load adaptor class for Bio::Annotation::DBLink attempting to load module Bio::DB::BioSQL::DBLinkAdaptor instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor attempting to load adaptor class for Bio::PrimarySeq attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor attempting to load adaptor class for Bio::SeqFeature::Generic attempting to load module Bio::DB::BioSQL::GenericAdaptor attempting to load adaptor class for Bio::SeqFeatureI attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor attempting to load adaptor class for Bio::Location::Simple attempting to load module Bio::DB::BioSQL::SimpleAdaptor attempting to load adaptor class for Bio::Location::Atomic attempting to load module Bio::DB::BioSQL::AtomicAdaptor attempting to load adaptor class for Bio::LocationI attempting to load module Bio::DB::BioSQL::LocationIAdaptor attempting to load module Bio::DB::BioSQL::LocationAdaptor instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load adaptor class for BioNamespace attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager attempting to load driver for adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor attempting to load driver for adaptor class Bio::DB::BioSQL::BasePersistenceAdaptor Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer for Bio::DB::BioSQL::BioNamespaceAdaptor preparing UK select statement: SELECT biodatabase.biodatabase_id, biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ? BioNamespaceAdaptor: binding UK column 1 to "test" (namespace) preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES (?, ?) BioNamespaceAdaptor::insert: binding column 1 to "test" (namespace) BioNamespaceAdaptor::insert: binding column 2 to "" (authority) attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for Bio::DB::BioSQL::SpeciesAdaptor preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id = ? SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid) ---------------------------------------------------------------------------- ------------------------- Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Tue Feb 14 16:32:42 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 14 Feb 2006 16:32:42 -0500 Subject: [Bioperl-l] game xml SeqIO In-Reply-To: <43F1043A.2000205@cornell.edu> Message-ID: Robert, It looks like you're right that this data isn't handled by SeqIO/game. If you'd like to add this then feel free to do it, the modified files or patches can be submitted to bugzilla.bioperl.org. If you take this on then please add a test or 2 to t/game.t as well. Yes, Bio::SeqFeature::Computation sounds right - does it match the data you're trying to parse? SeqFeature::Generic is the most commonly used, and it's flexible, but if another type of SeqFeature fits your data more precisely then that's the one you should use. Brian O. On 2/13/06 5:12 PM, "Robert Buels" wrote: > Hi all, > > Currently, the SeqIO for doing GAME XML does not seem to support writing > (or reading?) elements. Am I correct? > > If I am, are there any plans to add this functionality? Can I help / do it? > > If there are plans to add this, how would one distinguish SeqFeatures > that should be rendered as from SeqFeatures > that should be rendered as ? Would we do that with > Bio::SeqFeature::Computation? I assume that a given Seq can have > SeqFeatures of different types associated with it (I don't know, I'm a > bioperl newb). > > Rob From saldroubi at yahoo.com Tue Feb 14 22:54:42 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Tue, 14 Feb 2006 19:54:42 -0800 (PST) Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix Message-ID: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com> All, I am trying to use Bio::Matrix::GenericMatrix module. I simply put this line in my program: use Bio::Matrix::GenericMatrix; but I get the followin error: Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: /usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 /usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18. BEGIN failed--compilation aborted at sf.pl line 18. I found this module using find which is called Generic.pm in this directory /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix Could someone tell me why it is not working. I have no trouble including these modules in my file. use Bio::SeqIO; use Bio::DB::GenBank; Thank you. Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From jason.stajich at duke.edu Tue Feb 14 23:10:56 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 14 Feb 2006 23:10:56 -0500 Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix In-Reply-To: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com> References: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com> Message-ID: try: use Bio::Matrix::Generic; Apparently I screwed up the SYNOPSIS. fixed that just now. -jason On Feb 14, 2006, at 10:54 PM, Sam Al-Droubi wrote: > All, > > I am trying to use Bio::Matrix::GenericMatrix module. > I simply put this line in my program: > use Bio::Matrix::GenericMatrix; > > but I get the followin error: > > Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: / > usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 / > usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/ > perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/ > vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/ > vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18. > BEGIN failed--compilation aborted at sf.pl line 18. > > I found this module using find which is called Generic.pm in this > directory > /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix > > Could someone tell me why it is not working. I have no trouble > including these modules in my file. > use Bio::SeqIO; > use Bio::DB::GenBank; > > Thank you. > > > > Sincerely, > Sam Al-Droubi, M.S. > saldroubi at yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From daniel.lang at biologie.uni-freiburg.de Wed Feb 15 05:35:40 2006 From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang) Date: Wed, 15 Feb 2006 11:35:40 +0100 Subject: [Bioperl-l] distmat matrix Message-ID: <43F303FC.9000806@biologie.uni-freiburg.de> Hi, I need to go through a uncorrected distmat matrix (EMBOSS, run locally) to filter sequences from an MSA. I had a look around and didn't find an obvious candidate. Before I start writing something my own... Is there a bioperl parser for reading distmat matrices or can I trick the Bio::MapIO parsers for scoring or PHYLIP in doing so? If anyone knows of course a tool to generate an uncorrected distance matrix of protein MSAs that is supported by bioperl, would be also OK for me:) I have no experience with the Pise (Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand it it's only to execute the application on a remote web server? Or can I solve my task with Pise? Thanks in advance! Daniel From praveecbt at yahoo.co.in Wed Feb 15 03:57:44 2006 From: praveecbt at yahoo.co.in (Praveen Raj) Date: Wed, 15 Feb 2006 08:57:44 +0000 (GMT) Subject: [Bioperl-l] Help Message-ID: <20060215085744.14911.qmail@web8711.mail.in.yahoo.com> Dear Peter Schattner Sir, I have one problem with the profile_align() of Clustalw object. I have given the code like this, ...... 12 @seq_array=($seqobj1,$seqobj2,$seqobj3); 13 $seq_array_ref=\@seq_array; 14 $aln=$factory->align($seq_array_ref); 15 print $out $aln; # this works fine 16 $sen = Bio::Seq->new(-display_id => '>gi|userdata|', 17 -seq => "MTKKPGGPGKNRA....", 18 -format => "fasta"); 19 $aln=$factory->profile_align($aln,$sen); #problem here 20 print $out1 $aln; I have got one error like this in Line No. 19 ERROR: Could not open sequence file (-profile) No. of seqs. read = -1. No alignment! How I can I solve this problem? Hope you provide a proper solution. Thanking you, Praveen Raj, Project Student, NIV, India. --------------------------------- Jiyo cricket on Yahoo! India cricket Yahoo! Messenger Mobile Stay in touch with your buddies all the time. From jason.stajich at duke.edu Wed Feb 15 08:19:41 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 15 Feb 2006 08:19:41 -0500 Subject: [Bioperl-l] distmat matrix In-Reply-To: <43F303FC.9000806@biologie.uni-freiburg.de> References: <43F303FC.9000806@biologie.uni-freiburg.de> Message-ID: <550C115C-1216-4285-8BE5-EC217C3F1BE9@duke.edu> Bioperl can parse PHYLIP distance matricies, see Bio::Matrix::IO. I didn't write an EMBOSS distmat result parser but that would be nice to have (but check that EMBOSS doesn't already allow output in phylip format first). There is pure-perl distance matrix calculation of a MSA for DNA sequences Bio::Align::DNAStatistics and for protein Bio::Align::ProteinStatistics There is some initial discussion here on the website, but could certainly use some more details. http://bioperl.org/wiki/Phylogenetics http://bioperl.org/wiki/HOWTO:Trees http://bioperl.org/wiki/Module:Bio::Align::DNAStatistics -jason On Feb 15, 2006, at 5:35 AM, Daniel Lang wrote: > Hi, > > I need to go through a uncorrected distmat matrix (EMBOSS, run > locally) > to filter sequences from an MSA. > I had a look around and didn't find an obvious candidate. Before I > start > writing something my own... > Is there a bioperl parser for reading distmat matrices or can I trick > the Bio::MapIO parsers for scoring or PHYLIP in doing so? > If anyone knows of course a tool to generate an uncorrected distance > matrix of protein MSAs that is supported by bioperl, would be also OK > for me:) > > I have no experience with the Pise > (Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand > it it's only to execute the application on a remote web server? Or > can I > solve my task with Pise? > > Thanks in advance! > > Daniel > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From michael.watson at bbsrc.ac.uk Wed Feb 15 10:06:29 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 15 Feb 2006 15:06:29 -0000 Subject: [Bioperl-l] Website issues Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> Hi The links on the left of bioperl.org don't work in konqueror 3.1.1, which is a real b*gger because that's the browser I use on Linux... :-S Mick From rmb32 at cornell.edu Wed Feb 15 11:01:07 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 15 Feb 2006 11:01:07 -0500 Subject: [Bioperl-l] Bio::Tools::GFF parsing error Message-ID: <43F35043.7070705@cornell.edu> Hi all, I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using FeatureIO, except it purports not to support gff 2), and the file looks like: ##gff-version 2 ##date 2006-02-13 ##sequence-region C01HBa0088L02.seq 1 120525 C01HBa0088L02 RepeatMasker similarity 3537 4267 3.3 - . Target "Motif:bac_end_repeat_family_345" 1 740 C01HBa0088L02 RepeatMasker similarity 4172 4279 2.9 + . Target "Motif:HRSiTERT00100141" 1 104 C01HBa0088L02 RepeatMasker similarity 4267 4323 0.0 - . Target "Motif:k_29" 150 206 C01HBa0088L02 RepeatMasker similarity 4322 4492 26.6 + . Target "Motif:PRSiTERT00300001" 1960 2129 C01HBa0088L02 RepeatMasker similarity 4557 5124 29.5 + . Target "Motif:PRSiTERT00300001" 2142 2711 Notice the score column is padded with spaces. Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid score. My question is, who is wrong here, my input file or Bio::Tools::GFF? Should Bio::Tools::GFF be able to read this file? Rob -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 607-255-2360 rmb32 at cornell.edu http://www.sgn.cornell.edu From jason.stajich at duke.edu Wed Feb 15 11:12:59 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 15 Feb 2006 11:12:59 -0500 Subject: [Bioperl-l] Website issues In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> Okay I guess someone will have to look into that. Can you normally browse on wikipedia, we're just using their software, maybe it is a javascript problem? Please send a system bug request to our helpdesk: support at open-bio.org -jason On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote: > Hi > > The links on the left of bioperl.org don't work in konqueror 3.1.1, > which is a real b*gger because that's the browser I use on > Linux... :-S > > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Marc.Logghe at DEVGEN.com Wed Feb 15 11:13:16 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Wed, 15 Feb 2006 17:13:16 +0100 Subject: [Bioperl-l] Bio::Tools::GFF parsing error Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B2E@ANTARESIA.be.devgen.com> Hi Rob, According to the GFF Specifications Document @ http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml : All of the above described fields should be separated by TAB characters ('\t'). All values of the mandatory fields should not include whitespace (i.e. the strings for , and fields). Reading that, I am afraid you have to pre-process your gff input file ... HTH, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Robert Buels > Sent: Wednesday, February 15, 2006 5:01 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Bio::Tools::GFF parsing error > > Hi all, > > I'm parsing a GFF2 file with Bio::Tools::GFF (I would be > using FeatureIO, except it purports not to support gff 2), > and the file looks > like: > > ##gff-version 2 > ##date 2006-02-13 > ##sequence-region C01HBa0088L02.seq 1 120525 > C01HBa0088L02 RepeatMasker similarity 3537 4267 > 3.3 > - . Target "Motif:bac_end_repeat_family_345" 1 740 > C01HBa0088L02 RepeatMasker similarity 4172 4279 > 2.9 > + . Target "Motif:HRSiTERT00100141" 1 104 > C01HBa0088L02 RepeatMasker similarity 4267 4323 > 0.0 > - . Target "Motif:k_29" 150 206 > C01HBa0088L02 RepeatMasker similarity 4322 4492 > 26.6 > + . Target "Motif:PRSiTERT00300001" 1960 2129 > C01HBa0088L02 RepeatMasker similarity 4557 5124 > 29.5 > + . Target "Motif:PRSiTERT00300001" 2142 2711 > > Notice the score column is padded with spaces. > > Bio::Tools::GFF does not like this, and says that ' 3.3' is > not a valid score. My question is, who is wrong here, my > input file or Bio::Tools::GFF? Should Bio::Tools::GFF be > able to read this file? > > Rob > > -- > Robert Buels > SGN Bioinformatics Analyst > 252A Emerson Hall, Cornell University > Ithaca, NY 14853 > Tel: 607-255-2360 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Wed Feb 15 11:29:14 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Wed, 15 Feb 2006 11:29:14 -0500 Subject: [Bioperl-l] Website issues In-Reply-To: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> Message-ID: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE 3.1.4-9) But it works fine for me on 3.2.2-8.FC2 .... So I'm going to go with this being a konqueror bug, sorry to say, but feel free to still report the bug to the helpdesk. -jason On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote: > Okay I guess someone will have to look into that. Can you normally > browse on wikipedia, we're just using their software, maybe it is a > javascript problem? > > Please send a system bug request to our helpdesk: > support at open-bio.org > > -jason > On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote: > >> Hi >> >> The links on the left of bioperl.org don't work in konqueror 3.1.1, >> which is a real b*gger because that's the browser I use on >> Linux... :-S >> >> Mick >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Wed Feb 15 11:57:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Feb 2006 10:57:13 -0600 Subject: [Bioperl-l] Added 'Installing Bioperl for Unix' to wiki Message-ID: <000301c63250$de506120$15327e82@pyrimidine> I added an Installing Bioperl for Unix page, http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix which is a quick redo of the INSTALL text file in the bioperl distribution. It's in workable shape but needs links revisions etc. Please leave any comments on the discussion pages here. http://www.bioperl.org/wiki/Talk:Getting_BioPerl http://www.bioperl.org/wiki/Talk:Installing_Bioperl_for_Unix Thanks to Brian for helping out with the Windows install doc! Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From khoueiry at ibdm.univ-mrs.fr Wed Feb 15 12:23:21 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed, 15 Feb 2006 18:23:21 +0100 Subject: [Bioperl-l] Website issues In-Reply-To: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu> <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> Message-ID: <1140024202.2689.45.camel@localhost> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060215/a69052f0/attachment.ksh From heikki at sanbi.ac.za Wed Feb 15 13:55:07 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 15 Feb 2006 20:55:07 +0200 Subject: [Bioperl-l] Website issues In-Reply-To: <1140024202.2689.45.camel@localhost> References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk> <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu> <1140024202.2689.45.camel@localhost> Message-ID: <200602152055.07667.heikki@sanbi.ac.za> Konqueror 3.5.1. has no problems, either. Clearly, older konqueror had a bug that has been permanently fixed. Michael, time for you to upgrade. -Heikki On Wednesday 15 February 2006 19:23, khoueiry wrote: > I test it on konqueror 3.4.2 and it works well !!! > > On Wed, 2006-02-15 at 11:29 -0500, Jason Stajich wrote: > > I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE > > 3.1.4-9) > > > > But it works fine for me on 3.2.2-8.FC2 .... > > > > So I'm going to go with this being a konqueror bug, sorry to say, but > > feel free to still report the bug to the helpdesk. > > > > -jason > > > > On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote: > > > Okay I guess someone will have to look into that. Can you normally > > > browse on wikipedia, we're just using their software, maybe it is a > > > javascript problem? > > > > > > Please send a system bug request to our helpdesk: > > > support at open-bio.org > > > > > > -jason > > > > > > On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote: > > >> Hi > > >> > > >> The links on the left of bioperl.org don't work in konqueror 3.1.1, > > >> which is a real b*gger because that's the browser I use on > > >> Linux... :-S > > >> > > >> Mick > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > -- > > > Jason Stajich > > > Duke University > > > http://www.duke.edu/~jes12 > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > > Jason Stajich > > Duke University > > http://www.duke.edu/~jes12 > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From gyang at plantbio.uga.edu Wed Feb 15 14:39:41 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Wed, 15 Feb 2006 14:39:41 -0500 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 Message-ID: <20060215143941.54e91487@dogwood.plantbio.uga.edu> Hi, Chris, Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: " waiting........ ------------- EXCEPTION ------------- MSG: no data for midline Features flanking this part of subject sequence: STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc hIO/blast.pm:1172 STACK toplevel remoteblast_test:40 " The query sequence is: CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'): #!/usr/bin/perl use Bio::SeqIO; use Bio::Seq; use Bio::Tools::Run::RemoteBlast; use Bio::SearchIO; use strict; my $prog='blastn'; my $db='nr'; my $e_val=1e-10; my @params=( -prog=>$prog, -data=>$db, -expect=>$e_val, -readmethod=>'SearchIO'); my $factory=Bio::Tools::Run::RemoteBlast->new(@params); my $v = 1; my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); while (my $input = $str->next_seq()){ #Blast a sequence against a database: #Alternatively, you could pass in a file with many #sequences rather than loop through sequence one at a time #Remove the loop starting 'while (my $input = $str->next_seq())' #and swap the two lines below for an example of that. my $r = $factory->submit_blast($input); #my $r = $factory->submit_blast('amino.fa'); print STDERR "waiting..." if( $v > 0 ); while ( my @rids = $factory->each_rid ) { foreach my $rid ( @rids ) { my $rc = $factory->retrieve_blast($rid); if( !ref($rc) ) { if( $rc < 0 ) { $factory->remove_rid($rid); } print STDERR "." if ( $v > 0 ); sleep 5; } else { my $result = $rc->next_result(); #save the output my $filename = $result->query_name()."\.out"; $factory->save_output($filename); $factory->remove_rid($rid); print "\nQuery Name: ", $result->query_name(), "\n"; while ( my $hit = $result->next_hit ) { next unless ( $v > 0); print "\thit name is ", $hit->name, "\n"; while( my $hsp = $hit->next_hsp ) { print "\t\tscore is ", $hsp->score, "\n"; } } } } } } Do you think there might still be something in the NCBI output format? Thank you, Guojun Guojun Yang Department of Plant Biology University of Georgia Tel: 706-542-1857 Fax: 706-542-1805 http://www.arches.uga.edu/~guojun ----- Original Message ----- From: Chris Fields [mailto:cjfields at uiuc.edu] To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > Sorry, forgot to add that I didn't see the regex issue that you mentioned. > It could be a perl-related issue. Try the fixes I mentioned and see what > happens. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > Sent: Tuesday, February 14, 2006 12:36 PM > > To: 'gyang at plantbio.uga.edu' > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > It's a good habit to always add single quotes around words. The perl > > interpreter may think a single bare word is a subroutine or perlfunc > > called with no args so will try to find a subroutine named blastp(). My > > debugger actually gives the error that the bare word blastp may conflict > > with a future reserved word. Like you said, 'use strict' will point that > > out. > > > > As for the regex, it should match all the blast programs at NCBI (blastp, > > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > > else passes through. > > > > So, if you are using the script below, there are several errors. The bare > > words for $prog and $db need quotes, and the flags for you @params array > > don't have a dash before them. I get this after adding quotes but before > > adding the dashes to @params: > > > > C:\Perl\Scripts>test_blast.pl > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: > > STACK: Error::throw > > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > live/Bio/Root/Root.pm:328 > > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > > live/Bio/Tools/Run/RemoteBlast.pm:256 > > STACK: C:\Perl\Scripts\test_blast.pl:15 > > ----------------------------------------------------------- > > > > The last line indicates a problem with this line: > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > Changing the @params to this: > > > > my @params=( -prog=>$prog, > > -data=>$db, > > -expect=>$e_val, > > -readmethod=>'SearchIO'); > > > > fixes it, and I get output as expected. > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > Sent: Tuesday, February 14, 2006 11:48 AM > > > To: Chris Fields; bioperl-l at lists.open-bio.org > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > Hi, Chris, > > > When I tried with the perldoc script, It did not work either. First it > > > says $prog can not be bare word if I "use strict". I added quotes on the > > > words, then it says the value for $prog does not match expression > > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > script > > > is shown below. Why is the expression "t?blast[pnx]"? > > > > > > #!/usr/bin/perl > > > > > > use Bio::SeqIO; > > > use Bio::Seq; > > > use Bio::Tools::Run::RemoteBlast; > > > use Bio::SearchIO; > > > > > > > > > my $prog=blastp; > > > my $db=swissprot; > > > my $e_val=1e-10; > > > my @params=( prog=>$prog, > > > data=>$db, > > > expect=>$e_val, > > > readmethod=>'SearchIO'); > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > my $v = 1; > > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > > > while (my $input = $str->next_seq()){ > > > #Blast a sequence against a database: > > > #Alternatively, you could pass in a file with many > > > #sequences rather than loop through sequence one at a time > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > #and swap the two lines below for an example of that. > > > my $r = $factory->submit_blast($input); > > > #my $r = $factory->submit_blast('amino.fa'); > > > print STDERR "waiting..." if( $v > 0 ); > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { > > > my $result = $rc->next_result(); > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\n"; > > > while( my $hsp = $hit->next_hsp ) { > > > print "\t\tscore is ", $hsp->score, "\n"; > > > } > > > } > > > } > > > } > > > } > > > } > > > > > > Thank you for your help! > > > > > > > > > Guojun > > > Department of Plant Biology > > > University of Georgia > > > > > > ----- Original Message ----- > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > To: gyang at plantbio.uga.edu > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > Try two things: > > > > > 1) Use a much simpler script, like the one in 'perldoc > > > > Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > > > wrong > > > > with the logic in your subroutine: > > > > > my $v = 1; > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > > while (my $input = $str->next_seq()){ > > > > #Blast a sequence against a database: > > > > #Alternatively, you could pass in a file with many > > > > #sequences rather than loop through sequence one at a time > > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > > #and swap the two lines below for an example of that. > > > > my $r = $factory->submit_blast($input); > > > > #my $r = $factory->submit_blast('amino.fa'); > > > > print STDERR "waiting..." if( $v > 0 ); > > > > while ( my @rids = $factory->each_rid ) { > > > > foreach my $rid ( @rids ) { > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if( $rc < 0 ) { > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ( $v > 0 ); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save the output > > > > my $filename = $result->query_name()."\.out"; > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > 2) Try the RemoteBlast from Bugzilla and see if that works. It > > really > > > > shouldn't make that much of a difference, but I noticed that the CVS > > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > > > > released; the Bugzilla version is based off CVS. > > > > > Christopher Fields > > > > Postdoctoral Researcher - Switzer Lab > > > > Dept. of Biochemistry > > > > University of Illinois Urbana-Champaign > > > > > > -----Original Message----- > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > Sent: Monday, February 13, 2006 3:00 PM > > > > > To: bioperl-l at lists.open-bio.org > > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > Thanks, Chris, > > > > > I installed version 1.5.1 and replaced the blast.pm file with the > > one > > > from > > > > > your bug report. The running version is 1.5 when I use the command > > you > > > > > sent me. But when I tried the script, it doesn't change much. My > > > > > remoteblast code (portion) is here: > > > > > > > sub search { > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > > > > local > > > > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > > > > 'no'; > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > > > > -id=>"query", > > > > > -desc=>"new seq"); > > > > > my $len=$query->length(); > > > > > @db=('nr','htgs','wgs'); > > > > > foreach my $db (@db) { > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', > > > > > '-data' =>"$db", > > > > > > '-expect'=>"$E_value"); > > > > > > > > > my $blast_report = $factory->submit_blast($query); > > > > > > > my @rids = $factory->each_rid(); > > > > > foreach my $rid ( @rids ) { > > > > > print STDERR "$rid\n"; > > > > > } > > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > > > > print STDERR "waiting..."; > > > > > sleep 60; > > > > > > > foreach my $rid ( @rids ) { > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > while (!ref($rc) ) { > > > > > if( $rc < 0 ) { > > > > > # retrieve_blast returns -1 on error > > > > > $factory->remove_rid($rid); > > > > > print "Error!\n"; > > > > > send_error($email,$function,$seqname,$queryname[$ST]); > > > > > die "Can't retrieve $rid"; > > > > > } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > finished' > > > > > sleep 60; > > > > > $rc = $factory->retrieve_blast($rid); > > > > > } > > > > > } > > > > > if (ref($rc)) { > > > > > print STDERR "Done.\n"; > > > > > while( my $result = $rc->next_result) { > > > > > while( my $hit = $result->next_hit()) { > > > > > $hit_name=$hit->name; > > > > > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > > > > $name=$1; > > > > > @left_plus_start=(); > > > > > @left_plus_end=(); > > > > > @left_minus_start=(); > > > > > @left_minus_end=(); > > > > > @right_plus_start=(); > > > > > @right_plus_end=(); > > > > > @right_minus_start=(); > > > > > @right_minus_end=(); > > > > > > > if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > > > > while( my $hsp = $hit->next_hsp()) { > > > > > ...... > > > > > > > It was working quite well before around October laster year, but > > > it has > > > > > stopped since then, When a submission is sent via a webpage, the cgi > > > > > starts to work and use a memory of ~20 Mb. Then it hangs there, > > > finally > > > > > the expected email is received but without real results although it > > > does > > > > > contain something from other parts of the script. Apparently the > > > search > > > > > sub did not return anything (I know there is something should be > > > > > returned.). Is it also possible the format of the NCBI output for > > each > > > > > result has changed? > > > > > Thank you, > > > > > Guojun > > > > > > > > > Department of Plant Biology > > > > > University of Georgia > > > > > > > > > > > ----- Original Message ----- > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > How do you know two versions are installed (i.e. how are > > you > > > checking > > > > > the > > > > > > version)? Do you see have two complete bioperl distributions (in > > > two > > > > > > separate directories) or are you looking in modules? Here's the > > way > > > to > > > > > > check the version (from the FAQ): > > > > > > > perl -MBio::Root::Version -e 'print > > > $Bio::Root::Version::VERSION,"\n"' > > > > > > > If you have two full bioperl distributions on your computer, > > > normally > > > > > only > > > > > > one will be in use unless you have explicitly set the environment > > > > > variable > > > > > > PERL5LIB. The PERL5LIB directories will be searched first before > > > your > > > > > > normal perl directory list (@INC) is searched. You MAY get some > > > mixing > > > > > > then, but only if perl can't find a particular module in the path > > > > > designated > > > > > > in PERL5LIB; then it will progress through the directories listed > > in > > > > > @INC. > > > > > > This may happen if a module is unique to a particular release, but > > > > > shouldn't > > > > > > happen for the majority of modules, including RemoteBlast. You > > can > > > > > check > > > > > > what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > > > differ > > > > > > depending on your OS, perl build, etc. > > > > > > > Regardless, if you follow the directions for installing bioperl > > > for > > > > > your > > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install', > > > unless > > > > > you > > > > > > explicitly change the installation directory when using 'perl > > > > > Makefile.PL'), > > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will > > > install > > > > > the > > > > > > Bioperl distribution you downloaded over the old version in @INC. > > > See > > > > > this > > > > > > page: > > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > > > > > > for more details. > > > > > > > Christopher Fields > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > Dept. of Biochemistry > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > -----Original Message----- > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > Sent: Monday, February 13, 2006 12:32 PM > > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > Hi, Chris, > > > > > > > I do have different versions of bioperl on my Linux machine > > (1.4. > > > and > > > > > > > 1.5.0), this may be the problem. Should I just install bioperl- > > > 1.5.1 > > > > > or I > > > > > > > need to uninstall and remove the previous versions. I could not > > > find > > > > > any > > > > > > > hint on uninstalling bioperl on linux. Could you please give me > > > some > > > > > > > suggestion? > > > > > > > Thanks, > > > > > > > Guojun > > > > > > > > > Department of Plant Biology > > > > > > > University of Georgia > > > > > > > _____ > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > > version > > > > > > > 1.28 > > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely > > > > > updated from CVS > > > > > > > which isn't the latest fix. > > > > > > > > > Make sure that you check the following: > > > > > > > > > 1) Always post to the mailing list: > > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live > > > (CVS) > > > > > > > installed first. Perform a clean installation; do not upgrade > > > only > > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > can't > > > > > > > guarantee that mixing modules from old and new distributions > > (1.4 > > > and > > > > > > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > > > > > > installation will allow text output from BLAST v.2.2.12 to be > > > saved > > > > > and > > > > > > > parsed; it will not parse the newest BLAST text output from NCBI > > > > > (v2.2.13) > > > > > > > but it should still save it. I believe as long as next_results() > > > isn't > > > > > > > called, it will work. > > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST > > 2.2.13 > > > > > text output > > > > > > > are NOT in CVS; they haven't been cleared and checked in by > > Roger > > > Hall > > > > > > > (who's now taking care of RemoteBlast) and the powers that be > > > (Jason > > > > > or > > > > > > > whomever is in charge of Bio::SearchIO). They can be found in > > > > > Bugzilla: > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > > option > > > > > of > > > > > > > saving XML output, so isn't necessary if you don't plan on using > > > this > > > > > > > option. And, remember, they haven't been committed yet to CVS, > > > which > > > > > > > means that the final version will change to refle the new > > version. > > > > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > _____ > > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > > > > Sent: Monday, February 13, 2006 9:26 AM > > > > > > > To: Chris Fields > > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > > version > > > > > > > 1.28 > > > > > > > > > > > Hi, Chris > > > > > > > > > Thanks for your suggestion, however, it doesn't seem to work > > > for > > > > > my cgi > > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't > > > even > > > > > get > > > > > > > any RID. Is there any suggestion? > > > > > > > > > > > > > Guojun > > > > > > > > > > > Guojun Yang > > > > > > > Department of Plant Biology > > > > > > > University of Georgia > > > > > > > Tel: 706-542-1857 > > > > > > > Fax: 706-542-1805 > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > _____ > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > > > version > > > > > > > 1.28 > > > > > > > > > I would say give the new code a try, but realize that it > > > hasn't > > > > > been > > > > > > > checked > > > > > > > in (like I said below). I will try going over the modified > > > > > > > Bio::SearchIO::blast again this weekend to see if there is > > > anything I > > > > > > > might > > > > > > > have missed. The changed order in the header of BLAST text > > output > > > has > > > > > me a > > > > > > > bit worried that it might not catch everything, but it at least > > > > > doesn't > > > > > > > hang > > > > > > > in the while() loop I described in the bug report below (bug > > > #1934) > > > > > and > > > > > > > seems to process everything fine. > > > > > > > > > If you want more stability in the code, you might consider > > > > > changing over > > > > > > > to > > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are > > > some > > > > > > > changes > > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > > saving > > > > > XML > > > > > > > output, but I believe it parses everything regardless. If you > > look > > > > > back > > > > > > > the > > > > > > > last month or so there has been a bit of discussion here about > > it. > > > > > Jason > > > > > > > describes a bit on how to set up RemoteBlast for XML: > > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > > > > remoteblast/ > > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > -----Original Message----- > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > > Sent: Friday, February 03, 2006 1:45 PM > > > > > > > > To: bioperl-l at bioperl.org > > > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > > version > > > > > 1.28 > > > > > > > > > > > > > > > > Hi, Everybody, > > > > > > > > I see this post and am wondering if this is the reason for the > > > > > > > > malfunctionning of my webserver. We set up a webserver named > > > MAK, > > > > > for > > > > > > > MITE > > > > > > > > sequence analysis. It was working very well until around > > > November > > > > > 2005, > > > > > > > > when it stopped returning any result (the site is fine and > > seems > > > to > > > > > be > > > > > > > > doing sth after submission). In the CGI script, I used > > > remoteblast > > > > > (that > > > > > > > > work was done in 2003) to do searches. I currently do not have > > > > > access to > > > > > > > > the server because I moved. Quite several people sent emails > > to > > > us > > > > > about > > > > > > > > its malfunctioning. Is there any suggestion on fixing the > > > problem? > > > > > > > Should > > > > > > > > I simplily ask the remoteblast.pm be replaced with the new > > > version? > > > > > > > > Thanks a lot, > > > > > > > > Guojun > > > > > > > > > > > > > > > > Department of Plant Biology > > > > > > > > University of Georgia > > > > > > > > Tel: 706-542-1857 > > > > > > > > Fax: 706-542-1805 > > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > > _____ > > > > > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > > Jian' > > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > [mailto:bioperl- > > > > > > > > l at bioperl.org] > > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live > > > CVS. > > > > > It > > > > > > > > will > > > > > > > > work for saving text output. However, it will not parse > > anything > > > > > using > > > > > > > > next_result (it will likely hang) and will not save XML > > format. > > > See > > > > > > > these > > > > > > > > bugs: > > > > > > > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > > > > > > > for explanations and possible fixes (changes to RemoteBlast > > and > > > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in > > > yet > > > > > so > > > > > > > are > > > > > > > > still not included in bioperl-live; they may be further > > modified > > > > > before > > > > > > > > committing to CVS. If you're not worried about XML, you could > > > just > > > > > try > > > > > > > the > > > > > > > > first fix, which is a change to SearchIO::blast. > > > > > > > > > > > > > > > > Nagesh, I remember you posting to the list a month ago using a > > > > > script > > > > > > > > which > > > > > > > > had problems; the script you used saves the output but doesn't > > > > > actually > > > > > > > > parse it (i.e. you don't use next_result() to go through the > > > data). > > > > > Is > > > > > > > the > > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have > > you > > > > > tried > > > > > > > > parsing the output using "-readmethod => SearchIO" or "- > > > readmethod > > > > > => > > > > > > > > blast" > > > > > > > > using your version of RemoteBlast and method next_result()? > > Like > > > > > below > > > > > > > > (from > > > > > > > > perldoc): > > > > > > > > > > > > > > > > while ( my @rids = $factory->each_rid ) { > > > > > > > > foreach my $rid ( @rids ) { > > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > if( !ref($rc) ) { > > > > > > > > if( $rc < 0 ) { > > > > > > > > $factory->remove_rid($rid); > > > > > > > > } > > > > > > > > print STDERR "." if ( $v > 0 ); > > > > > > > > sleep 5; > > > > > > > > } else { # parsing > > > > > > > > starts here > > > > > > > > my $result = $rc->next_result(); # it should hang > > > > > > > > here > > > > > > > > #save the output > > > > > > > > my $filename = $result->query_name()."\.out"; > > > > > > > > $factory->save_output($filename); > > > > > > > > $factory->remove_rid($rid); > > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > > > > while ( my $hit = $result->next_hit ) { > > > > > > > > next unless ( $v > 0); > > > > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > My script hanged if I used next_result() in any way prior to > > the > > > > > fixes. > > > > > > > I > > > > > > > > want to see how many others are having the same issues with > > > parsing > > > > > > > using > > > > > > > > the CVS version of bioperl-live. > > > > > > > > > > > > > > > > Christopher Fields > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > Dept. of Biochemistry > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > > l- > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > > > > > > To: Huang Jian; bioperl-l > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > > > Hi Huang, > > > > > > > > > Thanks for the message. The older version of RemoteBlast.pm > > > works > > > > > on > > > > > > > the > > > > > > > > > logic of checking the temporary file size to determine > > whether > > > the > > > > > > > Blast > > > > > > > > > results are ready. This condition is not getting satisfied > > may > > > be > > > > > due > > > > > > > to > > > > > > > > > some changes brought about by NCBI. I had this problem > > > recently > > > > > and > > > > > > > > > figured out that the solution was to use the latest version > > > which > > > > > has > > > > > > > > > this problem fixed (does not use file size logic any more) > > > which > > > > > is > > > > > > > not > > > > > > > > > yet included in the BioPerl package. > > > > > > > > > Cheers > > > > > > > > > Nagesh > > > > > > > > > > > > > > > > > > Huang Jian wrote: > > > > > > > > > > > > > > > > > > > Dear Nagesh, > > > > > > > > > > > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > > you > > > > > send > > > > > > > > > > me. Now it works perfectly!!! > > > > > > > > > > > > > > > > > > > > Thank you!! > > > > > > > > > > > > > > > > > > > > Huang > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > > > > > > > > > > > To: "Huang Jian" ; "bioperl-l" > > > > > > > > > > > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the > > net, > > > so > > > > > still > > > > > > > > > > via email > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Huang, > > > > > > > > > >> I see that you are submitting a sequence for a remote > > blast > > > > > search. > > > > > > > > Can > > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28 > > > > > (2005/12/09). > > > > > > > If > > > > > > > > > >> not I have attached it with this email, try to replace it > > > with > > > > > the > > > > > > > > old > > > > > > > > > >> one which has a bug. > > > > > > > > > >> Let me know if it works. > > > > > > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Bioperl-l mailing list > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Bioperl-l mailing list > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > From cjfields at uiuc.edu Wed Feb 15 15:17:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Feb 2006 14:17:27 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28 In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu> Message-ID: <000001c6326c$d72dd640$15327e82@pyrimidine> This looks like a genuine bug and may be something that changed in BLASTN text output; I'm getting it here, too. Running verbose shows that text output is returned, so, from that and from the stack trace it looks like another error in text parsing in Bio::SearchIO::blast. Bio::SearchIO::blast line 1172 throws a conditional exception. I'm adding this to bug 1934 in bugzilla (reference to your email and this response) for now. I'll try messing around with it when I can; I'm really busy this week. I'll also forward this to Roger Hall. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Wednesday, February 15, 2006 1:40 PM > To: Chris Fields; bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] OK for aa seq but not a na seq on > RemoteBlast.pmversion 1.28 > > Hi, Chris, > Finally the remoteblast test script works for the amino.fa query. but when > I try a nucleic acid sequence (see below), Error occurs: > " > waiting........ > ------------- EXCEPTION ------------- > MSG: no data for midline Features flanking this part of subject sequence: > STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc > hIO/blast.pm:1172 > STACK toplevel remoteblast_test:40 > " > The query sequence is: > CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > The script (basically same as the remoteblast test, I only changed > database to 'nr' and program to 'blastn' and filename to 'ost3'): > #!/usr/bin/perl > > use Bio::SeqIO; > use Bio::Seq; > use Bio::Tools::Run::RemoteBlast; > use Bio::SearchIO; > use strict; > my $prog='blastn'; > my $db='nr'; > my $e_val=1e-10; > my @params=( -prog=>$prog, > -data=>$db, > -expect=>$e_val, > -readmethod=>'SearchIO'); > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > my $v = 1; > > my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } > } > > > Do you think there might still be something in the NCBI output format? > > Thank you, > Guojun > > > > > Guojun Yang > Department of Plant Biology > University of Georgia > Tel: 706-542-1857 > Fax: 706-542-1805 > http://www.arches.uga.edu/~guojun > > > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > Sorry, forgot to add that I didn't see the regex issue that you > mentioned. > > It could be a perl-related issue. Try the fixes I mentioned and see > what > > happens. > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > -----Original Message----- > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > Sent: Tuesday, February 14, 2006 12:36 PM > > > To: 'gyang at plantbio.uga.edu' > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > It's a good habit to always add single quotes around words. The > perl > > > interpreter may think a single bare word is a subroutine or perlfunc > > > called with no args so will try to find a subroutine named blastp(). > My > > > debugger actually gives the error that the bare word blastp may > conflict > > > with a future reserved word. Like you said, 'use strict' will point > that > > > out. > > > > > As for the regex, it should match all the blast programs at NCBI > (blastp, > > > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > > > else passes through. > > > > > So, if you are using the script below, there are several errors. > The bare > > > words for $prog and $db need quotes, and the flags for you @params > array > > > don't have a dash before them. I get this after adding quotes but > before > > > adding the dashes to @params: > > > > > C:\Perl\Scripts>test_blast.pl > > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > > MSG: > > > STACK: Error::throw > > > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > > live/Bio/Root/Root.pm:328 > > > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > > > live/Bio/Tools/Run/RemoteBlast.pm:256 > > > STACK: C:\Perl\Scripts\test_blast.pl:15 > > > ----------------------------------------------------------- > > > > > The last line indicates a problem with this line: > > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > Changing the @params to this: > > > > > my @params=( -prog=>$prog, > > > -data=>$db, > > > -expect=>$e_val, > > > -readmethod=>'SearchIO'); > > > > > fixes it, and I get output as expected. > > > > > Christopher Fields > > > Postdoctoral Researcher - Switzer Lab > > > Dept. of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > -----Original Message----- > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > Sent: Tuesday, February 14, 2006 11:48 AM > > > > To: Chris Fields; bioperl-l at lists.open-bio.org > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > > > Hi, Chris, > > > > When I tried with the perldoc script, It did not work either. First > it > > > > says $prog can not be bare word if I "use strict". I added quotes on > the > > > > words, then it says the value for $prog does not match expression > > > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > > script > > > > is shown below. Why is the expression "t?blast[pnx]"? > > > > > > > > #!/usr/bin/perl > > > > > > > > use Bio::SeqIO; > > > > use Bio::Seq; > > > > use Bio::Tools::Run::RemoteBlast; > > > > use Bio::SearchIO; > > > > > > > > > > > > my $prog=blastp; > > > > my $db=swissprot; > > > > my $e_val=1e-10; > > > > my @params=( prog=>$prog, > > > > data=>$db, > > > > expect=>$e_val, > > > > readmethod=>'SearchIO'); > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > > my $v = 1; > > > > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > > > > > > while (my $input = $str->next_seq()){ > > > > #Blast a sequence against a database: > > > > #Alternatively, you could pass in a file with many > > > > #sequences rather than loop through sequence one at a time > > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > > #and swap the two lines below for an example of that. > > > > my $r = $factory->submit_blast($input); > > > > #my $r = $factory->submit_blast('amino.fa'); > > > > print STDERR "waiting..." if( $v > 0 ); > > > > while ( my @rids = $factory->each_rid ) { > > > > foreach my $rid ( @rids ) { > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if( $rc < 0 ) { > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ( $v > 0 ); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save the output > > > > my $filename = $result->query_name()."\.out"; > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > > } > > > > > > > > Thank you for your help! > > > > > > > > > > > > Guojun > > > > Department of Plant Biology > > > > University of Georgia > > > > > > > > ----- Original Message ----- > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > To: gyang at plantbio.uga.edu > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > > > > Try two things: > > > > > > 1) Use a much simpler script, like the one in 'perldoc > > > > > Bio::Tools::Run::RemoteBlast'. If this fixes it, there's > something > > > > wrong > > > > > with the logic in your subroutine: > > > > > > my $v = 1; > > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' > ); > > > > > > while (my $input = $str->next_seq()){ > > > > > #Blast a sequence against a database: > > > > > #Alternatively, you could pass in a file with many > > > > > #sequences rather than loop through sequence one at a time > > > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > > > #and swap the two lines below for an example of that. > > > > > my $r = $factory->submit_blast($input); > > > > > #my $r = $factory->submit_blast('amino.fa'); > > > > > print STDERR "waiting..." if( $v > 0 ); > > > > > while ( my @rids = $factory->each_rid ) { > > > > > foreach my $rid ( @rids ) { > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > if( !ref($rc) ) { > > > > > if( $rc < 0 ) { > > > > > $factory->remove_rid($rid); > > > > > } > > > > > print STDERR "." if ( $v > 0 ); > > > > > sleep 5; > > > > > } else { > > > > > my $result = $rc->next_result(); > > > > > #save the output > > > > > my $filename = $result->query_name()."\.out"; > > > > > $factory->save_output($filename); > > > > > $factory->remove_rid($rid); > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > while ( my $hit = $result->next_hit ) { > > > > > next unless ( $v > 0); > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > } > > > > > } > > > > > } > > > > > } > > > > > } > > > > > } > > > > > > 2) Try the RemoteBlast from Bugzilla and see if that works. It > > > really > > > > > shouldn't make that much of a difference, but I noticed that the > CVS > > > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 > was > > > > > released; the Bugzilla version is based off CVS. > > > > > > Christopher Fields > > > > > Postdoctoral Researcher - Switzer Lab > > > > > Dept. of Biochemistry > > > > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > Sent: Monday, February 13, 2006 3:00 PM > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > Thanks, Chris, > > > > > > I installed version 1.5.1 and replaced the blast.pm file with > the > > > one > > > > from > > > > > > your bug report. The running version is 1.5 when I use the > command > > > you > > > > > > sent me. But when I tried the script, it doesn't change much. My > > > > > > remoteblast code (portion) is here: > > > > > > > > sub search { > > > > > > local > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > > > > > local > $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > > > > > local > > > > > > > > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > > > > > 'no'; > > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > > > > > -id=>"query", > > > > > > -desc=>"new seq"); > > > > > > my $len=$query->length(); > > > > > > @db=('nr','htgs','wgs'); > > > > > > foreach my $db (@db) { > > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' > =>'blastn', > > > > > > '-data' =>"$db", > > > > > > > > '-expect'=>"$E_value"); > > > > > > > > > > my $blast_report = $factory->submit_blast($query); > > > > > > > > my @rids = $factory->each_rid(); > > > > > > foreach my $rid ( @rids ) { > > > > > > print STDERR "$rid\n"; > > > > > > } > > > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > > > > > print STDERR "waiting..."; > > > > > > sleep 60; > > > > > > > > foreach my $rid ( @rids ) { > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > while (!ref($rc) ) { > > > > > > if( $rc < 0 ) { > > > > > > # retrieve_blast returns -1 on error > > > > > > $factory->remove_rid($rid); > > > > > > print "Error!\n"; > > > > > > send_error($email,$function,$seqname,$queryname[$ST]); > > > > > > die "Can't retrieve $rid"; > > > > > > } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > > finished' > > > > > > sleep 60; > > > > > > $rc = $factory->retrieve_blast($rid); > > > > > > } > > > > > > } > > > > > > if (ref($rc)) { > > > > > > print STDERR "Done.\n"; > > > > > > while( my $result = $rc->next_result) { > > > > > > while( my $hit = $result->next_hit()) { > > > > > > $hit_name=$hit->name; > > > > > > $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > > > > > $name=$1; > > > > > > @left_plus_start=(); > > > > > > @left_plus_end=(); > > > > > > @left_minus_start=(); > > > > > > @left_minus_end=(); > > > > > > @right_plus_start=(); > > > > > > @right_plus_end=(); > > > > > > @right_minus_start=(); > > > > > > @right_minus_end=(); > > > > > > > > if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > > > > > while( my $hsp = $hit->next_hsp()) { > > > > > > ...... > > > > > > > > It was working quite well before around October laster year, > but > > > > it has > > > > > > stopped since then, When a submission is sent via a webpage, the > cgi > > > > > > starts to work and use a memory of ~20 Mb. Then it hangs there, > > > > finally > > > > > > the expected email is received but without real results although > it > > > > does > > > > > > contain something from other parts of the script. Apparently the > > > > search > > > > > > sub did not return anything (I know there is something should be > > > > > > returned.). Is it also possible the format of the NCBI output > for > > > each > > > > > > result has changed? > > > > > > Thank you, > > > > > > Guojun > > > > > > > > > > Department of Plant Biology > > > > > > University of Georgia > > > > > > > > > > > > ----- Original Message ----- > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > > How do you know two versions are installed (i.e. how > are > > > you > > > > checking > > > > > > the > > > > > > > version)? Do you see have two complete bioperl distributions > (in > > > > two > > > > > > > separate directories) or are you looking in modules? Here's > the > > > way > > > > to > > > > > > > check the version (from the FAQ): > > > > > > > > perl -MBio::Root::Version -e 'print > > > > $Bio::Root::Version::VERSION,"\n"' > > > > > > > > If you have two full bioperl distributions on your computer, > > > > normally > > > > > > only > > > > > > > one will be in use unless you have explicitly set the > environment > > > > > > variable > > > > > > > PERL5LIB. The PERL5LIB directories will be searched first > before > > > > your > > > > > > > normal perl directory list (@INC) is searched. You MAY get > some > > > > mixing > > > > > > > then, but only if perl can't find a particular module in the > path > > > > > > designated > > > > > > > in PERL5LIB; then it will progress through the directories > listed > > > in > > > > > > @INC. > > > > > > > This may happen if a module is unique to a particular release, > but > > > > > > shouldn't > > > > > > > happen for the majority of modules, including RemoteBlast. > You > > > can > > > > > > check > > > > > > > what @INC and PERL5LIB are set to by using 'perl -V'. @INC > will > > > > differ > > > > > > > depending on your OS, perl build, etc. > > > > > > > > Regardless, if you follow the directions for installing > bioperl > > > > for > > > > > > your > > > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make > install', > > > > unless > > > > > > you > > > > > > > explicitly change the installation directory when using 'perl > > > > > > Makefile.PL'), > > > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will > > > > install > > > > > > the > > > > > > > Bioperl distribution you downloaded over the old version in > @INC. > > > > See > > > > > > this > > > > > > > page: > > > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > > > > > > > for more details. > > > > > > > > Christopher Fields > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > Dept. of Biochemistry > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > -----Original Message----- > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > l- > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > > Sent: Monday, February 13, 2006 12:32 PM > > > > > > > > To: bioperl-l at lists.open-bio.org > > > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > > > > > > > > Hi, Chris, > > > > > > > > I do have different versions of bioperl on my Linux machine > > > (1.4. > > > > and > > > > > > > > 1.5.0), this may be the problem. Should I just install > bioperl- > > > > 1.5.1 > > > > > > or I > > > > > > > > need to uninstall and remove the previous versions. I could > not > > > > find > > > > > > any > > > > > > > > hint on uninstalling bioperl on linux. Could you please give > me > > > > some > > > > > > > > suggestion? > > > > > > > > Thanks, > > > > > > > > Guojun > > > > > > > > > > Department of Plant Biology > > > > > > > > University of Georgia > > > > > > > > _____ > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > > > > > > > Subject: RE: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > > > version > > > > > > > > 1.28 > > > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've > likely > > > > > > updated from CVS > > > > > > > > which isn't the latest fix. > > > > > > > > > > Make sure that you check the following: > > > > > > > > > > 1) Always post to the mailing list: > > > > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl- > live > > > > (CVS) > > > > > > > > installed first. Perform a clean installation; do not > upgrade > > > > only > > > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > > can't > > > > > > > > guarantee that mixing modules from old and new distributions > > > (1.4 > > > > and > > > > > > > > 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl- > live > > > > > > > > installation will allow text output from BLAST v.2.2.12 to > be > > > > saved > > > > > > and > > > > > > > > parsed; it will not parse the newest BLAST text output from > NCBI > > > > > > (v2.2.13) > > > > > > > > but it should still save it. I believe as long as > next_results() > > > > isn't > > > > > > > > called, it will work. > > > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST > > > 2.2.13 > > > > > > text output > > > > > > > > are NOT in CVS; they haven't been cleared and checked in by > > > Roger > > > > Hall > > > > > > > > (who's now taking care of RemoteBlast) and the powers that > be > > > > (Jason > > > > > > or > > > > > > > > whomever is in charge of Bio::SearchIO). They can be found > in > > > > > > Bugzilla: > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow > the > > > > option > > > > > > of > > > > > > > > saving XML output, so isn't necessary if you don't plan on > using > > > > this > > > > > > > > option. And, remember, they haven't been committed yet to > CVS, > > > > which > > > > > > > > means that the final version will change to refle the new > > > version. > > > > > > > > > > > > Christopher Fields > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > Dept. of Biochemistry > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _____ > > > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > > > > > > Sent: Monday, February 13, 2006 9:26 AM > > > > > > > > To: Chris Fields > > > > > > > > Subject: RE: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > > > version > > > > > > > > 1.28 > > > > > > > > > > > > Hi, Chris > > > > > > > > > > Thanks for your suggestion, however, it doesn't seem to > work > > > > for > > > > > > my cgi > > > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I > didn't > > > > even > > > > > > get > > > > > > > > any RID. Is there any suggestion? > > > > > > > > > > > > > > Guojun > > > > > > > > > > > > Guojun Yang > > > > > > > > Department of Plant Biology > > > > > > > > University of Georgia > > > > > > > > Tel: 706-542-1857 > > > > > > > > Fax: 706-542-1805 > > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > > _____ > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > > > > > > > Subject: RE: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > > > version > > > > > > > > 1.28 > > > > > > > > > > I would say give the new code a try, but realize that it > > > > hasn't > > > > > > been > > > > > > > > checked > > > > > > > > in (like I said below). I will try going over the modified > > > > > > > > Bio::SearchIO::blast again this weekend to see if there is > > > > anything I > > > > > > > > might > > > > > > > > have missed. The changed order in the header of BLAST text > > > output > > > > has > > > > > > me a > > > > > > > > bit worried that it might not catch everything, but it at > least > > > > > > doesn't > > > > > > > > hang > > > > > > > > in the while() loop I described in the bug report below (bug > > > > #1934) > > > > > > and > > > > > > > > seems to process everything fine. > > > > > > > > > > If you want more stability in the code, you might > consider > > > > > > changing over > > > > > > > > to > > > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There > are > > > > some > > > > > > > > changes > > > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > > > saving > > > > > > XML > > > > > > > > output, but I believe it parses everything regardless. If > you > > > look > > > > > > back > > > > > > > > the > > > > > > > > last month or so there has been a bit of discussion here > about > > > it. > > > > > > Jason > > > > > > > > describes a bit on how to set up RemoteBlast for XML: > > > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml- > using- > > > > > > remoteblast/ > > > > > > > > > > Christopher Fields > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > Dept. of Biochemistry > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > -----Original Message----- > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l- > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > > > > > > > Sent: Friday, February 03, 2006 1:45 PM > > > > > > > > > To: bioperl-l at bioperl.org > > > > > > > > > Subject: [Bioperl-l] more question regarding > RemoteBlast.pm > > > > version > > > > > > 1.28 > > > > > > > > > > > > > > > > > > Hi, Everybody, > > > > > > > > > I see this post and am wondering if this is the reason for > the > > > > > > > > > malfunctionning of my webserver. We set up a webserver > named > > > > MAK, > > > > > > for > > > > > > > > MITE > > > > > > > > > sequence analysis. It was working very well until around > > > > November > > > > > > 2005, > > > > > > > > > when it stopped returning any result (the site is fine and > > > seems > > > > to > > > > > > be > > > > > > > > > doing sth after submission). In the CGI script, I used > > > > remoteblast > > > > > > (that > > > > > > > > > work was done in 2003) to do searches. I currently do not > have > > > > > > access to > > > > > > > > > the server because I moved. Quite several people sent > emails > > > to > > > > us > > > > > > about > > > > > > > > > its malfunctioning. Is there any suggestion on fixing the > > > > problem? > > > > > > > > Should > > > > > > > > > I simplily ask the remoteblast.pm be replaced with the new > > > > version? > > > > > > > > > Thanks a lot, > > > > > > > > > Guojun > > > > > > > > > > > > > > > > > > Department of Plant Biology > > > > > > > > > University of Georgia > > > > > > > > > Tel: 706-542-1857 > > > > > > > > > Fax: 706-542-1805 > > > > > > > > > http://www.arches.uga.edu/~guojun > > > > > > > > > _____ > > > > > > > > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu] > > > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], > 'Huang > > > > Jian' > > > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > > [mailto:bioperl- > > > > > > > > > l at bioperl.org] > > > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl- > live > > > > CVS. > > > > > > It > > > > > > > > > will > > > > > > > > > work for saving text output. However, it will not parse > > > anything > > > > > > using > > > > > > > > > next_result (it will likely hang) and will not save XML > > > format. > > > > See > > > > > > > > these > > > > > > > > > bugs: > > > > > > > > > > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > > > > > > > > > > > > > > > > for explanations and possible fixes (changes to > RemoteBlast > > > and > > > > > > > > > Bio::SearchIO::blast). Note that these haven't been > checked in > > > > yet > > > > > > so > > > > > > > > are > > > > > > > > > still not included in bioperl-live; they may be further > > > modified > > > > > > before > > > > > > > > > committing to CVS. If you're not worried about XML, you > could > > > > just > > > > > > try > > > > > > > > the > > > > > > > > > first fix, which is a change to SearchIO::blast. > > > > > > > > > > > > > > > > > > Nagesh, I remember you posting to the list a month ago > using a > > > > > > script > > > > > > > > > which > > > > > > > > > had problems; the script you used saves the output but > doesn't > > > > > > actually > > > > > > > > > parse it (i.e. you don't use next_result() to go through > the > > > > data). > > > > > > Is > > > > > > > > the > > > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? > Have > > > you > > > > > > tried > > > > > > > > > parsing the output using "-readmethod => SearchIO" or "- > > > > readmethod > > > > > > => > > > > > > > > > blast" > > > > > > > > > using your version of RemoteBlast and method > next_result()? > > > Like > > > > > > below > > > > > > > > > (from > > > > > > > > > perldoc): > > > > > > > > > > > > > > > > > > while ( my @rids = $factory->each_rid ) { > > > > > > > > > foreach my $rid ( @rids ) { > > > > > > > > > my $rc = $factory->retrieve_blast($rid); > > > > > > > > > if( !ref($rc) ) { > > > > > > > > > if( $rc < 0 ) { > > > > > > > > > $factory->remove_rid($rid); > > > > > > > > > } > > > > > > > > > print STDERR "." if ( $v > 0 ); > > > > > > > > > sleep 5; > > > > > > > > > } else { # parsing > > > > > > > > > starts here > > > > > > > > > my $result = $rc->next_result(); # it should hang > > > > > > > > > here > > > > > > > > > #save the output > > > > > > > > > my $filename = $result->query_name()."\.out"; > > > > > > > > > $factory->save_output($filename); > > > > > > > > > $factory->remove_rid($rid); > > > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > > > > > > while ( my $hit = $result->next_hit ) { > > > > > > > > > next unless ( $v > 0); > > > > > > > > > print "\thit name is ", $hit->name, "\n"; > > > > > > > > > while( my $hsp = $hit->next_hsp ) { > > > > > > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > My script hanged if I used next_result() in any way prior > to > > > the > > > > > > fixes. > > > > > > > > I > > > > > > > > > want to see how many others are having the same issues > with > > > > parsing > > > > > > > > using > > > > > > > > > the CVS version of bioperl-live. > > > > > > > > > > > > > > > > > > Christopher Fields > > > > > > > > > Postdoctoral Researcher - Switzer Lab > > > > > > > > > Dept. of Biochemistry > > > > > > > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > > > > > > > > -----Original Message----- > > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl- > > > l- > > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM > > > > > > > > > > To: Huang Jian; bioperl-l > > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > > > > > > > > > > > > > > > > > > Hi Huang, > > > > > > > > > > Thanks for the message. The older version of > RemoteBlast.pm > > > > works > > > > > > on > > > > > > > > the > > > > > > > > > > logic of checking the temporary file size to determine > > > whether > > > > the > > > > > > > > Blast > > > > > > > > > > results are ready. This condition is not getting > satisfied > > > may > > > > be > > > > > > due > > > > > > > > to > > > > > > > > > > some changes brought about by NCBI. I had this problem > > > > recently > > > > > > and > > > > > > > > > > figured out that the solution was to use the latest > version > > > > which > > > > > > has > > > > > > > > > > this problem fixed (does not use file size logic any > more) > > > > which > > > > > > is > > > > > > > > not > > > > > > > > > > yet included in the BioPerl package. > > > > > > > > > > Cheers > > > > > > > > > > Nagesh > > > > > > > > > > > > > > > > > > > > Huang Jian wrote: > > > > > > > > > > > > > > > > > > > > > Dear Nagesh, > > > > > > > > > > > > > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v > 1.28 > > > > you > > > > > > send > > > > > > > > > > > me. Now it works perfectly!!! > > > > > > > > > > > > > > > > > > > > > > Thank you!! > > > > > > > > > > > > > > > > > > > > > > Huang > > > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka" > > > > > > > > > > > > > > > > > > > > > > To: "Huang Jian" ; > "bioperl-l" > > > > > > > > > > > > > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM > > > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the > > > net, > > > > so > > > > > > still > > > > > > > > > > > via email > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >> Hi Huang, > > > > > > > > > > >> I see that you are submitting a sequence for a remote > > > blast > > > > > > search. > > > > > > > > > Can > > > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28 > > > > > > (2005/12/09). > > > > > > > > If > > > > > > > > > > >> not I have attached it with this email, try to > replace it > > > > with > > > > > > the > > > > > > > > > old > > > > > > > > > > >> one which has a bug. > > > > > > > > > > >> Let me know if it works. > > > > > > > > > > >> Nagesh > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > Bioperl-l mailing list > > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Bioperl-l mailing list > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Bioperl-l mailing list > > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Bioperl-l mailing list > > > > > > > > Bioperl-l at lists.open-bio.org > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Bioperl-l mailing list > > > > > > Bioperl-l at lists.open-bio.org > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Wed Feb 15 19:39:33 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 16 Feb 2006 00:39:33 -0000 Subject: [Bioperl-l] error running load_seqdatabase.pl References: Message-ID: <000c01c63291$5de08600$6601a8c0@WATSON> ----- Original Message ----- From: "Angshu Kar" To: "bioperl-l" Sent: Thursday, December 29, 2005 5:50 PM Subject: [Bioperl-l] error running load_seqdatabase.pl > Hi, > > I'm getting the following error while trying to run : > > ./load_seqdatabase.pl -host localhost -dbname USBA -dbuser > postgres -format > genbank NC_003076.gbk > > But I've a postgreSQL db and not a MySQL one...could anyone please guide > me > troubleshoot this? Angshu, I would probably start with: perldoc load_seqdatabase.pl I think that will likely give you your answer. Again, it is best to exhaust the resources at hand and to let the list know that you have done so (like--"I read the perldoc and tried this...."). Sean From cain at cshl.edu Wed Feb 15 11:07:28 2006 From: cain at cshl.edu (Scott Cain) Date: Wed, 15 Feb 2006 11:07:28 -0500 Subject: [Bioperl-l] Bio::Tools::GFF parsing error In-Reply-To: <43F35043.7070705@cornell.edu> References: <43F35043.7070705@cornell.edu> Message-ID: <1140019648.2849.58.camel@localhost.localdomain> Hi Robert, No column should ever be padded with spaces; GFF columns should always be separated by a single tab. Therefore, I don't thing Bio::Tools::GFF is at fault here. Scott On Wed, 2006-02-15 at 11:01 -0500, Robert Buels wrote: > Hi all, > > I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using > FeatureIO, except it purports not to support gff 2), and the file looks > like: > > ##gff-version 2 > ##date 2006-02-13 > ##sequence-region C01HBa0088L02.seq 1 120525 > C01HBa0088L02 RepeatMasker similarity 3537 4267 3.3 > - . Target "Motif:bac_end_repeat_family_345" 1 740 > C01HBa0088L02 RepeatMasker similarity 4172 4279 2.9 > + . Target "Motif:HRSiTERT00100141" 1 104 > C01HBa0088L02 RepeatMasker similarity 4267 4323 0.0 > - . Target "Motif:k_29" 150 206 > C01HBa0088L02 RepeatMasker similarity 4322 4492 26.6 > + . Target "Motif:PRSiTERT00300001" 1960 2129 > C01HBa0088L02 RepeatMasker similarity 4557 5124 29.5 > + . Target "Motif:PRSiTERT00300001" 2142 2711 > > Notice the score column is padded with spaces. > > Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid > score. My question is, who is wrong here, my input file or > Bio::Tools::GFF? Should Bio::Tools::GFF be able to read this file? > > Rob > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory From hlapp at gmx.net Wed Feb 15 20:54:01 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 15 Feb 2006 17:54:01 -0800 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db In-Reply-To: <001201c631a5$ce7496f0$15327e82@pyrimidine> References: <001201c631a5$ce7496f0$15327e82@pyrimidine> Message-ID: On Feb 14, 2006, at 12:32 PM, Chris Fields wrote: > Hilmar, > > Good News: I've added a section to the bioperl wiki on installing > bioperl-db > in Windows: > > http://www.bioperl.org/wiki/ > Installing_Bioperl_on_Windows#Installing_bioperl > -db > > Bad News: There's a new problem now. I updated from CVS yesterday; I > walked > through the steps and ran 'nmake test', with everything passing fine. > However, load_seqdatabase.pl is extremely slow; it's loading a sequence > every 5 minutes or so. I noticed (when using '-debug') that it is > hanging > up in Bio::DB::BioSQL::SpeciesAdaptor each time. If I create a > database, > load the biosql schema, and load sequences w/o loading taxonomy, the > problem > goes away. > > Here's the debugging output (I cut it off at the point it hangs up): > [...] > preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, > taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE > taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND > ncbi_taxon_id = > ? > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) > SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid) I'm a bit surprised if this is the query where it hangs. Are the indexes all there? There should be a primary key index on taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name over (taxon_id,name,name_class). Also, there should be separate indexes on taxon_name.taxon_id and taxon_name.name. Are they all there? If you reinstantiated the schema from the DDL then it seems unlikely that somehow the indexes have vanished except if you messed with the schema or the DDL. Putting an index on taxon_name.name_class really can't make sense, so let's assume it can't be that. So really I suspect this has something to do with the state of the database and the version of MySQL. In particular, from some 4.x version of MySQL under certain circumstances you have to analyze the statistics of the tables in order to get the optimizer pick up the indexes properly. Are you on MySQL 4.x and if so, have you done that? There's the ANALYZE TABLE command: http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html Note the comment: "This statement works with MyISAM, BDB, and (as of MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher? Also, you can check the execution plan for the query using EXPLAIN. http://dev.mysql.com/doc/refman/4.1/en/explain.html This should show you whether the index would be picked up for the query or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to the db using the mysql shell (mysql). I believe something similarly strange was encountered by someone using DB::GFF (or Chado) under MySQL, and if I recall correctly the solution was to optimize (analyze) the tables. Maybe someone who was in that thread reads this and can comment? -hilmar > > ----------------------------------------------------------------------- > ----- > ------------------------- > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Wed Feb 15 22:56:14 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 15 Feb 2006 21:56:14 -0600 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db In-Reply-To: References: <001201c631a5$ce7496f0$15327e82@pyrimidine> Message-ID: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu> On Feb 15, 2006, at 7:54 PM, Hilmar Lapp wrote: > > On Feb 14, 2006, at 12:32 PM, Chris Fields wrote: > >> Hilmar, >> >> Good News: I've added a section to the bioperl wiki on installing >> bioperl-db >> in Windows: >> >> http://www.bioperl.org/wiki/ >> Installing_Bioperl_on_Windows#Installing_bioperl >> -db >> >> Bad News: There's a new problem now. I updated from CVS yesterday; I >> walked >> through the steps and ran 'nmake test', with everything passing fine. >> However, load_seqdatabase.pl is extremely slow; it's loading a >> sequence >> every 5 minutes or so. I noticed (when using '-debug') that it is >> hanging >> up in Bio::DB::BioSQL::SpeciesAdaptor each time. If I create a >> database, >> load the biosql schema, and load sequences w/o loading taxonomy, the >> problem >> goes away. >> >> Here's the debugging output (I cut it off at the point it hangs up): >> [...] > >> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, >> NULL, >> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name >> WHERE >> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND >> ncbi_taxon_id = >> ? >> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) >> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid) > > I'm a bit surprised if this is the query where it hangs. Are the > indexes all there? There should be a primary key index on > taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on > taxon_name > over (taxon_id,name,name_class). Also, there should be separate > indexes > on taxon_name.taxon_id and taxon_name.name. Are they all there? If you > reinstantiated the schema from the DDL then it seems unlikely that > somehow the indexes have vanished except if you messed with the schema > or the DDL. I looked in the mailing list archives and Barry mentions something here: http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html He rebuilt the database from scratch and got it working; no reason was given. I wouldn't be surprised if it is something Mysql-related that pops up. The strange thing is that only a few months ago everything ran well with this version of MySQL (v.5); this was with the first test database I installed on it. Another strange thing (I think I mentioned it) is that NOT loading the taxonomy with load_ncbi_taxonomy.pl worked (everything was entered). I'll try rebuilding the database from scratch to see what happens. I am running this on Windows, so this is new territory... > Putting an index on taxon_name.name_class really can't make sense, so > let's assume it can't be that. > > So really I suspect this has something to do with the state of the > database and the version of MySQL. In particular, from some 4.x > version > of MySQL under certain circumstances you have to analyze the > statistics > of the tables in order to get the optimizer pick up the indexes > properly. Are you on MySQL 4.x and if so, have you done that? > > There's the ANALYZE TABLE command: > http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html > > Note the comment: "This statement works with MyISAM, BDB, and (as of > MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher? > > Also, you can check the execution plan for the query using EXPLAIN. > http://dev.mysql.com/doc/refman/4.1/en/explain.html > > This should show you whether the index would be picked up for the > query > or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to > the db using the mysql shell (mysql). I'll give these a shot and post what I find in the next few days. > I believe something similarly strange was encountered by someone using > DB::GFF (or Chado) under MySQL, and if I recall correctly the solution > was to optimize (analyze) the tables. Maybe someone who was in that > thread reads this and can comment? > > -hilmar I wanted to also mention that we shouldn't check in the modifications to Bio::Root:Root until I confirm something (I'm at home and currently can't). I tried running a script on an unrelated module using the modified Bio::Root::Roo (with the commas added after the 'throw $class' statements. Everything worked for $self->throw(), except the thrown message wasn't displayed. I'll dig into it a bit more to see what happens. > > >> >> --------------------------------------------------------------------- >> -- >> ----- >> ------------------------- >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Thu Feb 16 00:16:04 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 00:16:04 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> Message-ID: Harry, It's not clear to me that NCBI's eutils offers this capability directly. You can probably download Entrez Gene entries and parse them for coordinates but I know of no way to remotely retrieve genomic sequences like this from NCBI (ENSEMBL API perhaps?). What I had in mind uses the local approach that some of us favor and to prove to myself that this is simple to do I wrote a script that I just added to examples/tools, it's called extract_genes.pl and it's based on Bio::DB::Fasta. Download the sequence files for a given species to some dir, download Entrez Gene's gene2accession file, and run. It creates and stores a hash for lookups, it won't read gene2accession each time it runs. Brian O. On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for hooking to > an API at NCBI or EBI so I could get this info and seqs from them. In this > case, speed of retrieval is not critical and I'd rather not download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could script that, > but I thought that bioP had a method for using NCBI's external API's, tho it > may be that my memory is faulty or the approach is no longer supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! From hlapp at gmx.net Thu Feb 16 01:31:54 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 15 Feb 2006 22:31:54 -0800 Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db In-Reply-To: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu> References: <001201c631a5$ce7496f0$15327e82@pyrimidine> <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu> Message-ID: On Feb 15, 2006, at 7:56 PM, Chris Fields wrote: > [...] > I looked in the mailing list archives and Barry mentions something > here: > > http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html > > He rebuilt the database from scratch and got it working; no reason > was given. I wouldn't be surprised if it is something Mysql-related > that pops up. Note though that he was using PostgreSQL. With Pg you definitely need to 'vacuum,' which is their name for analyzing/optimizing the table(s). > The strange thing is that only a few months ago > everything ran well with this version of MySQL (v.5); this was with > the first test database I installed on it. Another strange thing (I > think I mentioned it) is that NOT loading the taxonomy with > load_ncbi_taxonomy.pl worked (everything was entered). That's not really strange, it is in fact consistent with the query you report as taking a long time. If you don't pre-load the taxonomy then the taxon and taxon_name tables are empty or almost empty and look-ups and joins of empty tables are amazingly fast :-J [...] > I wanted to also mention that we shouldn't check in the modifications > to Bio::Root:Root until I confirm something (I'm at home and > currently can't). OK we'll hold off. -hilmar -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From michael.watson at bbsrc.ac.uk Thu Feb 16 05:31:54 2006 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Thu, 16 Feb 2006 10:31:54 -0000 Subject: [Bioperl-l] CONTIG sequence files from the NCBI Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk> Hi I have two questions really. I fetched bacterial genome sequences from the NCBI using Bio::DB::GenBank. Some of these sequence entries are CONTIG sequences, ie they just point to other sequences that need to be joined together to form the entire genome. Looking at my downloads, it looks as if bioperl has done all the necessary joining for me - or maybe it was the NCBI that did the joining? OK, so firstly, did bioperl do the joining, and if so, are all the co-ordinates of the features updated to reflect their new location on the new, joined sequence? And secondly, sequence versions... I'm thinking that possibly the sequence version of the CONTIG may be 1 (as it hasn't changed) yet the versions of the sequences it refers to might have changed, so when I ask bioperl if these sequences have been updated, I will be told no because the CONTIG sequence version is 1, but I should be told yes because the underlying sequences have...? Make sense? Thanks Mick From cjfields at uiuc.edu Thu Feb 16 07:51:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 06:51:50 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <43F449E1.80605@esat.kuleuven.be> References: <20060215143941.54e91487@dogwood.plantbio.uga.edu> <43F449E1.80605@esat.kuleuven.be> Message-ID: <369C1D1F-DBCB-4161-A24A-7C3E579D337A@uiuc.edu> Yeah, looks like it broke text output nucleotide parsing with that. XML output parsing still works though (as expected). I'll give it a look. Chris On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote: > Hi, > > I have the same problem with the blast.pm-file. > The people of NCBI added some extra info when giving the Blast- > output. (see e.g. "Features flanking this part..." or "Features in > this part ..."), example added. > The blast.pm module starts looking for the hsp-alignement- > information, but it dies when it hits this Feature-information. > > Pieter > > >> gi|77552765|gb|DP000011.1| > query.fcgi? >> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza >> sativa (japonica cultivar-group) chromosome 12, complete > > sequence > Length=27492551 > > Features flanking this part of subject sequence: > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm > sub-class val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts> > 2655 bp at 3' side: hypothetical protein www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 4 GTACTACTCTACTCTACT 21 > |||||||||||||||||| > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > Features flanking this part of subject sequence: > 2991 bp at 5' side: hypothetical protein www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts> > 1131 bp at 3' side: hypothetical protein > val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 2 ATGTACTACTCTACTCTA 19 > |||||||||||||||||| > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > Features in this part of subject sequence: > DHHC zinc finger domain, putative > val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts> > > Score = 34.2 bits (17), Expect = 0.87 > Identities = 17/17 (100%), Gaps = 0/17 (0%) > Strand=Plus/Plus > > Query 5 TACTACTCTACTCTACT 21 > ||||||||||||||||| > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > Features flanking this part of subject sequence: > 102 bp at 5' side: bZIP transcription factor, putative > val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts> > 3740 bp at 3' side: yeast dcp1, putative www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts> > > Score = 32.2 bits (16), Expect = 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Plus > > Query 7 CTACTCTACTCTACTC 22 > |||||||||||||||| > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > Features flanking this part of subject sequence: > > 21 bp at 5' side: peptide transporter T17F3.11, putative www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts> > 10230 bp at 3' side: transposon protein, putative, unclassified > val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts> > > Score = 32.2 bits (16), Expect = 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Minus > > Query 7 CTACTCTACTCTACTC 22 > > |||||||||||||||| > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > Guojun Yang wrote: > >> Hi, Chris, >> Finally the remoteblast test script works for the amino.fa query. >> but when I try a nucleic acid sequence (see below), Error occurs: " >> waiting........ >> ------------- EXCEPTION ------------- >> MSG: no data for midline Features flanking this part of subject >> sequence: >> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/ >> 5.8.3/Bio/Searc hIO/blast.pm:1172 >> STACK toplevel remoteblast_test:40 >> " >> The query sequence is: >> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC >> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA >> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG >> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG >> >> The script (basically same as the remoteblast test, I only changed >> database to 'nr' and program to 'blastn' and filename to 'ost3'): >> #!/usr/bin/perl >> >> use Bio::SeqIO; >> use Bio::Seq; >> use Bio::Tools::Run::RemoteBlast; >> use Bio::SearchIO; >> use strict; >> my $prog='blastn'; >> my $db='nr'; >> my $e_val=1e-10; >> my @params=( -prog=>$prog, >> -data=>$db, >> -expect=>$e_val, >> -readmethod=>'SearchIO'); >> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >> >> my $v = 1; >> >> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); >> >> while (my $input = $str->next_seq()){ >> #Blast a sequence against a database: >> #Alternatively, you could pass in a file with many >> #sequences rather than loop through sequence one at a time >> #Remove the loop starting 'while (my $input = $str->next_seq())' >> #and swap the two lines below for an example of that. >> my $r = $factory->submit_blast($input); >> #my $r = $factory->submit_blast('amino.fa'); >> print STDERR "waiting..." if( $v > 0 ); >> while ( my @rids = $factory->each_rid ) { >> foreach my $rid ( @rids ) { >> my $rc = $factory->retrieve_blast($rid); >> if( !ref($rc) ) { >> if( $rc < 0 ) { >> $factory->remove_rid($rid); >> } >> print STDERR "." if ( $v > 0 ); >> sleep 5; >> } else { >> my $result = $rc->next_result(); >> #save the output >> my $filename = $result->query_name()."\.out"; >> $factory->save_output($filename); >> $factory->remove_rid($rid); >> print "\nQuery Name: ", $result->query_name(), "\n"; >> while ( my $hit = $result->next_hit ) { >> next unless ( $v > 0); >> print "\thit name is ", $hit->name, "\n"; >> while( my $hsp = $hit->next_hsp ) { >> print "\t\tscore is ", $hsp->score, "\n"; >> } >> } >> } >> } >> } >> } >> >> >> Do you think there might still be something in the NCBI output >> format? >> >> Thank you, >> Guojun >> >> >> >> >> Guojun Yang >> Department of Plant Biology >> University of Georgia >> Tel: 706-542-1857 >> Fax: 706-542-1805 >> http://www.arches.uga.edu/~guojun >> >> >> >> ----- Original Message ----- >> From: Chris Fields [mailto:cjfields at uiuc.edu] >> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 >> >> >> >>> Sorry, forgot to add that I didn't see the regex issue that you >>> mentioned. >>> It could be a perl-related issue. Try the fixes I mentioned and >>> see what >>> happens. >>> >>>> Christopher Fields >>>> >>> Postdoctoral Researcher - Switzer Lab >>> Dept. of Biochemistry >>> University of Illinois Urbana-Champaign >>>>>> -----Original Message----- >>>>>> >>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>> Sent: Tuesday, February 14, 2006 12:36 PM >>>> To: 'gyang at plantbio.uga.edu' >>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>>> >>>>>> It's a good habit to always add single quotes around words. >>>>>> The perl >>>>>> >>>> interpreter may think a single bare word is a subroutine or >>>> perlfunc >>>> called with no args so will try to find a subroutine named blastp >>>> (). My >>>> debugger actually gives the error that the bare word blastp may >>>> conflict >>>> with a future reserved word. Like you said, 'use strict' will >>>> point that >>>> out. >>>> >>>>>> As for the regex, it should match all the blast programs at >>>>>> NCBI (blastp, >>>>>> >>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure >>>> nothing >>>> else passes through. >>>> >>>>>> So, if you are using the script below, there are several >>>>>> errors. The bare >>>>>> >>>> words for $prog and $db need quotes, and the flags for you >>>> @params array >>>> don't have a dash before them. I get this after adding quotes >>>> but before >>>> adding the dashes to @params: >>>> >>>>>> C:\Perl\Scripts>test_blast.pl >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>>>>> >>>> MSG: >>>> STACK: Error::throw >>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- >>>> live/Bio/Root/Root.pm:328 >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 >>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl >>>> \bioperl- >>>> live/Bio/Tools/Run/RemoteBlast.pm:256 >>>> STACK: C:\Perl\Scripts\test_blast.pl:15 >>>> ----------------------------------------------------------- >>>> >>>>>> The last line indicates a problem with this line: >>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>>>> Changing the @params to this: >>>>>> my @params=( -prog=>$prog, >>>>>> >>>> -data=>$db, >>>> -expect=>$e_val, >>>> -readmethod=>'SearchIO'); >>>> >>>>>> fixes it, and I get output as expected. >>>>>> Christopher Fields >>>>>> >>>> Postdoctoral Researcher - Switzer Lab >>>> Dept. of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>>>>>>> -----Original Message----- >>>>>>>>> >>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>> Sent: Tuesday, February 14, 2006 11:48 AM >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>>>> >>>>> Hi, Chris, >>>>> When I tried with the perldoc script, It did not work either. >>>>> First it >>>>> says $prog can not be bare word if I "use strict". I added >>>>> quotes on the >>>>> words, then it says the value for $prog does not match expression >>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The >>>>> >>>> script >>>> >>>>> is shown below. Why is the expression "t?blast[pnx]"? >>>>> >>>>> #!/usr/bin/perl >>>>> >>>>> use Bio::SeqIO; >>>>> use Bio::Seq; >>>>> use Bio::Tools::Run::RemoteBlast; >>>>> use Bio::SearchIO; >>>>> >>>>> >>>>> my $prog=blastp; >>>>> my $db=swissprot; >>>>> my $e_val=1e-10; >>>>> my @params=( prog=>$prog, >>>>> data=>$db, >>>>> expect=>$e_val, >>>>> readmethod=>'SearchIO'); >>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>>> >>>>> my $v = 1; >>>>> >>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => >>>>> 'fasta' ); >>>>> >>>>> while (my $input = $str->next_seq()){ >>>>> #Blast a sequence against a database: >>>>> #Alternatively, you could pass in a file with many >>>>> #sequences rather than loop through sequence one at a time >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>> #and swap the two lines below for an example of that. >>>>> my $r = $factory->submit_blast($input); >>>>> #my $r = $factory->submit_blast('amino.fa'); >>>>> print STDERR "waiting..." if( $v > 0 ); >>>>> while ( my @rids = $factory->each_rid ) { >>>>> foreach my $rid ( @rids ) { >>>>> my $rc = $factory->retrieve_blast($rid); >>>>> if( !ref($rc) ) { >>>>> if( $rc < 0 ) { >>>>> $factory->remove_rid($rid); >>>>> } >>>>> print STDERR "." if ( $v > 0 ); >>>>> sleep 5; >>>>> } else { >>>>> my $result = $rc->next_result(); >>>>> #save the output >>>>> my $filename = $result->query_name()."\.out"; >>>>> $factory->save_output($filename); >>>>> $factory->remove_rid($rid); >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>> while ( my $hit = $result->next_hit ) { >>>>> next unless ( $v > 0); >>>>> print "\thit name is ", $hit->name, "\n"; >>>>> while( my $hsp = $hit->next_hsp ) { >>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>> >>>>> Thank you for your help! >>>>> >>>>> >>>>> Guojun >>>>> Department of Plant Biology >>>>> University of Georgia >>>>> >>>>> ----- Original Message ----- >>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>> To: gyang at plantbio.uga.edu >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>> >>>>> >>>>> >>>>>> Try two things: >>>>>> >>>>>>> 1) Use a much simpler script, like the one in 'perldoc >>>>>>> >>>>>> Bio::Tools::Run::RemoteBlast'. If this fixes it, there's >>>>>> something >>>>>> >>>>> wrong >>>>> >>>>>> with the logic in your subroutine: >>>>>> >>>>>>> my $v = 1; >>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => >>>>>>> 'fasta' ); >>>>>>> while (my $input = $str->next_seq()){ >>>>>>> >>>>>> #Blast a sequence against a database: >>>>>> #Alternatively, you could pass in a file with many >>>>>> #sequences rather than loop through sequence one at a time >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>>> #and swap the two lines below for an example of that. >>>>>> my $r = $factory->submit_blast($input); >>>>>> #my $r = $factory->submit_blast('amino.fa'); >>>>>> print STDERR "waiting..." if( $v > 0 ); >>>>>> while ( my @rids = $factory->each_rid ) { >>>>>> foreach my $rid ( @rids ) { >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> if( !ref($rc) ) { >>>>>> if( $rc < 0 ) { >>>>>> $factory->remove_rid($rid); >>>>>> } >>>>>> print STDERR "." if ( $v > 0 ); >>>>>> sleep 5; >>>>>> } else { >>>>>> my $result = $rc->next_result(); >>>>>> #save the output >>>>>> my $filename = $result->query_name()."\.out"; >>>>>> $factory->save_output($filename); >>>>>> $factory->remove_rid($rid); >>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>>> while ( my $hit = $result->next_hit ) { >>>>>> next unless ( $v > 0); >>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> } >>>>>> >>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works. It >>>>>>> >>>> really >>>> >>>>>> shouldn't make that much of a difference, but I noticed that >>>>>> the CVS >>>>>> RemoteBlast (1.28) was changed in Dec 2005, after >>>>>> bioperl-1.5.1 was >>>>>> released; the Bugzilla version is based off CVS. >>>>>> >>>>>>> Christopher Fields >>>>>>> >>>>>> Postdoctoral Researcher - Switzer Lab >>>>>> Dept. of Biochemistry >>>>>> University of Illinois Urbana-Champaign >>>>>> >>>>>>>> -----Original Message----- >>>>>>>> >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>> Sent: Monday, February 13, 2006 3:00 PM >>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>> >>>>>>>>> Thanks, Chris, >>>>>>>>> >>>>>>> I installed version 1.5.1 and replaced the blast.pm file with >>>>>>> the >>>>>>> >>>> one >>>> >>>>> from >>>>> >>>>>>> your bug report. The running version is 1.5 when I use the >>>>>>> command >>>>>>> >>>> you >>>> >>>>>>> sent me. But when I tried the script, it doesn't change much. My >>>>>>> remoteblast code (portion) is here: >>>>>>> >>>>>>>>> sub search { >>>>>>>>> >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} >>>>>>> ="$ORGN"; >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} >>>>>>> =5000; >>>>>>> local >>>>>>> >>>>>>> >>>> $Bio::Tools::Run::RemoteBlast::HEADER >>>> {'COMPOSITION_BASED_STATISTICS'}= >>>> >>>>>>> 'no'; >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; >>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]", >>>>>>> -id=>"query", >>>>>>> -desc=>"new seq"); >>>>>>> my $len=$query->length(); >>>>>>> @db=('nr','htgs','wgs'); >>>>>>> foreach my $db (@db) { >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' >>>>>>> =>'blastn', >>>>>>> '-data' =>"$db", >>>>>>> >>>>>>> >>> '-expect'=>"$E_value"); >>> >>>>>>>>>>> my $blast_report = $factory->submit_blast($query); >>>>>>>>>>> >>>>>>>>> my @rids = $factory->each_rid(); >>>>>>>>> >>>>>>> foreach my $rid ( @rids ) { >>>>>>> print STDERR "$rid\n"; >>>>>>> } >>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638) >>>>>>> print STDERR "waiting..."; >>>>>>> sleep 60; >>>>>>> >>>>>>>>> foreach my $rid ( @rids ) { >>>>>>>>> >>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>> while (!ref($rc) ) { >>>>>>> if( $rc < 0 ) { >>>>>>> # retrieve_blast returns -1 on error >>>>>>> $factory->remove_rid($rid); >>>>>>> print "Error!\n"; >>>>>>> send_error($email,$function,$seqname,$queryname[$ST]); >>>>>>> die "Can't retrieve $rid"; >>>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not >>>>>>> >>>> finished' >>>> >>>>>>> sleep 60; >>>>>>> $rc = $factory->retrieve_blast($rid); >>>>>>> } >>>>>>> } >>>>>>> if (ref($rc)) { >>>>>>> print STDERR "Done.\n"; >>>>>>> while( my $result = $rc->next_result) { >>>>>>> while( my $hit = $result->next_hit()) { >>>>>>> $hit_name=$hit->name; >>>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; >>>>>>> $name=$1; >>>>>>> @left_plus_start=(); >>>>>>> @left_plus_end=(); >>>>>>> @left_minus_start=(); >>>>>>> @left_minus_end=(); >>>>>>> @right_plus_start=(); >>>>>>> @right_plus_end=(); >>>>>>> @right_minus_start=(); >>>>>>> @right_minus_end=(); >>>>>>> >>>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { >>>>>>>>> >>>>>>> while( my $hsp = $hit->next_hsp()) { >>>>>>> ...... >>>>>>> >>>>>>>>> It was working quite well before around October laster >>>>>>>>> year, but >>>>>>>>> >>>>> it has >>>>> >>>>>>> stopped since then, When a submission is sent via a webpage, >>>>>>> the cgi >>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there, >>>>>>> >>>>> finally >>>>> >>>>>>> the expected email is received but without real results >>>>>>> although it >>>>>>> >>>>> does >>>>> >>>>>>> contain something from other parts of the script. Apparently the >>>>>>> >>>>> search >>>>> >>>>>>> sub did not return anything (I know there is something should be >>>>>>> returned.). Is it also possible the format of the NCBI output >>>>>>> for >>>>>>> >>>> each >>>> >>>>>>> result has changed? >>>>>>> Thank you, >>>>>>> Guojun >>>>>>> >>>>>>>>>>> Department of Plant Biology >>>>>>>>>>> >>>>>>> University of Georgia >>>>>>> >>>>>>>>>>>>> ----- Original Message ----- >>>>>>>>>>>>> >>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>> >>>>>>>>>>>> How do you know two versions are installed (i.e. how are >>>>>>>>>>>> >>>> you >>>> >>>>> checking >>>>> >>>>>>> the >>>>>>> >>>>>>>> version)? Do you see have two complete bioperl >>>>>>>> distributions (in >>>>>>>> >>>>> two >>>>> >>>>>>>> separate directories) or are you looking in modules? Here's >>>>>>>> the >>>>>>>> >>>> way >>>> >>>>> to >>>>> >>>>>>>> check the version (from the FAQ): >>>>>>>> >>>>>>>>> perl -MBio::Root::Version -e 'print >>>>>>>>> >>>>> $Bio::Root::Version::VERSION,"\n"' >>>>> >>>>>>>>> If you have two full bioperl distributions on your computer, >>>>>>>>> >>>>> normally >>>>> >>>>>>> only >>>>>>> >>>>>>>> one will be in use unless you have explicitly set the >>>>>>>> environment >>>>>>>> >>>>>>> variable >>>>>>> >>>>>>>> PERL5LIB. The PERL5LIB directories will be searched first >>>>>>>> before >>>>>>>> >>>>> your >>>>> >>>>>>>> normal perl directory list (@INC) is searched. You MAY get >>>>>>>> some >>>>>>>> >>>>> mixing >>>>> >>>>>>>> then, but only if perl can't find a particular module in the >>>>>>>> path >>>>>>>> >>>>>>> designated >>>>>>> >>>>>>>> in PERL5LIB; then it will progress through the directories >>>>>>>> listed >>>>>>>> >>>> in >>>> >>>>>>> @INC. >>>>>>> >>>>>>>> This may happen if a module is unique to a particular >>>>>>>> release, but >>>>>>>> >>>>>>> shouldn't >>>>>>> >>>>>>>> happen for the majority of modules, including RemoteBlast. You >>>>>>>> >>>> can >>>> >>>>>>> check >>>>>>> >>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'. @INC >>>>>>>> will >>>>>>>> >>>>> differ >>>>> >>>>>>>> depending on your OS, perl build, etc. >>>>>>>> >>>>>>>>> Regardless, if you follow the directions for installing >>>>>>>>> bioperl >>>>>>>>> >>>>> for >>>>> >>>>>>> your >>>>>>> >>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make >>>>>>>> install', >>>>>>>> >>>>> unless >>>>> >>>>>>> you >>>>>>> >>>>>>>> explicitly change the installation directory when using 'perl >>>>>>>> >>>>>>> Makefile.PL'), >>>>>>> >>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will >>>>>>>> >>>>> install >>>>> >>>>>>> the >>>>>>> >>>>>>>> Bioperl distribution you downloaded over the old version in >>>>>>>> @INC. >>>>>>>> >>>>> See >>>>> >>>>>>> this >>>>>>> >>>>>>>> page: >>>>>>>> >>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL >>>>>>>>> for more details. >>>>>>>>> Christopher Fields >>>>>>>>> >>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>> Dept. of Biochemistry >>>>>>>> University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM >>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>>>> >>>>>>>>>>> Hi, Chris, >>>>>>>>>>> >>>>>>>>> I do have different versions of bioperl on my Linux machine >>>>>>>>> >>>> (1.4. >>>> >>>>> and >>>>> >>>>>>>>> 1.5.0), this may be the problem. Should I just install >>>>>>>>> bioperl- >>>>>>>>> >>>>> 1.5.1 >>>>> >>>>>>> or I >>>>>>> >>>>>>>>> need to uninstall and remove the previous versions. I could >>>>>>>>> not >>>>>>>>> >>>>> find >>>>> >>>>>>> any >>>>>>> >>>>>>>>> hint on uninstalling bioperl on linux. Could you please >>>>>>>>> give me >>>>>>>>> >>>>> some >>>>> >>>>>>>>> suggestion? >>>>>>>>> Thanks, >>>>>>>>> Guojun >>>>>>>>> >>>>>>>>>>> Department of Plant Biology >>>>>>>>>>> >>>>>>>>> University of Georgia >>>>>>>>> _____ >>>>>>>>> >>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>>> >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500 >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding >>>>>>>>> RemoteBlast.pm >>>>>>>>> >>>>>>> version >>>>>>> >>>>>>>>> 1.28 >>>>>>>>> >>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely >>>>>>>>>>>>>>> >>>>>>> updated from CVS >>>>>>> >>>>>>>>> which isn't the latest fix. >>>>>>>>> >>>>>>>>>>> Make sure that you check the following: >>>>>>>>>>> 1) Always post to the mailing list: >>>>>>>>>>> >>>>>>>>> http://www.bioperl.org/wiki/ >>>>>>>>> HOWTO:Beginners#Getting_Assistance . >>>>>>>>> >>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live >>>>>>>>>>> >>>>> (CVS) >>>>> >>>>>>>>> installed first. Perform a clean installation; do not upgrade >>>>>>>>> >>>>> only >>>>> >>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we >>>>>>>>> >>>> can't >>>> >>>>>>>>> guarantee that mixing modules from old and new distributions >>>>>>>>> >>>> (1.4 >>>> >>>>> and >>>>> >>>>>>>>> 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl- >>>>>>>>> live >>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be >>>>>>>>> >>>>> saved >>>>> >>>>>>> and >>>>>>> >>>>>>>>> parsed; it will not parse the newest BLAST text output from >>>>>>>>> NCBI >>>>>>>>> >>>>>>> (v2.2.13) >>>>>>> >>>>>>>>> but it should still save it. I believe as long as >>>>>>>>> next_results() >>>>>>>>> >>>>> isn't >>>>> >>>>>>>>> called, it will work. >>>>>>>>> >>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST >>>>>>>>>>> >>>> 2.2.13 >>>> >>>>>>> text output >>>>>>> >>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by >>>>>>>>> >>>> Roger >>>> >>>>> Hall >>>>> >>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be >>>>>>>>> >>>>> (Jason >>>>> >>>>>>> or >>>>>>> >>>>>>>>> whomever is in charge of Bio::SearchIO). They can be found in >>>>>>>>> >>>>>>> Bugzilla: >>>>>>> >>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>>> >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>>> >>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the >>>>>>>>>>> >>>>> option >>>>> >>>>>>> of >>>>>>> >>>>>>>>> saving XML output, so isn't necessary if you don't plan on >>>>>>>>> using >>>>>>>>> >>>>> this >>>>> >>>>>>>>> option. And, remember, they haven't been committed yet to >>>>>>>>> CVS, >>>>>>>>> >>>>> which >>>>> >>>>>>>>> means that the final version will change to refle the new >>>>>>>>> >>>> version. >>>> >>>>>>>>>>>>> Christopher Fields >>>>>>>>>>>>> >>>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>>> Dept. of Biochemistry >>>>>>>>> University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>>>>>> _____ >>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>>>>>>>>>> >>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM >>>>>>>>> To: Chris Fields >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding >>>>>>>>> RemoteBlast.pm >>>>>>>>> >>>>>>> version >>>>>>> >>>>>>>>> 1.28 >>>>>>>>> >>>>>>>>>>>>> Hi, Chris >>>>>>>>>>>>> >>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work >>>>>>>>>>> >>>>> for >>>>> >>>>>>> my cgi >>>>>>> >>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I >>>>>>>>> didn't >>>>>>>>> >>>>> even >>>>> >>>>>>> get >>>>>>> >>>>>>>>> any RID. Is there any suggestion? >>>>>>>>> >>>>>>>>>>>>>>> Guojun >>>>>>>>>>>>>>> >>>>>>>>>>>>> Guojun Yang >>>>>>>>>>>>> >>>>>>>>> Department of Plant Biology >>>>>>>>> University of Georgia >>>>>>>>> Tel: 706-542-1857 >>>>>>>>> Fax: 706-542-1805 >>>>>>>>> http://www.arches.uga.edu/~guojun >>>>>>>>> _____ >>>>>>>>> >>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>>>>> >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org >>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500 >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding >>>>>>>>> RemoteBlast.pm >>>>>>>>> >>>>>>> version >>>>>>> >>>>>>>>> 1.28 >>>>>>>>> >>>>>>>>>>> I would say give the new code a try, but realize that it >>>>>>>>>>> >>>>> hasn't >>>>> >>>>>>> been >>>>>>> >>>>>>>>> checked >>>>>>>>> in (like I said below). I will try going over the modified >>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is >>>>>>>>> >>>>> anything I >>>>> >>>>>>>>> might >>>>>>>>> have missed. The changed order in the header of BLAST text >>>>>>>>> >>>> output >>>> >>>>> has >>>>> >>>>>>> me a >>>>>>> >>>>>>>>> bit worried that it might not catch everything, but it at >>>>>>>>> least >>>>>>>>> >>>>>>> doesn't >>>>>>> >>>>>>>>> hang >>>>>>>>> in the while() loop I described in the bug report below (bug >>>>>>>>> >>>>> #1934) >>>>> >>>>>>> and >>>>>>> >>>>>>>>> seems to process everything fine. >>>>>>>>> >>>>>>>>>>> If you want more stability in the code, you might consider >>>>>>>>>>> >>>>>>> changing over >>>>>>> >>>>>>>>> to >>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are >>>>>>>>> >>>>> some >>>>> >>>>>>>>> changes >>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate >>>>>>>>> >>>>> saving >>>>> >>>>>>> XML >>>>>>> >>>>>>>>> output, but I believe it parses everything regardless. If you >>>>>>>>> >>>> look >>>> >>>>>>> back >>>>>>> >>>>>>>>> the >>>>>>>>> last month or so there has been a bit of discussion here about >>>>>>>>> >>>> it. >>>> >>>>>>> Jason >>>>>>> >>>>>>>>> describes a bit on how to set up RemoteBlast for XML: >>>>>>>>> >>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using- >>>>>>>>>>> >>>>>>> remoteblast/ >>>>>>> >>>>>>>>>>> Christopher Fields >>>>>>>>>>> >>>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>>> Dept. of Biochemistry >>>>>>>>> University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM >>>>>>>>>> To: bioperl-l at bioperl.org >>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>>>> >>>>> version >>>>> >>>>>>> 1.28 >>>>>>> >>>>>>>>>> Hi, Everybody, >>>>>>>>>> I see this post and am wondering if this is the reason for >>>>>>>>>> the >>>>>>>>>> malfunctionning of my webserver. We set up a webserver named >>>>>>>>>> >>>>> MAK, >>>>> >>>>>>> for >>>>>>> >>>>>>>>> MITE >>>>>>>>> >>>>>>>>>> sequence analysis. It was working very well until around >>>>>>>>>> >>>>> November >>>>> >>>>>>> 2005, >>>>>>> >>>>>>>>>> when it stopped returning any result (the site is fine and >>>>>>>>>> >>>> seems >>>> >>>>> to >>>>> >>>>>>> be >>>>>>> >>>>>>>>>> doing sth after submission). In the CGI script, I used >>>>>>>>>> >>>>> remoteblast >>>>> >>>>>>> (that >>>>>>> >>>>>>>>>> work was done in 2003) to do searches. I currently do not >>>>>>>>>> have >>>>>>>>>> >>>>>>> access to >>>>>>> >>>>>>>>>> the server because I moved. Quite several people sent emails >>>>>>>>>> >>>> to >>>> >>>>> us >>>>> >>>>>>> about >>>>>>> >>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the >>>>>>>>>> >>>>> problem? >>>>> >>>>>>>>> Should >>>>>>>>> >>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new >>>>>>>>>> >>>>> version? >>>>> >>>>>>>>>> Thanks a lot, >>>>>>>>>> Guojun >>>>>>>>>> >>>>>>>>>> Department of Plant Biology >>>>>>>>>> University of Georgia >>>>>>>>>> Tel: 706-542-1857 >>>>>>>>>> Fax: 706-542-1805 >>>>>>>>>> http://www.arches.uga.edu/~guojun >>>>>>>>>> _____ >>>>>>>>>> >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang >>>>>>>>>> >>>>> Jian' >>>>> >>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' >>>>>>>>>> >>>> [mailto:bioperl- >>>> >>>>>>>>>> l at bioperl.org] >>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500 >>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>>> >>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl- >>>>>>>>>> live >>>>>>>>>> >>>>> CVS. >>>>> >>>>>>> It >>>>>>> >>>>>>>>>> will >>>>>>>>>> work for saving text output. However, it will not parse >>>>>>>>>> >>>> anything >>>> >>>>>>> using >>>>>>> >>>>>>>>>> next_result (it will likely hang) and will not save XML >>>>>>>>>> >>>> format. >>>> >>>>> See >>>>> >>>>>>>>> these >>>>>>>>> >>>>>>>>>> bugs: >>>>>>>>>> >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>>>> >>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast >>>>>>>>>> >>>> and >>>> >>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been >>>>>>>>>> checked in >>>>>>>>>> >>>>> yet >>>>> >>>>>>> so >>>>>>> >>>>>>>>> are >>>>>>>>> >>>>>>>>>> still not included in bioperl-live; they may be further >>>>>>>>>> >>>> modified >>>> >>>>>>> before >>>>>>> >>>>>>>>>> committing to CVS. If you're not worried about XML, you could >>>>>>>>>> >>>>> just >>>>> >>>>>>> try >>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>>> first fix, which is a change to SearchIO::blast. >>>>>>>>>> >>>>>>>>>> Nagesh, I remember you posting to the list a month ago >>>>>>>>>> using a >>>>>>>>>> >>>>>>> script >>>>>>> >>>>>>>>>> which >>>>>>>>>> had problems; the script you used saves the output but >>>>>>>>>> doesn't >>>>>>>>>> >>>>>>> actually >>>>>>> >>>>>>>>>> parse it (i.e. you don't use next_result() to go through the >>>>>>>>>> >>>>> data). >>>>> >>>>>>> Is >>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have >>>>>>>>>> >>>> you >>>> >>>>>>> tried >>>>>>> >>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "- >>>>>>>>>> >>>>> readmethod >>>>> >>>>>>> => >>>>>>> >>>>>>>>>> blast" >>>>>>>>>> using your version of RemoteBlast and method next_result()? >>>>>>>>>> >>>> Like >>>> >>>>>>> below >>>>>>> >>>>>>>>>> (from >>>>>>>>>> perldoc): >>>>>>>>>> >>>>>>>>>> while ( my @rids = $factory->each_rid ) { >>>>>>>>>> foreach my $rid ( @rids ) { >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>>>>>> if( !ref($rc) ) { >>>>>>>>>> if( $rc < 0 ) { >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> } >>>>>>>>>> print STDERR "." if ( $v > 0 ); >>>>>>>>>> sleep 5; >>>>>>>>>> } else { # parsing >>>>>>>>>> starts here >>>>>>>>>> my $result = $rc->next_result(); # it should hang >>>>>>>>>> here >>>>>>>>>> #save the output >>>>>>>>>> my $filename = $result->query_name()."\.out"; >>>>>>>>>> $factory->save_output($filename); >>>>>>>>>> $factory->remove_rid($rid); >>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>>>>>>> while ( my $hit = $result->next_hit ) { >>>>>>>>>> next unless ( $v > 0); >>>>>>>>>> print "\thit name is ", $hit->name, "\n"; >>>>>>>>>> while( my $hsp = $hit->next_hsp ) { >>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> My script hanged if I used next_result() in any way prior to >>>>>>>>>> >>>> the >>>> >>>>>>> fixes. >>>>>>> >>>>>>>>> I >>>>>>>>> >>>>>>>>>> want to see how many others are having the same issues with >>>>>>>>>> >>>>> parsing >>>>> >>>>>>>>> using >>>>>>>>> >>>>>>>>>> the CVS version of bioperl-live. >>>>>>>>>> >>>>>>>>>> Christopher Fields >>>>>>>>>> Postdoctoral Researcher - Switzer Lab >>>>>>>>>> Dept. of Biochemistry >>>>>>>>>> University of Illinois Urbana-Champaign >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> -----Original Message----- >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- >>>>>>>>>>> >>>> l- >>>> >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka >>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM >>>>>>>>>>> To: Huang Jian; bioperl-l >>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>>>> >>>>>>>>>>> Hi Huang, >>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm >>>>>>>>>>> >>>>> works >>>>> >>>>>>> on >>>>>>> >>>>>>>>> the >>>>>>>>> >>>>>>>>>>> logic of checking the temporary file size to determine >>>>>>>>>>> >>>> whether >>>> >>>>> the >>>>> >>>>>>>>> Blast >>>>>>>>> >>>>>>>>>>> results are ready. This condition is not getting satisfied >>>>>>>>>>> >>>> may >>>> >>>>> be >>>>> >>>>>>> due >>>>>>> >>>>>>>>> to >>>>>>>>> >>>>>>>>>>> some changes brought about by NCBI. I had this problem >>>>>>>>>>> >>>>> recently >>>>> >>>>>>> and >>>>>>> >>>>>>>>>>> figured out that the solution was to use the latest version >>>>>>>>>>> >>>>> which >>>>> >>>>>>> has >>>>>>> >>>>>>>>>>> this problem fixed (does not use file size logic any more) >>>>>>>>>>> >>>>> which >>>>> >>>>>>> is >>>>>>> >>>>>>>>> not >>>>>>>>> >>>>>>>>>>> yet included in the BioPerl package. >>>>>>>>>>> Cheers >>>>>>>>>>> Nagesh >>>>>>>>>>> >>>>>>>>>>> Huang Jian wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Dear Nagesh, >>>>>>>>>>>> >>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 >>>>>>>>>>>> >>>>> you >>>>> >>>>>>> send >>>>>>> >>>>>>>>>>>> me. Now it works perfectly!!! >>>>>>>>>>>> >>>>>>>>>>>> Thank you!! >>>>>>>>>>>> >>>>>>>>>>>> Huang >>>>>>>>>>>> >>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka" >>>>>>>>>>>> >>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l" >>>>>>>>>>>> >>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM >>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the >>>>>>>>>>>> >>>> net, >>>> >>>>> so >>>>> >>>>>>> still >>>>>>> >>>>>>>>>>>> via email >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Hi Huang, >>>>>>>>>>>>> I see that you are submitting a sequence for a remote >>>>>>>>>>>>> >>>> blast >>>> >>>>>>> search. >>>>>>> >>>>>>>>>> Can >>>>>>>>>> >>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28 >>>>>>>>>>>>> >>>>>>> (2005/12/09). >>>>>>> >>>>>>>>> If >>>>>>>>> >>>>>>>>>>>>> not I have attached it with this email, try to replace it >>>>>>>>>>>>> >>>>> with >>>>> >>>>>>> the >>>>>>> >>>>>>>>>> old >>>>>>>>>> >>>>>>>>>>>>> one which has a bug. >>>>>>>>>>>>> Let me know if it works. >>>>>>>>>>>>> Nagesh >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Feb 16 07:52:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 06:52:31 -0600 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: References: Message-ID: I think a method was recently implemented in Bio::DB::GenBank to retrieve a segment of DNA given start and end coordinates in GenBank format; that should contain the features you need. I requested it ~Nov-Dec in the mailing list but didn't get a chance to test it. Would that help? On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > Harry, > > It's not clear to me that NCBI's eutils offers this capability > directly. You > can probably download Entrez Gene entries and parse them for > coordinates but > I know of no way to remotely retrieve genomic sequences like this > from NCBI > (ENSEMBL API perhaps?). What I had in mind uses the local approach > that some > of us favor and to prove to myself that this is simple to do I wrote a > script that I just added to examples/tools, it's called > extract_genes.pl and > it's based on Bio::DB::Fasta. Download the sequence files for a given > species to some dir, download Entrez Gene's gene2accession file, > and run. It > creates and stores a hash for lookups, it won't read gene2accession > each > time it runs. > > Brian O. > > > On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > >> Hi Brian, >> >> Thanks very much for the pointers and the speed of your reply and >> apologies >> for the speed of mine. >> >> This looks good, but what I was looking for was a bioP approach >> for hooking to >> an API at NCBI or EBI so I could get this info and seqs from >> them. In this >> case, speed of retrieval is not critical and I'd rather not >> download the >> entirety of the sequences to a local disk to hack at them. >> >> I've determined a screen-scraping approach to get them and could >> script that, >> but I thought that bioP had a method for using NCBI's external >> API's, tho it >> may be that my memory is faulty or the approach is no longer >> supported due to >> overload. >> >> Does NCBI make such APIs available anymore? I searched a bit for >> docs on them >> but couldn't find anything (unless it's buried in the NCBI tookit, >> which I >> haven't started to excavate). >> >> Failing that, would SEALS provide such a service? Any PerlPinipeds >> listening? >> >> Harry >> >> >> >> >> >> >> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>> Harry, >>> >>> Hope you're doing well. The approach could be based on >>> Bio::DB::Fasta. So, >>> from its documentation: >>> >>> use Bio::DB::Fasta; >>> >>> # create database from directory of fasta files >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>> >>> # simple access (for those without Bioperl) >>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>> my @ids = $db->ids; >>> my $length = $db->length('CHROMOSOME_I'); >>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>> my $header = $db->header('CHROMOSOME_I'); >>> >>> # Bioperl-style access >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>> >>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>> my $seq = $obj->seq; >>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>> >>> Do you already have the offsets? >>> >>> Brian O. >>> >>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>> Hi All, >>>> >>>> After perusing the tutorial and other docs for a an evening, I >>>> still >>>> can't find the answer to this. Forgive me if I've missed something >>>> obvious. >>>> >>>> This should not be a novel request, but I've not found it >>>> answered. If >>>> bioperl isn't the best way to do this, I'd be grateful to a >>>> pointer to a >>>> better way, especially if it includes an illuminating bit of code. >>>> >>>> The problem is to retrieve genomic sequences plus & minus some >>>> offset >>>> from a locus determined by HUGO keyword or GeneID. This would be a >>>> common followup chore for some extra analysis from a gene >>>> expression >>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>> the >>>> sequence type to specify...? >>>> >>>> >>>> TIA! > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From anst at kvl.dk Thu Feb 16 04:24:51 2006 From: anst at kvl.dk (Anders Stegmann) Date: Thu, 16 Feb 2006 10:24:51 +0100 Subject: [Bioperl-l] searchIO bug? Message-ID: <43F452F30200009B00000EC9@gwia.kvl.dk> Hi! I am blasting a protein seq against an identical protein. I am trying to parse the protein header by using the query_description method in the SearchIO module. After using the query_description method I use split / / in order to easily access the different header components. Here I discover that the query_description method is somehow introducing a space between number 5 comma and the following chromosome position number in the exon chromosome position list!? This truncates the list of exon chromosome positions from 7 to 4, later yielding a wrong number of the introns counted. Is this a bug? Attached is: testblast1.pl: the blastprogram to run. Q0045 the seq that is used as both query and database seq. (Q0045 has to be formated in order to be used as a database: formatdb -i Q0045 -p T -o F) Regards Anders. -------------- next part -------------- A non-text attachment was scrubbed... Name: blastp5.pl Type: application/octet-stream Size: 50384 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/c1dd1ff5/attachment-0002.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045 Type: application/octet-stream Size: 873 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/c1dd1ff5/attachment-0003.obj From anst at kvl.dk Thu Feb 16 05:20:06 2006 From: anst at kvl.dk (Anders Stegmann) Date: Thu, 16 Feb 2006 11:20:06 +0100 Subject: [Bioperl-l] another searchIO bug? Message-ID: <43F45FE60200009B00000ED6@gwia.kvl.dk> Hi! I am blasting a protein seq (query) against an identical seq with a deletion of Aa nr 61 (subject). Then I print out the type of nomatch Aa and its position. The nomatch for the query seq is Aa G at position 61, which is correct. The nomatch for the subject seq is V at position 60, which is definitely not correct!? Is this a bug? testblast2.pl is the program to run Q0045 is the query seq. Q0045del61 is the subject seq (it has to be formated: formatdb -i Q0045del61 -p T -o F). Regards Anders. -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045 Type: application/octet-stream Size: 873 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/5062b2cb/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: testblast2.pl Type: application/octet-stream Size: 6109 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/5062b2cb/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045del61 Type: application/octet-stream Size: 872 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060216/5062b2cb/attachment-0002.obj From mcoyne at channing.harvard.edu Wed Feb 15 16:20:17 2006 From: mcoyne at channing.harvard.edu (Michael Coyne) Date: Wed, 15 Feb 2006 16:20:17 -0500 Subject: [Bioperl-l] Primer maps? Message-ID: <6.2.0.14.0.20060215155422.01d44a98@localhost> An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060215/c777b31d/attachment-0001.html From Pieter.Monsieurs at esat.kuleuven.be Thu Feb 16 04:46:09 2006 From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs) Date: Thu, 16 Feb 2006 10:46:09 +0100 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu> References: <20060215143941.54e91487@dogwood.plantbio.uga.edu> Message-ID: <43F449E1.80605@esat.kuleuven.be> Hi, I have the same problem with the blast.pm-file. The people of NCBI added some extra info when giving the Blast-output. (see e.g. "Features flanking this part..." or "Features in this part ..."), example added. The blast.pm module starts looking for the hsp-alignement-information, but it dies when it hits this Feature-information. Pieter >gi|77552765|gb|DP000011.1| Oryza sativa (japonica cultivar-group) chromosome 12, complete sequence Length=27492551 Features flanking this part of subject sequence: 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class 2655 bp at 3' side: hypothetical protein Score = 36.2 bits (18), Expect = 0.22 Identities = 18/18 (100%), Gaps = 0/18 (0%) Strand=Plus/Minus Query 4 GTACTACTCTACTCTACT 21 |||||||||||||||||| Sbjct 19257436 GTACTACTCTACTCTACT 19257419 Features flanking this part of subject sequence: 2991 bp at 5' side: hypothetical protein 1131 bp at 3' side: hypothetical protein Score = 36.2 bits (18), Expect = 0.22 Identities = 18/18 (100%), Gaps = 0/18 (0%) Strand=Plus/Minus Query 2 ATGTACTACTCTACTCTA 19 |||||||||||||||||| Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 Features in this part of subject sequence: DHHC zinc finger domain, putative Score = 34.2 bits (17), Expect = 0.87 Identities = 17/17 (100%), Gaps = 0/17 (0%) Strand=Plus/Plus Query 5 TACTACTCTACTCTACT 21 ||||||||||||||||| Sbjct 17616437 TACTACTCTACTCTACT 17616453 Features flanking this part of subject sequence: 102 bp at 5' side: bZIP transcription factor, putative 3740 bp at 3' side: yeast dcp1, putative Score = 32.2 bits (16), Expect = 3.4 Identities = 16/16 (100%), Gaps = 0/16 (0%) Strand=Plus/Plus Query 7 CTACTCTACTCTACTC 22 |||||||||||||||| Sbjct 2775880 CTACTCTACTCTACTC 2775895 Features flanking this part of subject sequence: 21 bp at 5' side: peptide transporter T17F3.11, putative 10230 bp at 3' side: transposon protein, putative, unclassified Score = 32.2 bits (16), Expect = 3.4 Identities = 16/16 (100%), Gaps = 0/16 (0%) Strand=Plus/Minus Query 7 CTACTCTACTCTACTC 22 |||||||||||||||| Sbjct 27323153 CTACTCTACTCTACTC 27323138 Guojun Yang wrote: >Hi, Chris, >Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: >" >waiting........ >------------- EXCEPTION ------------- >MSG: no data for midline Features flanking this part of subject sequence: >STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc hIO/blast.pm:1172 >STACK toplevel remoteblast_test:40 >" >The query sequence is: >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > >The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'): >#!/usr/bin/perl > >use Bio::SeqIO; >use Bio::Seq; >use Bio::Tools::Run::RemoteBlast; >use Bio::SearchIO; >use strict; >my $prog='blastn'; >my $db='nr'; >my $e_val=1e-10; >my @params=( -prog=>$prog, > -data=>$db, > -expect=>$e_val, > -readmethod=>'SearchIO'); >my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > >my $v = 1; > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > >while (my $input = $str->next_seq()){ > #Blast a sequence against a database: > #Alternatively, you could pass in a file with many > #sequences rather than loop through sequence one at a time > #Remove the loop starting 'while (my $input = $str->next_seq())' > #and swap the two lines below for an example of that. > my $r = $factory->submit_blast($input); > #my $r = $factory->submit_blast('amino.fa'); > print STDERR "waiting..." if( $v > 0 ); > while ( my @rids = $factory->each_rid ) { > foreach my $rid ( @rids ) { > my $rc = $factory->retrieve_blast($rid); > if( !ref($rc) ) { > if( $rc < 0 ) { > $factory->remove_rid($rid); > } > print STDERR "." if ( $v > 0 ); > sleep 5; > } else { > my $result = $rc->next_result(); > #save the output > my $filename = $result->query_name()."\.out"; > $factory->save_output($filename); > $factory->remove_rid($rid); > print "\nQuery Name: ", $result->query_name(), "\n"; > while ( my $hit = $result->next_hit ) { > next unless ( $v > 0); > print "\thit name is ", $hit->name, "\n"; > while( my $hsp = $hit->next_hsp ) { > print "\t\tscore is ", $hsp->score, "\n"; > } > } > } > } > } >} > > >Do you think there might still be something in the NCBI output format? > >Thank you, >Guojun > > > > >Guojun Yang >Department of Plant Biology >University of Georgia >Tel: 706-542-1857 >Fax: 706-542-1805 >http://www.arches.uga.edu/~guojun > > > >----- Original Message ----- >From: Chris Fields [mailto:cjfields at uiuc.edu] >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > >>Sorry, forgot to add that I didn't see the regex issue that you mentioned. >>It could be a perl-related issue. Try the fixes I mentioned and see what >>happens. >> >> >>>Christopher Fields >>> >>> >>Postdoctoral Researcher - Switzer Lab >>Dept. of Biochemistry >>University of Illinois Urbana-Champaign >> >> >>>>>-----Original Message----- >>>>> >>>>> >>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>Sent: Tuesday, February 14, 2006 12:36 PM >>>To: 'gyang at plantbio.uga.edu' >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>> >>> >>>>>It's a good habit to always add single quotes around words. The perl >>>>> >>>>> >>>interpreter may think a single bare word is a subroutine or perlfunc >>>called with no args so will try to find a subroutine named blastp(). My >>>debugger actually gives the error that the bare word blastp may conflict >>>with a future reserved word. Like you said, 'use strict' will point that >>>out. >>> >>> >>>>>As for the regex, it should match all the blast programs at NCBI (blastp, >>>>> >>>>> >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing >>>else passes through. >>> >>> >>>>>So, if you are using the script below, there are several errors. The bare >>>>> >>>>> >>>words for $prog and $db need quotes, and the flags for you @params array >>>don't have a dash before them. I get this after adding quotes but before >>>adding the dashes to @params: >>> >>> >>>>>C:\Perl\Scripts>test_blast.pl >>>>>------------- EXCEPTION: Bio::Root::Exception ------------- >>>>> >>>>> >>>MSG: >>>STACK: Error::throw >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- >>>live/Bio/Root/Root.pm:328 >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- >>>live/Bio/Tools/Run/RemoteBlast.pm:256 >>>STACK: C:\Perl\Scripts\test_blast.pl:15 >>>----------------------------------------------------------- >>> >>> >>>>>The last line indicates a problem with this line: >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>>>Changing the @params to this: >>>>>my @params=( -prog=>$prog, >>>>> >>>>> >>> -data=>$db, >>> -expect=>$e_val, >>> -readmethod=>'SearchIO'); >>> >>> >>>>>fixes it, and I get output as expected. >>>>>Christopher Fields >>>>> >>>>> >>>Postdoctoral Researcher - Switzer Lab >>>Dept. of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>>>>>>>-----Original Message----- >>>>>>>> >>>>>>>> >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>Sent: Tuesday, February 14, 2006 11:48 AM >>>>To: Chris Fields; bioperl-l at lists.open-bio.org >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 >>>> >>>>Hi, Chris, >>>>When I tried with the perldoc script, It did not work either. First it >>>>says $prog can not be bare word if I "use strict". I added quotes on the >>>>words, then it says the value for $prog does not match expression >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The >>>> >>>> >>>script >>> >>> >>>>is shown below. Why is the expression "t?blast[pnx]"? >>>> >>>>#!/usr/bin/perl >>>> >>>>use Bio::SeqIO; >>>>use Bio::Seq; >>>>use Bio::Tools::Run::RemoteBlast; >>>>use Bio::SearchIO; >>>> >>>> >>>>my $prog=blastp; >>>>my $db=swissprot; >>>>my $e_val=1e-10; >>>>my @params=( prog=>$prog, >>>> data=>$db, >>>> expect=>$e_val, >>>> readmethod=>'SearchIO'); >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); >>>> >>>>my $v = 1; >>>> >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); >>>> >>>>while (my $input = $str->next_seq()){ >>>> #Blast a sequence against a database: >>>> #Alternatively, you could pass in a file with many >>>> #sequences rather than loop through sequence one at a time >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>> #and swap the two lines below for an example of that. >>>> my $r = $factory->submit_blast($input); >>>> #my $r = $factory->submit_blast('amino.fa'); >>>> print STDERR "waiting..." if( $v > 0 ); >>>> while ( my @rids = $factory->each_rid ) { >>>> foreach my $rid ( @rids ) { >>>> my $rc = $factory->retrieve_blast($rid); >>>> if( !ref($rc) ) { >>>> if( $rc < 0 ) { >>>> $factory->remove_rid($rid); >>>> } >>>> print STDERR "." if ( $v > 0 ); >>>> sleep 5; >>>> } else { >>>> my $result = $rc->next_result(); >>>> #save the output >>>> my $filename = $result->query_name()."\.out"; >>>> $factory->save_output($filename); >>>> $factory->remove_rid($rid); >>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>> while ( my $hit = $result->next_hit ) { >>>> next unless ( $v > 0); >>>> print "\thit name is ", $hit->name, "\n"; >>>> while( my $hsp = $hit->next_hsp ) { >>>> print "\t\tscore is ", $hsp->score, "\n"; >>>> } >>>> } >>>> } >>>> } >>>> } >>>>} >>>> >>>>Thank you for your help! >>>> >>>> >>>>Guojun >>>>Department of Plant Biology >>>>University of Georgia >>>> >>>>----- Original Message ----- >>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>To: gyang at plantbio.uga.edu >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>> >>>> >>>> >>>> >>>>>Try two things: >>>>> >>>>> >>>>>>1) Use a much simpler script, like the one in 'perldoc >>>>>> >>>>>> >>>>>Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something >>>>> >>>>> >>>>wrong >>>> >>>> >>>>>with the logic in your subroutine: >>>>> >>>>> >>>>>>my $v = 1; >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); >>>>>>while (my $input = $str->next_seq()){ >>>>>> >>>>>> >>>>> #Blast a sequence against a database: >>>>> #Alternatively, you could pass in a file with many >>>>> #sequences rather than loop through sequence one at a time >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' >>>>> #and swap the two lines below for an example of that. >>>>> my $r = $factory->submit_blast($input); >>>>> #my $r = $factory->submit_blast('amino.fa'); >>>>> print STDERR "waiting..." if( $v > 0 ); >>>>> while ( my @rids = $factory->each_rid ) { >>>>> foreach my $rid ( @rids ) { >>>>> my $rc = $factory->retrieve_blast($rid); >>>>> if( !ref($rc) ) { >>>>> if( $rc < 0 ) { >>>>> $factory->remove_rid($rid); >>>>> } >>>>> print STDERR "." if ( $v > 0 ); >>>>> sleep 5; >>>>> } else { >>>>> my $result = $rc->next_result(); >>>>> #save the output >>>>> my $filename = $result->query_name()."\.out"; >>>>> $factory->save_output($filename); >>>>> $factory->remove_rid($rid); >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; >>>>> while ( my $hit = $result->next_hit ) { >>>>> next unless ( $v > 0); >>>>> print "\thit name is ", $hit->name, "\n"; >>>>> while( my $hsp = $hit->next_hsp ) { >>>>> print "\t\tscore is ", $hsp->score, "\n"; >>>>> } >>>>> } >>>>> } >>>>> } >>>>> } >>>>>} >>>>> >>>>> >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works. It >>>>>> >>>>>> >>>really >>> >>> >>>>>shouldn't make that much of a difference, but I noticed that the CVS >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was >>>>>released; the Bugzilla version is based off CVS. >>>>> >>>>> >>>>>>Christopher Fields >>>>>> >>>>>> >>>>>Postdoctoral Researcher - Switzer Lab >>>>>Dept. of Biochemistry >>>>>University of Illinois Urbana-Champaign >>>>> >>>>> >>>>>>>-----Original Message----- >>>>>>> >>>>>>> >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>Sent: Monday, February 13, 2006 3:00 PM >>>>>>To: bioperl-l at lists.open-bio.org >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>> >>>>>> >>>>>>>>Thanks, Chris, >>>>>>>> >>>>>>>> >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the >>>>>> >>>>>> >>>one >>> >>> >>>>from >>>> >>>> >>>>>>your bug report. The running version is 1.5 when I use the command >>>>>> >>>>>> >>>you >>> >>> >>>>>>sent me. But when I tried the script, it doesn't change much. My >>>>>>remoteblast code (portion) is here: >>>>>> >>>>>> >>>>>>>>sub search { >>>>>>>> >>>>>>>> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; >>>>>>local >>>>>> >>>>>> >>>>>> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= >>> >>> >>>>>>'no'; >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]", >>>>>> -id=>"query", >>>>>> -desc=>"new seq"); >>>>>>my $len=$query->length(); >>>>>>@db=('nr','htgs','wgs'); >>>>>>foreach my $db (@db) { >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', >>>>>> '-data' =>"$db", >>>>>> >>>>>> >>>>>> >>'-expect'=>"$E_value"); >> >> >>>>>>>>>>my $blast_report = $factory->submit_blast($query); >>>>>>>>>> >>>>>>>>>> >>>>>>>>my @rids = $factory->each_rid(); >>>>>>>> >>>>>>>> >>>>>>foreach my $rid ( @rids ) { >>>>>> print STDERR "$rid\n"; >>>>>>} >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638) >>>>>>print STDERR "waiting..."; >>>>>>sleep 60; >>>>>> >>>>>> >>>>>>>>foreach my $rid ( @rids ) { >>>>>>>> >>>>>>>> >>>>>> my $rc = $factory->retrieve_blast($rid); >>>>>> while (!ref($rc) ) { >>>>>> if( $rc < 0 ) { >>>>>># retrieve_blast returns -1 on error >>>>>> $factory->remove_rid($rid); >>>>>> print "Error!\n"; >>>>>> send_error($email,$function,$seqname,$queryname[$ST]); >>>>>> die "Can't retrieve $rid"; >>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not >>>>>> >>>>>> >>>finished' >>> >>> >>>>>> sleep 60; >>>>>> $rc = $factory->retrieve_blast($rid); >>>>>> } >>>>>> } >>>>>> if (ref($rc)) { >>>>>> print STDERR "Done.\n"; >>>>>> while( my $result = $rc->next_result) { >>>>>> while( my $hit = $result->next_hit()) { >>>>>> $hit_name=$hit->name; >>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; >>>>>> $name=$1; >>>>>> @left_plus_start=(); >>>>>> @left_plus_end=(); >>>>>> @left_minus_start=(); >>>>>> @left_minus_end=(); >>>>>> @right_plus_start=(); >>>>>> @right_plus_end=(); >>>>>> @right_minus_start=(); >>>>>> @right_minus_end=(); >>>>>> >>>>>> >>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { >>>>>>>> >>>>>>>> >>>>>> while( my $hsp = $hit->next_hsp()) { >>>>>>...... >>>>>> >>>>>> >>>>>>>>It was working quite well before around October laster year, but >>>>>>>> >>>>>>>> >>>>it has >>>> >>>> >>>>>>stopped since then, When a submission is sent via a webpage, the cgi >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there, >>>>>> >>>>>> >>>>finally >>>> >>>> >>>>>>the expected email is received but without real results although it >>>>>> >>>>>> >>>>does >>>> >>>> >>>>>>contain something from other parts of the script. Apparently the >>>>>> >>>>>> >>>>search >>>> >>>> >>>>>>sub did not return anything (I know there is something should be >>>>>>returned.). Is it also possible the format of the NCBI output for >>>>>> >>>>>> >>>each >>> >>> >>>>>>result has changed? >>>>>>Thank you, >>>>>>Guojun >>>>>> >>>>>> >>>>>>>>>>Department of Plant Biology >>>>>>>>>> >>>>>>>>>> >>>>>>University of Georgia >>>>>> >>>>>> >>>>>>>>>>>>----- Original Message ----- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>> >>>>>> >>>>>>>>>>>How do you know two versions are installed (i.e. how are >>>>>>>>>>> >>>>>>>>>>> >>>you >>> >>> >>>>checking >>>> >>>> >>>>>>the >>>>>> >>>>>> >>>>>>>version)? Do you see have two complete bioperl distributions (in >>>>>>> >>>>>>> >>>>two >>>> >>>> >>>>>>>separate directories) or are you looking in modules? Here's the >>>>>>> >>>>>>> >>>way >>> >>> >>>>to >>>> >>>> >>>>>>>check the version (from the FAQ): >>>>>>> >>>>>>> >>>>>>>>perl -MBio::Root::Version -e 'print >>>>>>>> >>>>>>>> >>>>$Bio::Root::Version::VERSION,"\n"' >>>> >>>> >>>>>>>>If you have two full bioperl distributions on your computer, >>>>>>>> >>>>>>>> >>>>normally >>>> >>>> >>>>>>only >>>>>> >>>>>> >>>>>>>one will be in use unless you have explicitly set the environment >>>>>>> >>>>>>> >>>>>>variable >>>>>> >>>>>> >>>>>>>PERL5LIB. The PERL5LIB directories will be searched first before >>>>>>> >>>>>>> >>>>your >>>> >>>> >>>>>>>normal perl directory list (@INC) is searched. You MAY get some >>>>>>> >>>>>>> >>>>mixing >>>> >>>> >>>>>>>then, but only if perl can't find a particular module in the path >>>>>>> >>>>>>> >>>>>>designated >>>>>> >>>>>> >>>>>>>in PERL5LIB; then it will progress through the directories listed >>>>>>> >>>>>>> >>>in >>> >>> >>>>>>@INC. >>>>>> >>>>>> >>>>>>>This may happen if a module is unique to a particular release, but >>>>>>> >>>>>>> >>>>>>shouldn't >>>>>> >>>>>> >>>>>>>happen for the majority of modules, including RemoteBlast. You >>>>>>> >>>>>>> >>>can >>> >>> >>>>>>check >>>>>> >>>>>> >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'. @INC will >>>>>>> >>>>>>> >>>>differ >>>> >>>> >>>>>>>depending on your OS, perl build, etc. >>>>>>> >>>>>>> >>>>>>>>Regardless, if you follow the directions for installing bioperl >>>>>>>> >>>>>>>> >>>>for >>>> >>>> >>>>>>your >>>>>> >>>>>> >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install', >>>>>>> >>>>>>> >>>>unless >>>> >>>> >>>>>>you >>>>>> >>>>>> >>>>>>>explicitly change the installation directory when using 'perl >>>>>>> >>>>>>> >>>>>>Makefile.PL'), >>>>>> >>>>>> >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will >>>>>>> >>>>>>> >>>>install >>>> >>>> >>>>>>the >>>>>> >>>>>> >>>>>>>Bioperl distribution you downloaded over the old version in @INC. >>>>>>> >>>>>>> >>>>See >>>> >>>> >>>>>>this >>>>>> >>>>>> >>>>>>>page: >>>>>>> >>>>>>> >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL >>>>>>>>for more details. >>>>>>>>Christopher Fields >>>>>>>> >>>>>>>> >>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>Dept. of Biochemistry >>>>>>>University of Illinois Urbana-Champaign >>>>>>> >>>>>>> >>>>>>>>>>-----Original Message----- >>>>>>>>>> >>>>>>>>>> >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM >>>>>>>>To: bioperl-l at lists.open-bio.org >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>Hi, Chris, >>>>>>>>>> >>>>>>>>>> >>>>>>>>I do have different versions of bioperl on my Linux machine >>>>>>>> >>>>>>>> >>>(1.4. >>> >>> >>>>and >>>> >>>> >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl- >>>>>>>> >>>>>>>> >>>>1.5.1 >>>> >>>> >>>>>>or I >>>>>> >>>>>> >>>>>>>>need to uninstall and remove the previous versions. I could not >>>>>>>> >>>>>>>> >>>>find >>>> >>>> >>>>>>any >>>>>> >>>>>> >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me >>>>>>>> >>>>>>>> >>>>some >>>> >>>> >>>>>>>>suggestion? >>>>>>>>Thanks, >>>>>>>>Guojun >>>>>>>> >>>>>>>> >>>>>>>>>>Department of Plant Biology >>>>>>>>>> >>>>>>>>>> >>>>>>>>University of Georgia >>>>>>>> _____ >>>>>>>> >>>>>>>> >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>> >>>>>>>>>> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500 >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>> >>>>>>>> >>>>>>version >>>>>> >>>>>> >>>>>>>>1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>updated from CVS >>>>>> >>>>>> >>>>>>>>which isn't the latest fix. >>>>>>>> >>>>>>>> >>>>>>>>>>Make sure that you check the following: >>>>>>>>>>1) Always post to the mailing list: >>>>>>>>>> >>>>>>>>>> >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . >>>>>>>> >>>>>>>> >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live >>>>>>>>>> >>>>>>>>>> >>>>(CVS) >>>> >>>> >>>>>>>>installed first. Perform a clean installation; do not upgrade >>>>>>>> >>>>>>>> >>>>only >>>> >>>> >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we >>>>>>>> >>>>>>>> >>>can't >>> >>> >>>>>>>>guarantee that mixing modules from old and new distributions >>>>>>>> >>>>>>>> >>>(1.4 >>> >>> >>>>and >>>> >>>> >>>>>>>>1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be >>>>>>>> >>>>>>>> >>>>saved >>>> >>>> >>>>>>and >>>>>> >>>>>> >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI >>>>>>>> >>>>>>>> >>>>>>(v2.2.13) >>>>>> >>>>>> >>>>>>>>but it should still save it. I believe as long as next_results() >>>>>>>> >>>>>>>> >>>>isn't >>>> >>>> >>>>>>>>called, it will work. >>>>>>>> >>>>>>>> >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST >>>>>>>>>> >>>>>>>>>> >>>2.2.13 >>> >>> >>>>>>text output >>>>>> >>>>>> >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by >>>>>>>> >>>>>>>> >>>Roger >>> >>> >>>>Hall >>>> >>>> >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be >>>>>>>> >>>>>>>> >>>>(Jason >>>> >>>> >>>>>>or >>>>>> >>>>>> >>>>>>>>whomever is in charge of Bio::SearchIO). They can be found in >>>>>>>> >>>>>>>> >>>>>>Bugzilla: >>>>>> >>>>>> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>> >>>>>>>>>> >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>> >>>>>>>> >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the >>>>>>>>>> >>>>>>>>>> >>>>option >>>> >>>> >>>>>>of >>>>>> >>>>>> >>>>>>>>saving XML output, so isn't necessary if you don't plan on using >>>>>>>> >>>>>>>> >>>>this >>>> >>>> >>>>>>>>option. And, remember, they haven't been committed yet to CVS, >>>>>>>> >>>>>>>> >>>>which >>>> >>>> >>>>>>>>means that the final version will change to refle the new >>>>>>>> >>>>>>>> >>>version. >>> >>> >>>>>>>>>>>>Christopher Fields >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>>Dept. of Biochemistry >>>>>>>>University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>> >>>>>>>>>>>> _____ >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM >>>>>>>>To: Chris Fields >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>> >>>>>>>> >>>>>>version >>>>>> >>>>>> >>>>>>>>1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>>>Hi, Chris >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work >>>>>>>>>> >>>>>>>>>> >>>>for >>>> >>>> >>>>>>my cgi >>>>>> >>>>>> >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't >>>>>>>> >>>>>>>> >>>>even >>>> >>>> >>>>>>get >>>>>> >>>>>> >>>>>>>>any RID. Is there any suggestion? >>>>>>>> >>>>>>>> >>>>>>>>>>>>>>Guojun >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>Guojun Yang >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>Department of Plant Biology >>>>>>>>University of Georgia >>>>>>>>Tel: 706-542-1857 >>>>>>>>Fax: 706-542-1805 >>>>>>>>http://www.arches.uga.edu/~guojun >>>>>>>> _____ >>>>>>>> >>>>>>>> >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500 >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>> >>>>>>>> >>>>>>version >>>>>> >>>>>> >>>>>>>>1.28 >>>>>>>> >>>>>>>> >>>>>>>>>>I would say give the new code a try, but realize that it >>>>>>>>>> >>>>>>>>>> >>>>hasn't >>>> >>>> >>>>>>been >>>>>> >>>>>> >>>>>>>>checked >>>>>>>>in (like I said below). I will try going over the modified >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is >>>>>>>> >>>>>>>> >>>>anything I >>>> >>>> >>>>>>>>might >>>>>>>>have missed. The changed order in the header of BLAST text >>>>>>>> >>>>>>>> >>>output >>> >>> >>>>has >>>> >>>> >>>>>>me a >>>>>> >>>>>> >>>>>>>>bit worried that it might not catch everything, but it at least >>>>>>>> >>>>>>>> >>>>>>doesn't >>>>>> >>>>>> >>>>>>>>hang >>>>>>>>in the while() loop I described in the bug report below (bug >>>>>>>> >>>>>>>> >>>>#1934) >>>> >>>> >>>>>>and >>>>>> >>>>>> >>>>>>>>seems to process everything fine. >>>>>>>> >>>>>>>> >>>>>>>>>>If you want more stability in the code, you might consider >>>>>>>>>> >>>>>>>>>> >>>>>>changing over >>>>>> >>>>>> >>>>>>>>to >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are >>>>>>>> >>>>>>>> >>>>some >>>> >>>> >>>>>>>>changes >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate >>>>>>>> >>>>>>>> >>>>saving >>>> >>>> >>>>>>XML >>>>>> >>>>>> >>>>>>>>output, but I believe it parses everything regardless. If you >>>>>>>> >>>>>>>> >>>look >>> >>> >>>>>>back >>>>>> >>>>>> >>>>>>>>the >>>>>>>>last month or so there has been a bit of discussion here about >>>>>>>> >>>>>>>> >>>it. >>> >>> >>>>>>Jason >>>>>> >>>>>> >>>>>>>>describes a bit on how to set up RemoteBlast for XML: >>>>>>>> >>>>>>>> >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using- >>>>>>>>>> >>>>>>>>>> >>>>>>remoteblast/ >>>>>> >>>>>> >>>>>>>>>>Christopher Fields >>>>>>>>>> >>>>>>>>>> >>>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>>Dept. of Biochemistry >>>>>>>>University of Illinois Urbana-Champaign >>>>>>>> >>>>>>>> >>>>>>>>>>>-----Original Message----- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM >>>>>>>>>To: bioperl-l at bioperl.org >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm >>>>>>>>> >>>>>>>>> >>>>version >>>> >>>> >>>>>>1.28 >>>>>> >>>>>> >>>>>>>>>Hi, Everybody, >>>>>>>>>I see this post and am wondering if this is the reason for the >>>>>>>>>malfunctionning of my webserver. We set up a webserver named >>>>>>>>> >>>>>>>>> >>>>MAK, >>>> >>>> >>>>>>for >>>>>> >>>>>> >>>>>>>>MITE >>>>>>>> >>>>>>>> >>>>>>>>>sequence analysis. It was working very well until around >>>>>>>>> >>>>>>>>> >>>>November >>>> >>>> >>>>>>2005, >>>>>> >>>>>> >>>>>>>>>when it stopped returning any result (the site is fine and >>>>>>>>> >>>>>>>>> >>>seems >>> >>> >>>>to >>>> >>>> >>>>>>be >>>>>> >>>>>> >>>>>>>>>doing sth after submission). In the CGI script, I used >>>>>>>>> >>>>>>>>> >>>>remoteblast >>>> >>>> >>>>>>(that >>>>>> >>>>>> >>>>>>>>>work was done in 2003) to do searches. I currently do not have >>>>>>>>> >>>>>>>>> >>>>>>access to >>>>>> >>>>>> >>>>>>>>>the server because I moved. Quite several people sent emails >>>>>>>>> >>>>>>>>> >>>to >>> >>> >>>>us >>>> >>>> >>>>>>about >>>>>> >>>>>> >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the >>>>>>>>> >>>>>>>>> >>>>problem? >>>> >>>> >>>>>>>>Should >>>>>>>> >>>>>>>> >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new >>>>>>>>> >>>>>>>>> >>>>version? >>>> >>>> >>>>>>>>>Thanks a lot, >>>>>>>>>Guojun >>>>>>>>> >>>>>>>>>Department of Plant Biology >>>>>>>>>University of Georgia >>>>>>>>>Tel: 706-542-1857 >>>>>>>>>Fax: 706-542-1805 >>>>>>>>>http://www.arches.uga.edu/~guojun >>>>>>>>>_____ >>>>>>>>> >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang >>>>>>>>> >>>>>>>>> >>>>Jian' >>>> >>>> >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' >>>>>>>>> >>>>>>>>> >>>[mailto:bioperl- >>> >>> >>>>>>>>>l at bioperl.org] >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500 >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>> >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live >>>>>>>>> >>>>>>>>> >>>>CVS. >>>> >>>> >>>>>>It >>>>>> >>>>>> >>>>>>>>>will >>>>>>>>>work for saving text output. However, it will not parse >>>>>>>>> >>>>>>>>> >>>anything >>> >>> >>>>>>using >>>>>> >>>>>> >>>>>>>>>next_result (it will likely hang) and will not save XML >>>>>>>>> >>>>>>>>> >>>format. >>> >>> >>>>See >>>> >>>> >>>>>>>>these >>>>>>>> >>>>>>>> >>>>>>>>>bugs: >>>>>>>>> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 >>>>>>>>> >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast >>>>>>>>> >>>>>>>>> >>>and >>> >>> >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in >>>>>>>>> >>>>>>>>> >>>>yet >>>> >>>> >>>>>>so >>>>>> >>>>>> >>>>>>>>are >>>>>>>> >>>>>>>> >>>>>>>>>still not included in bioperl-live; they may be further >>>>>>>>> >>>>>>>>> >>>modified >>> >>> >>>>>>before >>>>>> >>>>>> >>>>>>>>>committing to CVS. If you're not worried about XML, you could >>>>>>>>> >>>>>>>>> >>>>just >>>> >>>> >>>>>>try >>>>>> >>>>>> >>>>>>>>the >>>>>>>> >>>>>>>> >>>>>>>>>first fix, which is a change to SearchIO::blast. >>>>>>>>> >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a >>>>>>>>> >>>>>>>>> >>>>>>script >>>>>> >>>>>> >>>>>>>>>which >>>>>>>>>had problems; the script you used saves the output but doesn't >>>>>>>>> >>>>>>>>> >>>>>>actually >>>>>> >>>>>> >>>>>>>>>parse it (i.e. you don't use next_result() to go through the >>>>>>>>> >>>>>>>>> >>>>data). >>>> >>>> >>>>>>Is >>>>>> >>>>>> >>>>>>>>the >>>>>>>> >>>>>>>> >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have >>>>>>>>> >>>>>>>>> >>>you >>> >>> >>>>>>tried >>>>>> >>>>>> >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "- >>>>>>>>> >>>>>>>>> >>>>readmethod >>>> >>>> >>>>>>=> >>>>>> >>>>>> >>>>>>>>>blast" >>>>>>>>>using your version of RemoteBlast and method next_result()? >>>>>>>>> >>>>>>>>> >>>Like >>> >>> >>>>>>below >>>>>> >>>>>> >>>>>>>>>(from >>>>>>>>>perldoc): >>>>>>>>> >>>>>>>>>while ( my @rids = $factory->each_rid ) { >>>>>>>>>foreach my $rid ( @rids ) { >>>>>>>>>my $rc = $factory->retrieve_blast($rid); >>>>>>>>>if( !ref($rc) ) { >>>>>>>>>if( $rc < 0 ) { >>>>>>>>>$factory->remove_rid($rid); >>>>>>>>>} >>>>>>>>>print STDERR "." if ( $v > 0 ); >>>>>>>>>sleep 5; >>>>>>>>>} else { # parsing >>>>>>>>>starts here >>>>>>>>>my $result = $rc->next_result(); # it should hang >>>>>>>>>here >>>>>>>>>#save the output >>>>>>>>>my $filename = $result->query_name()."\.out"; >>>>>>>>>$factory->save_output($filename); >>>>>>>>>$factory->remove_rid($rid); >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n"; >>>>>>>>>while ( my $hit = $result->next_hit ) { >>>>>>>>>next unless ( $v > 0); >>>>>>>>>print "\thit name is ", $hit->name, "\n"; >>>>>>>>>while( my $hsp = $hit->next_hsp ) { >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n"; >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>>} >>>>>>>>> >>>>>>>>> >>>>>>>>>My script hanged if I used next_result() in any way prior to >>>>>>>>> >>>>>>>>> >>>the >>> >>> >>>>>>fixes. >>>>>> >>>>>> >>>>>>>>I >>>>>>>> >>>>>>>> >>>>>>>>>want to see how many others are having the same issues with >>>>>>>>> >>>>>>>>> >>>>parsing >>>> >>>> >>>>>>>>using >>>>>>>> >>>>>>>> >>>>>>>>>the CVS version of bioperl-live. >>>>>>>>> >>>>>>>>>Christopher Fields >>>>>>>>>Postdoctoral Researcher - Switzer Lab >>>>>>>>>Dept. of Biochemistry >>>>>>>>>University of Illinois Urbana-Champaign >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>>-----Original Message----- >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- >>>>>>>>>> >>>>>>>>>> >>>l- >>> >>> >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM >>>>>>>>>>To: Huang Jian; bioperl-l >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 >>>>>>>>>> >>>>>>>>>>Hi Huang, >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm >>>>>>>>>> >>>>>>>>>> >>>>works >>>> >>>> >>>>>>on >>>>>> >>>>>> >>>>>>>>the >>>>>>>> >>>>>>>> >>>>>>>>>>logic of checking the temporary file size to determine >>>>>>>>>> >>>>>>>>>> >>>whether >>> >>> >>>>the >>>> >>>> >>>>>>>>Blast >>>>>>>> >>>>>>>> >>>>>>>>>>results are ready. This condition is not getting satisfied >>>>>>>>>> >>>>>>>>>> >>>may >>> >>> >>>>be >>>> >>>> >>>>>>due >>>>>> >>>>>> >>>>>>>>to >>>>>>>> >>>>>>>> >>>>>>>>>>some changes brought about by NCBI. I had this problem >>>>>>>>>> >>>>>>>>>> >>>>recently >>>> >>>> >>>>>>and >>>>>> >>>>>> >>>>>>>>>>figured out that the solution was to use the latest version >>>>>>>>>> >>>>>>>>>> >>>>which >>>> >>>> >>>>>>has >>>>>> >>>>>> >>>>>>>>>>this problem fixed (does not use file size logic any more) >>>>>>>>>> >>>>>>>>>> >>>>which >>>> >>>> >>>>>>is >>>>>> >>>>>> >>>>>>>>not >>>>>>>> >>>>>>>> >>>>>>>>>>yet included in the BioPerl package. >>>>>>>>>>Cheers >>>>>>>>>>Nagesh >>>>>>>>>> >>>>>>>>>>Huang Jian wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>Dear Nagesh, >>>>>>>>>>> >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 >>>>>>>>>>> >>>>>>>>>>> >>>>you >>>> >>>> >>>>>>send >>>>>> >>>>>> >>>>>>>>>>>me. Now it works perfectly!!! >>>>>>>>>>> >>>>>>>>>>>Thank you!! >>>>>>>>>>> >>>>>>>>>>>Huang >>>>>>>>>>> >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka" >>>>>>>>>>> >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l" >>>>>>>>>>> >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the >>>>>>>>>>> >>>>>>>>>>> >>>net, >>> >>> >>>>so >>>> >>>> >>>>>>still >>>>>> >>>>>> >>>>>>>>>>>via email >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>Hi Huang, >>>>>>>>>>>>I see that you are submitting a sequence for a remote >>>>>>>>>>>> >>>>>>>>>>>> >>>blast >>> >>> >>>>>>search. >>>>>> >>>>>> >>>>>>>>>Can >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28 >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>(2005/12/09). >>>>>> >>>>>> >>>>>>>>If >>>>>>>> >>>>>>>> >>>>>>>>>>>>not I have attached it with this email, try to replace it >>>>>>>>>>>> >>>>>>>>>>>> >>>>with >>>> >>>> >>>>>>the >>>>>> >>>>>> >>>>>>>>>old >>>>>>>>> >>>>>>>>> >>>>>>>>>>>>one which has a bug. >>>>>>>>>>>>Let me know if it works. >>>>>>>>>>>>Nagesh >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>_______________________________________________ >>>>>>>>>>Bioperl-l mailing list >>>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>_______________________________________________ >>>>>>>>>Bioperl-l mailing list >>>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>_______________________________________________ >>>>>> >>>>>> >>>>>>>>Bioperl-l mailing list >>>>>>>>Bioperl-l at lists.open-bio.org >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>>_______________________________________________ >>>>>>>> >>>>>>>> >>>>>>Bioperl-l mailing list >>>>>>Bioperl-l at lists.open-bio.org >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From jason.stajich at duke.edu Thu Feb 16 09:00:01 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Thu, 16 Feb 2006 09:00:01 -0500 Subject: [Bioperl-l] searchIO bug? In-Reply-To: <43F452F30200009B00000EC9@gwia.kvl.dk> References: <43F452F30200009B00000EC9@gwia.kvl.dk> Message-ID: <11B49C84-9C04-4F43-9278-A3AA09C9B773@duke.edu> i think it would be more helpful if you posted the actual report rather than the protein since this may be dependent on the version of blast you are using. if you used split(/\s+/, $header) it wouldn't matter how many spaces. On Feb 16, 2006, at 4:24 AM, Anders Stegmann wrote: > Hi! > > > I am blasting a protein seq against an identical protein. > I am trying to parse the protein header by using the query_description > method in the SearchIO module. > After using the query_description method I use split / / in order > to easily access the different header components. > Here I discover that the query_description method is somehow > introducing > a space between number 5 comma and the following chromosome position > number > in the exon chromosome position list!? > This truncates the list of exon chromosome positions from 7 to 4, > later > yielding a wrong number of the introns counted. > > Is this a bug? > > Attached is: > > testblast1.pl: the blastprogram to run. > > Q0045 the seq that is used as both query and database seq. > (Q0045 has to be formated in order to be used as a database: > formatdb -i > Q0045 -p T -o F) > > > Regards Anders. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From cjfields at uiuc.edu Thu Feb 16 10:50:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 09:50:04 -0600 Subject: [Bioperl-l] additional error message In-Reply-To: <20060216100410.54a1a6d5@dogwood.plantbio.uga.edu> Message-ID: <002901c63310$a7da1b20$15327e82@pyrimidine> I don't think the apache error is related to the main issue here, but you could always try upgrading LWP to see if that fixes it. The second issue is text parsing issues in SearchIO specific to nucleotide BLAST information, which I'm looking into. Jason has posted a bit on using XML. Basically, do the following: my $prog = 'blastn'; my $db = 'nr'; my $e_val=1e-10; my $v = 1; my @params=(-prog=>$prog, -data=>$db, -expect=>$e_val, -readmethod=>'xml'); my $factory=Bio::Tools::Run::RemoteBlast->new(@params); $factory->retrieve_parameter('FORMAT_TYPE', 'XML'); You'll also need to modify following line: my $filename = $result->query_name()."\.out"; b/c the XML tag for this feature is actually part of the rid for some reason, so you'll get a weird output file name. This is a problem with NCBI's XML output, not SearchIO::XML parsing. XML BLAST files can be really big (~5 MB and up depending on how much information is returned), so it may take a little time to go through the data. Right now, it is the only consistently reliable way that output can be parsed at this moment as NCBI keeps changing text output, sending us back into "SearchIO::blast hell," as J.S. puts it. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > Sent: Thursday, February 16, 2006 9:04 AM > To: Chris Fields; Pieter Monsieurs > Cc: bioperl-l at lists.open-bio.org > Subject: additional error message > > when I check my apache error_log, there is one line saying: > "waiting...Parsing of undecoded UTF-8 will give garbage when decoding > entities at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 137.," > I also see an error saying "MSG: no data for midline Features flanking > this part of subject sequence:, " that is mentioned by Pieter. > Chris, may I have your suggestion on change it to XML parsing? I read > Jason's comments/suggestions about it, but could not make it work. > Thanks > > Guojun > Department of Plant Biology > University of Georgia > > > > ----- Original Message ----- > From: Chris Fields [mailto:cjfields at uiuc.edu] > To: Pieter Monsieurs [mailto:Pieter.Monsieurs at esat.kuleuven.be] > Cc: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm > version 1.28 > > > > Yeah, looks like it broke text output nucleotide parsing with that. > > XML output parsing still works though (as expected). I'll give it a > > look. > > > Chris > > > On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote: > > > > Hi, > > > > > > I have the same problem with the blast.pm-file. > > > The people of NCBI added some extra info when giving the Blast- > > > output. (see e.g. "Features flanking this part..." or "Features in > > > this part ..."), example added. > > > The blast.pm module starts looking for the hsp-alignement- > > > information, but it dies when it hits this Feature-information. > > > > > > Pieter > > > > > > > > >> gi|77552765|gb|DP000011.1| > >> query.fcgi? > > >> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza > > >> sativa (japonica cultivar-group) chromosome 12, complete > > > > > > sequence > > > Length=27492551 > > > > > > Features flanking this part of subject sequence: > > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm > > > sub-class > > val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts> > > > 2655 bp at 3' side: hypothetical protein > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts> > > > > > > Score = 36.2 bits (18), Expect = 0.22 > > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > > Strand=Plus/Minus > > > > > > Query 4 GTACTACTCTACTCTACT 21 > > > |||||||||||||||||| > > > > > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > > > > > > > Features flanking this part of subject sequence: > > > 2991 bp at 5' side: hypothetical protein > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts> > > > 1131 bp at 3' side: hypothetical protein > > > > > val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts> > > > > > > Score = 36.2 bits (18), Expect = 0.22 > > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > > Strand=Plus/Minus > > > > > > Query 2 ATGTACTACTCTACTCTA 19 > > > |||||||||||||||||| > > > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > > > > > > > > > Features in this part of subject sequence: > > > DHHC zinc finger domain, putative > > > > > val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts> > > > > > > Score = 34.2 bits (17), Expect = 0.87 > > > Identities = 17/17 (100%), Gaps = 0/17 (0%) > > > Strand=Plus/Plus > > > > > > Query 5 TACTACTCTACTCTACT 21 > > > ||||||||||||||||| > > > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > > > > > > > > > Features flanking this part of subject sequence: > > > 102 bp at 5' side: bZIP transcription factor, putative > > > > > val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts> > > > 3740 bp at 3' side: yeast dcp1, putative > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts> > > > > > > Score = 32.2 bits (16), Expect = 3.4 > > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > > Strand=Plus/Plus > > > > > > Query 7 CTACTCTACTCTACTC 22 > > > |||||||||||||||| > > > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > > > > > > > Features flanking this part of subject sequence: > > > > > > 21 bp at 5' side: peptide transporter T17F3.11, putative > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? > > > val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts> > > > 10230 bp at 3' side: transposon protein, putative, unclassified > > > > > val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts> > > > > > > Score = 32.2 bits (16), Expect = 3.4 > > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > > Strand=Plus/Minus > > > > > > Query 7 CTACTCTACTCTACTC 22 > > > > > > |||||||||||||||| > > > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > > > > > > > > > > > Guojun Yang wrote: > > > > > >> Hi, Chris, > > >> Finally the remoteblast test script works for the amino.fa query. > > >> but when I try a nucleic acid sequence (see below), Error occurs: " > > >> waiting........ > > >> ------------- EXCEPTION ------------- > > >> MSG: no data for midline Features flanking this part of subject > > >> sequence: > > >> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/ > > >> 5.8.3/Bio/Searc hIO/blast.pm:1172 > > >> STACK toplevel remoteblast_test:40 > > >> " > > >> The query sequence is: > > >> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > > >> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > > >> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > > >> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > >> > > >> The script (basically same as the remoteblast test, I only changed > > >> database to 'nr' and program to 'blastn' and filename to 'ost3'): > > >> #!/usr/bin/perl > > >> > > >> use Bio::SeqIO; > > >> use Bio::Seq; > > >> use Bio::Tools::Run::RemoteBlast; > > >> use Bio::SearchIO; > > >> use strict; > > >> my $prog='blastn'; > > >> my $db='nr'; > > >> my $e_val=1e-10; > > >> my @params=( -prog=>$prog, > > >> -data=>$db, > > >> -expect=>$e_val, > > >> -readmethod=>'SearchIO'); > > >> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >> > > >> my $v = 1; > > >> > > >> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > >> > > >> while (my $input = $str->next_seq()){ > > >> #Blast a sequence against a database: > > >> #Alternatively, you could pass in a file with many > > >> #sequences rather than loop through sequence one at a time > > >> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >> #and swap the two lines below for an example of that. > > >> my $r = $factory->submit_blast($input); > > >> #my $r = $factory->submit_blast('amino.fa'); > > >> print STDERR "waiting..." if( $v > 0 ); > > >> while ( my @rids = $factory->each_rid ) { > > >> foreach my $rid ( @rids ) { > > >> my $rc = $factory->retrieve_blast($rid); > > >> if( !ref($rc) ) { > > >> if( $rc < 0 ) { > > >> $factory->remove_rid($rid); > > >> } > > >> print STDERR "." if ( $v > 0 ); > > >> sleep 5; > > >> } else { > > >> my $result = $rc->next_result(); > > >> #save the output > > >> my $filename = $result->query_name()."\.out"; > > >> $factory->save_output($filename); > > >> $factory->remove_rid($rid); > > >> print "\nQuery Name: ", $result->query_name(), "\n"; > > >> while ( my $hit = $result->next_hit ) { > > >> next unless ( $v > 0); > > >> print "\thit name is ", $hit->name, "\n"; > > >> while( my $hsp = $hit->next_hsp ) { > > >> print "\t\tscore is ", $hsp->score, "\n"; > > >> } > > >> } > > >> } > > >> } > > >> } > > >> } > > >> > > >> > > >> Do you think there might still be something in the NCBI output > > >> format? > > >> > > >> Thank you, > > >> Guojun > > >> > > >> > > >> > > >> > > >> Guojun Yang > > >> Department of Plant Biology > > >> University of Georgia > > >> Tel: 706-542-1857 > > >> Fax: 706-542-1805 > > >> http://www.arches.uga.edu/~guojun > > >> > > >> > > >> > > >> ----- Original Message ----- > > >> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >> > > >> > > >> > > >>> Sorry, forgot to add that I didn't see the regex issue that you > > >>> mentioned. > > >>> It could be a perl-related issue. Try the fixes I mentioned and > > >>> see what > > >>> happens. > > >>> > > >>>> Christopher Fields > > >>>> > > >>> Postdoctoral Researcher - Switzer Lab > > >>> Dept. of Biochemistry > > >>> University of Illinois Urbana-Champaign > > >>>>>> -----Original Message----- > > >>>>>> > > >>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>> Sent: Tuesday, February 14, 2006 12:36 PM > > >>>> To: 'gyang at plantbio.uga.edu' > > >>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >>>> > > >>>>>> It's a good habit to always add single quotes around words. > > >>>>>> The perl > > >>>>>> > > >>>> interpreter may think a single bare word is a subroutine or > > >>>> perlfunc > > >>>> called with no args so will try to find a subroutine named blastp > > >>>> (). My > > >>>> debugger actually gives the error that the bare word blastp may > > >>>> conflict > > >>>> with a future reserved word. Like you said, 'use strict' will > > >>>> point that > > >>>> out. > > >>>> > > >>>>>> As for the regex, it should match all the blast programs at > > >>>>>> NCBI (blastp, > > >>>>>> > > >>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure > > >>>> nothing > > >>>> else passes through. > > >>>> > > >>>>>> So, if you are using the script below, there are several > > >>>>>> errors. The bare > > >>>>>> > > >>>> words for $prog and $db need quotes, and the flags for you > > >>>> @params array > > >>>> don't have a dash before them. I get this after adding quotes > > >>>> but before > > >>>> adding the dashes to @params: > > >>>> > > >>>>>> C:\Perl\Scripts>test_blast.pl > > >>>>>> ------------- EXCEPTION: Bio::Root::Exception ------------- > > >>>>>> > > >>>> MSG: > > >>>> STACK: Error::throw > > >>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > >>>> live/Bio/Root/Root.pm:328 > > >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > >>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > >>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl > > >>>> \bioperl- > > >>>> live/Bio/Tools/Run/RemoteBlast.pm:256 > > >>>> STACK: C:\Perl\Scripts\test_blast.pl:15 > > >>>> ----------------------------------------------------------- > > >>>> > > >>>>>> The last line indicates a problem with this line: > > >>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >>>>>> Changing the @params to this: > > >>>>>> my @params=( -prog=>$prog, > > >>>>>> > > >>>> -data=>$db, > > >>>> -expect=>$e_val, > > >>>> -readmethod=>'SearchIO'); > > >>>> > > >>>>>> fixes it, and I get output as expected. > > >>>>>> Christopher Fields > > >>>>>> > > >>>> Postdoctoral Researcher - Switzer Lab > > >>>> Dept. of Biochemistry > > >>>> University of Illinois Urbana-Champaign > > >>>> > > >>>>>>>>> -----Original Message----- > > >>>>>>>>> > > >>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > >>>>> Sent: Tuesday, February 14, 2006 11:48 AM > > >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org > > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >>>>> > > >>>>> Hi, Chris, > > >>>>> When I tried with the perldoc script, It did not work either. > > >>>>> First it > > >>>>> says $prog can not be bare word if I "use strict". I added > > >>>>> quotes on the > > >>>>> words, then it says the value for $prog does not match expression > > >>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > >>>>> > > >>>> script > > >>>> > > >>>>> is shown below. Why is the expression "t?blast[pnx]"? > > >>>>> > > >>>>> #!/usr/bin/perl > > >>>>> > > >>>>> use Bio::SeqIO; > > >>>>> use Bio::Seq; > > >>>>> use Bio::Tools::Run::RemoteBlast; > > >>>>> use Bio::SearchIO; > > >>>>> > > >>>>> > > >>>>> my $prog=blastp; > > >>>>> my $db=swissprot; > > >>>>> my $e_val=1e-10; > > >>>>> my @params=( prog=>$prog, > > >>>>> data=>$db, > > >>>>> expect=>$e_val, > > >>>>> readmethod=>'SearchIO'); > > >>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >>>>> > > >>>>> my $v = 1; > > >>>>> > > >>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => > >>>>> > 'fasta' ); > > >>>>> > > >>>>> while (my $input = $str->next_seq()){ > > >>>>> #Blast a sequence against a database: > > >>>>> #Alternatively, you could pass in a file with many > > >>>>> #sequences rather than loop through sequence one at a time > > >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >>>>> #and swap the two lines below for an example of that. > > >>>>> my $r = $factory->submit_blast($input); > > >>>>> #my $r = $factory->submit_blast('amino.fa'); > > >>>>> print STDERR "waiting..." if( $v > 0 ); > > >>>>> while ( my @rids = $factory->each_rid ) { > > >>>>> foreach my $rid ( @rids ) { > > >>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>> if( !ref($rc) ) { > > >>>>> if( $rc < 0 ) { > > >>>>> $factory->remove_rid($rid); > > >>>>> } > > >>>>> print STDERR "." if ( $v > 0 ); > > >>>>> sleep 5; > > >>>>> } else { > > >>>>> my $result = $rc->next_result(); > > >>>>> #save the output > > >>>>> my $filename = $result->query_name()."\.out"; > > >>>>> $factory->save_output($filename); > > >>>>> $factory->remove_rid($rid); > > >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>> while ( my $hit = $result->next_hit ) { > > >>>>> next unless ( $v > 0); > > >>>>> print "\thit name is ", $hit->name, "\n"; > > >>>>> while( my $hsp = $hit->next_hsp ) { > > >>>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> > > >>>>> Thank you for your help! > > >>>>> > > >>>>> > > >>>>> Guojun > > >>>>> Department of Plant Biology > > >>>>> University of Georgia > > >>>>> > > >>>>> ----- Original Message ----- > > >>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>> To: gyang at plantbio.uga.edu > > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>> > > >>>>> > > >>>>> > > >>>>>> Try two things: > > >>>>>> > > >>>>>>> 1) Use a much simpler script, like the one in 'perldoc > > >>>>>>> > > >>>>>> Bio::Tools::Run::RemoteBlast'. If this fixes it, there's > > >>>>>> something > > >>>>>> > > >>>>> wrong > > >>>>> > > >>>>>> with the logic in your subroutine: > > >>>>>> > > >>>>>>> my $v = 1; > > >>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => > > >>>>>>> 'fasta' ); > > >>>>>>> while (my $input = $str->next_seq()){ > > >>>>>>> > > >>>>>> #Blast a sequence against a database: > > >>>>>> #Alternatively, you could pass in a file with many > > >>>>>> #sequences rather than loop through sequence one at a time > > >>>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >>>>>> #and swap the two lines below for an example of that. > > >>>>>> my $r = $factory->submit_blast($input); > > >>>>>> #my $r = $factory->submit_blast('amino.fa'); > > >>>>>> print STDERR "waiting..." if( $v > 0 ); > > >>>>>> while ( my @rids = $factory->each_rid ) { > > >>>>>> foreach my $rid ( @rids ) { > > >>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>> if( !ref($rc) ) { > > >>>>>> if( $rc < 0 ) { > > >>>>>> $factory->remove_rid($rid); > > >>>>>> } > > >>>>>> print STDERR "." if ( $v > 0 ); > > >>>>>> sleep 5; > > >>>>>> } else { > > >>>>>> my $result = $rc->next_result(); > > >>>>>> #save the output > > >>>>>> my $filename = $result->query_name()."\.out"; > > >>>>>> $factory->save_output($filename); > > >>>>>> $factory->remove_rid($rid); > > >>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>>> while ( my $hit = $result->next_hit ) { > > >>>>>> next unless ( $v > 0); > > >>>>>> print "\thit name is ", $hit->name, "\n"; > > >>>>>> while( my $hsp = $hit->next_hsp ) { > > >>>>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> } > > >>>>>> > > >>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works. It > > >>>>>>> > > >>>> really > > >>>> > > >>>>>> shouldn't make that much of a difference, but I noticed that > > >>>>>> the CVS > > >>>>>> RemoteBlast (1.28) was changed in Dec 2005, after > > >>>>>> bioperl-1.5.1 was > > >>>>>> released; the Bugzilla version is based off CVS. > > >>>>>> > > >>>>>>> Christopher Fields > > >>>>>>> > > >>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>> Dept. of Biochemistry > > >>>>>> University of Illinois Urbana-Champaign > > >>>>>> > > >>>>>>>> -----Original Message----- > > >>>>>>>> > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>> Sent: Monday, February 13, 2006 3:00 PM > > >>>>>>> To: bioperl-l at lists.open-bio.org > > >>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>>> > > >>>>>>>>> Thanks, Chris, > > >>>>>>>>> > > >>>>>>> I installed version 1.5.1 and replaced the blast.pm file with > > >>>>>>> the > > >>>>>>> > > >>>> one > > >>>> > > >>>>> from > > >>>>> > > >>>>>>> your bug report. The running version is 1.5 when I use the > > >>>>>>> command > > >>>>>>> > > >>>> you > > >>>> > > >>>>>>> sent me. But when I tried the script, it doesn't change much. My > > >>>>>>> remoteblast code (portion) is here: > > >>>>>>> > > >>>>>>>>> sub search { > > >>>>>>>>> > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} > > >>>>>>> ="$ORGN"; > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} > > >>>>>>> =5000; > > >>>>>>> local > > >>>>>>> > > >>>>>>> > > >>>> $Bio::Tools::Run::RemoteBlast::HEADER > > >>>> {'COMPOSITION_BASED_STATISTICS'}= > > >>>> > > >>>>>>> 'no'; > > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > >>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > >>>>>>> -id=>"query", > > >>>>>>> -desc=>"new seq"); > > >>>>>>> my $len=$query->length(); > > >>>>>>> @db=('nr','htgs','wgs'); > > >>>>>>> foreach my $db (@db) { > > >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' > > >>>>>>> =>'blastn', > > >>>>>>> '-data' =>"$db", > > >>>>>>> > > >>>>>>> > > >>> '-expect'=>"$E_value"); > > >>> > > >>>>>>>>>>> my $blast_report = $factory->submit_blast($query); > > >>>>>>>>>>> > > >>>>>>>>> my @rids = $factory->each_rid(); > > >>>>>>>>> > > >>>>>>> foreach my $rid ( @rids ) { > > >>>>>>> print STDERR "$rid\n"; > > >>>>>>> } > > >>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > >>>>>>> print STDERR "waiting..."; > > >>>>>>> sleep 60; > > >>>>>>> > > >>>>>>>>> foreach my $rid ( @rids ) { > > >>>>>>>>> > > >>>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>>> while (!ref($rc) ) { > > >>>>>>> if( $rc < 0 ) { > > >>>>>>> # retrieve_blast returns -1 on error > > >>>>>>> $factory->remove_rid($rid); > > >>>>>>> print "Error!\n"; > > >>>>>>> send_error($email,$function,$seqname,$queryname[$ST]); > > >>>>>>> die "Can't retrieve $rid"; > > >>>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > >>>>>>> > > >>>> finished' > > >>>> > > >>>>>>> sleep 60; > > >>>>>>> $rc = $factory->retrieve_blast($rid); > > >>>>>>> } > > >>>>>>> } > > >>>>>>> if (ref($rc)) { > > >>>>>>> print STDERR "Done.\n"; > > >>>>>>> while( my $result = $rc->next_result) { > > >>>>>>> while( my $hit = $result->next_hit()) { > > >>>>>>> $hit_name=$hit->name; > > >>>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > >>>>>>> $name=$1; > > >>>>>>> @left_plus_start=(); > > >>>>>>> @left_plus_end=(); > > >>>>>>> @left_minus_start=(); > > >>>>>>> @left_minus_end=(); > > >>>>>>> @right_plus_start=(); > > >>>>>>> @right_plus_end=(); > > >>>>>>> @right_minus_start=(); > > >>>>>>> @right_minus_end=(); > > >>>>>>> > > >>>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > >>>>>>>>> > > >>>>>>> while( my $hsp = $hit->next_hsp()) { > > >>>>>>> ...... > > >>>>>>> > > >>>>>>>>> It was working quite well before around October laster > > >>>>>>>>> year, but > > >>>>>>>>> > > >>>>> it has > > >>>>> > > >>>>>>> stopped since then, When a submission is sent via a webpage, > > >>>>>>> the cgi > > >>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there, > > >>>>>>> > > >>>>> finally > > >>>>> > > >>>>>>> the expected email is received but without real results > > >>>>>>> although it > > >>>>>>> > > >>>>> does > > >>>>> > > >>>>>>> contain something from other parts of the script. Apparently the > > >>>>>>> > > >>>>> search > > >>>>> > > >>>>>>> sub did not return anything (I know there is something should be > > >>>>>>> returned.). Is it also possible the format of the NCBI output > > >>>>>>> for > > >>>>>>> > > >>>> each > > >>>> > > >>>>>>> result has changed? > > >>>>>>> Thank you, > > >>>>>>> Guojun > > >>>>>>> > > >>>>>>>>>>> Department of Plant Biology > > >>>>>>>>>>> > > >>>>>>> University of Georgia > > >>>>>>> > > >>>>>>>>>>>>> ----- Original Message ----- > > >>>>>>>>>>>>> > > >>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>>> > > >>>>>>>>>>>> How do you know two versions are installed (i.e. how are > > >>>>>>>>>>>> > > >>>> you > > >>>> > > >>>>> checking > > >>>>> > > >>>>>>> the > > >>>>>>> > > >>>>>>>> version)? Do you see have two complete bioperl > > >>>>>>>> distributions (in > > >>>>>>>> > > >>>>> two > > >>>>> > > >>>>>>>> separate directories) or are you looking in modules? Here's > > >>>>>>>> the > > >>>>>>>> > > >>>> way > > >>>> > > >>>>> to > > >>>>> > > >>>>>>>> check the version (from the FAQ): > > >>>>>>>> > > >>>>>>>>> perl -MBio::Root::Version -e 'print > > >>>>>>>>> > > >>>>> $Bio::Root::Version::VERSION,"\n"' > > >>>>> > > >>>>>>>>> If you have two full bioperl distributions on your computer, > > >>>>>>>>> > > >>>>> normally > > >>>>> > > >>>>>>> only > > >>>>>>> > > >>>>>>>> one will be in use unless you have explicitly set the > > >>>>>>>> environment > > >>>>>>>> > > >>>>>>> variable > > >>>>>>> > > >>>>>>>> PERL5LIB. The PERL5LIB directories will be searched first > > >>>>>>>> before > > >>>>>>>> > > >>>>> your > > >>>>> > > >>>>>>>> normal perl directory list (@INC) is searched. You MAY get > > >>>>>>>> some > > >>>>>>>> > > >>>>> mixing > > >>>>> > > >>>>>>>> then, but only if perl can't find a particular module in the > > >>>>>>>> path > > >>>>>>>> > > >>>>>>> designated > > >>>>>>> > > >>>>>>>> in PERL5LIB; then it will progress through the directories > > >>>>>>>> listed > > >>>>>>>> > > >>>> in > > >>>> > > >>>>>>> @INC. > > >>>>>>> > > >>>>>>>> This may happen if a module is unique to a particular > > >>>>>>>> release, but > > >>>>>>>> > > >>>>>>> shouldn't > > >>>>>>> > > >>>>>>>> happen for the majority of modules, including RemoteBlast. You > > >>>>>>>> > > >>>> can > > >>>> > > >>>>>>> check > > >>>>>>> > > >>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'. @INC > > >>>>>>>> will > > >>>>>>>> > > >>>>> differ > > >>>>> > > >>>>>>>> depending on your OS, perl build, etc. > > >>>>>>>> > > >>>>>>>>> Regardless, if you follow the directions for installing > > >>>>>>>>> bioperl > > >>>>>>>>> > > >>>>> for > > >>>>> > > >>>>>>> your > > >>>>>>> > > >>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make > > >>>>>>>> install', > > >>>>>>>> > > >>>>> unless > > >>>>> > > >>>>>>> you > > >>>>>>> > > >>>>>>>> explicitly change the installation directory when using 'perl > > >>>>>>>> > > >>>>>>> Makefile.PL'), > > >>>>>>> > > >>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will > > >>>>>>>> > > >>>>> install > > >>>>> > > >>>>>>> the > > >>>>>>> > > >>>>>>>> Bioperl distribution you downloaded over the old version in > > >>>>>>>> @INC. > > >>>>>>>> > > >>>>> See > > >>>>> > > >>>>>>> this > > >>>>>>> > > >>>>>>>> page: > > >>>>>>>> > > >>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > >>>>>>>>> for more details. > > >>>>>>>>> Christopher Fields > > >>>>>>>>> > > >>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>> Dept. of Biochemistry > > >>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>> > > >>>>>>>>>>> -----Original Message----- > > >>>>>>>>>>> > > >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM > > >>>>>>>>> To: bioperl-l at lists.open-bio.org > > >>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>>>>> > > >>>>>>>>>>> Hi, Chris, > > >>>>>>>>>>> > > >>>>>>>>> I do have different versions of bioperl on my Linux machine > > >>>>>>>>> > > >>>> (1.4. > > >>>> > > >>>>> and > > >>>>> > > >>>>>>>>> 1.5.0), this may be the problem. Should I just install > > >>>>>>>>> bioperl- > > >>>>>>>>> > > >>>>> 1.5.1 > > >>>>> > > >>>>>>> or I > > >>>>>>> > > >>>>>>>>> need to uninstall and remove the previous versions. I could > > >>>>>>>>> not > > >>>>>>>>> > > >>>>> find > > >>>>> > > >>>>>>> any > > >>>>>>> > > >>>>>>>>> hint on uninstalling bioperl on linux. Could you please > > >>>>>>>>> give me > > >>>>>>>>> > > >>>>> some > > >>>>> > > >>>>>>>>> suggestion? > > >>>>>>>>> Thanks, > > >>>>>>>>> Guojun > > >>>>>>>>> > > >>>>>>>>>>> Department of Plant Biology > > >>>>>>>>>>> > > >>>>>>>>> University of Georgia > > >>>>>>>>> _____ > > >>>>>>>>> > > >>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>>> > > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding > > >>>>>>>>> RemoteBlast.pm > > >>>>>>>>> > > >>>>>>> version > > >>>>>>> > > >>>>>>>>> 1.28 > > >>>>>>>>> > > >>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely > > >>>>>>>>>>>>>>> > > >>>>>>> updated from CVS > > >>>>>>> > > >>>>>>>>> which isn't the latest fix. > > >>>>>>>>> > > >>>>>>>>>>> Make sure that you check the following: > > >>>>>>>>>>> 1) Always post to the mailing list: > > >>>>>>>>>>> > > >>>>>>>>> http://www.bioperl.org/wiki/ > > >>>>>>>>> HOWTO:Beginners#Getting_Assistance . > > >>>>>>>>> > > >>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live > > >>>>>>>>>>> > > >>>>> (CVS) > > >>>>> > > >>>>>>>>> installed first. Perform a clean installation; do not upgrade > > >>>>>>>>> > > >>>>> only > > >>>>> > > >>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > >>>>>>>>> > > >>>> can't > > >>>> > > >>>>>>>>> guarantee that mixing modules from old and new distributions > > >>>>>>>>> > > >>>> (1.4 > > >>>> > > >>>>> and > > >>>>> > > >>>>>>>>> 1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl- > > >>>>>>>>> live > > >>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be > > >>>>>>>>> > > >>>>> saved > > >>>>> > > >>>>>>> and > > >>>>>>> > > >>>>>>>>> parsed; it will not parse the newest BLAST text output from > > >>>>>>>>> NCBI > > >>>>>>>>> > > >>>>>>> (v2.2.13) > > >>>>>>> > > >>>>>>>>> but it should still save it. I believe as long as > > >>>>>>>>> next_results() > > >>>>>>>>> > > >>>>> isn't > > >>>>> > > >>>>>>>>> called, it will work. > > >>>>>>>>> > > >>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST > > >>>>>>>>>>> > > >>>> 2.2.13 > > >>>> > > >>>>>>> text output > > >>>>>>> > > >>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by > > >>>>>>>>> > > >>>> Roger > > >>>> > > >>>>> Hall > > >>>>> > > >>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be > > >>>>>>>>> > > >>>>> (Jason > > >>>>> > > >>>>>>> or > > >>>>>>> > > >>>>>>>>> whomever is in charge of Bio::SearchIO). They can be found in > > >>>>>>>>> > > >>>>>>> Bugzilla: > > >>>>>>> > > >>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > >>>>>>>>>>> > > >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > >>>>>>>>> > > >>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > >>>>>>>>>>> > > >>>>> option > > >>>>> > > >>>>>>> of > > >>>>>>> > > >>>>>>>>> saving XML output, so isn't necessary if you don't plan on > > >>>>>>>>> using > > >>>>>>>>> > > >>>>> this > > >>>>> > > >>>>>>>>> option. And, remember, they haven't been committed yet to > > >>>>>>>>> CVS, > > >>>>>>>>> > > >>>>> which > > >>>>> > > >>>>>>>>> means that the final version will change to refle the new > > >>>>>>>>> > > >>>> version. > > >>>> > > >>>>>>>>>>>>> Christopher Fields > > >>>>>>>>>>>>> > > >>>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>>> Dept. of Biochemistry > > >>>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>>> > > >>>>>>>>>>>>> _____ > > >>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > >>>>>>>>>>>>> > > >>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM > > >>>>>>>>> To: Chris Fields > > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding > > >>>>>>>>> RemoteBlast.pm > > >>>>>>>>> > > >>>>>>> version > > >>>>>>> > > >>>>>>>>> 1.28 > > >>>>>>>>> > > >>>>>>>>>>>>> Hi, Chris > > >>>>>>>>>>>>> > > >>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work > > >>>>>>>>>>> > > >>>>> for > > >>>>> > > >>>>>>> my cgi > > >>>>>>> > > >>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I > > >>>>>>>>> didn't > > >>>>>>>>> > > >>>>> even > > >>>>> > > >>>>>>> get > > >>>>>>> > > >>>>>>>>> any RID. Is there any suggestion? > > >>>>>>>>> > > >>>>>>>>>>>>>>> Guojun > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>> Guojun Yang > > >>>>>>>>>>>>> > > >>>>>>>>> Department of Plant Biology > > >>>>>>>>> University of Georgia > > >>>>>>>>> Tel: 706-542-1857 > > >>>>>>>>> Fax: 706-542-1805 > > >>>>>>>>> http://www.arches.uga.edu/~guojun > > >>>>>>>>> _____ > > >>>>>>>>> > > >>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>>>>> > > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > >>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding > > >>>>>>>>> RemoteBlast.pm > > >>>>>>>>> > > >>>>>>> version > > >>>>>>> > > >>>>>>>>> 1.28 > > >>>>>>>>> > > >>>>>>>>>>> I would say give the new code a try, but realize that it > > >>>>>>>>>>> > > >>>>> hasn't > > >>>>> > > >>>>>>> been > > >>>>>>> > > >>>>>>>>> checked > > >>>>>>>>> in (like I said below). I will try going over the modified > > >>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is > > >>>>>>>>> > > >>>>> anything I > > >>>>> > > >>>>>>>>> might > > >>>>>>>>> have missed. The changed order in the header of BLAST text > > >>>>>>>>> > > >>>> output > > >>>> > > >>>>> has > > >>>>> > > >>>>>>> me a > > >>>>>>> > > >>>>>>>>> bit worried that it might not catch everything, but it at > > >>>>>>>>> least > > >>>>>>>>> > > >>>>>>> doesn't > > >>>>>>> > > >>>>>>>>> hang > > >>>>>>>>> in the while() loop I described in the bug report below (bug > > >>>>>>>>> > > >>>>> #1934) > > >>>>> > > >>>>>>> and > > >>>>>>> > > >>>>>>>>> seems to process everything fine. > > >>>>>>>>> > > >>>>>>>>>>> If you want more stability in the code, you might consider > > >>>>>>>>>>> > > >>>>>>> changing over > > >>>>>>> > > >>>>>>>>> to > > >>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are > > >>>>>>>>> > > >>>>> some > > >>>>> > > >>>>>>>>> changes > > >>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > >>>>>>>>> > > >>>>> saving > > >>>>> > > >>>>>>> XML > > >>>>>>> > > >>>>>>>>> output, but I believe it parses everything regardless. If you > > >>>>>>>>> > > >>>> look > > >>>> > > >>>>>>> back > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> last month or so there has been a bit of discussion here about > > >>>>>>>>> > > >>>> it. > > >>>> > > >>>>>>> Jason > > >>>>>>> > > >>>>>>>>> describes a bit on how to set up RemoteBlast for XML: > > >>>>>>>>> > > >>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > >>>>>>>>>>> > > >>>>>>> remoteblast/ > > >>>>>>> > > >>>>>>>>>>> Christopher Fields > > >>>>>>>>>>> > > >>>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>>> Dept. of Biochemistry > > >>>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>>> > > >>>>>>>>>>>> -----Original Message----- > > >>>>>>>>>>>> > > >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM > > >>>>>>>>>> To: bioperl-l at bioperl.org > > >>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > >>>>>>>>>> > > >>>>> version > > >>>>> > > >>>>>>> 1.28 > > >>>>>>> > > >>>>>>>>>> Hi, Everybody, > > >>>>>>>>>> I see this post and am wondering if this is the reason for > > >>>>>>>>>> the > > >>>>>>>>>> malfunctionning of my webserver. We set up a webserver named > > >>>>>>>>>> > > >>>>> MAK, > > >>>>> > > >>>>>>> for > > >>>>>>> > > >>>>>>>>> MITE > > >>>>>>>>> > > >>>>>>>>>> sequence analysis. It was working very well until around > > >>>>>>>>>> > > >>>>> November > > >>>>> > > >>>>>>> 2005, > > >>>>>>> > > >>>>>>>>>> when it stopped returning any result (the site is fine and > > >>>>>>>>>> > > >>>> seems > > >>>> > > >>>>> to > > >>>>> > > >>>>>>> be > > >>>>>>> > > >>>>>>>>>> doing sth after submission). In the CGI script, I used > > >>>>>>>>>> > > >>>>> remoteblast > > >>>>> > > >>>>>>> (that > > >>>>>>> > > >>>>>>>>>> work was done in 2003) to do searches. I currently do not > > >>>>>>>>>> have > > >>>>>>>>>> > > >>>>>>> access to > > >>>>>>> > > >>>>>>>>>> the server because I moved. Quite several people sent emails > > >>>>>>>>>> > > >>>> to > > >>>> > > >>>>> us > > >>>>> > > >>>>>>> about > > >>>>>>> > > >>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the > > >>>>>>>>>> > > >>>>> problem? > > >>>>> > > >>>>>>>>> Should > > >>>>>>>>> > > >>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new > > >>>>>>>>>> > > >>>>> version? > > >>>>> > > >>>>>>>>>> Thanks a lot, > > >>>>>>>>>> Guojun > > >>>>>>>>>> > > >>>>>>>>>> Department of Plant Biology > > >>>>>>>>>> University of Georgia > > >>>>>>>>>> Tel: 706-542-1857 > > >>>>>>>>>> Fax: 706-542-1805 > > >>>>>>>>>> http://www.arches.uga.edu/~guojun > > >>>>>>>>>> _____ > > >>>>>>>>>> > > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > >>>>>>>>>> > > >>>>> Jian' > > >>>>> > > >>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > >>>>>>>>>> > > >>>> [mailto:bioperl- > > >>>> > > >>>>>>>>>> l at bioperl.org] > > >>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > >>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > >>>>>>>>>> > > >>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl- > > >>>>>>>>>> live > > >>>>>>>>>> > > >>>>> CVS. > > >>>>> > > >>>>>>> It > > >>>>>>> > > >>>>>>>>>> will > > >>>>>>>>>> work for saving text output. However, it will not parse > > >>>>>>>>>> > > >>>> anything > > >>>> > > >>>>>>> using > > >>>>>>> > > >>>>>>>>>> next_result (it will likely hang) and will not save XML > > >>>>>>>>>> > > >>>> format. > > >>>> > > >>>>> See > > >>>>> > > >>>>>>>>> these > > >>>>>>>>> > > >>>>>>>>>> bugs: > > >>>>>>>>>> > > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > >>>>>>>>>> > > >>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast > > >>>>>>>>>> > > >>>> and > > >>>> > > >>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been > > >>>>>>>>>> checked in > > >>>>>>>>>> > > >>>>> yet > > >>>>> > > >>>>>>> so > > >>>>>>> > > >>>>>>>>> are > > >>>>>>>>> > > >>>>>>>>>> still not included in bioperl-live; they may be further > > >>>>>>>>>> > > >>>> modified > > >>>> > > >>>>>>> before > > >>>>>>> > > >>>>>>>>>> committing to CVS. If you're not worried about XML, you could > > >>>>>>>>>> > > >>>>> just > > >>>>> > > >>>>>>> try > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> > > >>>>>>>>>> first fix, which is a change to SearchIO::blast. > > >>>>>>>>>> > > >>>>>>>>>> Nagesh, I remember you posting to the list a month ago > > >>>>>>>>>> using a > > >>>>>>>>>> > > >>>>>>> script > > >>>>>>> > > >>>>>>>>>> which > > >>>>>>>>>> had problems; the script you used saves the output but > > >>>>>>>>>> doesn't > > >>>>>>>>>> > > >>>>>>> actually > > >>>>>>> > > >>>>>>>>>> parse it (i.e. you don't use next_result() to go through the > > >>>>>>>>>> > > >>>>> data). > > >>>>> > > >>>>>>> Is > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> > > >>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have > > >>>>>>>>>> > > >>>> you > > >>>> > > >>>>>>> tried > > >>>>>>> > > >>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "- > > >>>>>>>>>> > > >>>>> readmethod > > >>>>> > > >>>>>>> => > > >>>>>>> > > >>>>>>>>>> blast" > > >>>>>>>>>> using your version of RemoteBlast and method next_result()? > > >>>>>>>>>> > > >>>> Like > > >>>> > > >>>>>>> below > > >>>>>>> > > >>>>>>>>>> (from > > >>>>>>>>>> perldoc): > > >>>>>>>>>> > > >>>>>>>>>> while ( my @rids = $factory->each_rid ) { > > >>>>>>>>>> foreach my $rid ( @rids ) { > > >>>>>>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>>>>>> if( !ref($rc) ) { > > >>>>>>>>>> if( $rc < 0 ) { > > >>>>>>>>>> $factory->remove_rid($rid); > > >>>>>>>>>> } > > >>>>>>>>>> print STDERR "." if ( $v > 0 ); > > >>>>>>>>>> sleep 5; > > >>>>>>>>>> } else { # parsing > > >>>>>>>>>> starts here > > >>>>>>>>>> my $result = $rc->next_result(); # it should hang > > >>>>>>>>>> here > > >>>>>>>>>> #save the output > > >>>>>>>>>> my $filename = $result->query_name()."\.out"; > > >>>>>>>>>> $factory->save_output($filename); > > >>>>>>>>>> $factory->remove_rid($rid); > > >>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>>>>>>> while ( my $hit = $result->next_hit ) { > > >>>>>>>>>> next unless ( $v > 0); > > >>>>>>>>>> print "\thit name is ", $hit->name, "\n"; > > >>>>>>>>>> while( my $hsp = $hit->next_hsp ) { > > >>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> } > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> My script hanged if I used next_result() in any way prior to > > >>>>>>>>>> > > >>>> the > > >>>> > > >>>>>>> fixes. > > >>>>>>> > > >>>>>>>>> I > > >>>>>>>>> > > >>>>>>>>>> want to see how many others are having the same issues with > > >>>>>>>>>> > > >>>>> parsing > > >>>>> > > >>>>>>>>> using > > >>>>>>>>> > > >>>>>>>>>> the CVS version of bioperl-live. > > >>>>>>>>>> > > >>>>>>>>>> Christopher Fields > > >>>>>>>>>> Postdoctoral Researcher - Switzer Lab > > >>>>>>>>>> Dept. of Biochemistry > > >>>>>>>>>> University of Illinois Urbana-Champaign > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>> -----Original Message----- > > >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > > >>>>>>>>>>> > > >>>> l- > > >>>> > > >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > >>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM > > >>>>>>>>>>> To: Huang Jian; bioperl-l > > >>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > >>>>>>>>>>> > > >>>>>>>>>>> Hi Huang, > > >>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm > > >>>>>>>>>>> > > >>>>> works > > >>>>> > > >>>>>>> on > > >>>>>>> > > >>>>>>>>> the > > >>>>>>>>> > > >>>>>>>>>>> logic of checking the temporary file size to determine > > >>>>>>>>>>> > > >>>> whether > > >>>> > > >>>>> the > > >>>>> > > >>>>>>>>> Blast > > >>>>>>>>> > > >>>>>>>>>>> results are ready. This condition is not getting satisfied > > >>>>>>>>>>> > > >>>> may > > >>>> > > >>>>> be > > >>>>> > > >>>>>>> due > > >>>>>>> > > >>>>>>>>> to > > >>>>>>>>> > > >>>>>>>>>>> some changes brought about by NCBI. I had this problem > > >>>>>>>>>>> > > >>>>> recently > > >>>>> > > >>>>>>> and > > >>>>>>> > > >>>>>>>>>>> figured out that the solution was to use the latest version > > >>>>>>>>>>> > > >>>>> which > > >>>>> > > >>>>>>> has > > >>>>>>> > > >>>>>>>>>>> this problem fixed (does not use file size logic any more) > > >>>>>>>>>>> > > >>>>> which > > >>>>> > > >>>>>>> is > > >>>>>>> > > >>>>>>>>> not > > >>>>>>>>> > > >>>>>>>>>>> yet included in the BioPerl package. > > >>>>>>>>>>> Cheers > > >>>>>>>>>>> Nagesh > > >>>>>>>>>>> > > >>>>>>>>>>> Huang Jian wrote: > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>> Dear Nagesh, > > >>>>>>>>>>>> > > >>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > >>>>>>>>>>>> > > >>>>> you > > >>>>> > > >>>>>>> send > > >>>>>>> > > >>>>>>>>>>>> me. Now it works perfectly!!! > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thank you!! > > >>>>>>>>>>>> > > >>>>>>>>>>>> Huang > > >>>>>>>>>>>> > > >>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka" > > >>>>>>>>>>>> > > >>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l" > > >>>>>>>>>>>> > > >>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM > > >>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the > > >>>>>>>>>>>> > > >>>> net, > > >>>> > > >>>>> so > > >>>>> > > >>>>>>> still > > >>>>>>> > > >>>>>>>>>>>> via email > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Hi Huang, > > >>>>>>>>>>>>> I see that you are submitting a sequence for a remote > > >>>>>>>>>>>>> > > >>>> blast > > >>>> > > >>>>>>> search. > > >>>>>>> > > >>>>>>>>>> Can > > >>>>>>>>>> > > >>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28 > > >>>>>>>>>>>>> > > >>>>>>> (2005/12/09). > > >>>>>>> > > >>>>>>>>> If > > >>>>>>>>> > > >>>>>>>>>>>>> not I have attached it with this email, try to replace it > > >>>>>>>>>>>>> > > >>>>> with > > >>>>> > > >>>>>>> the > > >>>>>>> > > >>>>>>>>>> old > > >>>>>>>>>> > > >>>>>>>>>>>>> one which has a bug. > > >>>>>>>>>>>>> Let me know if it works. > > >>>>>>>>>>>>> Nagesh > > >>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> _______________________________________________ > > >>>>>>>>>>> Bioperl-l mailing list > > >>>>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>>>> > > >>>>>>>>>> _______________________________________________ > > >>>>>>>>>> Bioperl-l mailing list > > >>>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> _______________________________________________ > > >>>>>>>>>> Bioperl-l mailing list > > >>>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>>> > > >>>>>>> _______________________________________________ > > >>>>>>> > > >>>>>>>>> Bioperl-l mailing list > > >>>>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>> > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> > > >>>>>>> Bioperl-l mailing list > > >>>>>>> Bioperl-l at lists.open-bio.org > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>> > > >>>>>>> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >> > > >> > > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > > > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Marc.Logghe at DEVGEN.com Thu Feb 16 10:47:13 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Thu, 16 Feb 2006 16:47:13 +0100 Subject: [Bioperl-l] Primer maps? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com> Hi Mike, Another route you might take is mapping your primers into Bio::SeqFeature::Generic objects and add them to the seq object. Then you dump the object into a rich sequence format like genbank and pass that to EMBOSS's showseq application Or you might do it completely with showseq. Here the only thing you need is an annotation file containing the positions of the primers, followed by any text (e.g. primer name). Then you do: showseq -translate - -format 4 -annotation Have a look at http://emboss.sourceforge.net/apps/showseq.html for more options HTH, Marc Marc Logghe, PhD Expert Scientist Bioinformatics deVGen NV Technologiepark 30 B - 9052 Ghent-Zwijnaarde Tel. +32 9 324 24 83 Fax. +32 9 324 24 25 Web: www.devgen.com --- Disclaimer start --- This e-mail and any attachments thereto may contain information which is confidential and/or which is proprietary to the sender. Accordingly, this e-mail and any attachments thereto, as well as any and all information contained therein, are intended for the sole use of the recipient or recipients designated above. Any use of this e-mail, of any attachments thereto, of any and all information contained therein, and/or of any part(s) thereof (including, without limitation, total or partial reproduction, communication and/or distribution in any form) by persons other than the designated recipient(s) is prohibited. If you have received this e-mail in error, please notify the sender either by telephone or by e-mail and delete the material from any computer. Thank you for your cooperation. --- Disclaimer end --- ________________________________ From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne Sent: Wednesday, February 15, 2006 10:20 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Primer maps? Hello all -- I'm having a devil of a time figuring out how to make restriction maps using BioPerl. What I'm going for is output similar to GCG's map program, but instead of using a set of defined restriction enzymes, I'd like to use a set of primers, to create a primer map rather than a restriction map. I do not need a table of restriction enzymes that cut or don't cut (or primers that match or don't match, in this case), but an honest-to-goodness map, something like: FKP-5-> | CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT 1921 ---------+---------+---------+---------+---------+---------+ 1980 GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA a M E I V S T F D E L Q D Y - I also need translations of orfs, but I can use GenBank files as input to the program and thus the CDS translations are already there, so I'm guessing that shouldn't be too hard.... How does one create such a map using the BioPerl modules? There are intriguing indications out there that such a thing is possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find a single example of code that creates such a basic, bread-and-butter thing as a restriction map with orf translations. The documentation to these modules is fairly useless to me, consisting mostly of internal methods and function prototypes. Perhaps my skills as a Perl programmer are to blame, but a clear example of how a map like this is constructed would be a big help. Right now, I'm generating primer maps with system calls to EMBOSS's remap, pointing it at a file of primer sequences rather than a file of restriction enzyme sequences, but the results are less than desired. I'm considering trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my needs, but this seems like a lot of work for an operation I suspect is possible in BioPerl. Any help greatly appreciated... Mike --------------------------------------------------------------------- //=\ Michael J. Coyne phone: (617) 525-7820 \=// Channing Laboratory FAX: (617) 264-5193 //=\ EBRC, Room 617 \=// 221 Longwood Avenue email:mcoyne at channing.harvard.edu //=\ Boston, MA 02115 mjcoyne at comcast.net \=// --------------------------------------------------------------------- From sdavis2 at mail.nih.gov Thu Feb 16 09:43:45 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Thu, 16 Feb 2006 09:43:45 -0500 Subject: [Bioperl-l] Primer maps? In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost> Message-ID: Do you mean that you want to use Bio::Graphics to make a picture, or just map your primers onto a sequence? Sean On 2/15/06 4:20 PM, "Michael Coyne" wrote: > Hello all -- > > I'm having a devil of a time figuring out how to make restriction maps using > BioPerl. What I'm going for is output similar to GCG's map program, but > instead of using a set of defined restriction enzymes, I'd like to use a set > of primers, to create a primer map rather than a restriction map. I do not > need a table of restriction enzymes that cut or don't cut (or primers that > match or don't match, in this case), but an honest-to-goodness map, something > like: > > FKP-5-> > | > CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT > 1921 ---------+---------+---------+---------+---------+---------+ 1980 > GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA > > a M E I V S T F D E L Q D Y - > > I also need translations of orfs, but I can use GenBank files as input to the > program and thus the CDS translations are already there, so I'm guessing that > shouldn't be too hard.... How does one create such a map using the BioPerl > modules? > > There are intriguing indications out there that such a thing is possible (e.g. > the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find a single > example of code that creates such a basic, bread-and-butter thing as a > restriction map with orf translations. The documentation to these modules is > fairly useless to me, consisting mostly of internal methods and function > prototypes. Perhaps my skills as a Perl programmer are to blame, but a clear > example of how a map like this is constructed would be a big help. > > Right now, I'm generating primer maps with system calls to EMBOSS's remap, > pointing it at a file of primer sequences rather than a file of restriction > enzyme sequences, but the results are less than desired. I'm considering > trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my > needs, but this seems like a lot of work for an operation I suspect is > possible in BioPerl. > > Any help greatly appreciated... > > Mike > > --------------------------------------------------------------------- > //=\ Michael J. Coyne phone: (617) 525-7820 > \=// Channing Laboratory FAX: (617) 264-5193 > //=\ EBRC, Room 617 > \=// 221 Longwood Avenue email:mcoyne at channing.harvard.edu > //=\ Boston, MA 02115 mjcoyne at comcast.net > \=// > --------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Thu Feb 16 11:27:13 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 11:27:13 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602140915.11604.hjm@tacgi.com> Message-ID: Harry, I've long suspected, but never demonstrated, that the easiest way to do something like this is through ENSEMBL, and Jason hinted at this as well. In fact your question is something of a FAQ, and my previous responses always included a plea to some anonymous ENSEMBL API expert, always unheeded. At any rate, here is an example script I made: #!/usr/bin/perl use strict; use lib "/Users/bosborne/ensembl/modules"; use DBI; use Getopt::Long; use Bio::EnsEMBL::DBSQL::DBAdaptor; my $name; GetOptions( "n=s" => \$name ); my $db = Bio::EnsEMBL::DBSQL::DBAdaptor->new( -user => "anonymous", -dbname => "homo_sapiens_core_37_35j", -host => "ensembldb.ensembl.org", -pass => "", -driver => 'mysql' ); my $gene_adaptor = $db->get_GeneAdaptor; my $slice_adaptor = $db->get_SliceAdaptor; my @genes = @{$gene_adaptor->fetch_all_by_external_name($name)}; for my $gene (@genes) { for my $trans (@{$gene->get_all_Transcripts}) { my $seq = $slice_adaptor->fetch_by_region("chromosome", $trans->seq_region_name, $trans->start, $trans->end); print "\n",$seq->seq,"\n"; } } There are some issues, the largest of which is that though this script prints out big sequences it's completely untested! Another is that it makes assumptions about transcripts, you should verify for yourself that ENSEMBL's definition of transcript fits yours. Finally that fetch_all_by_external_name() method does not seem to accept a second argument, i.e. namespace. I found this surprising. Anyway, if more than one gene is retrieved using some name or id you're in a quandary. For more on this API see: http://www.ensembl.org/info/software/core/core_tutorial.html There are tons of modules and methods in this API, I've barely scratched the surface here. Brian O. On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > Hi Brian, > > Thanks very much for the pointers and the speed of your reply and apologies > for the speed of mine. > > This looks good, but what I was looking for was a bioP approach for hooking to > an API at NCBI or EBI so I could get this info and seqs from them. In this > case, speed of retrieval is not critical and I'd rather not download the > entirety of the sequences to a local disk to hack at them. > > I've determined a screen-scraping approach to get them and could script that, > but I thought that bioP had a method for using NCBI's external API's, tho it > may be that my memory is faulty or the approach is no longer supported due to > overload. > > Does NCBI make such APIs available anymore? I searched a bit for docs on them > but couldn't find anything (unless it's buried in the NCBI tookit, which I > haven't started to excavate). > > Failing that, would SEALS provide such a service? Any PerlPinipeds listening? > > Harry > > > > > > > On Sunday 12 February 2006 08:37, Brian Osborne wrote: >> Harry, >> >> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So, >> from its documentation: >> >> use Bio::DB::Fasta; >> >> # create database from directory of fasta files >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> # simple access (for those without Bioperl) >> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >> my @ids = $db->ids; >> my $length = $db->length('CHROMOSOME_I'); >> my $alphabet = $db->alphabet('CHROMOSOME_I'); >> my $header = $db->header('CHROMOSOME_I'); >> >> # Bioperl-style access >> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >> >> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >> my $seq = $obj->seq; >> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >> >> Do you already have the offsets? >> >> Brian O. >> >> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>> Hi All, >>> >>> After perusing the tutorial and other docs for a an evening, I still >>> can't find the answer to this. Forgive me if I've missed something >>> obvious. >>> >>> This should not be a novel request, but I've not found it answered. If >>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a >>> better way, especially if it includes an illuminating bit of code. >>> >>> The problem is to retrieve genomic sequences plus & minus some offset >>> from a locus determined by HUGO keyword or GeneID. This would be a >>> common followup chore for some extra analysis from a gene expression >>> expt. Or maybe this is in the DBFetch routines, but I've missed the >>> sequence type to specify...? >>> >>> >>> TIA! From heikki at sanbi.ac.za Thu Feb 16 12:32:51 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 16 Feb 2006 19:32:51 +0200 Subject: [Bioperl-l] Primer maps? In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com> References: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com> Message-ID: <200602161932.51552.heikki@sanbi.ac.za> Mike, Marc's suggestion is the best I've heard. We really do not have any kind of pretty print functionality within BioPerl. I guess there has not been a pressing need. Bio::Graphics has filled in the need for sequence display. I think Bio::Seq::PrettyPrint could be a great way to design prettyprinting in very modular way so that it can print out anything mapped to a sequence location. The EMBOSS showseq would be a great help in there. A student project? Would anyone be interested? -Heikki On Thursday 16 February 2006 17:47, Marc Logghe wrote: > Hi Mike, > Another route you might take is mapping your primers into > Bio::SeqFeature::Generic objects and add them to the seq object. Then > you dump the object into a rich sequence format like genbank and pass > that to EMBOSS's showseq application > Or you might do it completely with showseq. Here the only thing you need > is an annotation file containing the positions of the primers, followed > by any text (e.g. primer name). > Then you do: > showseq -translate - -format 4 > -annotation > Have a look at http://emboss.sourceforge.net/apps/showseq.html for more > options > > HTH, > Marc > > > Marc Logghe, PhD > Expert Scientist Bioinformatics > deVGen NV > Technologiepark 30 > B - 9052 Ghent-Zwijnaarde > Tel. +32 9 324 24 83 > Fax. +32 9 324 24 25 > Web: www.devgen.com > > --- Disclaimer start --- > This e-mail and any attachments thereto may contain information which is > confidential and/or which is proprietary to the sender. Accordingly, > this e-mail and any attachments thereto, as well as any and all > information contained therein, are intended for the sole use of the > recipient or recipients designated above. Any use of this e-mail, of any > attachments thereto, of any and all information contained therein, > and/or of any part(s) thereof (including, without limitation, total or > partial reproduction, communication and/or distribution in any form) by > persons other than the designated recipient(s) is prohibited. If you > have received this e-mail in error, please notify the sender either by > telephone or by e-mail and delete the material from any computer. > Thank you for your cooperation. > --- Disclaimer end --- > > > > > > ________________________________ > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne > Sent: Wednesday, February 15, 2006 10:20 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Primer maps? > > > Hello all -- > > I'm having a devil of a time figuring out how to make > restriction maps using BioPerl. What I'm going for is output similar to > GCG's map program, but instead of using a set of defined restriction > enzymes, I'd like to use a set of primers, to create a primer map rather > than a restriction map. I do not need a table of restriction enzymes > that cut or don't cut (or primers that match or don't match, in this > case), but an honest-to-goodness map, something like: > > FKP-5-> > > > CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT > 1921 > ---------+---------+---------+---------+---------+---------+ 1980 > > GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA > > a M E I V S T F D E L Q D Y > - > > I also need translations of orfs, but I can use GenBank files as > input to the program and thus the CDS translations are already there, so > I'm guessing that shouldn't be too hard.... How does one create such a > map using the BioPerl modules? > > There are intriguing indications out there that such a thing is > possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I > can't find a single example of code that creates such a basic, > bread-and-butter thing as a restriction map with orf translations. The > documentation to these modules is fairly useless to me, consisting > mostly of internal methods and function prototypes. Perhaps my skills > as a Perl programmer are to blame, but a clear example of how a map like > this is constructed would be a big help. > > Right now, I'm generating primer maps with system calls to > EMBOSS's remap, pointing it at a file of primer sequences rather than a > file of restriction enzyme sequences, but the results are less than > desired. I'm considering trying to adapt tacg 4.1.0 or sequence > extractor 1.1 web-based code to my needs, but this seems like a lot of > work for an operation I suspect is possible in BioPerl. > > Any help greatly appreciated... > > Mike > > > > --------------------------------------------------------------------- > //=\ Michael J. Coyne phone: (617) > 525-7820 > \=// Channing Laboratory FAX: (617) > 264-5193 > //=\ EBRC, Room 617 > \=// 221 Longwood Avenue > email:mcoyne at channing.harvard.edu > //=\ Boston, MA 02115 mjcoyne at comcast.net > \=// > > --------------------------------------------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From osborne1 at optonline.net Thu Feb 16 12:59:37 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 12:59:37 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <200602160823.03534.hjm@tacgi.com> Message-ID: Chris and Harry, I'm writing a Wiki page on this, it's linked to the FAQ as Wiki is complaining that the FAQ is getting too big. I'll fill in the ENSEMBL API and Bio::DB::Fasta approaches, if you would comment on the BioPerl/eutils approach at some point that would be superb: http://bioperl.open-bio.org/wiki/Getting_Genomic_Sequences Brian O. On 2/16/06 11:23 AM, "Harry Mangalam" wrote: > Yes, I'm going to try this 1st. Also the pointer to the NCBI eutils page was > helpful. They describe the same thing and I think that API will give me what > I need. I'll post back to report. > > Sorry for the delay in answering - this is a side project and as such is going > slow. > > Many thanks to you guys, especially Brian for the example code - much more > than I had a right to expect. Virtual Beers all round and real ones should > we ever meet up. > > Harry > > > On Thursday 16 February 2006 04:52, Chris Fields wrote: >> I think a method was recently implemented in Bio::DB::GenBank to >> retrieve a segment of DNA given start and end coordinates in GenBank >> format; that should contain the features you need. I requested it >> ~Nov-Dec in the mailing list but didn't get a chance to test it. >> Would that help? >> >> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: >>> Harry, >>> >>> It's not clear to me that NCBI's eutils offers this capability >>> directly. You >>> can probably download Entrez Gene entries and parse them for >>> coordinates but >>> I know of no way to remotely retrieve genomic sequences like this >>> from NCBI >>> (ENSEMBL API perhaps?). What I had in mind uses the local approach >>> that some >>> of us favor and to prove to myself that this is simple to do I wrote a >>> script that I just added to examples/tools, it's called >>> extract_genes.pl and >>> it's based on Bio::DB::Fasta. Download the sequence files for a given >>> species to some dir, download Entrez Gene's gene2accession file, >>> and run. It >>> creates and stores a hash for lookups, it won't read gene2accession >>> each >>> time it runs. >>> >>> Brian O. >>> >>> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >>>> Hi Brian, >>>> >>>> Thanks very much for the pointers and the speed of your reply and >>>> apologies >>>> for the speed of mine. >>>> >>>> This looks good, but what I was looking for was a bioP approach >>>> for hooking to >>>> an API at NCBI or EBI so I could get this info and seqs from >>>> them. In this >>>> case, speed of retrieval is not critical and I'd rather not >>>> download the >>>> entirety of the sequences to a local disk to hack at them. >>>> >>>> I've determined a screen-scraping approach to get them and could >>>> script that, >>>> but I thought that bioP had a method for using NCBI's external >>>> API's, tho it >>>> may be that my memory is faulty or the approach is no longer >>>> supported due to >>>> overload. >>>> >>>> Does NCBI make such APIs available anymore? I searched a bit for >>>> docs on them >>>> but couldn't find anything (unless it's buried in the NCBI tookit, >>>> which I >>>> haven't started to excavate). >>>> >>>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>>> listening? >>>> >>>> Harry >>>> >>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>>> Harry, >>>>> >>>>> Hope you're doing well. The approach could be based on >>>>> Bio::DB::Fasta. So, >>>>> from its documentation: >>>>> >>>>> use Bio::DB::Fasta; >>>>> >>>>> # create database from directory of fasta files >>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>> >>>>> # simple access (for those without Bioperl) >>>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>>> my @ids = $db->ids; >>>>> my $length = $db->length('CHROMOSOME_I'); >>>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>>> my $header = $db->header('CHROMOSOME_I'); >>>>> >>>>> # Bioperl-style access >>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>> >>>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>>> my $seq = $obj->seq; >>>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>>> >>>>> Do you already have the offsets? >>>>> >>>>> Brian O. >>>>> >>>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>>> Hi All, >>>>>> >>>>>> After perusing the tutorial and other docs for a an evening, I >>>>>> still >>>>>> can't find the answer to this. Forgive me if I've missed something >>>>>> obvious. >>>>>> >>>>>> This should not be a novel request, but I've not found it >>>>>> answered. If >>>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>>> pointer to a >>>>>> better way, especially if it includes an illuminating bit of code. >>>>>> >>>>>> The problem is to retrieve genomic sequences plus & minus some >>>>>> offset >>>>>> from a locus determined by HUGO keyword or GeneID. This would be a >>>>>> common followup chore for some extra analysis from a gene >>>>>> expression >>>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>>> the >>>>>> sequence type to specify...? >>>>>> >>>>>> >>>>>> TIA! >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign From hjm at tacgi.com Thu Feb 16 12:02:07 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 16 Feb 2006 09:02:07 -0800 Subject: [Bioperl-l] Primer maps? In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost> References: <6.2.0.14.0.20060215155422.01d44a98@localhost> Message-ID: <200602160902.07383.hjm@tacgi.com> A bit off the bioperl topic - if you must have bioperl, ignore this (or just system() wrap the command) - but you can do exactly this mapping and in-line translation with a thing I wrote called tacg - you make a GCG-formatted file of primers ie for each pattern you need a line like: ; Top Bottom ;Name Offset Recognition Pattern Offset ! comments primer1 0 tcgggywmkkgg 0 ! ... primer2 0 gcttggctgaggag 0 ! . . . Obviously the offsets can be set to 0 for non REs. There's no limit to the number of primer patterns (tho I think there's a compiled-in limit of 30 chars in the pattern - easily changed in header), no limit to amount of seq searched, handles degeneracies, searches at ~4Mbases/s on a 2G opteron (120 patterns). Also does searching with errors (slowly) and regex's (at pcre speeds), and matrices. Other neat stuff, too. The output is sort of as you describe - replace the RE names with your primer labels and you'll have it. 6 frame xl with 3 letter abbrievs. BsrGI BsrGI AflII DraI \ \ \ \ 121 gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt 180 3453 cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa 3512 ^ * ^ * ^ * ^ * ^ * ^ * 1 ValCysIleCysThrLeuCysThrLeuLysThrTyrThrPheHisCysValTerIleIle 2 CysValPheValHisPheValHisLeuArgProThrHisPheIleValPheLysLeuLeu 3 ValTyrLeuTyrThrLeuTyrThrTerAspLeuHisIleSerLeuCysLeuAsnTyrTyr 4 HisIleGlnValSerGlnValSerLeuTerValValAsnTerGlnThrTerIleIleVal 5 ThrTyrLysTyrValLysTyrValTerArgSerCysMetGluAsnHisLysPheTerTer 6 HisThrAsnThrCysLysThrCysLysGlyLeuValCysLysMetThrAsnLeuAsnAsn or 3 frames with 1 letter abbrievs BsrGI BsrGI AflII DraI \ \ \ \ 121 gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt 180 3453 cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa 3512 ^ * ^ * ^ * ^ * ^ * ^ * 1 V C I C T L C T L K T Y T F H C V * I I 2 C V F V H F V H L R P T H F I V F K L L 3 V Y L Y T L Y T * D L H I S L C L N Y Y read more at tacg.sf.net or reply to me for the latest docs and version - have to admit the sf site is a bit moldy. hjm On Wednesday 15 February 2006 13:20, Michael Coyne wrote: > Hello all -- > > I'm having a devil of a time figuring out how to make restriction maps > using BioPerl.? What I'm going for is output similar to GCG's map program, > but instead of using a set of defined restriction enzymes, I'd like to use > a set of primers, to create a primer map rather than a restriction map.? I > do not need a table of restriction enzymes that cut or don't cut (or > primers that match or don't match, in this case), but an honest-to-goodness > map, something like: > > ?????????????????????????????????????? FKP-5-> > ???????????????????????????????????????????? | > ???? CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT > 1921 ---------+---------+---------+---------+---------+---------+ 1980 > ???? GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA > ? > a???????????????????????? M? E? I? V? S? T? F? D? E? L? Q? D? Y?? - > > I also need translations of orfs, but I can use GenBank files as input to > the program and thus the CDS translations are already there, so I'm > guessing that shouldn't be too hard....? How does one create such a map > using the BioPerl modules? > > There are intriguing indications out there that such a thing is possible > (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find > a single example of code that creates such a basic, bread-and-butter thing > as a restriction map with orf translations.? The documentation to these > modules is fairly useless to me, consisting mostly of internal methods and > function prototypes.? Perhaps my skills as a Perl programmer are to blame, > but a clear example of how a map like this is constructed would be a big > help. > > Right now, I'm generating primer maps with system calls to EMBOSS's remap, > pointing it at a file of primer sequences rather than a file of restriction > enzyme sequences, but the results are less than desired.? I'm considering > trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my > needs, but this seems like a lot of work for an operation I suspect is > possible in BioPerl. > > Any help greatly appreciated... > > Mike > > --------------------------------------------------------------------- > ?//=\?? Michael J. Coyne?????????????????????? phone: (617) 525-7820 > ?\=//?? Channing Laboratory??????????????????? FAX:?? (617) 264-5193 > ? //=\? EBRC, Room 617 > ? \=//? 221 Longwood Avenue??????? email:mcoyne at channing.harvard.edu > ?? //=\ Boston, MA 02115???????????????? mjcoyne at comcast.net > ?? \=// > --------------------------------------------------------------------- -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From hjm at tacgi.com Thu Feb 16 11:23:02 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 16 Feb 2006 08:23:02 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: References: Message-ID: <200602160823.03534.hjm@tacgi.com> Yes, I'm going to try this 1st. Also the pointer to the NCBI eutils page was helpful. They describe the same thing and I think that API will give me what I need. I'll post back to report. Sorry for the delay in answering - this is a side project and as such is going slow. Many thanks to you guys, especially Brian for the example code - much more than I had a right to expect. Virtual Beers all round and real ones should we ever meet up. Harry On Thursday 16 February 2006 04:52, Chris Fields wrote: > I think a method was recently implemented in Bio::DB::GenBank to > retrieve a segment of DNA given start and end coordinates in GenBank > format; that should contain the features you need. I requested it > ~Nov-Dec in the mailing list but didn't get a chance to test it. > Would that help? > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > Harry, > > > > It's not clear to me that NCBI's eutils offers this capability > > directly. You > > can probably download Entrez Gene entries and parse them for > > coordinates but > > I know of no way to remotely retrieve genomic sequences like this > > from NCBI > > (ENSEMBL API perhaps?). What I had in mind uses the local approach > > that some > > of us favor and to prove to myself that this is simple to do I wrote a > > script that I just added to examples/tools, it's called > > extract_genes.pl and > > it's based on Bio::DB::Fasta. Download the sequence files for a given > > species to some dir, download Entrez Gene's gene2accession file, > > and run. It > > creates and stores a hash for lookups, it won't read gene2accession > > each > > time it runs. > > > > Brian O. > > > > On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > >> Hi Brian, > >> > >> Thanks very much for the pointers and the speed of your reply and > >> apologies > >> for the speed of mine. > >> > >> This looks good, but what I was looking for was a bioP approach > >> for hooking to > >> an API at NCBI or EBI so I could get this info and seqs from > >> them. In this > >> case, speed of retrieval is not critical and I'd rather not > >> download the > >> entirety of the sequences to a local disk to hack at them. > >> > >> I've determined a screen-scraping approach to get them and could > >> script that, > >> but I thought that bioP had a method for using NCBI's external > >> API's, tho it > >> may be that my memory is faulty or the approach is no longer > >> supported due to > >> overload. > >> > >> Does NCBI make such APIs available anymore? I searched a bit for > >> docs on them > >> but couldn't find anything (unless it's buried in the NCBI tookit, > >> which I > >> haven't started to excavate). > >> > >> Failing that, would SEALS provide such a service? Any PerlPinipeds > >> listening? > >> > >> Harry > >> > >> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > >>> Harry, > >>> > >>> Hope you're doing well. The approach could be based on > >>> Bio::DB::Fasta. So, > >>> from its documentation: > >>> > >>> use Bio::DB::Fasta; > >>> > >>> # create database from directory of fasta files > >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>> > >>> # simple access (for those without Bioperl) > >>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > >>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > >>> my @ids = $db->ids; > >>> my $length = $db->length('CHROMOSOME_I'); > >>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > >>> my $header = $db->header('CHROMOSOME_I'); > >>> > >>> # Bioperl-style access > >>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>> > >>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > >>> my $seq = $obj->seq; > >>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > >>> > >>> Do you already have the offsets? > >>> > >>> Brian O. > >>> > >>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > >>>> Hi All, > >>>> > >>>> After perusing the tutorial and other docs for a an evening, I > >>>> still > >>>> can't find the answer to this. Forgive me if I've missed something > >>>> obvious. > >>>> > >>>> This should not be a novel request, but I've not found it > >>>> answered. If > >>>> bioperl isn't the best way to do this, I'd be grateful to a > >>>> pointer to a > >>>> better way, especially if it includes an illuminating bit of code. > >>>> > >>>> The problem is to retrieve genomic sequences plus & minus some > >>>> offset > >>>> from a locus determined by HUGO keyword or GeneID. This would be a > >>>> common followup chore for some extra analysis from a gene > >>>> expression > >>>> expt. Or maybe this is in the DBFetch routines, but I've missed > >>>> the > >>>> sequence type to specify...? > >>>> > >>>> > >>>> TIA! > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From cjfields at uiuc.edu Thu Feb 16 16:37:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 15:37:25 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <43F449E1.80605@esat.kuleuven.be> Message-ID: <000301c63341$2e015d50$15327e82@pyrimidine> As an update for those interested, I check on this today, feeding SearchIO XML and text output for all NCBI's BLAST flavors. Basically, all XML parses fine. All text output except blastn and tblastx works fine. The last two have the extra lines starting with 'Features in this part of subject sequence:'. I'll be checking into SearchIO::blast but don't know when I can get around to posting a fix. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs > Sent: Thursday, February 16, 2006 3:46 AM > To: gyang at plantbio.uga.edu > Cc: bioperl-l at lists.open-bio.org; Chris Fields > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm > version 1.28 > > Hi, > > I have the same problem with the blast.pm-file. > The people of NCBI added some extra info when giving the Blast-output. > (see e.g. "Features flanking this part..." or "Features in this part > ..."), example added. > The blast.pm module starts looking for the hsp-alignement-information, > but it dies when it hits this Feature-information. > > Pieter > > ...... From osborne1 at optonline.net Thu Feb 16 17:19:16 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 16 Feb 2006 17:19:16 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: Message-ID: Chris, Yes. The question now is where to easily get the coordinates. Brian O. On 2/16/06 7:52 AM, "Chris Fields" wrote: > I think a method was recently implemented in Bio::DB::GenBank to > retrieve a segment of DNA given start and end coordinates in GenBank > format; that should contain the features you need. I requested it > ~Nov-Dec in the mailing list but didn't get a chance to test it. > Would that help? > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > >> Harry, >> >> It's not clear to me that NCBI's eutils offers this capability >> directly. You >> can probably download Entrez Gene entries and parse them for >> coordinates but >> I know of no way to remotely retrieve genomic sequences like this >> from NCBI >> (ENSEMBL API perhaps?). What I had in mind uses the local approach >> that some >> of us favor and to prove to myself that this is simple to do I wrote a >> script that I just added to examples/tools, it's called >> extract_genes.pl and >> it's based on Bio::DB::Fasta. Download the sequence files for a given >> species to some dir, download Entrez Gene's gene2accession file, >> and run. It >> creates and stores a hash for lookups, it won't read gene2accession >> each >> time it runs. >> >> Brian O. >> >> >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >> >>> Hi Brian, >>> >>> Thanks very much for the pointers and the speed of your reply and >>> apologies >>> for the speed of mine. >>> >>> This looks good, but what I was looking for was a bioP approach >>> for hooking to >>> an API at NCBI or EBI so I could get this info and seqs from >>> them. In this >>> case, speed of retrieval is not critical and I'd rather not >>> download the >>> entirety of the sequences to a local disk to hack at them. >>> >>> I've determined a screen-scraping approach to get them and could >>> script that, >>> but I thought that bioP had a method for using NCBI's external >>> API's, tho it >>> may be that my memory is faulty or the approach is no longer >>> supported due to >>> overload. >>> >>> Does NCBI make such APIs available anymore? I searched a bit for >>> docs on them >>> but couldn't find anything (unless it's buried in the NCBI tookit, >>> which I >>> haven't started to excavate). >>> >>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>> listening? >>> >>> Harry >>> >>> >>> >>> >>> >>> >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>> Harry, >>>> >>>> Hope you're doing well. The approach could be based on >>>> Bio::DB::Fasta. So, >>>> from its documentation: >>>> >>>> use Bio::DB::Fasta; >>>> >>>> # create database from directory of fasta files >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>> >>>> # simple access (for those without Bioperl) >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>> my @ids = $db->ids; >>>> my $length = $db->length('CHROMOSOME_I'); >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>> my $header = $db->header('CHROMOSOME_I'); >>>> >>>> # Bioperl-style access >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>> >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>> my $seq = $obj->seq; >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>> >>>> Do you already have the offsets? >>>> >>>> Brian O. >>>> >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>> Hi All, >>>>> >>>>> After perusing the tutorial and other docs for a an evening, I >>>>> still >>>>> can't find the answer to this. Forgive me if I've missed something >>>>> obvious. >>>>> >>>>> This should not be a novel request, but I've not found it >>>>> answered. If >>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>> pointer to a >>>>> better way, especially if it includes an illuminating bit of code. >>>>> >>>>> The problem is to retrieve genomic sequences plus & minus some >>>>> offset >>>>> from a locus determined by HUGO keyword or GeneID. This would be a >>>>> common followup chore for some extra analysis from a gene >>>>> expression >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>> the >>>>> sequence type to specify...? >>>>> >>>>> >>>>> TIA! >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Feb 16 17:29:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 16:29:15 -0600 Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO text parsing? Message-ID: <000001c63348$6b8136d0$15327e82@pyrimidine> I'm floating this to see what people think... I'm beginning to wonder, especially when I'm wading through the regex/parsing nightmare in SearchIO::blast, if we should either require a minimal BLAST version number for parsing to work in SearchIO::blast. I could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a warning if the blast version number is below a minimal version, so at least people will know what the problem is (not us!). The regexes are really piling up, and the latest changes in blastn and tblastx will require adding a few more. I also think that this would help remind everybody running the latest Bioperl that there are also newer versions of BLAST. My current thought is to get it working for the latest text output from NCBI, check it against the last version of BLAST (v. 2.2.12, which, luckily, blastcl3 generates), and not worry too much about older ones. Any thoughts on this? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Thu Feb 16 17:45:52 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 16 Feb 2006 16:45:52 -0600 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: Message-ID: <000101c6334a$bd80a900$15327e82@pyrimidine> If I know the start, end, and strand info for a list of features (personal preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew up), couldn't I try pulling out the surrounding region? My thought is this, though I haven't coded it yet: 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates (array of hashes) based off what I get from RNAMotif objects. 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep in there somewhere to not tick off the NCBI curators. Reason I'm interested in this is b/c I want to know where the RNA motif is in context to surrounding features. If it is very close to a coding region, then the motif likely indicates translational regulation. Further away may indicate transcriptional termination or another mechanism. The files returned should have the features included as long as they are in the full length GenBank record. I tried it out using the web form but not through Bio::DB::GenBank yet. If I can get it to work I'll add it to the page. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Thursday, February 16, 2006 4:19 PM > To: Chris Fields > Cc: Harry Mangalam; bioperl-l > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names or > GeneIDs > > Chris, > > Yes. The question now is where to easily get the coordinates. > > Brian O. > > > On 2/16/06 7:52 AM, "Chris Fields" wrote: > > > I think a method was recently implemented in Bio::DB::GenBank to > > retrieve a segment of DNA given start and end coordinates in GenBank > > format; that should contain the features you need. I requested it > > ~Nov-Dec in the mailing list but didn't get a chance to test it. > > Would that help? > > > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > > >> Harry, > >> > >> It's not clear to me that NCBI's eutils offers this capability > >> directly. You > >> can probably download Entrez Gene entries and parse them for > >> coordinates but > >> I know of no way to remotely retrieve genomic sequences like this > >> from NCBI > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach > >> that some > >> of us favor and to prove to myself that this is simple to do I wrote a > >> script that I just added to examples/tools, it's called > >> extract_genes.pl and > >> it's based on Bio::DB::Fasta. Download the sequence files for a given > >> species to some dir, download Entrez Gene's gene2accession file, > >> and run. It > >> creates and stores a hash for lookups, it won't read gene2accession > >> each > >> time it runs. > >> > >> Brian O. > >> > >> > >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > >> > >>> Hi Brian, > >>> > >>> Thanks very much for the pointers and the speed of your reply and > >>> apologies > >>> for the speed of mine. > >>> > >>> This looks good, but what I was looking for was a bioP approach > >>> for hooking to > >>> an API at NCBI or EBI so I could get this info and seqs from > >>> them. In this > >>> case, speed of retrieval is not critical and I'd rather not > >>> download the > >>> entirety of the sequences to a local disk to hack at them. > >>> > >>> I've determined a screen-scraping approach to get them and could > >>> script that, > >>> but I thought that bioP had a method for using NCBI's external > >>> API's, tho it > >>> may be that my memory is faulty or the approach is no longer > >>> supported due to > >>> overload. > >>> > >>> Does NCBI make such APIs available anymore? I searched a bit for > >>> docs on them > >>> but couldn't find anything (unless it's buried in the NCBI tookit, > >>> which I > >>> haven't started to excavate). > >>> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds > >>> listening? > >>> > >>> Harry > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > >>>> Harry, > >>>> > >>>> Hope you're doing well. The approach could be based on > >>>> Bio::DB::Fasta. So, > >>>> from its documentation: > >>>> > >>>> use Bio::DB::Fasta; > >>>> > >>>> # create database from directory of fasta files > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>>> > >>>> # simple access (for those without Bioperl) > >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > >>>> my @ids = $db->ids; > >>>> my $length = $db->length('CHROMOSOME_I'); > >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > >>>> my $header = $db->header('CHROMOSOME_I'); > >>>> > >>>> # Bioperl-style access > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > >>>> > >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > >>>> my $seq = $obj->seq; > >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > >>>> > >>>> Do you already have the offsets? > >>>> > >>>> Brian O. > >>>> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > >>>>> Hi All, > >>>>> > >>>>> After perusing the tutorial and other docs for a an evening, I > >>>>> still > >>>>> can't find the answer to this. Forgive me if I've missed something > >>>>> obvious. > >>>>> > >>>>> This should not be a novel request, but I've not found it > >>>>> answered. If > >>>>> bioperl isn't the best way to do this, I'd be grateful to a > >>>>> pointer to a > >>>>> better way, especially if it includes an illuminating bit of code. > >>>>> > >>>>> The problem is to retrieve genomic sequences plus & minus some > >>>>> offset > >>>>> from a locus determined by HUGO keyword or GeneID. This would be a > >>>>> common followup chore for some extra analysis from a gene > >>>>> expression > >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed > >>>>> the > >>>>> sequence type to specify...? > >>>>> > >>>>> > >>>>> TIA! > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hjm at tacgi.com Thu Feb 16 18:10:59 2006 From: hjm at tacgi.com (Harry Mangalam) Date: Thu, 16 Feb 2006 15:10:59 -0800 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or GeneIDs In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine> References: <000101c6334a$bd80a900$15327e82@pyrimidine> Message-ID: <200602161510.59679.hjm@tacgi.com> This is essentially what I want to do and my [only in pseudocode] approach is basically what you describe, except that currently I only have HUGO descriptors, not Genbank UIDs. If you know of an index that lists both, that would be the entire shot. I'm also interested in tracking transcriptional control elements and cross-correlating & why I wrote the 'rules' chunk of the recently (self-promoted) tacg. Best Harry On Thursday 16 February 2006 14:45, Chris Fields wrote: > If I know the start, end, and strand info for a list of features (personal > preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew > up), couldn't I try pulling out the surrounding region? My thought is > this, though I haven't coded it yet: > > 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates > (array of hashes) based off what I get from RNAMotif objects. > 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream > and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep > in there somewhere to not tick off the NCBI curators. > > Reason I'm interested in this is b/c I want to know where the RNA motif is > in context to surrounding features. If it is very close to a coding region, > then the motif likely indicates translational regulation. Further away may > indicate transcriptional termination or another mechanism. > > The files returned should have the features included as long as they are in > the full length GenBank record. I tried it out using the web form but not > through Bio::DB::GenBank yet. If I can get it to work I'll add it to the > page. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > -----Original Message----- > > From: Brian Osborne [mailto:osborne1 at optonline.net] > > Sent: Thursday, February 16, 2006 4:19 PM > > To: Chris Fields > > Cc: Harry Mangalam; bioperl-l > > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names > > or GeneIDs > > > > Chris, > > > > Yes. The question now is where to easily get the coordinates. > > > > Brian O. > > > > On 2/16/06 7:52 AM, "Chris Fields" wrote: > > > I think a method was recently implemented in Bio::DB::GenBank to > > > retrieve a segment of DNA given start and end coordinates in GenBank > > > format; that should contain the features you need. I requested it > > > ~Nov-Dec in the mailing list but didn't get a chance to test it. > > > Would that help? > > > > > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > >> Harry, > > >> > > >> It's not clear to me that NCBI's eutils offers this capability > > >> directly. You > > >> can probably download Entrez Gene entries and parse them for > > >> coordinates but > > >> I know of no way to remotely retrieve genomic sequences like this > > >> from NCBI > > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach > > >> that some > > >> of us favor and to prove to myself that this is simple to do I wrote a > > >> script that I just added to examples/tools, it's called > > >> extract_genes.pl and > > >> it's based on Bio::DB::Fasta. Download the sequence files for a given > > >> species to some dir, download Entrez Gene's gene2accession file, > > >> and run. It > > >> creates and stores a hash for lookups, it won't read gene2accession > > >> each > > >> time it runs. > > >> > > >> Brian O. > > >> > > >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > > >>> Hi Brian, > > >>> > > >>> Thanks very much for the pointers and the speed of your reply and > > >>> apologies > > >>> for the speed of mine. > > >>> > > >>> This looks good, but what I was looking for was a bioP approach > > >>> for hooking to > > >>> an API at NCBI or EBI so I could get this info and seqs from > > >>> them. In this > > >>> case, speed of retrieval is not critical and I'd rather not > > >>> download the > > >>> entirety of the sequences to a local disk to hack at them. > > >>> > > >>> I've determined a screen-scraping approach to get them and could > > >>> script that, > > >>> but I thought that bioP had a method for using NCBI's external > > >>> API's, tho it > > >>> may be that my memory is faulty or the approach is no longer > > >>> supported due to > > >>> overload. > > >>> > > >>> Does NCBI make such APIs available anymore? I searched a bit for > > >>> docs on them > > >>> but couldn't find anything (unless it's buried in the NCBI tookit, > > >>> which I > > >>> haven't started to excavate). > > >>> > > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds > > >>> listening? > > >>> > > >>> Harry > > >>> > > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > > >>>> Harry, > > >>>> > > >>>> Hope you're doing well. The approach could be based on > > >>>> Bio::DB::Fasta. So, > > >>>> from its documentation: > > >>>> > > >>>> use Bio::DB::Fasta; > > >>>> > > >>>> # create database from directory of fasta files > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> # simple access (for those without Bioperl) > > >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > > >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > > >>>> my @ids = $db->ids; > > >>>> my $length = $db->length('CHROMOSOME_I'); > > >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > > >>>> my $header = $db->header('CHROMOSOME_I'); > > >>>> > > >>>> # Bioperl-style access > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > > >>>> my $seq = $obj->seq; > > >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > > >>>> > > >>>> Do you already have the offsets? > > >>>> > > >>>> Brian O. > > >>>> > > >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > > >>>>> Hi All, > > >>>>> > > >>>>> After perusing the tutorial and other docs for a an evening, I > > >>>>> still > > >>>>> can't find the answer to this. Forgive me if I've missed something > > >>>>> obvious. > > >>>>> > > >>>>> This should not be a novel request, but I've not found it > > >>>>> answered. If > > >>>>> bioperl isn't the best way to do this, I'd be grateful to a > > >>>>> pointer to a > > >>>>> better way, especially if it includes an illuminating bit of code. > > >>>>> > > >>>>> The problem is to retrieve genomic sequences plus & minus some > > >>>>> offset > > >>>>> from a locus determined by HUGO keyword or GeneID. This would be a > > >>>>> common followup chore for some extra analysis from a gene > > >>>>> expression > > >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed > > >>>>> the > > >>>>> sequence type to specify...? > > >>>>> > > >>>>> > > >>>>> TIA! > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > Christopher Fields > > > Postdoctoral Researcher > > > Lab of Dr. Robert Switzer > > > Dept of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com <> From anst at kvl.dk Fri Feb 17 04:18:18 2006 From: anst at kvl.dk (Anders Stegmann) Date: Fri, 17 Feb 2006 10:18:18 +0100 Subject: [Bioperl-l] another searchIO bug? with blast report In-Reply-To: <43F45FE60200009B00000ED6@gwia.kvl.dk> References: <43F45FE60200009B00000ED6@gwia.kvl.dk> Message-ID: <43F5A2EA0200009B00000F45@gwia.kvl.dk> >>>Anders Stegmann 02/16/06 11:20 am >>> Hi! I am blasting a protein seq (query) against an identical seq with a deletion of Aa nr 61 (subject). Then I print out the type of nomatch Aa and its position. The nomatch for the query seq is Aa G at position 61, which is correct. The nomatch for the subject seq is V at position 60, which is definitely not correct!? Is this a bug? testblast2.pl is the program to run Q0045 is the query seq. Q0045del61 is the subject seq (it has to be formated: formatdb -i Q0045del61 -p T -o F). Regards Anders. -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045 Type: application/octet-stream Size: 873 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: Q0045del61 Type: application/octet-stream Size: 872 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment-0001.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: testblast2.pl Type: application/octet-stream Size: 6109 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment-0002.obj -------------- next part -------------- An HTML attachment was scrubbed... URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060217/74838520/attachment.html From saldroubi at yahoo.com Fri Feb 17 12:49:40 2006 From: saldroubi at yahoo.com (Sam Al-Droubi) Date: Fri, 17 Feb 2006 09:49:40 -0800 (PST) Subject: [Bioperl-l] Count or weight matrix in bioperl? In-Reply-To: <43EAAEEF.3000304@infotech.monash.edu.au> Message-ID: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com> Torsten and all, I don't think this will work for me for it only generates statistics for a single sequence. What I need is a count matrix for each position for a number of DNA sequences. In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide. For example if I pass an array of sequences say: ATC,CCC,TTT then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this: 1 2 3 A 1 0 0 C 1 1 2 T 1 2 1 G 0 0 0 Any idea of this is already built somewhere in bioperl? Thank you. Torsten Seemann wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? > Please excuse my lack of knowledge as I am a new comer to bioinformatics. Use the Bio::Tools::SeqStats module. The PDoc documentation even has an example similar to what you want to do: http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html --Torsten Seemann Sincerely, Sam Al-Droubi, M.S. saldroubi at yahoo.com From muratem at eng.uah.edu Fri Feb 17 12:45:30 2006 From: muratem at eng.uah.edu (Mike Muratet) Date: Fri, 17 Feb 2006 11:45:30 -0600 (CST) Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO text parsing? In-Reply-To: <000001c63348$6b8136d0$15327e82@pyrimidine> References: <000001c63348$6b8136d0$15327e82@pyrimidine> Message-ID: On Thu, 16 Feb 2006, Chris Fields wrote: > I'm floating this to see what people think... > > I'm beginning to wonder, especially when I'm wading through the > regex/parsing nightmare in SearchIO::blast, if we should either require a > minimal BLAST version number for parsing to work in SearchIO::blast. I > could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a > warning if the blast version number is below a minimal version, so at least > people will know what the problem is (not us!). > > The regexes are really piling up, and the latest changes in blastn and > tblastx will require adding a few more. I also think that this would help > remind everybody running the latest Bioperl that there are also newer > versions of BLAST. My current thought is to get it working for the latest > text output from NCBI, check it against the last version of BLAST (v. > 2.2.12, which, luckily, blastcl3 generates), and not worry too much about > older ones. > > Any thoughts on this? > Chris I could live with it. I think most of the world runs on NCBI or WUBLAST and it's easy to download/update either of those. Thanks for the effort. I use SearchIO a lot. Mike > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Fri Feb 17 13:15:53 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 17 Feb 2006 12:15:53 -0600 Subject: [Bioperl-l] Count or weight matrix in bioperl? Message-ID: http://forkhead.cgb.ki.se/TFBS/ provides ability to generate position frequency matrix from list of (presumaby aligned) sequences as follows: #!/usr/bin/env perl use TFBS::PatternGen::SimplePFM; my @sequences = <>; chomp @sequences; print TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences)->pattern->rawpr int; exit 0; The output when run on your example input shows that the order the nucleotides is not the same as you expect (it is alphbetical): 1 0 0 1 1 2 0 0 0 1 2 1 Good luck, TFBS installation requires signifigant dependencies, including bioperl and PDL. Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sam >Al-Droubi >Sent: Friday, February 17, 2006 11:50 AM >To: Torsten Seemann >Cc: BioPerl list >Subject: Re: [Bioperl-l] Count or weight matrix in bioperl? > > >Torsten and all, > > I don't think this will work for me for it only generates >statistics for a single sequence. What I need is a count >matrix for each position for a number of DNA sequences. In >other words, if I pass there 3 sequences to this function then >it returns the count for each postion for each nucleotide. > > For example if I pass an array of sequences say: ATC,CCC,TTT > then I should get a matrix back that will have count for >postion 1,2,3 for each A,C,T, or G like this: > > > 1 2 3 > A 1 0 0 > C 1 1 2 > T 1 2 1 > G 0 0 0 > > Any idea of this is already built somewhere in bioperl? > > Thank you. > > > Torsten Seemann >wrote:> Say I have an array of nucleotide sequences of of >length N. I want to calculate the count matrix (weight >matrix). That is for each position 1..N, I want to know how >many As, Cs ,Ts and Gs there are. Is the code to do this >already written in bioperl to build this matrix if I pass it >those strings? >> Please excuse my lack of knowledge as I am a new comer to >bioinformatics. > >Use the Bio::Tools::SeqStats module. The PDoc documentation >even has an >example similar to what you want to do: > >http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/Seq >Stats.html > >--Torsten Seemann > > > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi at yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jason.stajich at duke.edu Fri Feb 17 14:01:45 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Fri, 17 Feb 2006 14:01:45 -0500 Subject: [Bioperl-l] another searchIO bug? with blast report In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk> References: <43F45FE60200009B00000ED6@gwia.kvl.dk> <43F5A2EA0200009B00000F45@gwia.kvl.dk> Message-ID: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu> In case people on the list think that by my speaking up about question means they should ignore it... Hopefully someone else can help debug this - I really don't have time I'm afraid. -jason On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote: > > >>>> Anders Stegmann 02/16/06 11:20 am >>> > Hi! > > I am blasting a protein seq (query) against an identical seq with a > deletion of Aa nr 61 (subject). > Then I print out the type of nomatch Aa and its position. > The nomatch for the query seq is Aa G at position 61, which is > correct. > The nomatch for the subject seq is V at position 60, which is > definitely > not correct!? > > Is this a bug? > > testblast2.pl is the program to run > > Q0045 is the query seq. > > Q0045del61 is the subject seq (it has to be formated: formatdb -i > Q0045del61 -p T -o F). > > Regards Anders. > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Fri Feb 17 14:17:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Feb 2006 13:17:32 -0600 Subject: [Bioperl-l] another searchIO bug? with blast report In-Reply-To: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu> Message-ID: <000001c633f6$cd391740$15327e82@pyrimidine> No, haven't ignored it. Just been busy going through SearchIO::blast again (I've perltidy'd it) since BLASTN and TBLASTX output (v2.2.13) don't work; looks like all others should. Trying to fix one problem at a time. I'll look at this next. Don't worry about it. ;> Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Jason Stajich > Sent: Friday, February 17, 2006 1:02 PM > To: Anders Stegmann > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] another searchIO bug? with blast report > > In case people on the list think that by my speaking up about > question means they should ignore it... > > Hopefully someone else can help debug this - I really don't have time > I'm afraid. > > -jason > > > On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote: > > > > > > >>>> Anders Stegmann 02/16/06 11:20 am >>> > > Hi! > > > > I am blasting a protein seq (query) against an identical seq with a > > deletion of Aa nr 61 (subject). > > Then I print out the type of nomatch Aa and its position. > > The nomatch for the query seq is Aa G at position 61, which is > > correct. > > The nomatch for the subject seq is V at position 60, which is > > definitely > > not correct!? > > > > Is this a bug? > > > > testblast2.pl is the program to run > > > > Q0045 is the query seq. > > > > Q0045del61 is the subject seq (it has to be formated: formatdb -i > > Q0045del61 -p T -o F). > > > > Regards Anders. > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From skirov at utk.edu Fri Feb 17 13:09:00 2006 From: skirov at utk.edu (Stefan Kirov) Date: Fri, 17 Feb 2006 13:09:00 -0500 Subject: [Bioperl-l] Count or weight matrix in bioperl? In-Reply-To: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com> References: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com> Message-ID: <43F6113C.6070501@utk.edu> If you have bioperl-live: write a file: >seqgroup1 ATC CCC TTT my $mio=new Bio::Matrix::PSM::IO(-format=>'masta',-file=>$filename); while (my $matrix=$mio->next_matrix) {#Returns Bio::Matrix::PSM::SiteMatrix object #do something with the matrix... print $matrix->consensus,"\n"; } This is not going to give you the raw counts, but it will give you the fequency for each pos/letter. see the docs for Bio::Matrix::PSM::SiteMatrix Hope this helps Stefan Sam Al-Droubi wrote: >Torsten and all, > > I don't think this will work for me for it only generates statistics for a single sequence. What I need is a count matrix for each position for a number of DNA sequences. In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide. > > For example if I pass an array of sequences say: ATC,CCC,TTT > then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this: > > > 1 2 3 > A 1 0 0 > C 1 1 2 > T 1 2 1 > G 0 0 0 > > Any idea of this is already built somewhere in bioperl? > > Thank you. > > > Torsten Seemann wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings? > > >> Please excuse my lack of knowledge as I am a new comer to bioinformatics. >> >> > >Use the Bio::Tools::SeqStats module. The PDoc documentation even has an >example similar to what you want to do: > >http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html > >--Torsten Seemann > > > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi at yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at uiuc.edu Fri Feb 17 18:02:02 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 17 Feb 2006 17:02:02 -0600 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names orGeneIDs In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine> Message-ID: <000601c63416$2a14aa00$15327e82@pyrimidine> Brian, I added some sample code to the page. See what you think. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Thursday, February 16, 2006 4:46 PM > To: 'Brian Osborne' > Cc: 'Harry Mangalam'; 'bioperl-l' > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names > orGeneIDs > > If I know the start, end, and strand info for a list of features (personal > preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew > up), couldn't I try pulling out the surrounding region? My thought is > this, > though I haven't coded it yet: > > 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates > (array of hashes) based off what I get from RNAMotif objects. > 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream > and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep > in there somewhere to not tick off the NCBI curators. > > Reason I'm interested in this is b/c I want to know where the RNA motif is > in context to surrounding features. If it is very close to a coding > region, > then the motif likely indicates translational regulation. Further away > may > indicate transcriptional termination or another mechanism. > > The files returned should have the features included as long as they are > in > the full length GenBank record. I tried it out using the web form but not > through Bio::DB::GenBank yet. If I can get it to work I'll add it to the > page. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: Brian Osborne [mailto:osborne1 at optonline.net] > > Sent: Thursday, February 16, 2006 4:19 PM > > To: Chris Fields > > Cc: Harry Mangalam; bioperl-l > > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names > or > > GeneIDs > > > > Chris, > > > > Yes. The question now is where to easily get the coordinates. > > > > Brian O. > > > > > > On 2/16/06 7:52 AM, "Chris Fields" wrote: > > > > > I think a method was recently implemented in Bio::DB::GenBank to > > > retrieve a segment of DNA given start and end coordinates in GenBank > > > format; that should contain the features you need. I requested it > > > ~Nov-Dec in the mailing list but didn't get a chance to test it. > > > Would that help? > > > > > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: > > > > > >> Harry, > > >> > > >> It's not clear to me that NCBI's eutils offers this capability > > >> directly. You > > >> can probably download Entrez Gene entries and parse them for > > >> coordinates but > > >> I know of no way to remotely retrieve genomic sequences like this > > >> from NCBI > > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach > > >> that some > > >> of us favor and to prove to myself that this is simple to do I wrote > a > > >> script that I just added to examples/tools, it's called > > >> extract_genes.pl and > > >> it's based on Bio::DB::Fasta. Download the sequence files for a given > > >> species to some dir, download Entrez Gene's gene2accession file, > > >> and run. It > > >> creates and stores a hash for lookups, it won't read gene2accession > > >> each > > >> time it runs. > > >> > > >> Brian O. > > >> > > >> > > >> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: > > >> > > >>> Hi Brian, > > >>> > > >>> Thanks very much for the pointers and the speed of your reply and > > >>> apologies > > >>> for the speed of mine. > > >>> > > >>> This looks good, but what I was looking for was a bioP approach > > >>> for hooking to > > >>> an API at NCBI or EBI so I could get this info and seqs from > > >>> them. In this > > >>> case, speed of retrieval is not critical and I'd rather not > > >>> download the > > >>> entirety of the sequences to a local disk to hack at them. > > >>> > > >>> I've determined a screen-scraping approach to get them and could > > >>> script that, > > >>> but I thought that bioP had a method for using NCBI's external > > >>> API's, tho it > > >>> may be that my memory is faulty or the approach is no longer > > >>> supported due to > > >>> overload. > > >>> > > >>> Does NCBI make such APIs available anymore? I searched a bit for > > >>> docs on them > > >>> but couldn't find anything (unless it's buried in the NCBI tookit, > > >>> which I > > >>> haven't started to excavate). > > >>> > > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds > > >>> listening? > > >>> > > >>> Harry > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: > > >>>> Harry, > > >>>> > > >>>> Hope you're doing well. The approach could be based on > > >>>> Bio::DB::Fasta. So, > > >>>> from its documentation: > > >>>> > > >>>> use Bio::DB::Fasta; > > >>>> > > >>>> # create database from directory of fasta files > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> # simple access (for those without Bioperl) > > >>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); > > >>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); > > >>>> my @ids = $db->ids; > > >>>> my $length = $db->length('CHROMOSOME_I'); > > >>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); > > >>>> my $header = $db->header('CHROMOSOME_I'); > > >>>> > > >>>> # Bioperl-style access > > >>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); > > >>>> > > >>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); > > >>>> my $seq = $obj->seq; > > >>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); > > >>>> > > >>>> Do you already have the offsets? > > >>>> > > >>>> Brian O. > > >>>> > > >>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: > > >>>>> Hi All, > > >>>>> > > >>>>> After perusing the tutorial and other docs for a an evening, I > > >>>>> still > > >>>>> can't find the answer to this. Forgive me if I've missed > something > > >>>>> obvious. > > >>>>> > > >>>>> This should not be a novel request, but I've not found it > > >>>>> answered. If > > >>>>> bioperl isn't the best way to do this, I'd be grateful to a > > >>>>> pointer to a > > >>>>> better way, especially if it includes an illuminating bit of code. > > >>>>> > > >>>>> The problem is to retrieve genomic sequences plus & minus some > > >>>>> offset > > >>>>> from a locus determined by HUGO keyword or GeneID. This would be > a > > >>>>> common followup chore for some extra analysis from a gene > > >>>>> expression > > >>>>> expt. Or maybe this is in the DBFetch routines, but I've missed > > >>>>> the > > >>>>> sequence type to specify...? > > >>>>> > > >>>>> > > >>>>> TIA! > > >> > > >> > > >> _______________________________________________ > > >> Bioperl-l mailing list > > >> Bioperl-l at lists.open-bio.org > > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > Christopher Fields > > > Postdoctoral Researcher > > > Lab of Dr. Robert Switzer > > > Dept of Biochemistry > > > University of Illinois Urbana-Champaign > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Fri Feb 17 23:01:14 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 17 Feb 2006 23:01:14 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names orGeneIDs In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine> Message-ID: Chris, That's nice. Now what I'm puzzling over is how to get the genomic coordinates given an id, like a Gene id. The raw query is something like: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&rettyp e=xml This is _something_ like the queries used within Bio::DB::Query::GenBank, but not exactly. Now taking a look at how the text returned is transformed into objects... Brian O. On 2/17/06 6:02 PM, "Chris Fields" wrote: > Brian, > > I added some sample code to the page. See what you think. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, February 16, 2006 4:46 PM >> To: 'Brian Osborne' >> Cc: 'Harry Mangalam'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> orGeneIDs >> >> If I know the start, end, and strand info for a list of features (personal >> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew >> up), couldn't I try pulling out the surrounding region? My thought is >> this, >> though I haven't coded it yet: >> >> 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates >> (array of hashes) based off what I get from RNAMotif objects. >> 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream >> and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep >> in there somewhere to not tick off the NCBI curators. >> >> Reason I'm interested in this is b/c I want to know where the RNA motif is >> in context to surrounding features. If it is very close to a coding >> region, >> then the motif likely indicates translational regulation. Further away >> may >> indicate transcriptional termination or another mechanism. >> >> The files returned should have the features included as long as they are >> in >> the full length GenBank record. I tried it out using the web form but not >> through Bio::DB::GenBank yet. If I can get it to work I'll add it to the >> page. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >>> -----Original Message----- >>> From: Brian Osborne [mailto:osborne1 at optonline.net] >>> Sent: Thursday, February 16, 2006 4:19 PM >>> To: Chris Fields >>> Cc: Harry Mangalam; bioperl-l >>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> or >>> GeneIDs >>> >>> Chris, >>> >>> Yes. The question now is where to easily get the coordinates. >>> >>> Brian O. >>> >>> >>> On 2/16/06 7:52 AM, "Chris Fields" wrote: >>> >>>> I think a method was recently implemented in Bio::DB::GenBank to >>>> retrieve a segment of DNA given start and end coordinates in GenBank >>>> format; that should contain the features you need. I requested it >>>> ~Nov-Dec in the mailing list but didn't get a chance to test it. >>>> Would that help? >>>> >>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: >>>> >>>>> Harry, >>>>> >>>>> It's not clear to me that NCBI's eutils offers this capability >>>>> directly. You >>>>> can probably download Entrez Gene entries and parse them for >>>>> coordinates but >>>>> I know of no way to remotely retrieve genomic sequences like this >>>>> from NCBI >>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach >>>>> that some >>>>> of us favor and to prove to myself that this is simple to do I wrote >> a >>>>> script that I just added to examples/tools, it's called >>>>> extract_genes.pl and >>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given >>>>> species to some dir, download Entrez Gene's gene2accession file, >>>>> and run. It >>>>> creates and stores a hash for lookups, it won't read gene2accession >>>>> each >>>>> time it runs. >>>>> >>>>> Brian O. >>>>> >>>>> >>>>> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >>>>> >>>>>> Hi Brian, >>>>>> >>>>>> Thanks very much for the pointers and the speed of your reply and >>>>>> apologies >>>>>> for the speed of mine. >>>>>> >>>>>> This looks good, but what I was looking for was a bioP approach >>>>>> for hooking to >>>>>> an API at NCBI or EBI so I could get this info and seqs from >>>>>> them. In this >>>>>> case, speed of retrieval is not critical and I'd rather not >>>>>> download the >>>>>> entirety of the sequences to a local disk to hack at them. >>>>>> >>>>>> I've determined a screen-scraping approach to get them and could >>>>>> script that, >>>>>> but I thought that bioP had a method for using NCBI's external >>>>>> API's, tho it >>>>>> may be that my memory is faulty or the approach is no longer >>>>>> supported due to >>>>>> overload. >>>>>> >>>>>> Does NCBI make such APIs available anymore? I searched a bit for >>>>>> docs on them >>>>>> but couldn't find anything (unless it's buried in the NCBI tookit, >>>>>> which I >>>>>> haven't started to excavate). >>>>>> >>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>>>>> listening? >>>>>> >>>>>> Harry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>>>>> Harry, >>>>>>> >>>>>>> Hope you're doing well. The approach could be based on >>>>>>> Bio::DB::Fasta. So, >>>>>>> from its documentation: >>>>>>> >>>>>>> use Bio::DB::Fasta; >>>>>>> >>>>>>> # create database from directory of fasta files >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> # simple access (for those without Bioperl) >>>>>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>>>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>>>>> my @ids = $db->ids; >>>>>>> my $length = $db->length('CHROMOSOME_I'); >>>>>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>>>>> my $header = $db->header('CHROMOSOME_I'); >>>>>>> >>>>>>> # Bioperl-style access >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>>>>> my $seq = $obj->seq; >>>>>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>>>>> >>>>>>> Do you already have the offsets? >>>>>>> >>>>>>> Brian O. >>>>>>> >>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> After perusing the tutorial and other docs for a an evening, I >>>>>>>> still >>>>>>>> can't find the answer to this. Forgive me if I've missed >> something >>>>>>>> obvious. >>>>>>>> >>>>>>>> This should not be a novel request, but I've not found it >>>>>>>> answered. If >>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>>>>> pointer to a >>>>>>>> better way, especially if it includes an illuminating bit of code. >>>>>>>> >>>>>>>> The problem is to retrieve genomic sequences plus & minus some >>>>>>>> offset >>>>>>>> from a locus determined by HUGO keyword or GeneID. This would be >> a >>>>>>>> common followup chore for some extra analysis from a gene >>>>>>>> expression >>>>>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>>>>> the >>>>>>>> sequence type to specify...? >>>>>>>> >>>>>>>> >>>>>>>> TIA! >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Fri Feb 17 23:56:08 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Fri, 17 Feb 2006 23:56:08 -0500 Subject: [Bioperl-l] CONTIG sequence files from the NCBI In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk> Message-ID: Michael, Yes, BioPerl has done this for you. Essentially what it does it take all the ids in the CONTIG section and query for each individually, then use the sequences and the location data to create the single large sequence. This sequence is appended to the annotation and feature section of the initial Genbank entry. If you want to study this yourself take a look at Bio::DB::NCBIHelper::postprocess_data. OK, to answer your first question with my assumption: what NCBI is doing is simply providing a shorthand rather than an entire large sequence, therefore no feature coordinates change, whether it's shorthand, CONTIG, or longhand, ORIGIN. Second, my explanation tells you that all the sequences are the very latest versions of each sequence, that's how eutils works by default. However, I don't think I've answered your question because I'm not sure I understand what you mean by "when I ask bioperl if these sequences have been updated, I will be told no". All Bioperl does is read the file provided by GenBank and use its stated version, nothing fancy. Brian O. On 2/16/06 5:31 AM, "michael watson (IAH-C)" wrote: > Hi > > I have two questions really. I fetched bacterial genome sequences from > the NCBI using Bio::DB::GenBank. > > Some of these sequence entries are CONTIG sequences, ie they just point > to other sequences that need to be joined together to form the entire > genome. > > Looking at my downloads, it looks as if bioperl has done all the > necessary joining for me - or maybe it was the NCBI that did the > joining? > > OK, so firstly, did bioperl do the joining, and if so, are all the > co-ordinates of the features updated to reflect their new location on > the new, joined sequence? > > And secondly, sequence versions... I'm thinking that possibly the > sequence version of the CONTIG may be 1 (as it hasn't changed) yet the > versions of the sequences it refers to might have changed, so when I ask > bioperl if these sequences have been updated, I will be told no because > the CONTIG sequence version is 1, but I should be told yes because the > underlying sequences have...? > > Make sense? > > Thanks > Mick > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From pedro.fabre at gmail.com Fri Feb 17 13:36:37 2006 From: pedro.fabre at gmail.com (pedro fabre) Date: Fri, 17 Feb 2006 18:36:37 +0000 Subject: [Bioperl-l] Count or weight matrix in bioperl? Message-ID: >Torsten and all, > > I don't think this will work for me for it only generates >statistics for a single sequence. What I need is a count matrix for >each position for a number of DNA sequences. In other words, if I >pass there 3 sequences to this function then it returns the count >for each postion for each nucleotide. > > For example if I pass an array of sequences say: ATC,CCC,TTT > then I should get a matrix back that will have count for postion >1,2,3 for each A,C,T, or G like this: > > > 1 2 3 > A 1 0 0 > C 1 1 2 > T 1 2 1 > G 0 0 0 > > Any idea of this is already built somewhere in bioperl? > > Thank you. > > Sam, What about this? I worked in something like that some time ago for SNP calculation and it looks to me you are on the same way. If you have a sequence like A C G T C C A - T C G G T A G T G C C C C C C G T G C C G C T C G T G C Convert the sequence to numbers (0 for the first value, 1 for the first modification (reading by columns), 2 for the second modification and so on) Deletions can be considered as another base if you like After that: 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 Once we have the haplotype converted to numbers we have to generate the snp type information for the haplotype. SNP code = SUM ( value * multiplicity ^ position );> where: SUM is the sum of the values for the SNP value is the SNP number code (0 [generally for the mayor allele], 1 [for the minor allele]. position is the position on the block. For this example the code is: 0 0 0 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 0 1 1 1 1 ------------------------------------------------------------------ 14 10 12 4 2 14 14 14 14 14 = 0*2^0 + 1*2^1 + 1*2^2 + 1*2^3 12 = 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3 .... Once we have the families classify. We will B just the SNP's B. 14 10 12 4 2 If you want to look into the code follow this link. http://users.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/PopGen/HtSNP.pm?rev=1.4&content-type=text/vnd.viewcvs-markup HTH Pedro > Torsten Seemann wrote:> >Say I have an array of nucleotide sequences of of length N. I want >to calculate the count matrix (weight matrix). That is for each >position 1..N, I want to know how many As, Cs ,Ts and Gs there are. >Is the code to do this already written in bioperl to build this >matrix if I pass it those strings? >> Please excuse my lack of knowledge as I am a new comer to bioinformatics. > >Use the Bio::Tools::SeqStats module. The PDoc documentation even has an >example similar to what you want to do: > >http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html > >--Torsten Seemann > > > > >Sincerely, >Sam Al-Droubi, M.S. >saldroubi at yahoo.com >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Sat Feb 18 18:35:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 18 Feb 2006 17:35:22 -0600 Subject: [Bioperl-l] Bio::SearchIO fix posted in Bugzilla Message-ID: <97C946BE-8410-4B7F-9FA3-97A01641E20E@uiuc.edu> Added a fix for the blastn and tblastx problems with Bio::SearchIO text parsing of BLAST 2.2.13 output: http://bugzilla.open-bio.org/show_bug.cgi?id=1934 The extra lines "Features in this part of subject sequence" and the following descriptive lines are passed over using a loop. See the bug report for specifics. Cheers, Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Sun Feb 19 00:47:44 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Sun, 19 Feb 2006 00:47:44 -0500 Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names orGeneIDs In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine> Message-ID: Chris and Harry, OK, I've put the missing link in place. This is Bio::DB::EntrezGene, so you can get NCBI Genes as objects, perfectly analogous to Bio::DB::GenBank and the related modules: use Bio::DB::EntrezGene; $db = new Bio::DB::EntrezGene; $seq = $db->get_Seq_by_id(2); So starting with just a Gene id, then using Bio::DB::GenBank as Chris showed, you can get the sequence. What's a little odd is how Entrez Gene has stored positional information and Sequence identifier, you may have thought that they'd create a special set of fields for this but no, it's only available as part of a URL as far as I can tell: Bio::Annotation::DBLink=HASH() '_root_verbose' => 0 'database' => 'Evidence Viewer' 'primary_id' => 4693 'url' => 'http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&contig=NT_079573.2&ge ne=NDP&lid=4693&from=6657835&to=6682559' Question: are NT_* sequences going to be a problem for Bio::DB::GenBank? I see this in NCBIHelper: # NT contigs can not be retrieved $self->throw("NT_ contigs are whole chromosome files which are not part of regular". "database distributions. Go to ftp://ftp.ncbi.nih.gov/genomes/.") if $ids =~ /NT_/; Perhaps we can modify this so there's no throw() when a seq_start and seq_stop are specified. Brian O. On 2/17/06 6:02 PM, "Chris Fields" wrote: > Brian, > > I added some sample code to the page. See what you think. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Chris Fields >> Sent: Thursday, February 16, 2006 4:46 PM >> To: 'Brian Osborne' >> Cc: 'Harry Mangalam'; 'bioperl-l' >> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> orGeneIDs >> >> If I know the start, end, and strand info for a list of features (personal >> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew >> up), couldn't I try pulling out the surrounding region? My thought is >> this, >> though I haven't coded it yet: >> >> 1) Draw up a list of Seqfeatures, with accession, start, stop coordinates >> (array of hashes) based off what I get from RNAMotif objects. >> 2) Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream >> and downstream, one at a time, using get_Seq_by_ID(). I could add a sleep >> in there somewhere to not tick off the NCBI curators. >> >> Reason I'm interested in this is b/c I want to know where the RNA motif is >> in context to surrounding features. If it is very close to a coding >> region, >> then the motif likely indicates translational regulation. Further away >> may >> indicate transcriptional termination or another mechanism. >> >> The files returned should have the features included as long as they are >> in >> the full length GenBank record. I tried it out using the web form but not >> through Bio::DB::GenBank yet. If I can get it to work I'll add it to the >> page. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >>> -----Original Message----- >>> From: Brian Osborne [mailto:osborne1 at optonline.net] >>> Sent: Thursday, February 16, 2006 4:19 PM >>> To: Chris Fields >>> Cc: Harry Mangalam; bioperl-l >>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names >> or >>> GeneIDs >>> >>> Chris, >>> >>> Yes. The question now is where to easily get the coordinates. >>> >>> Brian O. >>> >>> >>> On 2/16/06 7:52 AM, "Chris Fields" wrote: >>> >>>> I think a method was recently implemented in Bio::DB::GenBank to >>>> retrieve a segment of DNA given start and end coordinates in GenBank >>>> format; that should contain the features you need. I requested it >>>> ~Nov-Dec in the mailing list but didn't get a chance to test it. >>>> Would that help? >>>> >>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote: >>>> >>>>> Harry, >>>>> >>>>> It's not clear to me that NCBI's eutils offers this capability >>>>> directly. You >>>>> can probably download Entrez Gene entries and parse them for >>>>> coordinates but >>>>> I know of no way to remotely retrieve genomic sequences like this >>>>> from NCBI >>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach >>>>> that some >>>>> of us favor and to prove to myself that this is simple to do I wrote >> a >>>>> script that I just added to examples/tools, it's called >>>>> extract_genes.pl and >>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given >>>>> species to some dir, download Entrez Gene's gene2accession file, >>>>> and run. It >>>>> creates and stores a hash for lookups, it won't read gene2accession >>>>> each >>>>> time it runs. >>>>> >>>>> Brian O. >>>>> >>>>> >>>>> On 2/14/06 12:15 PM, "Harry Mangalam" wrote: >>>>> >>>>>> Hi Brian, >>>>>> >>>>>> Thanks very much for the pointers and the speed of your reply and >>>>>> apologies >>>>>> for the speed of mine. >>>>>> >>>>>> This looks good, but what I was looking for was a bioP approach >>>>>> for hooking to >>>>>> an API at NCBI or EBI so I could get this info and seqs from >>>>>> them. In this >>>>>> case, speed of retrieval is not critical and I'd rather not >>>>>> download the >>>>>> entirety of the sequences to a local disk to hack at them. >>>>>> >>>>>> I've determined a screen-scraping approach to get them and could >>>>>> script that, >>>>>> but I thought that bioP had a method for using NCBI's external >>>>>> API's, tho it >>>>>> may be that my memory is faulty or the approach is no longer >>>>>> supported due to >>>>>> overload. >>>>>> >>>>>> Does NCBI make such APIs available anymore? I searched a bit for >>>>>> docs on them >>>>>> but couldn't find anything (unless it's buried in the NCBI tookit, >>>>>> which I >>>>>> haven't started to excavate). >>>>>> >>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds >>>>>> listening? >>>>>> >>>>>> Harry >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote: >>>>>>> Harry, >>>>>>> >>>>>>> Hope you're doing well. The approach could be based on >>>>>>> Bio::DB::Fasta. So, >>>>>>> from its documentation: >>>>>>> >>>>>>> use Bio::DB::Fasta; >>>>>>> >>>>>>> # create database from directory of fasta files >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> # simple access (for those without Bioperl) >>>>>>> my $seq = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000); >>>>>>> my $revseq = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000); >>>>>>> my @ids = $db->ids; >>>>>>> my $length = $db->length('CHROMOSOME_I'); >>>>>>> my $alphabet = $db->alphabet('CHROMOSOME_I'); >>>>>>> my $header = $db->header('CHROMOSOME_I'); >>>>>>> >>>>>>> # Bioperl-style access >>>>>>> my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); >>>>>>> >>>>>>> my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); >>>>>>> my $seq = $obj->seq; >>>>>>> my $subseq = $obj->subseq(4_000_000 => 4_100_000); >>>>>>> >>>>>>> Do you already have the offsets? >>>>>>> >>>>>>> Brian O. >>>>>>> >>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam" wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> After perusing the tutorial and other docs for a an evening, I >>>>>>>> still >>>>>>>> can't find the answer to this. Forgive me if I've missed >> something >>>>>>>> obvious. >>>>>>>> >>>>>>>> This should not be a novel request, but I've not found it >>>>>>>> answered. If >>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a >>>>>>>> pointer to a >>>>>>>> better way, especially if it includes an illuminating bit of code. >>>>>>>> >>>>>>>> The problem is to retrieve genomic sequences plus & minus some >>>>>>>> offset >>>>>>>> from a locus determined by HUGO keyword or GeneID. This would be >> a >>>>>>>> common followup chore for some extra analysis from a gene >>>>>>>> expression >>>>>>>> expt. Or maybe this is in the DBFetch routines, but I've missed >>>>>>>> the >>>>>>>> sequence type to specify...? >>>>>>>> >>>>>>>> >>>>>>>> TIA! >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From maximilianh at gmail.com Sun Feb 19 08:52:37 2006 From: maximilianh at gmail.com (Maximilian Haeussler) Date: Sun, 19 Feb 2006 14:52:37 +0100 Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> Message-ID: <76f031ae0602190552v5f2542dbv@mail.gmail.com> Hi bio-mailinglists, does anyone here know of a tool or a library to display two (or more) sequences at the same time with coloured features? Possibly with lines, connecting some features from one sequence to the other (synteny-plot) ? Or to display two multiple alignments, one on top of each other, with colored features added? It's not that it would be difficult to write, but programming visualisation usually takes a lot of time. Bio::Graphics seems mainly concerned with one main sequence and features on it. Well, I could copy together two of these gif-images, but then there would be no connecting lines. Same applies for the graphics in Biojava or the gff2ps tool or all the multiple alignment viewers that I know (Bioedit, ClustalX). There is something called Toucan in Java, which displays at least several lines of gff-style-features, but no visible sequences and more importantly, no connecting lines. A recent software, Djinn lite, is using a similar kind of visualization to compare different spliced genes from various species, but it's mainly aimed at splicing and written in Visual Basic. I guess a good compromise might be the 3D viewer Sockeye, but I haven't seen any synteny-lines in sockeye yet. I guess I must have missed something here. I cannot be the first one that would like to compare, say, two gff files, or two multiple alignments? Thanks a lot for any idea, Max From lutfullah at upesh.edu Sun Feb 19 12:01:05 2006 From: lutfullah at upesh.edu (Dr. Lutfullah) Date: Sun, 19 Feb 2006 22:01:05 +0500 Subject: [Bioperl-l] bioperl in jail Message-ID: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com> Hello, I am trying to create a situation where users can ssh login to a chrooted jailed account with limited functionality. I created the chroot jail on my Fedora Core 4 installation using a script available at: http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/ The script has a line: ====================== APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd /usr/libexec/openssh/sftp-server" ======================= to which I added everything I could get with /bin/perl to make it: APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv /bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd /usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5 /usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp /usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl" perl becomes available inside the jail but I cannot use the line "use Bio::Perl" inside the jail. The script produces an error on including /usr/lib or /usr/lib/perl5: Copying necessary library-files to jail (may take some time) cp: omitting directory `/usr/lib' ldd: /usr/lib: No such file or directory Copying files from /etc/pam.d/ to jail Copying PAM-Modules to jail In the jailed account the little test program: use Bio::Perl; print 2+4; generated this error: Can't locate Bio/Perl.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread ............................................ Any help would be much appreciated. Thanks in advance. LK From boris.steipe at utoronto.ca Sun Feb 19 17:34:52 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Sun, 19 Feb 2006 17:34:52 -0500 Subject: [Bioperl-l] bioperl in jail In-Reply-To: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com> References: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com> Message-ID: The path that perl uses internally to search its modules (@INC) is not the same thing as the path your shell uses. You have to modify @INC either within running scripts, or by setting the PERL5LIB environment variable upon login. e.g. see http://modperlbook.org/html/ch03_09.html HTH, B. On 19 Feb 2006, at 12:01, Dr. Lutfullah wrote: > Hello, > > I am trying to create a situation where users can ssh login to a > chrooted > jailed account with limited functionality. > I created the chroot jail on my Fedora Core 4 installation using a > script > available at: > http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/ > The script has a line: > ====================== > APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv > /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id > /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd > /usr/libexec/openssh/sftp-server" > ======================= > to which I added everything I could get with /bin/perl to make it: > > APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv > /bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id > /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd > /usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5 > /usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp > /usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl" > > perl becomes available inside the jail but I cannot use the line "use > Bio::Perl" inside the jail. > > The script produces an error on including /usr/lib or /usr/lib/perl5: > > Copying necessary library-files to jail (may take some time) > cp: omitting directory `/usr/lib' > ldd: /usr/lib: No such file or directory > Copying files from /etc/pam.d/ to jail > Copying PAM-Modules to jail > > In the jailed account the little test program: > > use Bio::Perl; > print 2+4; > > generated this error: > > Can't locate Bio/Perl.pm in @INC (@INC contains: > /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi > /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread > ............................................ > > Any help would be much appreciated. Thanks in advance. > > LK > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From khoueiry at ibdm.univ-mrs.fr Mon Feb 20 04:27:07 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Mon, 20 Feb 2006 10:27:07 +0100 Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <76f031ae0602190552v5f2542dbv@mail.gmail.com> Message-ID: <1140427628.10569.10.camel@localhost> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060220/fc7e2fc8/attachment.ksh From shameer at ncbs.res.in Mon Feb 20 01:21:01 2006 From: shameer at ncbs.res.in (Shameer Khadar) Date: Mon, 20 Feb 2006 11:51:01 +0530 (IST) Subject: [Bioperl-l] Matrix Average Code / Module ? In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <76f031ae0602190552v5f2542dbv@mail.gmail.com> Message-ID: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176> Hi all, Is there any program/module to calculate the average of a blosum/pam any matrix ? I have a matrix and I need to see the average for example 11 22 43 54 50 27 87 74 32 10 66 58 98 78 20 22 23 44 16 34 I have gone through Bio::Matrix::MatrixI and Bio::Matrix::GenericMatrix and other perl modules like Math::Matrix http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm and Math::Cephes::Matrix - but none of them have a provison to do matrix average calculation. Any help ??? thanks in advance, Happy biocomputing !!! -- Shameer Khadar National Centre for Biological Sciences (TIFR) UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India T - 91-080-23636420-32 EXT 4241 F - 91-080-23636662/23636675 W - http://www.ncbs.res.in -------------------------------------------------- "Refrain from illusions, insist on work and not words, patiently seek divine and scientific truth." MM From cjfields at uiuc.edu Mon Feb 20 12:01:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Feb 2006 11:01:26 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm version 1.28 In-Reply-To: <43F449E1.80605@esat.kuleuven.be> Message-ID: <000e01c6363f$494bc5e0$15327e82@pyrimidine> I have added a preliminary bugfix for the problems seen with nucleotide blast parsing for BLAST 2.2.13 reports. I passed SearchIO::blast through perltidy to space out the blocks (really for my own purposes; it's a pretty complex module). The fix bypasses the extra lines output for blastn and tblastx and now seems to parse the text output for those reports correctly. I tested it using all NCBI BLAST flavors for the last two version of BLAST (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST report parsing, such as WU-BLAST, RPS-BLAST, or Paracel. It has only been tested on MacOSX at the moment, so I need people out there to test it out on anything they can to make sure it works before committing. I'll be trying it on Windows today. Report back to me and I'll post anything on bugzilla. Here it is: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs > Sent: Thursday, February 16, 2006 3:46 AM > To: gyang at plantbio.uga.edu > Cc: bioperl-l at lists.open-bio.org; Chris Fields > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm > version 1.28 > > Hi, > > I have the same problem with the blast.pm-file. > The people of NCBI added some extra info when giving the Blast-output. > (see e.g. "Features flanking this part..." or "Features in this part > ..."), example added. > The blast.pm module starts looking for the hsp-alignement-information, > but it dies when it hits this Feature-information. > > Pieter > > > >gi|77552765|gb|DP000011.1| > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group) > chromosome 12, complete > > sequence > Length=27492551 > > Features flanking this part of subject sequence: > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class > &from=19251479&to=19253693&view=gbwithparts> > > 2655 bp at 3' side: hypothetical protein > &from=19260091&to=19260600&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 4 GTACTACTCTACTCTACT 21 > |||||||||||||||||| > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > Features flanking this part of subject sequence: > > 2991 bp at 5' side: hypothetical protein > &from=27003164&to=27003907&view=gbwithparts> > 1131 bp at 3' side: hypothetical protein > > &from=27008046&to=27010752&view=gbwithparts> > > Score = 36.2 bits (18), Expect = 0.22 > Identities = 18/18 (100%), Gaps = 0/18 (0%) > Strand=Plus/Minus > > Query 2 ATGTACTACTCTACTCTA 19 > |||||||||||||||||| > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > Features in this part of subject sequence: > DHHC zinc finger domain, putative > > &from=17614825&to=17618687&view=gbwithparts> > > Score = 34.2 bits (17), Expect = 0.87 > Identities = 17/17 (100%), Gaps = 0/17 (0%) > Strand=Plus/Plus > > Query 5 TACTACTCTACTCTACT 21 > ||||||||||||||||| > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > Features flanking this part of subject sequence: > 102 bp at 5' side: bZIP transcription factor, putative > > &from=2774964&to=2775778&view=gbwithparts> > 3740 bp at 3' side: yeast dcp1, putative > &from=2779635&to=2782508&view=gbwithparts> > > Score = 32.2 bits (16), Expect = > 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Plus > > Query 7 CTACTCTACTCTACTC 22 > |||||||||||||||| > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > Features flanking this part of subject sequence: > > 21 bp at 5' side: peptide transporter T17F3.11, putative > &from=27321354&to=27323117&view=gbwithparts> > > 10230 bp at 3' side: transposon protein, putative, unclassified > &from=27333383&to=27334285&view=gbwithparts> > > Score = 32.2 bits (16), Expect = 3.4 > Identities = 16/16 (100%), Gaps = 0/16 (0%) > Strand=Plus/Minus > > Query 7 CTACTCTACTCTACTC 22 > > |||||||||||||||| > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > Guojun Yang wrote: > > >Hi, Chris, > >Finally the remoteblast test script works for the amino.fa query. but > when I try a nucleic acid sequence (see below), Error occurs: > >" > >waiting........ > >------------- EXCEPTION ------------- > >MSG: no data for midline Features flanking this part of subject > sequence: > >STACK Bio::SearchIO::blast::next_result > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc > hIO/blast.pm:1172 > >STACK toplevel remoteblast_test:40 > >" > >The query sequence is: > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > > >The script (basically same as the remoteblast test, I only changed > database to 'nr' and program to 'blastn' and filename to 'ost3'): > >#!/usr/bin/perl > > > >use Bio::SeqIO; > >use Bio::Seq; > >use Bio::Tools::Run::RemoteBlast; > >use Bio::SearchIO; > >use strict; > >my $prog='blastn'; > >my $db='nr'; > >my $e_val=1e-10; > >my @params=( -prog=>$prog, > > -data=>$db, > > -expect=>$e_val, > > -readmethod=>'SearchIO'); > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > >my $v = 1; > > > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > > >while (my $input = $str->next_seq()){ > > #Blast a sequence against a database: > > #Alternatively, you could pass in a file with many > > #sequences rather than loop through sequence one at a time > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > #and swap the two lines below for an example of that. > > my $r = $factory->submit_blast($input); > > #my $r = $factory->submit_blast('amino.fa'); > > print STDERR "waiting..." if( $v > 0 ); > > while ( my @rids = $factory->each_rid ) { > > foreach my $rid ( @rids ) { > > my $rc = $factory->retrieve_blast($rid); > > if( !ref($rc) ) { > > if( $rc < 0 ) { > > $factory->remove_rid($rid); > > } > > print STDERR "." if ( $v > 0 ); > > sleep 5; > > } else { > > my $result = $rc->next_result(); > > #save the output > > my $filename = $result->query_name()."\.out"; > > $factory->save_output($filename); > > $factory->remove_rid($rid); > > print "\nQuery Name: ", $result->query_name(), "\n"; > > while ( my $hit = $result->next_hit ) { > > next unless ( $v > 0); > > print "\thit name is ", $hit->name, "\n"; > > while( my $hsp = $hit->next_hsp ) { > > print "\t\tscore is ", $hsp->score, "\n"; > > } > > } > > } > > } > > } > >} > > > > > >Do you think there might still be something in the NCBI output format? > > > >Thank you, > >Guojun > > > > > > > > > >Guojun Yang > >Department of Plant Biology > >University of Georgia > >Tel: 706-542-1857 > >Fax: 706-542-1805 > >http://www.arches.uga.edu/~guojun > > > > > > > >----- Original Message ----- > >From: Chris Fields [mailto:cjfields at uiuc.edu] > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > > > > >>Sorry, forgot to add that I didn't see the regex issue that you > mentioned. > >>It could be a perl-related issue. Try the fixes I mentioned and see > what > >>happens. > >> > >> > >>>Christopher Fields > >>> > >>> > >>Postdoctoral Researcher - Switzer Lab > >>Dept. of Biochemistry > >>University of Illinois Urbana-Champaign > >> > >> > >>>>>-----Original Message----- > >>>>> > >>>>> > >>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>Sent: Tuesday, February 14, 2006 12:36 PM > >>>To: 'gyang at plantbio.uga.edu' > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > >>> > >>> > >>>>>It's a good habit to always add single quotes around words. The perl > >>>>> > >>>>> > >>>interpreter may think a single bare word is a subroutine or perlfunc > >>>called with no args so will try to find a subroutine named blastp(). > My > >>>debugger actually gives the error that the bare word blastp may > conflict > >>>with a future reserved word. Like you said, 'use strict' will point > that > >>>out. > >>> > >>> > >>>>>As for the regex, it should match all the blast programs at NCBI > (blastp, > >>>>> > >>>>> > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing > >>>else passes through. > >>> > >>> > >>>>>So, if you are using the script below, there are several errors. The > bare > >>>>> > >>>>> > >>>words for $prog and $db need quotes, and the flags for you @params > array > >>>don't have a dash before them. I get this after adding quotes but > before > >>>adding the dashes to @params: > >>> > >>> > >>>>>C:\Perl\Scripts>test_blast.pl > >>>>>------------- EXCEPTION: Bio::Root::Exception ------------- > >>>>> > >>>>> > >>>MSG: > >>>STACK: Error::throw > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > >>>live/Bio/Root/Root.pm:328 > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > >>>live/Bio/Tools/Run/RemoteBlast.pm:256 > >>>STACK: C:\Perl\Scripts\test_blast.pl:15 > >>>----------------------------------------------------------- > >>> > >>> > >>>>>The last line indicates a problem with this line: > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > >>>>>Changing the @params to this: > >>>>>my @params=( -prog=>$prog, > >>>>> > >>>>> > >>> -data=>$db, > >>> -expect=>$e_val, > >>> -readmethod=>'SearchIO'); > >>> > >>> > >>>>>fixes it, and I get output as expected. > >>>>>Christopher Fields > >>>>> > >>>>> > >>>Postdoctoral Researcher - Switzer Lab > >>>Dept. of Biochemistry > >>>University of Illinois Urbana-Champaign > >>> > >>> > >>>>>>>>-----Original Message----- > >>>>>>>> > >>>>>>>> > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > >>>>Sent: Tuesday, February 14, 2006 11:48 AM > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > >>>> > >>>>Hi, Chris, > >>>>When I tried with the perldoc script, It did not work either. First it > >>>>says $prog can not be bare word if I "use strict". I added quotes on > the > >>>>words, then it says the value for $prog does not match expression > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > >>>> > >>>> > >>>script > >>> > >>> > >>>>is shown below. Why is the expression "t?blast[pnx]"? > >>>> > >>>>#!/usr/bin/perl > >>>> > >>>>use Bio::SeqIO; > >>>>use Bio::Seq; > >>>>use Bio::Tools::Run::RemoteBlast; > >>>>use Bio::SearchIO; > >>>> > >>>> > >>>>my $prog=blastp; > >>>>my $db=swissprot; > >>>>my $e_val=1e-10; > >>>>my @params=( prog=>$prog, > >>>> data=>$db, > >>>> expect=>$e_val, > >>>> readmethod=>'SearchIO'); > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > >>>> > >>>>my $v = 1; > >>>> > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > >>>> > >>>>while (my $input = $str->next_seq()){ > >>>> #Blast a sequence against a database: > >>>> #Alternatively, you could pass in a file with many > >>>> #sequences rather than loop through sequence one at a time > >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > >>>> #and swap the two lines below for an example of that. > >>>> my $r = $factory->submit_blast($input); > >>>> #my $r = $factory->submit_blast('amino.fa'); > >>>> print STDERR "waiting..." if( $v > 0 ); > >>>> while ( my @rids = $factory->each_rid ) { > >>>> foreach my $rid ( @rids ) { > >>>> my $rc = $factory->retrieve_blast($rid); > >>>> if( !ref($rc) ) { > >>>> if( $rc < 0 ) { > >>>> $factory->remove_rid($rid); > >>>> } > >>>> print STDERR "." if ( $v > 0 ); > >>>> sleep 5; > >>>> } else { > >>>> my $result = $rc->next_result(); > >>>> #save the output > >>>> my $filename = $result->query_name()."\.out"; > >>>> $factory->save_output($filename); > >>>> $factory->remove_rid($rid); > >>>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>>> while ( my $hit = $result->next_hit ) { > >>>> next unless ( $v > 0); > >>>> print "\thit name is ", $hit->name, "\n"; > >>>> while( my $hsp = $hit->next_hsp ) { > >>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>> } > >>>> } > >>>> } > >>>> } > >>>> } > >>>>} > >>>> > >>>>Thank you for your help! > >>>> > >>>> > >>>>Guojun > >>>>Department of Plant Biology > >>>>University of Georgia > >>>> > >>>>----- Original Message ----- > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>To: gyang at plantbio.uga.edu > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>> > >>>> > >>>> > >>>> > >>>>>Try two things: > >>>>> > >>>>> > >>>>>>1) Use a much simpler script, like the one in 'perldoc > >>>>>> > >>>>>> > >>>>>Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > >>>>> > >>>>> > >>>>wrong > >>>> > >>>> > >>>>>with the logic in your subroutine: > >>>>> > >>>>> > >>>>>>my $v = 1; > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > >>>>>>while (my $input = $str->next_seq()){ > >>>>>> > >>>>>> > >>>>> #Blast a sequence against a database: > >>>>> #Alternatively, you could pass in a file with many > >>>>> #sequences rather than loop through sequence one at a time > >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > >>>>> #and swap the two lines below for an example of that. > >>>>> my $r = $factory->submit_blast($input); > >>>>> #my $r = $factory->submit_blast('amino.fa'); > >>>>> print STDERR "waiting..." if( $v > 0 ); > >>>>> while ( my @rids = $factory->each_rid ) { > >>>>> foreach my $rid ( @rids ) { > >>>>> my $rc = $factory->retrieve_blast($rid); > >>>>> if( !ref($rc) ) { > >>>>> if( $rc < 0 ) { > >>>>> $factory->remove_rid($rid); > >>>>> } > >>>>> print STDERR "." if ( $v > 0 ); > >>>>> sleep 5; > >>>>> } else { > >>>>> my $result = $rc->next_result(); > >>>>> #save the output > >>>>> my $filename = $result->query_name()."\.out"; > >>>>> $factory->save_output($filename); > >>>>> $factory->remove_rid($rid); > >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > >>>>> while ( my $hit = $result->next_hit ) { > >>>>> next unless ( $v > 0); > >>>>> print "\thit name is ", $hit->name, "\n"; > >>>>> while( my $hsp = $hit->next_hsp ) { > >>>>> print "\t\tscore is ", $hsp->score, "\n"; > >>>>> } > >>>>> } > >>>>> } > >>>>> } > >>>>> } > >>>>>} > >>>>> > >>>>> > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works. It > >>>>>> > >>>>>> > >>>really > >>> > >>> > >>>>>shouldn't make that much of a difference, but I noticed that the CVS > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > >>>>>released; the Bugzilla version is based off CVS. > >>>>> > >>>>> > >>>>>>Christopher Fields > >>>>>> > >>>>>> > >>>>>Postdoctoral Researcher - Switzer Lab > >>>>>Dept. of Biochemistry > >>>>>University of Illinois Urbana-Champaign > >>>>> > >>>>> > >>>>>>>-----Original Message----- > >>>>>>> > >>>>>>> > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > >>>>>>Sent: Monday, February 13, 2006 3:00 PM > >>>>>>To: bioperl-l at lists.open-bio.org > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>>>> > >>>>>> > >>>>>>>>Thanks, Chris, > >>>>>>>> > >>>>>>>> > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the > >>>>>> > >>>>>> > >>>one > >>> > >>> > >>>>from > >>>> > >>>> > >>>>>>your bug report. The running version is 1.5 when I use the command > >>>>>> > >>>>>> > >>>you > >>> > >>> > >>>>>>sent me. But when I tried the script, it doesn't change much. My > >>>>>>remoteblast code (portion) is here: > >>>>>> > >>>>>> > >>>>>>>>sub search { > >>>>>>>> > >>>>>>>> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > >>>>>>local > >>>>>> > >>>>>> > >>>>>> > >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > >>> > >>> > >>>>>>'no'; > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]", > >>>>>> -id=>"query", > >>>>>> -desc=>"new seq"); > >>>>>>my $len=$query->length(); > >>>>>>@db=('nr','htgs','wgs'); > >>>>>>foreach my $db (@db) { > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn', > >>>>>> '-data' =>"$db", > >>>>>> > >>>>>> > >>>>>> > >>'-expect'=>"$E_value"); > >> > >> > >>>>>>>>>>my $blast_report = $factory->submit_blast($query); > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>my @rids = $factory->each_rid(); > >>>>>>>> > >>>>>>>> > >>>>>>foreach my $rid ( @rids ) { > >>>>>> print STDERR "$rid\n"; > >>>>>>} > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638) > >>>>>>print STDERR "waiting..."; > >>>>>>sleep 60; > >>>>>> > >>>>>> > >>>>>>>>foreach my $rid ( @rids ) { > >>>>>>>> > >>>>>>>> > >>>>>> my $rc = $factory->retrieve_blast($rid); > >>>>>> while (!ref($rc) ) { > >>>>>> if( $rc < 0 ) { > >>>>>># retrieve_blast returns -1 on error > >>>>>> $factory->remove_rid($rid); > >>>>>> print "Error!\n"; > >>>>>> send_error($email,$function,$seqname,$queryname[$ST]); > >>>>>> die "Can't retrieve $rid"; > >>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not > >>>>>> > >>>>>> > >>>finished' > >>> > >>> > >>>>>> sleep 60; > >>>>>> $rc = $factory->retrieve_blast($rid); > >>>>>> } > >>>>>> } > >>>>>> if (ref($rc)) { > >>>>>> print STDERR "Done.\n"; > >>>>>> while( my $result = $rc->next_result) { > >>>>>> while( my $hit = $result->next_hit()) { > >>>>>> $hit_name=$hit->name; > >>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > >>>>>> $name=$1; > >>>>>> @left_plus_start=(); > >>>>>> @left_plus_end=(); > >>>>>> @left_minus_start=(); > >>>>>> @left_minus_end=(); > >>>>>> @right_plus_start=(); > >>>>>> @right_plus_end=(); > >>>>>> @right_minus_start=(); > >>>>>> @right_minus_end=(); > >>>>>> > >>>>>> > >>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > >>>>>>>> > >>>>>>>> > >>>>>> while( my $hsp = $hit->next_hsp()) { > >>>>>>...... > >>>>>> > >>>>>> > >>>>>>>>It was working quite well before around October laster year, but > >>>>>>>> > >>>>>>>> > >>>>it has > >>>> > >>>> > >>>>>>stopped since then, When a submission is sent via a webpage, the cgi > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there, > >>>>>> > >>>>>> > >>>>finally > >>>> > >>>> > >>>>>>the expected email is received but without real results although it > >>>>>> > >>>>>> > >>>>does > >>>> > >>>> > >>>>>>contain something from other parts of the script. Apparently the > >>>>>> > >>>>>> > >>>>search > >>>> > >>>> > >>>>>>sub did not return anything (I know there is something should be > >>>>>>returned.). Is it also possible the format of the NCBI output for > >>>>>> > >>>>>> > >>>each > >>> > >>> > >>>>>>result has changed? > >>>>>>Thank you, > >>>>>>Guojun > >>>>>> > >>>>>> > >>>>>>>>>>Department of Plant Biology > >>>>>>>>>> > >>>>>>>>>> > >>>>>>University of Georgia > >>>>>> > >>>>>> > >>>>>>>>>>>>----- Original Message ----- > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>>>> > >>>>>> > >>>>>>>>>>>How do you know two versions are installed (i.e. how are > >>>>>>>>>>> > >>>>>>>>>>> > >>>you > >>> > >>> > >>>>checking > >>>> > >>>> > >>>>>>the > >>>>>> > >>>>>> > >>>>>>>version)? Do you see have two complete bioperl distributions (in > >>>>>>> > >>>>>>> > >>>>two > >>>> > >>>> > >>>>>>>separate directories) or are you looking in modules? Here's the > >>>>>>> > >>>>>>> > >>>way > >>> > >>> > >>>>to > >>>> > >>>> > >>>>>>>check the version (from the FAQ): > >>>>>>> > >>>>>>> > >>>>>>>>perl -MBio::Root::Version -e 'print > >>>>>>>> > >>>>>>>> > >>>>$Bio::Root::Version::VERSION,"\n"' > >>>> > >>>> > >>>>>>>>If you have two full bioperl distributions on your computer, > >>>>>>>> > >>>>>>>> > >>>>normally > >>>> > >>>> > >>>>>>only > >>>>>> > >>>>>> > >>>>>>>one will be in use unless you have explicitly set the environment > >>>>>>> > >>>>>>> > >>>>>>variable > >>>>>> > >>>>>> > >>>>>>>PERL5LIB. The PERL5LIB directories will be searched first before > >>>>>>> > >>>>>>> > >>>>your > >>>> > >>>> > >>>>>>>normal perl directory list (@INC) is searched. You MAY get some > >>>>>>> > >>>>>>> > >>>>mixing > >>>> > >>>> > >>>>>>>then, but only if perl can't find a particular module in the path > >>>>>>> > >>>>>>> > >>>>>>designated > >>>>>> > >>>>>> > >>>>>>>in PERL5LIB; then it will progress through the directories listed > >>>>>>> > >>>>>>> > >>>in > >>> > >>> > >>>>>>@INC. > >>>>>> > >>>>>> > >>>>>>>This may happen if a module is unique to a particular release, but > >>>>>>> > >>>>>>> > >>>>>>shouldn't > >>>>>> > >>>>>> > >>>>>>>happen for the majority of modules, including RemoteBlast. You > >>>>>>> > >>>>>>> > >>>can > >>> > >>> > >>>>>>check > >>>>>> > >>>>>> > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > >>>>>>> > >>>>>>> > >>>>differ > >>>> > >>>> > >>>>>>>depending on your OS, perl build, etc. > >>>>>>> > >>>>>>> > >>>>>>>>Regardless, if you follow the directions for installing bioperl > >>>>>>>> > >>>>>>>> > >>>>for > >>>> > >>>> > >>>>>>your > >>>>>> > >>>>>> > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install', > >>>>>>> > >>>>>>> > >>>>unless > >>>> > >>>> > >>>>>>you > >>>>>> > >>>>>> > >>>>>>>explicitly change the installation directory when using 'perl > >>>>>>> > >>>>>>> > >>>>>>Makefile.PL'), > >>>>>> > >>>>>> > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will > >>>>>>> > >>>>>>> > >>>>install > >>>> > >>>> > >>>>>>the > >>>>>> > >>>>>> > >>>>>>>Bioperl distribution you downloaded over the old version in @INC. > >>>>>>> > >>>>>>> > >>>>See > >>>> > >>>> > >>>>>>this > >>>>>> > >>>>>> > >>>>>>>page: > >>>>>>> > >>>>>>> > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > >>>>>>>>for more details. > >>>>>>>>Christopher Fields > >>>>>>>> > >>>>>>>> > >>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>Dept. of Biochemistry > >>>>>>>University of Illinois Urbana-Champaign > >>>>>>> > >>>>>>> > >>>>>>>>>>-----Original Message----- > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM > >>>>>>>>To: bioperl-l at lists.open-bio.org > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>Hi, Chris, > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>I do have different versions of bioperl on my Linux machine > >>>>>>>> > >>>>>>>> > >>>(1.4. > >>> > >>> > >>>>and > >>>> > >>>> > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl- > >>>>>>>> > >>>>>>>> > >>>>1.5.1 > >>>> > >>>> > >>>>>>or I > >>>>>> > >>>>>> > >>>>>>>>need to uninstall and remove the previous versions. I could not > >>>>>>>> > >>>>>>>> > >>>>find > >>>> > >>>> > >>>>>>any > >>>>>> > >>>>>> > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me > >>>>>>>> > >>>>>>>> > >>>>some > >>>> > >>>> > >>>>>>>>suggestion? > >>>>>>>>Thanks, > >>>>>>>>Guojun > >>>>>>>> > >>>>>>>> > >>>>>>>>>>Department of Plant Biology > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>University of Georgia > >>>>>>>> _____ > >>>>>>>> > >>>>>>>> > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500 > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>> > >>>>>>>> > >>>>>>version > >>>>>> > >>>>>> > >>>>>>>>1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>updated from CVS > >>>>>> > >>>>>> > >>>>>>>>which isn't the latest fix. > >>>>>>>> > >>>>>>>> > >>>>>>>>>>Make sure that you check the following: > >>>>>>>>>>1) Always post to the mailing list: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > >>>>>>>> > >>>>>>>> > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live > >>>>>>>>>> > >>>>>>>>>> > >>>>(CVS) > >>>> > >>>> > >>>>>>>>installed first. Perform a clean installation; do not upgrade > >>>>>>>> > >>>>>>>> > >>>>only > >>>> > >>>> > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > >>>>>>>> > >>>>>>>> > >>>can't > >>> > >>> > >>>>>>>>guarantee that mixing modules from old and new distributions > >>>>>>>> > >>>>>>>> > >>>(1.4 > >>> > >>> > >>>>and > >>>> > >>>> > >>>>>>>>1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be > >>>>>>>> > >>>>>>>> > >>>>saved > >>>> > >>>> > >>>>>>and > >>>>>> > >>>>>> > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI > >>>>>>>> > >>>>>>>> > >>>>>>(v2.2.13) > >>>>>> > >>>>>> > >>>>>>>>but it should still save it. I believe as long as next_results() > >>>>>>>> > >>>>>>>> > >>>>isn't > >>>> > >>>> > >>>>>>>>called, it will work. > >>>>>>>> > >>>>>>>> > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST > >>>>>>>>>> > >>>>>>>>>> > >>>2.2.13 > >>> > >>> > >>>>>>text output > >>>>>> > >>>>>> > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by > >>>>>>>> > >>>>>>>> > >>>Roger > >>> > >>> > >>>>Hall > >>>> > >>>> > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be > >>>>>>>> > >>>>>>>> > >>>>(Jason > >>>> > >>>> > >>>>>>or > >>>>>> > >>>>>> > >>>>>>>>whomever is in charge of Bio::SearchIO). They can be found in > >>>>>>>> > >>>>>>>> > >>>>>>Bugzilla: > >>>>>> > >>>>>> > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the > >>>>>>>>>> > >>>>>>>>>> > >>>>option > >>>> > >>>> > >>>>>>of > >>>>>> > >>>>>> > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using > >>>>>>>> > >>>>>>>> > >>>>this > >>>> > >>>> > >>>>>>>>option. And, remember, they haven't been committed yet to CVS, > >>>>>>>> > >>>>>>>> > >>>>which > >>>> > >>>> > >>>>>>>>means that the final version will change to refle the new > >>>>>>>> > >>>>>>>> > >>>version. > >>> > >>> > >>>>>>>>>>>>Christopher Fields > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>>Dept. of Biochemistry > >>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>> _____ > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM > >>>>>>>>To: Chris Fields > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>> > >>>>>>>> > >>>>>>version > >>>>>> > >>>>>> > >>>>>>>>1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>Hi, Chris > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work > >>>>>>>>>> > >>>>>>>>>> > >>>>for > >>>> > >>>> > >>>>>>my cgi > >>>>>> > >>>>>> > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't > >>>>>>>> > >>>>>>>> > >>>>even > >>>> > >>>> > >>>>>>get > >>>>>> > >>>>>> > >>>>>>>>any RID. Is there any suggestion? > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>>>Guojun > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > >>>>>>>>>>>>Guojun Yang > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>Department of Plant Biology > >>>>>>>>University of Georgia > >>>>>>>>Tel: 706-542-1857 > >>>>>>>>Fax: 706-542-1805 > >>>>>>>>http://www.arches.uga.edu/~guojun > >>>>>>>> _____ > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500 > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>> > >>>>>>>> > >>>>>>version > >>>>>> > >>>>>> > >>>>>>>>1.28 > >>>>>>>> > >>>>>>>> > >>>>>>>>>>I would say give the new code a try, but realize that it > >>>>>>>>>> > >>>>>>>>>> > >>>>hasn't > >>>> > >>>> > >>>>>>been > >>>>>> > >>>>>> > >>>>>>>>checked > >>>>>>>>in (like I said below). I will try going over the modified > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is > >>>>>>>> > >>>>>>>> > >>>>anything I > >>>> > >>>> > >>>>>>>>might > >>>>>>>>have missed. The changed order in the header of BLAST text > >>>>>>>> > >>>>>>>> > >>>output > >>> > >>> > >>>>has > >>>> > >>>> > >>>>>>me a > >>>>>> > >>>>>> > >>>>>>>>bit worried that it might not catch everything, but it at least > >>>>>>>> > >>>>>>>> > >>>>>>doesn't > >>>>>> > >>>>>> > >>>>>>>>hang > >>>>>>>>in the while() loop I described in the bug report below (bug > >>>>>>>> > >>>>>>>> > >>>>#1934) > >>>> > >>>> > >>>>>>and > >>>>>> > >>>>>> > >>>>>>>>seems to process everything fine. > >>>>>>>> > >>>>>>>> > >>>>>>>>>>If you want more stability in the code, you might consider > >>>>>>>>>> > >>>>>>>>>> > >>>>>>changing over > >>>>>> > >>>>>> > >>>>>>>>to > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are > >>>>>>>> > >>>>>>>> > >>>>some > >>>> > >>>> > >>>>>>>>changes > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > >>>>>>>> > >>>>>>>> > >>>>saving > >>>> > >>>> > >>>>>>XML > >>>>>> > >>>>>> > >>>>>>>>output, but I believe it parses everything regardless. If you > >>>>>>>> > >>>>>>>> > >>>look > >>> > >>> > >>>>>>back > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>>last month or so there has been a bit of discussion here about > >>>>>>>> > >>>>>>>> > >>>it. > >>> > >>> > >>>>>>Jason > >>>>>> > >>>>>> > >>>>>>>>describes a bit on how to set up RemoteBlast for XML: > >>>>>>>> > >>>>>>>> > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using- > >>>>>>>>>> > >>>>>>>>>> > >>>>>>remoteblast/ > >>>>>> > >>>>>> > >>>>>>>>>>Christopher Fields > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>>Dept. of Biochemistry > >>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>-----Original Message----- > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM > >>>>>>>>>To: bioperl-l at bioperl.org > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm > >>>>>>>>> > >>>>>>>>> > >>>>version > >>>> > >>>> > >>>>>>1.28 > >>>>>> > >>>>>> > >>>>>>>>>Hi, Everybody, > >>>>>>>>>I see this post and am wondering if this is the reason for the > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named > >>>>>>>>> > >>>>>>>>> > >>>>MAK, > >>>> > >>>> > >>>>>>for > >>>>>> > >>>>>> > >>>>>>>>MITE > >>>>>>>> > >>>>>>>> > >>>>>>>>>sequence analysis. It was working very well until around > >>>>>>>>> > >>>>>>>>> > >>>>November > >>>> > >>>> > >>>>>>2005, > >>>>>> > >>>>>> > >>>>>>>>>when it stopped returning any result (the site is fine and > >>>>>>>>> > >>>>>>>>> > >>>seems > >>> > >>> > >>>>to > >>>> > >>>> > >>>>>>be > >>>>>> > >>>>>> > >>>>>>>>>doing sth after submission). In the CGI script, I used > >>>>>>>>> > >>>>>>>>> > >>>>remoteblast > >>>> > >>>> > >>>>>>(that > >>>>>> > >>>>>> > >>>>>>>>>work was done in 2003) to do searches. I currently do not have > >>>>>>>>> > >>>>>>>>> > >>>>>>access to > >>>>>> > >>>>>> > >>>>>>>>>the server because I moved. Quite several people sent emails > >>>>>>>>> > >>>>>>>>> > >>>to > >>> > >>> > >>>>us > >>>> > >>>> > >>>>>>about > >>>>>> > >>>>>> > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the > >>>>>>>>> > >>>>>>>>> > >>>>problem? > >>>> > >>>> > >>>>>>>>Should > >>>>>>>> > >>>>>>>> > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new > >>>>>>>>> > >>>>>>>>> > >>>>version? > >>>> > >>>> > >>>>>>>>>Thanks a lot, > >>>>>>>>>Guojun > >>>>>>>>> > >>>>>>>>>Department of Plant Biology > >>>>>>>>>University of Georgia > >>>>>>>>>Tel: 706-542-1857 > >>>>>>>>>Fax: 706-542-1805 > >>>>>>>>>http://www.arches.uga.edu/~guojun > >>>>>>>>>_____ > >>>>>>>>> > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > >>>>>>>>> > >>>>>>>>> > >>>>Jian' > >>>> > >>>> > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > >>>>>>>>> > >>>>>>>>> > >>>[mailto:bioperl- > >>> > >>> > >>>>>>>>>l at bioperl.org] > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500 > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > >>>>>>>>> > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live > >>>>>>>>> > >>>>>>>>> > >>>>CVS. > >>>> > >>>> > >>>>>>It > >>>>>> > >>>>>> > >>>>>>>>>will > >>>>>>>>>work for saving text output. However, it will not parse > >>>>>>>>> > >>>>>>>>> > >>>anything > >>> > >>> > >>>>>>using > >>>>>> > >>>>>> > >>>>>>>>>next_result (it will likely hang) and will not save XML > >>>>>>>>> > >>>>>>>>> > >>>format. > >>> > >>> > >>>>See > >>>> > >>>> > >>>>>>>>these > >>>>>>>> > >>>>>>>> > >>>>>>>>>bugs: > >>>>>>>>> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > >>>>>>>>> > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast > >>>>>>>>> > >>>>>>>>> > >>>and > >>> > >>> > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in > >>>>>>>>> > >>>>>>>>> > >>>>yet > >>>> > >>>> > >>>>>>so > >>>>>> > >>>>>> > >>>>>>>>are > >>>>>>>> > >>>>>>>> > >>>>>>>>>still not included in bioperl-live; they may be further > >>>>>>>>> > >>>>>>>>> > >>>modified > >>> > >>> > >>>>>>before > >>>>>> > >>>>>> > >>>>>>>>>committing to CVS. If you're not worried about XML, you could > >>>>>>>>> > >>>>>>>>> > >>>>just > >>>> > >>>> > >>>>>>try > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>> > >>>>>>>> > >>>>>>>>>first fix, which is a change to SearchIO::blast. > >>>>>>>>> > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a > >>>>>>>>> > >>>>>>>>> > >>>>>>script > >>>>>> > >>>>>> > >>>>>>>>>which > >>>>>>>>>had problems; the script you used saves the output but doesn't > >>>>>>>>> > >>>>>>>>> > >>>>>>actually > >>>>>> > >>>>>> > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the > >>>>>>>>> > >>>>>>>>> > >>>>data). > >>>> > >>>> > >>>>>>Is > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>> > >>>>>>>> > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have > >>>>>>>>> > >>>>>>>>> > >>>you > >>> > >>> > >>>>>>tried > >>>>>> > >>>>>> > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "- > >>>>>>>>> > >>>>>>>>> > >>>>readmethod > >>>> > >>>> > >>>>>>=> > >>>>>> > >>>>>> > >>>>>>>>>blast" > >>>>>>>>>using your version of RemoteBlast and method next_result()? > >>>>>>>>> > >>>>>>>>> > >>>Like > >>> > >>> > >>>>>>below > >>>>>> > >>>>>> > >>>>>>>>>(from > >>>>>>>>>perldoc): > >>>>>>>>> > >>>>>>>>>while ( my @rids = $factory->each_rid ) { > >>>>>>>>>foreach my $rid ( @rids ) { > >>>>>>>>>my $rc = $factory->retrieve_blast($rid); > >>>>>>>>>if( !ref($rc) ) { > >>>>>>>>>if( $rc < 0 ) { > >>>>>>>>>$factory->remove_rid($rid); > >>>>>>>>>} > >>>>>>>>>print STDERR "." if ( $v > 0 ); > >>>>>>>>>sleep 5; > >>>>>>>>>} else { # parsing > >>>>>>>>>starts here > >>>>>>>>>my $result = $rc->next_result(); # it should hang > >>>>>>>>>here > >>>>>>>>>#save the output > >>>>>>>>>my $filename = $result->query_name()."\.out"; > >>>>>>>>>$factory->save_output($filename); > >>>>>>>>>$factory->remove_rid($rid); > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n"; > >>>>>>>>>while ( my $hit = $result->next_hit ) { > >>>>>>>>>next unless ( $v > 0); > >>>>>>>>>print "\thit name is ", $hit->name, "\n"; > >>>>>>>>>while( my $hsp = $hit->next_hsp ) { > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n"; > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>>} > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>My script hanged if I used next_result() in any way prior to > >>>>>>>>> > >>>>>>>>> > >>>the > >>> > >>> > >>>>>>fixes. > >>>>>> > >>>>>> > >>>>>>>>I > >>>>>>>> > >>>>>>>> > >>>>>>>>>want to see how many others are having the same issues with > >>>>>>>>> > >>>>>>>>> > >>>>parsing > >>>> > >>>> > >>>>>>>>using > >>>>>>>> > >>>>>>>> > >>>>>>>>>the CVS version of bioperl-live. > >>>>>>>>> > >>>>>>>>>Christopher Fields > >>>>>>>>>Postdoctoral Researcher - Switzer Lab > >>>>>>>>>Dept. of Biochemistry > >>>>>>>>>University of Illinois Urbana-Champaign > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>-----Original Message----- > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > >>>>>>>>>> > >>>>>>>>>> > >>>l- > >>> > >>> > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM > >>>>>>>>>>To: Huang Jian; bioperl-l > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > >>>>>>>>>> > >>>>>>>>>>Hi Huang, > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm > >>>>>>>>>> > >>>>>>>>>> > >>>>works > >>>> > >>>> > >>>>>>on > >>>>>> > >>>>>> > >>>>>>>>the > >>>>>>>> > >>>>>>>> > >>>>>>>>>>logic of checking the temporary file size to determine > >>>>>>>>>> > >>>>>>>>>> > >>>whether > >>> > >>> > >>>>the > >>>> > >>>> > >>>>>>>>Blast > >>>>>>>> > >>>>>>>> > >>>>>>>>>>results are ready. This condition is not getting satisfied > >>>>>>>>>> > >>>>>>>>>> > >>>may > >>> > >>> > >>>>be > >>>> > >>>> > >>>>>>due > >>>>>> > >>>>>> > >>>>>>>>to > >>>>>>>> > >>>>>>>> > >>>>>>>>>>some changes brought about by NCBI. I had this problem > >>>>>>>>>> > >>>>>>>>>> > >>>>recently > >>>> > >>>> > >>>>>>and > >>>>>> > >>>>>> > >>>>>>>>>>figured out that the solution was to use the latest version > >>>>>>>>>> > >>>>>>>>>> > >>>>which > >>>> > >>>> > >>>>>>has > >>>>>> > >>>>>> > >>>>>>>>>>this problem fixed (does not use file size logic any more) > >>>>>>>>>> > >>>>>>>>>> > >>>>which > >>>> > >>>> > >>>>>>is > >>>>>> > >>>>>> > >>>>>>>>not > >>>>>>>> > >>>>>>>> > >>>>>>>>>>yet included in the BioPerl package. > >>>>>>>>>>Cheers > >>>>>>>>>>Nagesh > >>>>>>>>>> > >>>>>>>>>>Huang Jian wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>>Dear Nagesh, > >>>>>>>>>>> > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > >>>>>>>>>>> > >>>>>>>>>>> > >>>>you > >>>> > >>>> > >>>>>>send > >>>>>> > >>>>>> > >>>>>>>>>>>me. Now it works perfectly!!! > >>>>>>>>>>> > >>>>>>>>>>>Thank you!! > >>>>>>>>>>> > >>>>>>>>>>>Huang > >>>>>>>>>>> > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka" > >>>>>>>>>>> > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l" > >>>>>>>>>>> > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the > >>>>>>>>>>> > >>>>>>>>>>> > >>>net, > >>> > >>> > >>>>so > >>>> > >>>> > >>>>>>still > >>>>>> > >>>>>> > >>>>>>>>>>>via email > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>>Hi Huang, > >>>>>>>>>>>>I see that you are submitting a sequence for a remote > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>blast > >>> > >>> > >>>>>>search. > >>>>>> > >>>>>> > >>>>>>>>>Can > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28 > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>(2005/12/09). > >>>>>> > >>>>>> > >>>>>>>>If > >>>>>>>> > >>>>>>>> > >>>>>>>>>>>>not I have attached it with this email, try to replace it > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>with > >>>> > >>>> > >>>>>>the > >>>>>> > >>>>>> > >>>>>>>>>old > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>>>>one which has a bug. > >>>>>>>>>>>>Let me know if it works. > >>>>>>>>>>>>Nagesh > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>_______________________________________________ > >>>>>>>>>>Bioperl-l mailing list > >>>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>_______________________________________________ > >>>>>>>>>Bioperl-l mailing list > >>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>_______________________________________________ > >>>>>>>>>Bioperl-l mailing list > >>>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> > >>>>>>_______________________________________________ > >>>>>> > >>>>>> > >>>>>>>>Bioperl-l mailing list > >>>>>>>>Bioperl-l at lists.open-bio.org > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>>_______________________________________________ > >>>>>>>> > >>>>>>>> > >>>>>>Bioperl-l mailing list > >>>>>>Bioperl-l at lists.open-bio.org > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>>> > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From valiente at lsi.upc.edu Mon Feb 20 13:51:35 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Mon, 20 Feb 2006 19:51:35 +0100 Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy Message-ID: <43FA0FB7.6060904@lsi.upc.edu> The local flat file implementation of Bio::DB::Taxonomy seems to be fine: use Bio::DB::Taxonomy; my $nodesfile = "nodes.dmp"; my $namesfile = "names.dmp"; my $db = new Bio::DB::Taxonomy(-source => 'flatfile' -nodesfile => $nodesfile, -namesfile => $namefile); my $taxonid = $db->get_taxonid('Homo sapiens'); Here, $taxonid is 9606. However, my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid); raises: -------------------- WARNING --------------------- MSG: can't create a species object for Homo sapiens (human) because it isn't a species but is a '' instead --------------------------------------------------- Thanks, Gabriel From boris.steipe at utoronto.ca Mon Feb 20 13:40:19 2006 From: boris.steipe at utoronto.ca (Boris Steipe) Date: Mon, 20 Feb 2006 13:40:19 -0500 Subject: [Bioperl-l] Matrix Average Code / Module ? In-Reply-To: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176> References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1> <76f031ae0602190552v5f2542dbv@mail.gmail.com> <59825.192.168.1.176.1140416461.squirrel@192.168.1.176> Message-ID: <92CF0104-0524-4BA3-B039-3CEECF68E20B@utoronto.ca> Assuming you mean the arithmetic average of all elements in a matrix, you could do the following (using your numbers): #!/usr/bin/perl -w use strict; my @matrix; push(@matrix, [(11,22,43,54,50)]); # [(...)] :a list passed as an anonymous array push(@matrix, [(27,87,74,32,10)]); push(@matrix, [(66,58,98,78,20)]); push(@matrix, [(22,23,44,16,34)]); my $sum = 0; my $number = 0; foreach my $row (@matrix) { foreach my $element (@{$row}){ $sum += $element; $number++; } } print "Average of $number elements = ", $sum/$number,"\n"; exit; HTH, B. On 20 Feb 2006, at 01:21, Shameer Khadar wrote: > Hi all, > Is there any program/module to calculate the average of a blosum/ > pam any > matrix ? > > I have a matrix and I need to see the average > > for example > > 11 22 43 54 50 > 27 87 74 32 10 > 66 58 98 78 20 > 22 23 44 16 34 > > I have gone through Bio::Matrix::MatrixI and > Bio::Matrix::GenericMatrix > and other perl modules like Math::Matrix > http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm > and Math::Cephes::Matrix - but none of them have a provison to do > matrix > average calculation. > > Any help ??? > thanks in advance, > Happy biocomputing !!! > > > -- > Shameer Khadar > National Centre for Biological Sciences (TIFR) > UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India > T - 91-080-23636420-32 EXT 4241 > F - 91-080-23636662/23636675 > W - http://www.ncbs.res.in > -------------------------------------------------- > "Refrain from illusions, insist on work and not words, > patiently seek divine and scientific truth." > MM > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Feb 20 17:01:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 20 Feb 2006 16:01:15 -0600 Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28 In-Reply-To: <000e01c6363f$494bc5e0$15327e82@pyrimidine> Message-ID: <000001c63669$2bf06a80$15327e82@pyrimidine> Guojun Yang pointed out that his BLAST output was still not parsed correctly, so I posted another change: http://bugzilla.bioperl.org/show_bug.cgi?id=1934 The direct link for the module is: http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view Note that all caveats (can't sue if computer blows up, this is a very preliminary bugfix, etc.) apply. Apparently, NCBI has changed blastn and tblastx output to show features in the region for each HSP, starting with the either one of the following lines: Features in this part of subject sequence: Features flanking this part of subject sequence: If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13, most blastn or tblastx report parsing seems to choke on these lines, unless you are pretty lucky. This extra little feature was introduced a while back for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default and hadn't starting affecting web output until this last fall. The first fix I posted caught only the first version but not the second The fix included a loop with debugging output to bypass this for now. If you use SearchIO directly for parsing (not through RemoteBlast) you can see the bypassed lines by setting the '-verbose' flag to 1. Thanks to Guojun Yang for pointing this out. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Monday, February 20, 2006 11:01 AM > To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on > RemoteBlast.pmversion 1.28 > > I have added a preliminary bugfix for the problems seen with nucleotide > blast parsing for BLAST 2.2.13 reports. I passed SearchIO::blast through > perltidy to space out the blocks (really for my own purposes; it's a > pretty > complex module). The fix bypasses the extra lines output for blastn and > tblastx and now seems to parse the text output for those reports > correctly. > I tested it using all NCBI BLAST flavors for the last two version of BLAST > (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST > report parsing, such as WU-BLAST, RPS-BLAST, or Paracel. It has only been > tested on MacOSX at the moment, so I need people out there to test it out > on > anything they can to make sure it works before committing. I'll be trying > it on Windows today. Report back to me and I'll post anything on > bugzilla. > > Here it is: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs > > Sent: Thursday, February 16, 2006 3:46 AM > > To: gyang at plantbio.uga.edu > > Cc: bioperl-l at lists.open-bio.org; Chris Fields > > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on > RemoteBlast.pm > > version 1.28 > > > > Hi, > > > > I have the same problem with the blast.pm-file. > > The people of NCBI added some extra info when giving the Blast-output. > > (see e.g. "Features flanking this part..." or "Features in this part > > ..."), example added. > > The blast.pm module starts looking for the hsp-alignement-information, > > but it dies when it hits this Feature-information. > > > > Pieter > > > > > > >gi|77552765|gb|DP000011.1| > > > > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group) > > chromosome 12, complete > > > > sequence > > Length=27492551 > > > > Features flanking this part of subject sequence: > > > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub- > class > > > > &from=19251479&to=19253693&view=gbwithparts> > > > > 2655 bp at 3' side: hypothetical protein > > > > &from=19260091&to=19260600&view=gbwithparts> > > > > Score = 36.2 bits (18), Expect = 0.22 > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > Strand=Plus/Minus > > > > Query 4 GTACTACTCTACTCTACT 21 > > |||||||||||||||||| > > > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > > > > Features flanking this part of subject sequence: > > > > 2991 bp at 5' side: hypothetical protein > > > > &from=27003164&to=27003907&view=gbwithparts> > > 1131 bp at 3' side: hypothetical protein > > > > > > &from=27008046&to=27010752&view=gbwithparts> > > > > Score = 36.2 bits (18), Expect = 0.22 > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > Strand=Plus/Minus > > > > Query 2 ATGTACTACTCTACTCTA 19 > > |||||||||||||||||| > > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > > > > > Features in this part of subject sequence: > > DHHC zinc finger domain, putative > > > > > > &from=17614825&to=17618687&view=gbwithparts> > > > > Score = 34.2 bits (17), Expect = 0.87 > > Identities = 17/17 (100%), Gaps = 0/17 (0%) > > Strand=Plus/Plus > > > > Query 5 TACTACTCTACTCTACT 21 > > ||||||||||||||||| > > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > > > > > Features flanking this part of subject sequence: > > 102 bp at 5' side: bZIP transcription factor, putative > > > > > > &from=2774964&to=2775778&view=gbwithparts> > > 3740 bp at 3' side: yeast dcp1, putative > > > > &from=2779635&to=2782508&view=gbwithparts> > > > > Score = 32.2 bits (16), Expect = > > 3.4 > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > Strand=Plus/Plus > > > > Query 7 CTACTCTACTCTACTC 22 > > |||||||||||||||| > > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > > > > Features flanking this part of subject sequence: > > > > 21 bp at 5' side: peptide transporter T17F3.11, putative > > > > &from=27321354&to=27323117&view=gbwithparts> > > > > 10230 bp at 3' side: transposon protein, putative, unclassified > > > > &from=27333383&to=27334285&view=gbwithparts> > > > > Score = 32.2 bits (16), Expect = 3.4 > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > Strand=Plus/Minus > > > > Query 7 CTACTCTACTCTACTC 22 > > > > |||||||||||||||| > > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > > > > > > Guojun Yang wrote: > > > > >Hi, Chris, > > >Finally the remoteblast test script works for the amino.fa query. but > > when I try a nucleic acid sequence (see below), Error occurs: > > >" > > >waiting........ > > >------------- EXCEPTION ------------- > > >MSG: no data for midline Features flanking this part of subject > > sequence: > > >STACK Bio::SearchIO::blast::next_result > > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc > > hIO/blast.pm:1172 > > >STACK toplevel remoteblast_test:40 > > >" > > >The query sequence is: > > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > > > > >The script (basically same as the remoteblast test, I only changed > > database to 'nr' and program to 'blastn' and filename to 'ost3'): > > >#!/usr/bin/perl > > > > > >use Bio::SeqIO; > > >use Bio::Seq; > > >use Bio::Tools::Run::RemoteBlast; > > >use Bio::SearchIO; > > >use strict; > > >my $prog='blastn'; > > >my $db='nr'; > > >my $e_val=1e-10; > > >my @params=( -prog=>$prog, > > > -data=>$db, > > > -expect=>$e_val, > > > -readmethod=>'SearchIO'); > > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > >my $v = 1; > > > > > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > > > > >while (my $input = $str->next_seq()){ > > > #Blast a sequence against a database: > > > #Alternatively, you could pass in a file with many > > > #sequences rather than loop through sequence one at a time > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > #and swap the two lines below for an example of that. > > > my $r = $factory->submit_blast($input); > > > #my $r = $factory->submit_blast('amino.fa'); > > > print STDERR "waiting..." if( $v > 0 ); > > > while ( my @rids = $factory->each_rid ) { > > > foreach my $rid ( @rids ) { > > > my $rc = $factory->retrieve_blast($rid); > > > if( !ref($rc) ) { > > > if( $rc < 0 ) { > > > $factory->remove_rid($rid); > > > } > > > print STDERR "." if ( $v > 0 ); > > > sleep 5; > > > } else { > > > my $result = $rc->next_result(); > > > #save the output > > > my $filename = $result->query_name()."\.out"; > > > $factory->save_output($filename); > > > $factory->remove_rid($rid); > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > while ( my $hit = $result->next_hit ) { > > > next unless ( $v > 0); > > > print "\thit name is ", $hit->name, "\n"; > > > while( my $hsp = $hit->next_hsp ) { > > > print "\t\tscore is ", $hsp->score, "\n"; > > > } > > > } > > > } > > > } > > > } > > >} > > > > > > > > >Do you think there might still be something in the NCBI output format? > > > > > >Thank you, > > >Guojun > > > > > > > > > > > > > > >Guojun Yang > > >Department of Plant Biology > > >University of Georgia > > >Tel: 706-542-1857 > > >Fax: 706-542-1805 > > >http://www.arches.uga.edu/~guojun > > > > > > > > > > > >----- Original Message ----- > > >From: Chris Fields [mailto:cjfields at uiuc.edu] > > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > > > > > > > > > >>Sorry, forgot to add that I didn't see the regex issue that you > > mentioned. > > >>It could be a perl-related issue. Try the fixes I mentioned and see > > what > > >>happens. > > >> > > >> > > >>>Christopher Fields > > >>> > > >>> > > >>Postdoctoral Researcher - Switzer Lab > > >>Dept. of Biochemistry > > >>University of Illinois Urbana-Champaign > > >> > > >> > > >>>>>-----Original Message----- > > >>>>> > > >>>>> > > >>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>Sent: Tuesday, February 14, 2006 12:36 PM > > >>>To: 'gyang at plantbio.uga.edu' > > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >>> > > >>> > > >>>>>It's a good habit to always add single quotes around words. The > perl > > >>>>> > > >>>>> > > >>>interpreter may think a single bare word is a subroutine or perlfunc > > >>>called with no args so will try to find a subroutine named blastp(). > > My > > >>>debugger actually gives the error that the bare word blastp may > > conflict > > >>>with a future reserved word. Like you said, 'use strict' will point > > that > > >>>out. > > >>> > > >>> > > >>>>>As for the regex, it should match all the blast programs at NCBI > > (blastp, > > >>>>> > > >>>>> > > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure > nothing > > >>>else passes through. > > >>> > > >>> > > >>>>>So, if you are using the script below, there are several errors. > The > > bare > > >>>>> > > >>>>> > > >>>words for $prog and $db need quotes, and the flags for you @params > > array > > >>>don't have a dash before them. I get this after adding quotes but > > before > > >>>adding the dashes to @params: > > >>> > > >>> > > >>>>>C:\Perl\Scripts>test_blast.pl > > >>>>>------------- EXCEPTION: Bio::Root::Exception ------------- > > >>>>> > > >>>>> > > >>>MSG: > > >>>STACK: Error::throw > > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > >>>live/Bio/Root/Root.pm:328 > > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > > >>>live/Bio/Tools/Run/RemoteBlast.pm:256 > > >>>STACK: C:\Perl\Scripts\test_blast.pl:15 > > >>>----------------------------------------------------------- > > >>> > > >>> > > >>>>>The last line indicates a problem with this line: > > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >>>>>Changing the @params to this: > > >>>>>my @params=( -prog=>$prog, > > >>>>> > > >>>>> > > >>> -data=>$db, > > >>> -expect=>$e_val, > > >>> -readmethod=>'SearchIO'); > > >>> > > >>> > > >>>>>fixes it, and I get output as expected. > > >>>>>Christopher Fields > > >>>>> > > >>>>> > > >>>Postdoctoral Researcher - Switzer Lab > > >>>Dept. of Biochemistry > > >>>University of Illinois Urbana-Champaign > > >>> > > >>> > > >>>>>>>>-----Original Message----- > > >>>>>>>> > > >>>>>>>> > > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > >>>>Sent: Tuesday, February 14, 2006 11:48 AM > > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > >>>> > > >>>>Hi, Chris, > > >>>>When I tried with the perldoc script, It did not work either. First > it > > >>>>says $prog can not be bare word if I "use strict". I added quotes on > > the > > >>>>words, then it says the value for $prog does not match expression > > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > >>>> > > >>>> > > >>>script > > >>> > > >>> > > >>>>is shown below. Why is the expression "t?blast[pnx]"? > > >>>> > > >>>>#!/usr/bin/perl > > >>>> > > >>>>use Bio::SeqIO; > > >>>>use Bio::Seq; > > >>>>use Bio::Tools::Run::RemoteBlast; > > >>>>use Bio::SearchIO; > > >>>> > > >>>> > > >>>>my $prog=blastp; > > >>>>my $db=swissprot; > > >>>>my $e_val=1e-10; > > >>>>my @params=( prog=>$prog, > > >>>> data=>$db, > > >>>> expect=>$e_val, > > >>>> readmethod=>'SearchIO'); > > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > >>>> > > >>>>my $v = 1; > > >>>> > > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > >>>> > > >>>>while (my $input = $str->next_seq()){ > > >>>> #Blast a sequence against a database: > > >>>> #Alternatively, you could pass in a file with many > > >>>> #sequences rather than loop through sequence one at a time > > >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >>>> #and swap the two lines below for an example of that. > > >>>> my $r = $factory->submit_blast($input); > > >>>> #my $r = $factory->submit_blast('amino.fa'); > > >>>> print STDERR "waiting..." if( $v > 0 ); > > >>>> while ( my @rids = $factory->each_rid ) { > > >>>> foreach my $rid ( @rids ) { > > >>>> my $rc = $factory->retrieve_blast($rid); > > >>>> if( !ref($rc) ) { > > >>>> if( $rc < 0 ) { > > >>>> $factory->remove_rid($rid); > > >>>> } > > >>>> print STDERR "." if ( $v > 0 ); > > >>>> sleep 5; > > >>>> } else { > > >>>> my $result = $rc->next_result(); > > >>>> #save the output > > >>>> my $filename = $result->query_name()."\.out"; > > >>>> $factory->save_output($filename); > > >>>> $factory->remove_rid($rid); > > >>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>> while ( my $hit = $result->next_hit ) { > > >>>> next unless ( $v > 0); > > >>>> print "\thit name is ", $hit->name, "\n"; > > >>>> while( my $hsp = $hit->next_hsp ) { > > >>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>> } > > >>>> } > > >>>> } > > >>>> } > > >>>> } > > >>>>} > > >>>> > > >>>>Thank you for your help! > > >>>> > > >>>> > > >>>>Guojun > > >>>>Department of Plant Biology > > >>>>University of Georgia > > >>>> > > >>>>----- Original Message ----- > > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>To: gyang at plantbio.uga.edu > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>> > > >>>> > > >>>> > > >>>> > > >>>>>Try two things: > > >>>>> > > >>>>> > > >>>>>>1) Use a much simpler script, like the one in 'perldoc > > >>>>>> > > >>>>>> > > >>>>>Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > > >>>>> > > >>>>> > > >>>>wrong > > >>>> > > >>>> > > >>>>>with the logic in your subroutine: > > >>>>> > > >>>>> > > >>>>>>my $v = 1; > > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' > ); > > >>>>>>while (my $input = $str->next_seq()){ > > >>>>>> > > >>>>>> > > >>>>> #Blast a sequence against a database: > > >>>>> #Alternatively, you could pass in a file with many > > >>>>> #sequences rather than loop through sequence one at a time > > >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > >>>>> #and swap the two lines below for an example of that. > > >>>>> my $r = $factory->submit_blast($input); > > >>>>> #my $r = $factory->submit_blast('amino.fa'); > > >>>>> print STDERR "waiting..." if( $v > 0 ); > > >>>>> while ( my @rids = $factory->each_rid ) { > > >>>>> foreach my $rid ( @rids ) { > > >>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>> if( !ref($rc) ) { > > >>>>> if( $rc < 0 ) { > > >>>>> $factory->remove_rid($rid); > > >>>>> } > > >>>>> print STDERR "." if ( $v > 0 ); > > >>>>> sleep 5; > > >>>>> } else { > > >>>>> my $result = $rc->next_result(); > > >>>>> #save the output > > >>>>> my $filename = $result->query_name()."\.out"; > > >>>>> $factory->save_output($filename); > > >>>>> $factory->remove_rid($rid); > > >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>> while ( my $hit = $result->next_hit ) { > > >>>>> next unless ( $v > 0); > > >>>>> print "\thit name is ", $hit->name, "\n"; > > >>>>> while( my $hsp = $hit->next_hsp ) { > > >>>>> print "\t\tscore is ", $hsp->score, "\n"; > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>> } > > >>>>>} > > >>>>> > > >>>>> > > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works. It > > >>>>>> > > >>>>>> > > >>>really > > >>> > > >>> > > >>>>>shouldn't make that much of a difference, but I noticed that the > CVS > > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > > >>>>>released; the Bugzilla version is based off CVS. > > >>>>> > > >>>>> > > >>>>>>Christopher Fields > > >>>>>> > > >>>>>> > > >>>>>Postdoctoral Researcher - Switzer Lab > > >>>>>Dept. of Biochemistry > > >>>>>University of Illinois Urbana-Champaign > > >>>>> > > >>>>> > > >>>>>>>-----Original Message----- > > >>>>>>> > > >>>>>>> > > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>Sent: Monday, February 13, 2006 3:00 PM > > >>>>>>To: bioperl-l at lists.open-bio.org > > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>> > > >>>>>> > > >>>>>>>>Thanks, Chris, > > >>>>>>>> > > >>>>>>>> > > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the > > >>>>>> > > >>>>>> > > >>>one > > >>> > > >>> > > >>>>from > > >>>> > > >>>> > > >>>>>>your bug report. The running version is 1.5 when I use the command > > >>>>>> > > >>>>>> > > >>>you > > >>> > > >>> > > >>>>>>sent me. But when I tried the script, it doesn't change much. My > > >>>>>>remoteblast code (portion) is here: > > >>>>>> > > >>>>>> > > >>>>>>>>sub search { > > >>>>>>>> > > >>>>>>>> > > >>>>>>local > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > >>>>>>local > > >>>>>> > > >>>>>> > > >>>>>> > > > >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > >>> > > >>> > > >>>>>>'no'; > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > >>>>>> -id=>"query", > > >>>>>> -desc=>"new seq"); > > >>>>>>my $len=$query->length(); > > >>>>>>@db=('nr','htgs','wgs'); > > >>>>>>foreach my $db (@db) { > > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' > =>'blastn', > > >>>>>> '-data' =>"$db", > > >>>>>> > > >>>>>> > > >>>>>> > > >>'-expect'=>"$E_value"); > > >> > > >> > > >>>>>>>>>>my $blast_report = $factory->submit_blast($query); > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>my @rids = $factory->each_rid(); > > >>>>>>>> > > >>>>>>>> > > >>>>>>foreach my $rid ( @rids ) { > > >>>>>> print STDERR "$rid\n"; > > >>>>>>} > > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > >>>>>>print STDERR "waiting..."; > > >>>>>>sleep 60; > > >>>>>> > > >>>>>> > > >>>>>>>>foreach my $rid ( @rids ) { > > >>>>>>>> > > >>>>>>>> > > >>>>>> my $rc = $factory->retrieve_blast($rid); > > >>>>>> while (!ref($rc) ) { > > >>>>>> if( $rc < 0 ) { > > >>>>>># retrieve_blast returns -1 on error > > >>>>>> $factory->remove_rid($rid); > > >>>>>> print "Error!\n"; > > >>>>>> send_error($email,$function,$seqname,$queryname[$ST]); > > >>>>>> die "Can't retrieve $rid"; > > >>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > >>>>>> > > >>>>>> > > >>>finished' > > >>> > > >>> > > >>>>>> sleep 60; > > >>>>>> $rc = $factory->retrieve_blast($rid); > > >>>>>> } > > >>>>>> } > > >>>>>> if (ref($rc)) { > > >>>>>> print STDERR "Done.\n"; > > >>>>>> while( my $result = $rc->next_result) { > > >>>>>> while( my $hit = $result->next_hit()) { > > >>>>>> $hit_name=$hit->name; > > >>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > >>>>>> $name=$1; > > >>>>>> @left_plus_start=(); > > >>>>>> @left_plus_end=(); > > >>>>>> @left_minus_start=(); > > >>>>>> @left_minus_end=(); > > >>>>>> @right_plus_start=(); > > >>>>>> @right_plus_end=(); > > >>>>>> @right_minus_start=(); > > >>>>>> @right_minus_end=(); > > >>>>>> > > >>>>>> > > >>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > >>>>>>>> > > >>>>>>>> > > >>>>>> while( my $hsp = $hit->next_hsp()) { > > >>>>>>...... > > >>>>>> > > >>>>>> > > >>>>>>>>It was working quite well before around October laster year, but > > >>>>>>>> > > >>>>>>>> > > >>>>it has > > >>>> > > >>>> > > >>>>>>stopped since then, When a submission is sent via a webpage, the > cgi > > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there, > > >>>>>> > > >>>>>> > > >>>>finally > > >>>> > > >>>> > > >>>>>>the expected email is received but without real results although > it > > >>>>>> > > >>>>>> > > >>>>does > > >>>> > > >>>> > > >>>>>>contain something from other parts of the script. Apparently the > > >>>>>> > > >>>>>> > > >>>>search > > >>>> > > >>>> > > >>>>>>sub did not return anything (I know there is something should be > > >>>>>>returned.). Is it also possible the format of the NCBI output for > > >>>>>> > > >>>>>> > > >>>each > > >>> > > >>> > > >>>>>>result has changed? > > >>>>>>Thank you, > > >>>>>>Guojun > > >>>>>> > > >>>>>> > > >>>>>>>>>>Department of Plant Biology > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>University of Georgia > > >>>>>> > > >>>>>> > > >>>>>>>>>>>>----- Original Message ----- > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>> > > >>>>>> > > >>>>>>>>>>>How do you know two versions are installed (i.e. how are > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>you > > >>> > > >>> > > >>>>checking > > >>>> > > >>>> > > >>>>>>the > > >>>>>> > > >>>>>> > > >>>>>>>version)? Do you see have two complete bioperl distributions (in > > >>>>>>> > > >>>>>>> > > >>>>two > > >>>> > > >>>> > > >>>>>>>separate directories) or are you looking in modules? Here's the > > >>>>>>> > > >>>>>>> > > >>>way > > >>> > > >>> > > >>>>to > > >>>> > > >>>> > > >>>>>>>check the version (from the FAQ): > > >>>>>>> > > >>>>>>> > > >>>>>>>>perl -MBio::Root::Version -e 'print > > >>>>>>>> > > >>>>>>>> > > >>>>$Bio::Root::Version::VERSION,"\n"' > > >>>> > > >>>> > > >>>>>>>>If you have two full bioperl distributions on your computer, > > >>>>>>>> > > >>>>>>>> > > >>>>normally > > >>>> > > >>>> > > >>>>>>only > > >>>>>> > > >>>>>> > > >>>>>>>one will be in use unless you have explicitly set the environment > > >>>>>>> > > >>>>>>> > > >>>>>>variable > > >>>>>> > > >>>>>> > > >>>>>>>PERL5LIB. The PERL5LIB directories will be searched first > before > > >>>>>>> > > >>>>>>> > > >>>>your > > >>>> > > >>>> > > >>>>>>>normal perl directory list (@INC) is searched. You MAY get some > > >>>>>>> > > >>>>>>> > > >>>>mixing > > >>>> > > >>>> > > >>>>>>>then, but only if perl can't find a particular module in the path > > >>>>>>> > > >>>>>>> > > >>>>>>designated > > >>>>>> > > >>>>>> > > >>>>>>>in PERL5LIB; then it will progress through the directories listed > > >>>>>>> > > >>>>>>> > > >>>in > > >>> > > >>> > > >>>>>>@INC. > > >>>>>> > > >>>>>> > > >>>>>>>This may happen if a module is unique to a particular release, > but > > >>>>>>> > > >>>>>>> > > >>>>>>shouldn't > > >>>>>> > > >>>>>> > > >>>>>>>happen for the majority of modules, including RemoteBlast. You > > >>>>>>> > > >>>>>>> > > >>>can > > >>> > > >>> > > >>>>>>check > > >>>>>> > > >>>>>> > > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > > >>>>>>> > > >>>>>>> > > >>>>differ > > >>>> > > >>>> > > >>>>>>>depending on your OS, perl build, etc. > > >>>>>>> > > >>>>>>> > > >>>>>>>>Regardless, if you follow the directions for installing bioperl > > >>>>>>>> > > >>>>>>>> > > >>>>for > > >>>> > > >>>> > > >>>>>>your > > >>>>>> > > >>>>>> > > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install', > > >>>>>>> > > >>>>>>> > > >>>>unless > > >>>> > > >>>> > > >>>>>>you > > >>>>>> > > >>>>>> > > >>>>>>>explicitly change the installation directory when using 'perl > > >>>>>>> > > >>>>>>> > > >>>>>>Makefile.PL'), > > >>>>>> > > >>>>>> > > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will > > >>>>>>> > > >>>>>>> > > >>>>install > > >>>> > > >>>> > > >>>>>>the > > >>>>>> > > >>>>>> > > >>>>>>>Bioperl distribution you downloaded over the old version in @INC. > > >>>>>>> > > >>>>>>> > > >>>>See > > >>>> > > >>>> > > >>>>>>this > > >>>>>> > > >>>>>> > > >>>>>>>page: > > >>>>>>> > > >>>>>>> > > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > >>>>>>>>for more details. > > >>>>>>>>Christopher Fields > > >>>>>>>> > > >>>>>>>> > > >>>>>>>Postdoctoral Researcher - Switzer Lab > > >>>>>>>Dept. of Biochemistry > > >>>>>>>University of Illinois Urbana-Champaign > > >>>>>>> > > >>>>>>> > > >>>>>>>>>>-----Original Message----- > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM > > >>>>>>>>To: bioperl-l at lists.open-bio.org > > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>Hi, Chris, > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>I do have different versions of bioperl on my Linux machine > > >>>>>>>> > > >>>>>>>> > > >>>(1.4. > > >>> > > >>> > > >>>>and > > >>>> > > >>>> > > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl- > > >>>>>>>> > > >>>>>>>> > > >>>>1.5.1 > > >>>> > > >>>> > > >>>>>>or I > > >>>>>> > > >>>>>> > > >>>>>>>>need to uninstall and remove the previous versions. I could not > > >>>>>>>> > > >>>>>>>> > > >>>>find > > >>>> > > >>>> > > >>>>>>any > > >>>>>> > > >>>>>> > > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me > > >>>>>>>> > > >>>>>>>> > > >>>>some > > >>>> > > >>>> > > >>>>>>>>suggestion? > > >>>>>>>>Thanks, > > >>>>>>>>Guojun > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>Department of Plant Biology > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>University of Georgia > > >>>>>>>> _____ > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > >>>>>>>> > > >>>>>>>> > > >>>>>>version > > >>>>>> > > >>>>>> > > >>>>>>>>1.28 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>updated from CVS > > >>>>>> > > >>>>>> > > >>>>>>>>which isn't the latest fix. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>Make sure that you check the following: > > >>>>>>>>>>1) Always post to the mailing list: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>(CVS) > > >>>> > > >>>> > > >>>>>>>>installed first. Perform a clean installation; do not upgrade > > >>>>>>>> > > >>>>>>>> > > >>>>only > > >>>> > > >>>> > > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > >>>>>>>> > > >>>>>>>> > > >>>can't > > >>> > > >>> > > >>>>>>>>guarantee that mixing modules from old and new distributions > > >>>>>>>> > > >>>>>>>> > > >>>(1.4 > > >>> > > >>> > > >>>>and > > >>>> > > >>>> > > >>>>>>>>1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be > > >>>>>>>> > > >>>>>>>> > > >>>>saved > > >>>> > > >>>> > > >>>>>>and > > >>>>>> > > >>>>>> > > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI > > >>>>>>>> > > >>>>>>>> > > >>>>>>(v2.2.13) > > >>>>>> > > >>>>>> > > >>>>>>>>but it should still save it. I believe as long as next_results() > > >>>>>>>> > > >>>>>>>> > > >>>>isn't > > >>>> > > >>>> > > >>>>>>>>called, it will work. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST > > >>>>>>>>>> > > >>>>>>>>>> > > >>>2.2.13 > > >>> > > >>> > > >>>>>>text output > > >>>>>> > > >>>>>> > > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by > > >>>>>>>> > > >>>>>>>> > > >>>Roger > > >>> > > >>> > > >>>>Hall > > >>>> > > >>>> > > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be > > >>>>>>>> > > >>>>>>>> > > >>>>(Jason > > >>>> > > >>>> > > >>>>>>or > > >>>>>> > > >>>>>> > > >>>>>>>>whomever is in charge of Bio::SearchIO). They can be found in > > >>>>>>>> > > >>>>>>>> > > >>>>>>Bugzilla: > > >>>>>> > > >>>>>> > > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>option > > >>>> > > >>>> > > >>>>>>of > > >>>>>> > > >>>>>> > > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using > > >>>>>>>> > > >>>>>>>> > > >>>>this > > >>>> > > >>>> > > >>>>>>>>option. And, remember, they haven't been committed yet to CVS, > > >>>>>>>> > > >>>>>>>> > > >>>>which > > >>>> > > >>>> > > >>>>>>>>means that the final version will change to refle the new > > >>>>>>>> > > >>>>>>>> > > >>>version. > > >>> > > >>> > > >>>>>>>>>>>>Christopher Fields > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab > > >>>>>>>>Dept. of Biochemistry > > >>>>>>>>University of Illinois Urbana-Champaign > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>>> _____ > > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM > > >>>>>>>>To: Chris Fields > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > >>>>>>>> > > >>>>>>>> > > >>>>>>version > > >>>>>> > > >>>>>> > > >>>>>>>>1.28 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>>>Hi, Chris > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>for > > >>>> > > >>>> > > >>>>>>my cgi > > >>>>>> > > >>>>>> > > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't > > >>>>>>>> > > >>>>>>>> > > >>>>even > > >>>> > > >>>> > > >>>>>>get > > >>>>>> > > >>>>>> > > >>>>>>>>any RID. Is there any suggestion? > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>>>>>Guojun > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>Guojun Yang > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>Department of Plant Biology > > >>>>>>>>University of Georgia > > >>>>>>>>Tel: 706-542-1857 > > >>>>>>>>Fax: 706-542-1805 > > >>>>>>>>http://www.arches.uga.edu/~guojun > > >>>>>>>> _____ > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > >>>>>>>> > > >>>>>>>> > > >>>>>>version > > >>>>>> > > >>>>>> > > >>>>>>>>1.28 > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>I would say give the new code a try, but realize that it > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>hasn't > > >>>> > > >>>> > > >>>>>>been > > >>>>>> > > >>>>>> > > >>>>>>>>checked > > >>>>>>>>in (like I said below). I will try going over the modified > > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is > > >>>>>>>> > > >>>>>>>> > > >>>>anything I > > >>>> > > >>>> > > >>>>>>>>might > > >>>>>>>>have missed. The changed order in the header of BLAST text > > >>>>>>>> > > >>>>>>>> > > >>>output > > >>> > > >>> > > >>>>has > > >>>> > > >>>> > > >>>>>>me a > > >>>>>> > > >>>>>> > > >>>>>>>>bit worried that it might not catch everything, but it at least > > >>>>>>>> > > >>>>>>>> > > >>>>>>doesn't > > >>>>>> > > >>>>>> > > >>>>>>>>hang > > >>>>>>>>in the while() loop I described in the bug report below (bug > > >>>>>>>> > > >>>>>>>> > > >>>>#1934) > > >>>> > > >>>> > > >>>>>>and > > >>>>>> > > >>>>>> > > >>>>>>>>seems to process everything fine. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>If you want more stability in the code, you might consider > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>changing over > > >>>>>> > > >>>>>> > > >>>>>>>>to > > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are > > >>>>>>>> > > >>>>>>>> > > >>>>some > > >>>> > > >>>> > > >>>>>>>>changes > > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > >>>>>>>> > > >>>>>>>> > > >>>>saving > > >>>> > > >>>> > > >>>>>>XML > > >>>>>> > > >>>>>> > > >>>>>>>>output, but I believe it parses everything regardless. If you > > >>>>>>>> > > >>>>>>>> > > >>>look > > >>> > > >>> > > >>>>>>back > > >>>>>> > > >>>>>> > > >>>>>>>>the > > >>>>>>>>last month or so there has been a bit of discussion here about > > >>>>>>>> > > >>>>>>>> > > >>>it. > > >>> > > >>> > > >>>>>>Jason > > >>>>>> > > >>>>>> > > >>>>>>>>describes a bit on how to set up RemoteBlast for XML: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>remoteblast/ > > >>>>>> > > >>>>>> > > >>>>>>>>>>Christopher Fields > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab > > >>>>>>>>Dept. of Biochemistry > > >>>>>>>>University of Illinois Urbana-Champaign > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>>-----Original Message----- > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM > > >>>>>>>>>To: bioperl-l at bioperl.org > > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > >>>>>>>>> > > >>>>>>>>> > > >>>>version > > >>>> > > >>>> > > >>>>>>1.28 > > >>>>>> > > >>>>>> > > >>>>>>>>>Hi, Everybody, > > >>>>>>>>>I see this post and am wondering if this is the reason for the > > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named > > >>>>>>>>> > > >>>>>>>>> > > >>>>MAK, > > >>>> > > >>>> > > >>>>>>for > > >>>>>> > > >>>>>> > > >>>>>>>>MITE > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>sequence analysis. It was working very well until around > > >>>>>>>>> > > >>>>>>>>> > > >>>>November > > >>>> > > >>>> > > >>>>>>2005, > > >>>>>> > > >>>>>> > > >>>>>>>>>when it stopped returning any result (the site is fine and > > >>>>>>>>> > > >>>>>>>>> > > >>>seems > > >>> > > >>> > > >>>>to > > >>>> > > >>>> > > >>>>>>be > > >>>>>> > > >>>>>> > > >>>>>>>>>doing sth after submission). In the CGI script, I used > > >>>>>>>>> > > >>>>>>>>> > > >>>>remoteblast > > >>>> > > >>>> > > >>>>>>(that > > >>>>>> > > >>>>>> > > >>>>>>>>>work was done in 2003) to do searches. I currently do not have > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>access to > > >>>>>> > > >>>>>> > > >>>>>>>>>the server because I moved. Quite several people sent emails > > >>>>>>>>> > > >>>>>>>>> > > >>>to > > >>> > > >>> > > >>>>us > > >>>> > > >>>> > > >>>>>>about > > >>>>>> > > >>>>>> > > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the > > >>>>>>>>> > > >>>>>>>>> > > >>>>problem? > > >>>> > > >>>> > > >>>>>>>>Should > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new > > >>>>>>>>> > > >>>>>>>>> > > >>>>version? > > >>>> > > >>>> > > >>>>>>>>>Thanks a lot, > > >>>>>>>>>Guojun > > >>>>>>>>> > > >>>>>>>>>Department of Plant Biology > > >>>>>>>>>University of Georgia > > >>>>>>>>>Tel: 706-542-1857 > > >>>>>>>>>Fax: 706-542-1805 > > >>>>>>>>>http://www.arches.uga.edu/~guojun > > >>>>>>>>>_____ > > >>>>>>>>> > > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > >>>>>>>>> > > >>>>>>>>> > > >>>>Jian' > > >>>> > > >>>> > > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > >>>>>>>>> > > >>>>>>>>> > > >>>[mailto:bioperl- > > >>> > > >>> > > >>>>>>>>>l at bioperl.org] > > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > >>>>>>>>> > > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live > > >>>>>>>>> > > >>>>>>>>> > > >>>>CVS. > > >>>> > > >>>> > > >>>>>>It > > >>>>>> > > >>>>>> > > >>>>>>>>>will > > >>>>>>>>>work for saving text output. However, it will not parse > > >>>>>>>>> > > >>>>>>>>> > > >>>anything > > >>> > > >>> > > >>>>>>using > > >>>>>> > > >>>>>> > > >>>>>>>>>next_result (it will likely hang) and will not save XML > > >>>>>>>>> > > >>>>>>>>> > > >>>format. > > >>> > > >>> > > >>>>See > > >>>> > > >>>> > > >>>>>>>>these > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>bugs: > > >>>>>>>>> > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > >>>>>>>>> > > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast > > >>>>>>>>> > > >>>>>>>>> > > >>>and > > >>> > > >>> > > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in > > >>>>>>>>> > > >>>>>>>>> > > >>>>yet > > >>>> > > >>>> > > >>>>>>so > > >>>>>> > > >>>>>> > > >>>>>>>>are > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>still not included in bioperl-live; they may be further > > >>>>>>>>> > > >>>>>>>>> > > >>>modified > > >>> > > >>> > > >>>>>>before > > >>>>>> > > >>>>>> > > >>>>>>>>>committing to CVS. If you're not worried about XML, you could > > >>>>>>>>> > > >>>>>>>>> > > >>>>just > > >>>> > > >>>> > > >>>>>>try > > >>>>>> > > >>>>>> > > >>>>>>>>the > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>first fix, which is a change to SearchIO::blast. > > >>>>>>>>> > > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>script > > >>>>>> > > >>>>>> > > >>>>>>>>>which > > >>>>>>>>>had problems; the script you used saves the output but doesn't > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>actually > > >>>>>> > > >>>>>> > > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the > > >>>>>>>>> > > >>>>>>>>> > > >>>>data). > > >>>> > > >>>> > > >>>>>>Is > > >>>>>> > > >>>>>> > > >>>>>>>>the > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have > > >>>>>>>>> > > >>>>>>>>> > > >>>you > > >>> > > >>> > > >>>>>>tried > > >>>>>> > > >>>>>> > > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "- > > >>>>>>>>> > > >>>>>>>>> > > >>>>readmethod > > >>>> > > >>>> > > >>>>>>=> > > >>>>>> > > >>>>>> > > >>>>>>>>>blast" > > >>>>>>>>>using your version of RemoteBlast and method next_result()? > > >>>>>>>>> > > >>>>>>>>> > > >>>Like > > >>> > > >>> > > >>>>>>below > > >>>>>> > > >>>>>> > > >>>>>>>>>(from > > >>>>>>>>>perldoc): > > >>>>>>>>> > > >>>>>>>>>while ( my @rids = $factory->each_rid ) { > > >>>>>>>>>foreach my $rid ( @rids ) { > > >>>>>>>>>my $rc = $factory->retrieve_blast($rid); > > >>>>>>>>>if( !ref($rc) ) { > > >>>>>>>>>if( $rc < 0 ) { > > >>>>>>>>>$factory->remove_rid($rid); > > >>>>>>>>>} > > >>>>>>>>>print STDERR "." if ( $v > 0 ); > > >>>>>>>>>sleep 5; > > >>>>>>>>>} else { # parsing > > >>>>>>>>>starts here > > >>>>>>>>>my $result = $rc->next_result(); # it should hang > > >>>>>>>>>here > > >>>>>>>>>#save the output > > >>>>>>>>>my $filename = $result->query_name()."\.out"; > > >>>>>>>>>$factory->save_output($filename); > > >>>>>>>>>$factory->remove_rid($rid); > > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n"; > > >>>>>>>>>while ( my $hit = $result->next_hit ) { > > >>>>>>>>>next unless ( $v > 0); > > >>>>>>>>>print "\thit name is ", $hit->name, "\n"; > > >>>>>>>>>while( my $hsp = $hit->next_hsp ) { > > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n"; > > >>>>>>>>>} > > >>>>>>>>>} > > >>>>>>>>>} > > >>>>>>>>>} > > >>>>>>>>>} > > >>>>>>>>>} > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>My script hanged if I used next_result() in any way prior to > > >>>>>>>>> > > >>>>>>>>> > > >>>the > > >>> > > >>> > > >>>>>>fixes. > > >>>>>> > > >>>>>> > > >>>>>>>>I > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>want to see how many others are having the same issues with > > >>>>>>>>> > > >>>>>>>>> > > >>>>parsing > > >>>> > > >>>> > > >>>>>>>>using > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>the CVS version of bioperl-live. > > >>>>>>>>> > > >>>>>>>>>Christopher Fields > > >>>>>>>>>Postdoctoral Researcher - Switzer Lab > > >>>>>>>>>Dept. of Biochemistry > > >>>>>>>>>University of Illinois Urbana-Champaign > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>>-----Original Message----- > > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > > >>>>>>>>>> > > >>>>>>>>>> > > >>>l- > > >>> > > >>> > > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM > > >>>>>>>>>>To: Huang Jian; bioperl-l > > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > >>>>>>>>>> > > >>>>>>>>>>Hi Huang, > > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>works > > >>>> > > >>>> > > >>>>>>on > > >>>>>> > > >>>>>> > > >>>>>>>>the > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>logic of checking the temporary file size to determine > > >>>>>>>>>> > > >>>>>>>>>> > > >>>whether > > >>> > > >>> > > >>>>the > > >>>> > > >>>> > > >>>>>>>>Blast > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>results are ready. This condition is not getting satisfied > > >>>>>>>>>> > > >>>>>>>>>> > > >>>may > > >>> > > >>> > > >>>>be > > >>>> > > >>>> > > >>>>>>due > > >>>>>> > > >>>>>> > > >>>>>>>>to > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>some changes brought about by NCBI. I had this problem > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>recently > > >>>> > > >>>> > > >>>>>>and > > >>>>>> > > >>>>>> > > >>>>>>>>>>figured out that the solution was to use the latest version > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>which > > >>>> > > >>>> > > >>>>>>has > > >>>>>> > > >>>>>> > > >>>>>>>>>>this problem fixed (does not use file size logic any more) > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>which > > >>>> > > >>>> > > >>>>>>is > > >>>>>> > > >>>>>> > > >>>>>>>>not > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>yet included in the BioPerl package. > > >>>>>>>>>>Cheers > > >>>>>>>>>>Nagesh > > >>>>>>>>>> > > >>>>>>>>>>Huang Jian wrote: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>>>Dear Nagesh, > > >>>>>>>>>>> > > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>you > > >>>> > > >>>> > > >>>>>>send > > >>>>>> > > >>>>>> > > >>>>>>>>>>>me. Now it works perfectly!!! > > >>>>>>>>>>> > > >>>>>>>>>>>Thank you!! > > >>>>>>>>>>> > > >>>>>>>>>>>Huang > > >>>>>>>>>>> > > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka" > > >>>>>>>>>>> > > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l" > > >>>>>>>>>>> > > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM > > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>net, > > >>> > > >>> > > >>>>so > > >>>> > > >>>> > > >>>>>>still > > >>>>>> > > >>>>>> > > >>>>>>>>>>>via email > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>>>Hi Huang, > > >>>>>>>>>>>>I see that you are submitting a sequence for a remote > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>blast > > >>> > > >>> > > >>>>>>search. > > >>>>>> > > >>>>>> > > >>>>>>>>>Can > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28 > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>(2005/12/09). > > >>>>>> > > >>>>>> > > >>>>>>>>If > > >>>>>>>> > > >>>>>>>> > > >>>>>>>>>>>>not I have attached it with this email, try to replace it > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>with > > >>>> > > >>>> > > >>>>>>the > > >>>>>> > > >>>>>> > > >>>>>>>>>old > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>>>>one which has a bug. > > >>>>>>>>>>>>Let me know if it works. > > >>>>>>>>>>>>Nagesh > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>_______________________________________________ > > >>>>>>>>>>Bioperl-l mailing list > > >>>>>>>>>>Bioperl-l at lists.open-bio.org > > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>_______________________________________________ > > >>>>>>>>>Bioperl-l mailing list > > >>>>>>>>>Bioperl-l at lists.open-bio.org > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>_______________________________________________ > > >>>>>>>>>Bioperl-l mailing list > > >>>>>>>>>Bioperl-l at lists.open-bio.org > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>_______________________________________________ > > >>>>>> > > >>>>>> > > >>>>>>>>Bioperl-l mailing list > > >>>>>>>>Bioperl-l at lists.open-bio.org > > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>>>> > > >>>>>>>>_______________________________________________ > > >>>>>>>> > > >>>>>>>> > > >>>>>>Bioperl-l mailing list > > >>>>>>Bioperl-l at lists.open-bio.org > > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > >>>>>> > > >>>>>> > > >>>>>> > > > > > >_______________________________________________ > > >Bioperl-l mailing list > > >Bioperl-l at lists.open-bio.org > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gyang at plantbio.uga.edu Mon Feb 20 17:22:28 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Mon, 20 Feb 2006 17:22:28 -0500 Subject: [Bioperl-l] Tested-OK Message-ID: <20060220172228.f7d22947@dogwood.plantbio.uga.edu> Chris, I tested the latest fix for blast.pm on my linux with blastn. It worked very well although my CGI script still not returning what I need, but it's not related to this parsing of blast results I think. Thanks for your great efforts. Guojun ----- Original Message ----- From: Chris Fields [mailto:cjfields at uiuc.edu] To: 'Chris Fields' [mailto:cjfields at uiuc.edu], 'Pieter Monsieurs' [mailto:Pieter.Monsieurs at esat.kuleuven.be], gyang at plantbio.uga.edu Cc: bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28 > Guojun Yang pointed out that his BLAST output was still not parsed > correctly, so I posted another change: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > The direct link for the module is: > > http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view > > Note that all caveats (can't sue if computer blows up, this is a very > preliminary bugfix, etc.) apply. > > Apparently, NCBI has changed blastn and tblastx output to show features in > the region for each HSP, starting with the either one of the following > lines: > > Features in this part of subject sequence: > Features flanking this part of subject sequence: > > If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13, > most blastn or tblastx report parsing seems to choke on these lines, unless > you are pretty lucky. This extra little feature was introduced a while back > for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default > and hadn't starting affecting web output until this last fall. The first > fix I posted caught only the first version but not the second > > The fix included a loop with debugging output to bypass this for now. If > you use SearchIO directly for parsing (not through RemoteBlast) you can see > the bypassed lines by setting the '-verbose' flag to 1. > > Thanks to Guojun Yang for pointing this out. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Chris Fields > > Sent: Monday, February 20, 2006 11:01 AM > > To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on > > RemoteBlast.pmversion 1.28 > > > > I have added a preliminary bugfix for the problems seen with nucleotide > > blast parsing for BLAST 2.2.13 reports. I passed SearchIO::blast through > > perltidy to space out the blocks (really for my own purposes; it's a > > pretty > > complex module). The fix bypasses the extra lines output for blastn and > > tblastx and now seems to parse the text output for those reports > > correctly. > > I tested it using all NCBI BLAST flavors for the last two version of BLAST > > (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST > > report parsing, such as WU-BLAST, RPS-BLAST, or Paracel. It has only been > > tested on MacOSX at the moment, so I need people out there to test it out > > on > > anything they can to make sure it works before committing. I'll be trying > > it on Windows today. Report back to me and I'll post anything on > > bugzilla. > > > > Here it is: > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs > > > Sent: Thursday, February 16, 2006 3:46 AM > > > To: gyang at plantbio.uga.edu > > > Cc: bioperl-l at lists.open-bio.org; Chris Fields > > > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on > > RemoteBlast.pm > > > version 1.28 > > > > > > Hi, > > > > > > I have the same problem with the blast.pm-file. > > > The people of NCBI added some extra info when giving the Blast-output. > > > (see e.g. "Features flanking this part..." or "Features in this part > > > ..."), example added. > > > The blast.pm module starts looking for the hsp-alignement-information, > > > but it dies when it hits this Feature-information. > > > > > > Pieter > > > > > > > > > >gi|77552765|gb|DP000011.1| > > > > > > > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group) > > > chromosome 12, complete > > > > > > sequence > > > Length=27492551 > > > > > > Features flanking this part of subject sequence: > > > > > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub- > > class > > > > > > > &from=19251479&to=19253693&view=gbwithparts> > > > > > > 2655 bp at 3' side: hypothetical protein > > > > > > > &from=19260091&to=19260600&view=gbwithparts> > > > > > > Score = 36.2 bits (18), Expect = 0.22 > > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > > Strand=Plus/Minus > > > > > > Query 4 GTACTACTCTACTCTACT 21 > > > |||||||||||||||||| > > > > > > Sbjct 19257436 GTACTACTCTACTCTACT 19257419 > > > > > > > > > Features flanking this part of subject sequence: > > > > > > 2991 bp at 5' side: hypothetical protein > > > > > > > &from=27003164&to=27003907&view=gbwithparts> > > > 1131 bp at 3' side: hypothetical protein > > > > > > > > > > &from=27008046&to=27010752&view=gbwithparts> > > > > > > Score = 36.2 bits (18), Expect = 0.22 > > > Identities = 18/18 (100%), Gaps = 0/18 (0%) > > > Strand=Plus/Minus > > > > > > Query 2 ATGTACTACTCTACTCTA 19 > > > |||||||||||||||||| > > > Sbjct 27006915 ATGTACTACTCTACTCTA 27006898 > > > > > > > > > > > > Features in this part of subject sequence: > > > DHHC zinc finger domain, putative > > > > > > > > > > &from=17614825&to=17618687&view=gbwithparts> > > > > > > Score = 34.2 bits (17), Expect = 0.87 > > > Identities = 17/17 (100%), Gaps = 0/17 (0%) > > > Strand=Plus/Plus > > > > > > Query 5 TACTACTCTACTCTACT 21 > > > ||||||||||||||||| > > > Sbjct 17616437 TACTACTCTACTCTACT 17616453 > > > > > > > > > > > > Features flanking this part of subject sequence: > > > 102 bp at 5' side: bZIP transcription factor, putative > > > > > > > > > > &from=2774964&to=2775778&view=gbwithparts> > > > 3740 bp at 3' side: yeast dcp1, putative > > > > > > > &from=2779635&to=2782508&view=gbwithparts> > > > > > > Score = 32.2 bits (16), Expect = > > > 3.4 > > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > > Strand=Plus/Plus > > > > > > Query 7 CTACTCTACTCTACTC 22 > > > |||||||||||||||| > > > Sbjct 2775880 CTACTCTACTCTACTC 2775895 > > > > > > > > > Features flanking this part of subject sequence: > > > > > > 21 bp at 5' side: peptide transporter T17F3.11, putative > > > > > > > &from=27321354&to=27323117&view=gbwithparts> > > > > > > 10230 bp at 3' side: transposon protein, putative, unclassified > > > > > > > &from=27333383&to=27334285&view=gbwithparts> > > > > > > Score = 32.2 bits (16), Expect = 3.4 > > > Identities = 16/16 (100%), Gaps = 0/16 (0%) > > > Strand=Plus/Minus > > > > > > Query 7 CTACTCTACTCTACTC 22 > > > > > > |||||||||||||||| > > > Sbjct 27323153 CTACTCTACTCTACTC 27323138 > > > > > > > > > > > > > > > Guojun Yang wrote: > > > > > > >Hi, Chris, > > > >Finally the remoteblast test script works for the amino.fa query. but > > > when I try a nucleic acid sequence (see below), Error occurs: > > > >" > > > >waiting........ > > > >------------- EXCEPTION ------------- > > > >MSG: no data for midline Features flanking this part of subject > > > sequence: > > > >STACK Bio::SearchIO::blast::next_result > > > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc > > > hIO/blast.pm:1172 > > > >STACK toplevel remoteblast_test:40 > > > >" > > > >The query sequence is: > > > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC > > > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA > > > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG > > > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG > > > > > > > >The script (basically same as the remoteblast test, I only changed > > > database to 'nr' and program to 'blastn' and filename to 'ost3'): > > > >#!/usr/bin/perl > > > > > > > >use Bio::SeqIO; > > > >use Bio::Seq; > > > >use Bio::Tools::Run::RemoteBlast; > > > >use Bio::SearchIO; > > > >use strict; > > > >my $prog='blastn'; > > > >my $db='nr'; > > > >my $e_val=1e-10; > > > >my @params=( -prog=>$prog, > > > > -data=>$db, > > > > -expect=>$e_val, > > > > -readmethod=>'SearchIO'); > > > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > > > > > >my $v = 1; > > > > > > > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' ); > > > > > > > >while (my $input = $str->next_seq()){ > > > > #Blast a sequence against a database: > > > > #Alternatively, you could pass in a file with many > > > > #sequences rather than loop through sequence one at a time > > > > #Remove the loop starting 'while (my $input = $str->next_seq())' > > > > #and swap the two lines below for an example of that. > > > > my $r = $factory->submit_blast($input); > > > > #my $r = $factory->submit_blast('amino.fa'); > > > > print STDERR "waiting..." if( $v > 0 ); > > > > while ( my @rids = $factory->each_rid ) { > > > > foreach my $rid ( @rids ) { > > > > my $rc = $factory->retrieve_blast($rid); > > > > if( !ref($rc) ) { > > > > if( $rc < 0 ) { > > > > $factory->remove_rid($rid); > > > > } > > > > print STDERR "." if ( $v > 0 ); > > > > sleep 5; > > > > } else { > > > > my $result = $rc->next_result(); > > > > #save the output > > > > my $filename = $result->query_name()."\.out"; > > > > $factory->save_output($filename); > > > > $factory->remove_rid($rid); > > > > print "\nQuery Name: ", $result->query_name(), "\n"; > > > > while ( my $hit = $result->next_hit ) { > > > > next unless ( $v > 0); > > > > print "\thit name is ", $hit->name, "\n"; > > > > while( my $hsp = $hit->next_hsp ) { > > > > print "\t\tscore is ", $hsp->score, "\n"; > > > > } > > > > } > > > > } > > > > } > > > > } > > > >} > > > > > > > > > > > >Do you think there might still be something in the NCBI output format? > > > > > > > >Thank you, > > > >Guojun > > > > > > > > > > > > > > > > > > > >Guojun Yang > > > >Department of Plant Biology > > > >University of Georgia > > > >Tel: 706-542-1857 > > > >Fax: 706-542-1805 > > > >http://www.arches.uga.edu/~guojun > > > > > > > > > > > > > > > >----- Original Message ----- > > > >From: Chris Fields [mailto:cjfields at uiuc.edu] > > > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > > > > > > > > > > > > > > > > > >>Sorry, forgot to add that I didn't see the regex issue that you > > > mentioned. > > > >>It could be a perl-related issue. Try the fixes I mentioned and see > > > what > > > >>happens. > > > >> > > > >> > > > >>>Christopher Fields > > > >>> > > > >>> > > > >>Postdoctoral Researcher - Switzer Lab > > > >>Dept. of Biochemistry > > > >>University of Illinois Urbana-Champaign > > > >> > > > >> > > > >>>>>-----Original Message----- > > > >>>>> > > > >>>>> > > > >>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > > >>>Sent: Tuesday, February 14, 2006 12:36 PM > > > >>>To: 'gyang at plantbio.uga.edu' > > > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > >>> > > > >>> > > > >>>>>It's a good habit to always add single quotes around words. The > > perl > > > >>>>> > > > >>>>> > > > >>>interpreter may think a single bare word is a subroutine or perlfunc > > > >>>called with no args so will try to find a subroutine named blastp(). > > > My > > > >>>debugger actually gives the error that the bare word blastp may > > > conflict > > > >>>with a future reserved word. Like you said, 'use strict' will point > > > that > > > >>>out. > > > >>> > > > >>> > > > >>>>>As for the regex, it should match all the blast programs at NCBI > > > (blastp, > > > >>>>> > > > >>>>> > > > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure > > nothing > > > >>>else passes through. > > > >>> > > > >>> > > > >>>>>So, if you are using the script below, there are several errors. > > The > > > bare > > > >>>>> > > > >>>>> > > > >>>words for $prog and $db need quotes, and the flags for you @params > > > array > > > >>>don't have a dash before them. I get this after adding quotes but > > > before > > > >>>adding the dashes to @params: > > > >>> > > > >>> > > > >>>>>C:\Perl\Scripts>test_blast.pl > > > >>>>>------------- EXCEPTION: Bio::Root::Exception ------------- > > > >>>>> > > > >>>>> > > > >>>MSG: > > > >>>STACK: Error::throw > > > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl- > > > >>>live/Bio/Root/Root.pm:328 > > > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter > > > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325 > > > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl- > > > >>>live/Bio/Tools/Run/RemoteBlast.pm:256 > > > >>>STACK: C:\Perl\Scripts\test_blast.pl:15 > > > >>>----------------------------------------------------------- > > > >>> > > > >>> > > > >>>>>The last line indicates a problem with this line: > > > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > >>>>>Changing the @params to this: > > > >>>>>my @params=( -prog=>$prog, > > > >>>>> > > > >>>>> > > > >>> -data=>$db, > > > >>> -expect=>$e_val, > > > >>> -readmethod=>'SearchIO'); > > > >>> > > > >>> > > > >>>>>fixes it, and I get output as expected. > > > >>>>>Christopher Fields > > > >>>>> > > > >>>>> > > > >>>Postdoctoral Researcher - Switzer Lab > > > >>>Dept. of Biochemistry > > > >>>University of Illinois Urbana-Champaign > > > >>> > > > >>> > > > >>>>>>>>-----Original Message----- > > > >>>>>>>> > > > >>>>>>>> > > > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > >>>>Sent: Tuesday, February 14, 2006 11:48 AM > > > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org > > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2 > > > >>>> > > > >>>>Hi, Chris, > > > >>>>When I tried with the perldoc script, It did not work either. First > > it > > > >>>>says $prog can not be bare word if I "use strict". I added quotes on > > > the > > > >>>>words, then it says the value for $prog does not match expression > > > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256. The > > > >>>> > > > >>>> > > > >>>script > > > >>> > > > >>> > > > >>>>is shown below. Why is the expression "t?blast[pnx]"? > > > >>>> > > > >>>>#!/usr/bin/perl > > > >>>> > > > >>>>use Bio::SeqIO; > > > >>>>use Bio::Seq; > > > >>>>use Bio::Tools::Run::RemoteBlast; > > > >>>>use Bio::SearchIO; > > > >>>> > > > >>>> > > > >>>>my $prog=blastp; > > > >>>>my $db=swissprot; > > > >>>>my $e_val=1e-10; > > > >>>>my @params=( prog=>$prog, > > > >>>> data=>$db, > > > >>>> expect=>$e_val, > > > >>>> readmethod=>'SearchIO'); > > > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params); > > > >>>> > > > >>>>my $v = 1; > > > >>>> > > > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' ); > > > >>>> > > > >>>>while (my $input = $str->next_seq()){ > > > >>>> #Blast a sequence against a database: > > > >>>> #Alternatively, you could pass in a file with many > > > >>>> #sequences rather than loop through sequence one at a time > > > >>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > > >>>> #and swap the two lines below for an example of that. > > > >>>> my $r = $factory->submit_blast($input); > > > >>>> #my $r = $factory->submit_blast('amino.fa'); > > > >>>> print STDERR "waiting..." if( $v > 0 ); > > > >>>> while ( my @rids = $factory->each_rid ) { > > > >>>> foreach my $rid ( @rids ) { > > > >>>> my $rc = $factory->retrieve_blast($rid); > > > >>>> if( !ref($rc) ) { > > > >>>> if( $rc < 0 ) { > > > >>>> $factory->remove_rid($rid); > > > >>>> } > > > >>>> print STDERR "." if ( $v > 0 ); > > > >>>> sleep 5; > > > >>>> } else { > > > >>>> my $result = $rc->next_result(); > > > >>>> #save the output > > > >>>> my $filename = $result->query_name()."\.out"; > > > >>>> $factory->save_output($filename); > > > >>>> $factory->remove_rid($rid); > > > >>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > > >>>> while ( my $hit = $result->next_hit ) { > > > >>>> next unless ( $v > 0); > > > >>>> print "\thit name is ", $hit->name, "\n"; > > > >>>> while( my $hsp = $hit->next_hsp ) { > > > >>>> print "\t\tscore is ", $hsp->score, "\n"; > > > >>>> } > > > >>>> } > > > >>>> } > > > >>>> } > > > >>>> } > > > >>>>} > > > >>>> > > > >>>>Thank you for your help! > > > >>>> > > > >>>> > > > >>>>Guojun > > > >>>>Department of Plant Biology > > > >>>>University of Georgia > > > >>>> > > > >>>>----- Original Message ----- > > > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > > >>>>To: gyang at plantbio.uga.edu > > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>>>Try two things: > > > >>>>> > > > >>>>> > > > >>>>>>1) Use a much simpler script, like the one in 'perldoc > > > >>>>>> > > > >>>>>> > > > >>>>>Bio::Tools::Run::RemoteBlast'. If this fixes it, there's something > > > >>>>> > > > >>>>> > > > >>>>wrong > > > >>>> > > > >>>> > > > >>>>>with the logic in your subroutine: > > > >>>>> > > > >>>>> > > > >>>>>>my $v = 1; > > > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' > > ); > > > >>>>>>while (my $input = $str->next_seq()){ > > > >>>>>> > > > >>>>>> > > > >>>>> #Blast a sequence against a database: > > > >>>>> #Alternatively, you could pass in a file with many > > > >>>>> #sequences rather than loop through sequence one at a time > > > >>>>> #Remove the loop starting 'while (my $input = $str->next_seq())' > > > >>>>> #and swap the two lines below for an example of that. > > > >>>>> my $r = $factory->submit_blast($input); > > > >>>>> #my $r = $factory->submit_blast('amino.fa'); > > > >>>>> print STDERR "waiting..." if( $v > 0 ); > > > >>>>> while ( my @rids = $factory->each_rid ) { > > > >>>>> foreach my $rid ( @rids ) { > > > >>>>> my $rc = $factory->retrieve_blast($rid); > > > >>>>> if( !ref($rc) ) { > > > >>>>> if( $rc < 0 ) { > > > >>>>> $factory->remove_rid($rid); > > > >>>>> } > > > >>>>> print STDERR "." if ( $v > 0 ); > > > >>>>> sleep 5; > > > >>>>> } else { > > > >>>>> my $result = $rc->next_result(); > > > >>>>> #save the output > > > >>>>> my $filename = $result->query_name()."\.out"; > > > >>>>> $factory->save_output($filename); > > > >>>>> $factory->remove_rid($rid); > > > >>>>> print "\nQuery Name: ", $result->query_name(), "\n"; > > > >>>>> while ( my $hit = $result->next_hit ) { > > > >>>>> next unless ( $v > 0); > > > >>>>> print "\thit name is ", $hit->name, "\n"; > > > >>>>> while( my $hsp = $hit->next_hsp ) { > > > >>>>> print "\t\tscore is ", $hsp->score, "\n"; > > > >>>>> } > > > >>>>> } > > > >>>>> } > > > >>>>> } > > > >>>>> } > > > >>>>>} > > > >>>>> > > > >>>>> > > > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works. It > > > >>>>>> > > > >>>>>> > > > >>>really > > > >>> > > > >>> > > > >>>>>shouldn't make that much of a difference, but I noticed that the > > CVS > > > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was > > > >>>>>released; the Bugzilla version is based off CVS. > > > >>>>> > > > >>>>> > > > >>>>>>Christopher Fields > > > >>>>>> > > > >>>>>> > > > >>>>>Postdoctoral Researcher - Switzer Lab > > > >>>>>Dept. of Biochemistry > > > >>>>>University of Illinois Urbana-Champaign > > > >>>>> > > > >>>>> > > > >>>>>>>-----Original Message----- > > > >>>>>>> > > > >>>>>>> > > > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > >>>>>>Sent: Monday, February 13, 2006 3:00 PM > > > >>>>>>To: bioperl-l at lists.open-bio.org > > > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > >>>>>> > > > >>>>>> > > > >>>>>>>>Thanks, Chris, > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the > > > >>>>>> > > > >>>>>> > > > >>>one > > > >>> > > > >>> > > > >>>>from > > > >>>> > > > >>>> > > > >>>>>>your bug report. The running version is 1.5 when I use the command > > > >>>>>> > > > >>>>>> > > > >>>you > > > >>> > > > >>> > > > >>>>>>sent me. But when I tried the script, it doesn't change much. My > > > >>>>>>remoteblast code (portion) is here: > > > >>>>>> > > > >>>>>> > > > >>>>>>>>sub search { > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>local > > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN"; > > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7; > > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000; > > > >>>>>>local > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > > > >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= > > > >>> > > > >>> > > > >>>>>>'no'; > > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1'; > > > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]", > > > >>>>>> -id=>"query", > > > >>>>>> -desc=>"new seq"); > > > >>>>>>my $len=$query->length(); > > > >>>>>>@db=('nr','htgs','wgs'); > > > >>>>>>foreach my $db (@db) { > > > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' > > =>'blastn', > > > >>>>>> '-data' =>"$db", > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > >>'-expect'=>"$E_value"); > > > >> > > > >> > > > >>>>>>>>>>my $blast_report = $factory->submit_blast($query); > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>my @rids = $factory->each_rid(); > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>foreach my $rid ( @rids ) { > > > >>>>>> print STDERR "$rid\n"; > > > >>>>>>} > > > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638) > > > >>>>>>print STDERR "waiting..."; > > > >>>>>>sleep 60; > > > >>>>>> > > > >>>>>> > > > >>>>>>>>foreach my $rid ( @rids ) { > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>> my $rc = $factory->retrieve_blast($rid); > > > >>>>>> while (!ref($rc) ) { > > > >>>>>> if( $rc < 0 ) { > > > >>>>>># retrieve_blast returns -1 on error > > > >>>>>> $factory->remove_rid($rid); > > > >>>>>> print "Error!\n"; > > > >>>>>> send_error($email,$function,$seqname,$queryname[$ST]); > > > >>>>>> die "Can't retrieve $rid"; > > > >>>>>> } if ($rc==0) { # retrieve_blast returns 0 on 'job not > > > >>>>>> > > > >>>>>> > > > >>>finished' > > > >>> > > > >>> > > > >>>>>> sleep 60; > > > >>>>>> $rc = $factory->retrieve_blast($rid); > > > >>>>>> } > > > >>>>>> } > > > >>>>>> if (ref($rc)) { > > > >>>>>> print STDERR "Done.\n"; > > > >>>>>> while( my $result = $rc->next_result) { > > > >>>>>> while( my $hit = $result->next_hit()) { > > > >>>>>> $hit_name=$hit->name; > > > >>>>>> $hit_name =~ /\S+[|](\S+)[.]\d+[|].*/; > > > >>>>>> $name=$1; > > > >>>>>> @left_plus_start=(); > > > >>>>>> @left_plus_end=(); > > > >>>>>> @left_minus_start=(); > > > >>>>>> @left_minus_end=(); > > > >>>>>> @right_plus_start=(); > > > >>>>>> @right_plus_end=(); > > > >>>>>> @right_minus_start=(); > > > >>>>>> @right_minus_end=(); > > > >>>>>> > > > >>>>>> > > > >>>>>>>> if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) { > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>> while( my $hsp = $hit->next_hsp()) { > > > >>>>>>...... > > > >>>>>> > > > >>>>>> > > > >>>>>>>>It was working quite well before around October laster year, but > > > >>>>>>>> > > > >>>>>>>> > > > >>>>it has > > > >>>> > > > >>>> > > > >>>>>>stopped since then, When a submission is sent via a webpage, the > > cgi > > > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there, > > > >>>>>> > > > >>>>>> > > > >>>>finally > > > >>>> > > > >>>> > > > >>>>>>the expected email is received but without real results although > > it > > > >>>>>> > > > >>>>>> > > > >>>>does > > > >>>> > > > >>>> > > > >>>>>>contain something from other parts of the script. Apparently the > > > >>>>>> > > > >>>>>> > > > >>>>search > > > >>>> > > > >>>> > > > >>>>>>sub did not return anything (I know there is something should be > > > >>>>>>returned.). Is it also possible the format of the NCBI output for > > > >>>>>> > > > >>>>>> > > > >>>each > > > >>> > > > >>> > > > >>>>>>result has changed? > > > >>>>>>Thank you, > > > >>>>>>Guojun > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>Department of Plant Biology > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>University of Georgia > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>>>----- Original Message ----- > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>>How do you know two versions are installed (i.e. how are > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>you > > > >>> > > > >>> > > > >>>>checking > > > >>>> > > > >>>> > > > >>>>>>the > > > >>>>>> > > > >>>>>> > > > >>>>>>>version)? Do you see have two complete bioperl distributions (in > > > >>>>>>> > > > >>>>>>> > > > >>>>two > > > >>>> > > > >>>> > > > >>>>>>>separate directories) or are you looking in modules? Here's the > > > >>>>>>> > > > >>>>>>> > > > >>>way > > > >>> > > > >>> > > > >>>>to > > > >>>> > > > >>>> > > > >>>>>>>check the version (from the FAQ): > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>>perl -MBio::Root::Version -e 'print > > > >>>>>>>> > > > >>>>>>>> > > > >>>>$Bio::Root::Version::VERSION,"\n"' > > > >>>> > > > >>>> > > > >>>>>>>>If you have two full bioperl distributions on your computer, > > > >>>>>>>> > > > >>>>>>>> > > > >>>>normally > > > >>>> > > > >>>> > > > >>>>>>only > > > >>>>>> > > > >>>>>> > > > >>>>>>>one will be in use unless you have explicitly set the environment > > > >>>>>>> > > > >>>>>>> > > > >>>>>>variable > > > >>>>>> > > > >>>>>> > > > >>>>>>>PERL5LIB. The PERL5LIB directories will be searched first > > before > > > >>>>>>> > > > >>>>>>> > > > >>>>your > > > >>>> > > > >>>> > > > >>>>>>>normal perl directory list (@INC) is searched. You MAY get some > > > >>>>>>> > > > >>>>>>> > > > >>>>mixing > > > >>>> > > > >>>> > > > >>>>>>>then, but only if perl can't find a particular module in the path > > > >>>>>>> > > > >>>>>>> > > > >>>>>>designated > > > >>>>>> > > > >>>>>> > > > >>>>>>>in PERL5LIB; then it will progress through the directories listed > > > >>>>>>> > > > >>>>>>> > > > >>>in > > > >>> > > > >>> > > > >>>>>>@INC. > > > >>>>>> > > > >>>>>> > > > >>>>>>>This may happen if a module is unique to a particular release, > > but > > > >>>>>>> > > > >>>>>>> > > > >>>>>>shouldn't > > > >>>>>> > > > >>>>>> > > > >>>>>>>happen for the majority of modules, including RemoteBlast. You > > > >>>>>>> > > > >>>>>>> > > > >>>can > > > >>> > > > >>> > > > >>>>>>check > > > >>>>>> > > > >>>>>> > > > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'. @INC will > > > >>>>>>> > > > >>>>>>> > > > >>>>differ > > > >>>> > > > >>>> > > > >>>>>>>depending on your OS, perl build, etc. > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>>Regardless, if you follow the directions for installing bioperl > > > >>>>>>>> > > > >>>>>>>> > > > >>>>for > > > >>>> > > > >>>> > > > >>>>>>your > > > >>>>>> > > > >>>>>> > > > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install', > > > >>>>>>> > > > >>>>>>> > > > >>>>unless > > > >>>> > > > >>>> > > > >>>>>>you > > > >>>>>> > > > >>>>>> > > > >>>>>>>explicitly change the installation directory when using 'perl > > > >>>>>>> > > > >>>>>>> > > > >>>>>>Makefile.PL'), > > > >>>>>> > > > >>>>>> > > > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will > > > >>>>>>> > > > >>>>>>> > > > >>>>install > > > >>>> > > > >>>> > > > >>>>>>the > > > >>>>>> > > > >>>>>> > > > >>>>>>>Bioperl distribution you downloaded over the old version in @INC. > > > >>>>>>> > > > >>>>>>> > > > >>>>See > > > >>>> > > > >>>> > > > >>>>>>this > > > >>>>>> > > > >>>>>> > > > >>>>>>>page: > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL > > > >>>>>>>>for more details. > > > >>>>>>>>Christopher Fields > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>Postdoctoral Researcher - Switzer Lab > > > >>>>>>>Dept. of Biochemistry > > > >>>>>>>University of Illinois Urbana-Champaign > > > >>>>>>> > > > >>>>>>> > > > >>>>>>>>>>-----Original Message----- > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM > > > >>>>>>>>To: bioperl-l at lists.open-bio.org > > > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28 > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>Hi, Chris, > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>I do have different versions of bioperl on my Linux machine > > > >>>>>>>> > > > >>>>>>>> > > > >>>(1.4. > > > >>> > > > >>> > > > >>>>and > > > >>>> > > > >>>> > > > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl- > > > >>>>>>>> > > > >>>>>>>> > > > >>>>1.5.1 > > > >>>> > > > >>>> > > > >>>>>>or I > > > >>>>>> > > > >>>>>> > > > >>>>>>>>need to uninstall and remove the previous versions. I could not > > > >>>>>>>> > > > >>>>>>>> > > > >>>>find > > > >>>> > > > >>>> > > > >>>>>>any > > > >>>>>> > > > >>>>>> > > > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me > > > >>>>>>>> > > > >>>>>>>> > > > >>>>some > > > >>>> > > > >>>> > > > >>>>>>>>suggestion? > > > >>>>>>>>Thanks, > > > >>>>>>>>Guojun > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>Department of Plant Biology > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>University of Georgia > > > >>>>>>>> _____ > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu] > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org > > > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500 > > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>version > > > >>>>>> > > > >>>>>> > > > >>>>>>>>1.28 > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>updated from CVS > > > >>>>>> > > > >>>>>> > > > >>>>>>>>which isn't the latest fix. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>Make sure that you check the following: > > > >>>>>>>>>>1) Always post to the mailing list: > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance . > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>(CVS) > > > >>>> > > > >>>> > > > >>>>>>>>installed first. Perform a clean installation; do not upgrade > > > >>>>>>>> > > > >>>>>>>> > > > >>>>only > > > >>>> > > > >>>> > > > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we > > > >>>>>>>> > > > >>>>>>>> > > > >>>can't > > > >>> > > > >>> > > > >>>>>>>>guarantee that mixing modules from old and new distributions > > > >>>>>>>> > > > >>>>>>>> > > > >>>(1.4 > > > >>> > > > >>> > > > >>>>and > > > >>>> > > > >>>> > > > >>>>>>>>1.5.1, for instance) will work. A bioperl-1.5.1 or bioperl-live > > > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be > > > >>>>>>>> > > > >>>>>>>> > > > >>>>saved > > > >>>> > > > >>>> > > > >>>>>>and > > > >>>>>> > > > >>>>>> > > > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>(v2.2.13) > > > >>>>>> > > > >>>>>> > > > >>>>>>>>but it should still save it. I believe as long as next_results() > > > >>>>>>>> > > > >>>>>>>> > > > >>>>isn't > > > >>>> > > > >>>> > > > >>>>>>>>called, it will work. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>2.2.13 > > > >>> > > > >>> > > > >>>>>>text output > > > >>>>>> > > > >>>>>> > > > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by > > > >>>>>>>> > > > >>>>>>>> > > > >>>Roger > > > >>> > > > >>> > > > >>>>Hall > > > >>>> > > > >>>> > > > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be > > > >>>>>>>> > > > >>>>>>>> > > > >>>>(Jason > > > >>>> > > > >>>> > > > >>>>>>or > > > >>>>>> > > > >>>>>> > > > >>>>>>>>whomever is in charge of Bio::SearchIO). They can be found in > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>Bugzilla: > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>option > > > >>>> > > > >>>> > > > >>>>>>of > > > >>>>>> > > > >>>>>> > > > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using > > > >>>>>>>> > > > >>>>>>>> > > > >>>>this > > > >>>> > > > >>>> > > > >>>>>>>>option. And, remember, they haven't been committed yet to CVS, > > > >>>>>>>> > > > >>>>>>>> > > > >>>>which > > > >>>> > > > >>>> > > > >>>>>>>>means that the final version will change to refle the new > > > >>>>>>>> > > > >>>>>>>> > > > >>>version. > > > >>> > > > >>> > > > >>>>>>>>>>>>Christopher Fields > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>Postdoctoral Researcher - Switzer Lab > > > >>>>>>>>Dept. of Biochemistry > > > >>>>>>>>University of Illinois Urbana-Champaign > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>>> _____ > > > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu] > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM > > > >>>>>>>>To: Chris Fields > > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>version > > > >>>>>> > > > >>>>>> > > > >>>>>>>>1.28 > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>>>Hi, Chris > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>for > > > >>>> > > > >>>> > > > >>>>>>my cgi > > > >>>>>> > > > >>>>>> > > > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't > > > >>>>>>>> > > > >>>>>>>> > > > >>>>even > > > >>>> > > > >>>> > > > >>>>>>get > > > >>>>>> > > > >>>>>> > > > >>>>>>>>any RID. Is there any suggestion? > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>>>>>Guojun > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>>> > > > >>>>>>>>>>>>Guojun Yang > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>Department of Plant Biology > > > >>>>>>>>University of Georgia > > > >>>>>>>>Tel: 706-542-1857 > > > >>>>>>>>Fax: 706-542-1805 > > > >>>>>>>>http://www.arches.uga.edu/~guojun > > > >>>>>>>> _____ > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org > > > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500 > > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>version > > > >>>>>> > > > >>>>>> > > > >>>>>>>>1.28 > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>I would say give the new code a try, but realize that it > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>hasn't > > > >>>> > > > >>>> > > > >>>>>>been > > > >>>>>> > > > >>>>>> > > > >>>>>>>>checked > > > >>>>>>>>in (like I said below). I will try going over the modified > > > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is > > > >>>>>>>> > > > >>>>>>>> > > > >>>>anything I > > > >>>> > > > >>>> > > > >>>>>>>>might > > > >>>>>>>>have missed. The changed order in the header of BLAST text > > > >>>>>>>> > > > >>>>>>>> > > > >>>output > > > >>> > > > >>> > > > >>>>has > > > >>>> > > > >>>> > > > >>>>>>me a > > > >>>>>> > > > >>>>>> > > > >>>>>>>>bit worried that it might not catch everything, but it at least > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>doesn't > > > >>>>>> > > > >>>>>> > > > >>>>>>>>hang > > > >>>>>>>>in the while() loop I described in the bug report below (bug > > > >>>>>>>> > > > >>>>>>>> > > > >>>>#1934) > > > >>>> > > > >>>> > > > >>>>>>and > > > >>>>>> > > > >>>>>> > > > >>>>>>>>seems to process everything fine. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>If you want more stability in the code, you might consider > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>changing over > > > >>>>>> > > > >>>>>> > > > >>>>>>>>to > > > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are > > > >>>>>>>> > > > >>>>>>>> > > > >>>>some > > > >>>> > > > >>>> > > > >>>>>>>>changes > > > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate > > > >>>>>>>> > > > >>>>>>>> > > > >>>>saving > > > >>>> > > > >>>> > > > >>>>>>XML > > > >>>>>> > > > >>>>>> > > > >>>>>>>>output, but I believe it parses everything regardless. If you > > > >>>>>>>> > > > >>>>>>>> > > > >>>look > > > >>> > > > >>> > > > >>>>>>back > > > >>>>>> > > > >>>>>> > > > >>>>>>>>the > > > >>>>>>>>last month or so there has been a bit of discussion here about > > > >>>>>>>> > > > >>>>>>>> > > > >>>it. > > > >>> > > > >>> > > > >>>>>>Jason > > > >>>>>> > > > >>>>>> > > > >>>>>>>>describes a bit on how to set up RemoteBlast for XML: > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using- > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>remoteblast/ > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>Christopher Fields > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>Postdoctoral Researcher - Switzer Lab > > > >>>>>>>>Dept. of Biochemistry > > > >>>>>>>>University of Illinois Urbana-Champaign > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>>-----Original Message----- > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang > > > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM > > > >>>>>>>>>To: bioperl-l at bioperl.org > > > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>version > > > >>>> > > > >>>> > > > >>>>>>1.28 > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>Hi, Everybody, > > > >>>>>>>>>I see this post and am wondering if this is the reason for the > > > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>MAK, > > > >>>> > > > >>>> > > > >>>>>>for > > > >>>>>> > > > >>>>>> > > > >>>>>>>>MITE > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>sequence analysis. It was working very well until around > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>November > > > >>>> > > > >>>> > > > >>>>>>2005, > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>when it stopped returning any result (the site is fine and > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>seems > > > >>> > > > >>> > > > >>>>to > > > >>>> > > > >>>> > > > >>>>>>be > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>doing sth after submission). In the CGI script, I used > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>remoteblast > > > >>>> > > > >>>> > > > >>>>>>(that > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>work was done in 2003) to do searches. I currently do not have > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>access to > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>the server because I moved. Quite several people sent emails > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>to > > > >>> > > > >>> > > > >>>>us > > > >>>> > > > >>>> > > > >>>>>>about > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>problem? > > > >>>> > > > >>>> > > > >>>>>>>>Should > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>version? > > > >>>> > > > >>>> > > > >>>>>>>>>Thanks a lot, > > > >>>>>>>>>Guojun > > > >>>>>>>>> > > > >>>>>>>>>Department of Plant Biology > > > >>>>>>>>>University of Georgia > > > >>>>>>>>>Tel: 706-542-1857 > > > >>>>>>>>>Fax: 706-542-1805 > > > >>>>>>>>>http://www.arches.uga.edu/~guojun > > > >>>>>>>>>_____ > > > >>>>>>>>> > > > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu] > > > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>Jian' > > > >>>> > > > >>>> > > > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>[mailto:bioperl- > > > >>> > > > >>> > > > >>>>>>>>>l at bioperl.org] > > > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500 > > > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > >>>>>>>>> > > > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>CVS. > > > >>>> > > > >>>> > > > >>>>>>It > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>will > > > >>>>>>>>>work for saving text output. However, it will not parse > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>anything > > > >>> > > > >>> > > > >>>>>>using > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>next_result (it will likely hang) and will not save XML > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>format. > > > >>> > > > >>> > > > >>>>See > > > >>>> > > > >>>> > > > >>>>>>>>these > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>bugs: > > > >>>>>>>>> > > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934 > > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935 > > > >>>>>>>>> > > > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>and > > > >>> > > > >>> > > > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>yet > > > >>>> > > > >>>> > > > >>>>>>so > > > >>>>>> > > > >>>>>> > > > >>>>>>>>are > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>still not included in bioperl-live; they may be further > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>modified > > > >>> > > > >>> > > > >>>>>>before > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>committing to CVS. If you're not worried about XML, you could > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>just > > > >>>> > > > >>>> > > > >>>>>>try > > > >>>>>> > > > >>>>>> > > > >>>>>>>>the > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>first fix, which is a change to SearchIO::blast. > > > >>>>>>>>> > > > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>script > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>which > > > >>>>>>>>>had problems; the script you used saves the output but doesn't > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>actually > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>data). > > > >>>> > > > >>>> > > > >>>>>>Is > > > >>>>>> > > > >>>>>> > > > >>>>>>>>the > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>you > > > >>> > > > >>> > > > >>>>>>tried > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "- > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>readmethod > > > >>>> > > > >>>> > > > >>>>>>=> > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>blast" > > > >>>>>>>>>using your version of RemoteBlast and method next_result()? > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>Like > > > >>> > > > >>> > > > >>>>>>below > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>(from > > > >>>>>>>>>perldoc): > > > >>>>>>>>> > > > >>>>>>>>>while ( my @rids = $factory->each_rid ) { > > > >>>>>>>>>foreach my $rid ( @rids ) { > > > >>>>>>>>>my $rc = $factory->retrieve_blast($rid); > > > >>>>>>>>>if( !ref($rc) ) { > > > >>>>>>>>>if( $rc < 0 ) { > > > >>>>>>>>>$factory->remove_rid($rid); > > > >>>>>>>>>} > > > >>>>>>>>>print STDERR "." if ( $v > 0 ); > > > >>>>>>>>>sleep 5; > > > >>>>>>>>>} else { # parsing > > > >>>>>>>>>starts here > > > >>>>>>>>>my $result = $rc->next_result(); # it should hang > > > >>>>>>>>>here > > > >>>>>>>>>#save the output > > > >>>>>>>>>my $filename = $result->query_name()."\.out"; > > > >>>>>>>>>$factory->save_output($filename); > > > >>>>>>>>>$factory->remove_rid($rid); > > > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n"; > > > >>>>>>>>>while ( my $hit = $result->next_hit ) { > > > >>>>>>>>>next unless ( $v > 0); > > > >>>>>>>>>print "\thit name is ", $hit->name, "\n"; > > > >>>>>>>>>while( my $hsp = $hit->next_hsp ) { > > > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n"; > > > >>>>>>>>>} > > > >>>>>>>>>} > > > >>>>>>>>>} > > > >>>>>>>>>} > > > >>>>>>>>>} > > > >>>>>>>>>} > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>My script hanged if I used next_result() in any way prior to > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>the > > > >>> > > > >>> > > > >>>>>>fixes. > > > >>>>>> > > > >>>>>> > > > >>>>>>>>I > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>want to see how many others are having the same issues with > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>parsing > > > >>>> > > > >>>> > > > >>>>>>>>using > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>the CVS version of bioperl-live. > > > >>>>>>>>> > > > >>>>>>>>>Christopher Fields > > > >>>>>>>>>Postdoctoral Researcher - Switzer Lab > > > >>>>>>>>>Dept. of Biochemistry > > > >>>>>>>>>University of Illinois Urbana-Champaign > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>>-----Original Message----- > > > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl- > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>l- > > > >>> > > > >>> > > > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka > > > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM > > > >>>>>>>>>>To: Huang Jian; bioperl-l > > > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28 > > > >>>>>>>>>> > > > >>>>>>>>>>Hi Huang, > > > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>works > > > >>>> > > > >>>> > > > >>>>>>on > > > >>>>>> > > > >>>>>> > > > >>>>>>>>the > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>logic of checking the temporary file size to determine > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>whether > > > >>> > > > >>> > > > >>>>the > > > >>>> > > > >>>> > > > >>>>>>>>Blast > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>results are ready. This condition is not getting satisfied > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>may > > > >>> > > > >>> > > > >>>>be > > > >>>> > > > >>>> > > > >>>>>>due > > > >>>>>> > > > >>>>>> > > > >>>>>>>>to > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>some changes brought about by NCBI. I had this problem > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>recently > > > >>>> > > > >>>> > > > >>>>>>and > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>figured out that the solution was to use the latest version > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>which > > > >>>> > > > >>>> > > > >>>>>>has > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>this problem fixed (does not use file size logic any more) > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>which > > > >>>> > > > >>>> > > > >>>>>>is > > > >>>>>> > > > >>>>>> > > > >>>>>>>>not > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>yet included in the BioPerl package. > > > >>>>>>>>>>Cheers > > > >>>>>>>>>>Nagesh > > > >>>>>>>>>> > > > >>>>>>>>>>Huang Jian wrote: > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>>>Dear Nagesh, > > > >>>>>>>>>>> > > > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>you > > > >>>> > > > >>>> > > > >>>>>>send > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>>me. Now it works perfectly!!! > > > >>>>>>>>>>> > > > >>>>>>>>>>>Thank you!! > > > >>>>>>>>>>> > > > >>>>>>>>>>>Huang > > > >>>>>>>>>>> > > > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka" > > > >>>>>>>>>>> > > > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l" > > > >>>>>>>>>>> > > > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM > > > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>net, > > > >>> > > > >>> > > > >>>>so > > > >>>> > > > >>>> > > > >>>>>>still > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>>>via email > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>>>Hi Huang, > > > >>>>>>>>>>>>I see that you are submitting a sequence for a remote > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>blast > > > >>> > > > >>> > > > >>>>>>search. > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>Can > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28 > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>(2005/12/09). > > > >>>>>> > > > >>>>>> > > > >>>>>>>>If > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>>>>>>not I have attached it with this email, try to replace it > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>with > > > >>>> > > > >>>> > > > >>>>>>the > > > >>>>>> > > > >>>>>> > > > >>>>>>>>>old > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>>>>one which has a bug. > > > >>>>>>>>>>>>Let me know if it works. > > > >>>>>>>>>>>>Nagesh > > > >>>>>>>>>>>> > > > >>>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>> > > > >>>>>>>>>>_______________________________________________ > > > >>>>>>>>>>Bioperl-l mailing list > > > >>>>>>>>>>Bioperl-l at lists.open-bio.org > > > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>>>> > > > >>>>>>>>>> > > > >>>>>>>>>_______________________________________________ > > > >>>>>>>>>Bioperl-l mailing list > > > >>>>>>>>>Bioperl-l at lists.open-bio.org > > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>>>>_______________________________________________ > > > >>>>>>>>>Bioperl-l mailing list > > > >>>>>>>>>Bioperl-l at lists.open-bio.org > > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>>> > > > >>>>>>>>> > > > >>>>>>_______________________________________________ > > > >>>>>> > > > >>>>>> > > > >>>>>>>>Bioperl-l mailing list > > > >>>>>>>>Bioperl-l at lists.open-bio.org > > > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>>>> > > > >>>>>>>>_______________________________________________ > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>Bioperl-l mailing list > > > >>>>>>Bioperl-l at lists.open-bio.org > > > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >>>>>> > > > >>>>>> > > > >>>>>> > > > > > > > >_______________________________________________ > > > >Bioperl-l mailing list > > > >Bioperl-l at lists.open-bio.org > > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > > > > > > > > > > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjm at fruitfly.org Mon Feb 20 20:48:57 2006 From: cjm at fruitfly.org (chris mungall) Date: Mon, 20 Feb 2006 17:48:57 -0800 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net> References: <000001c62e9a$4f82eee0$c2987ca5@pc13> <3666b00b7322d2bfe4d82129b047e5ce@gmx.net> Message-ID: <930b0083193357df7d43cc7a3111c938@fruitfly.org> I like the idea of using an ontology to describe the ontology. Note that the proposed structure: OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI will lead to cycles in the object graph when the metadata ontology describes itself. actually, I think the ontology module already has object reference cycles. TermI->OntologyI->TermI When I brought this up originally people didn't seem to care much - so long as you're only parsing GO then it's not a big issue, people have enough memory they won't notice a big chunk of memory that refuses to be garbage collected way after it's used. Of course, if you want to use bioperl to cycle though all of OBO + SnoMed + UMLS then it's a different story. I think it's best of Sohel concentrates on getting obo.pm working, then we can start thinking as a group about the best way to capture ontology metadata. This includes metadata on the whole ontology, and metadata on the terms (eg synonyms). To what extent are the current modules already in use? I think the object cycle is a serious flaw, will it be possible to fix this without a major overhaul? On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote: > Sohel, please do keep the discussion on the list, in your own interest > as there's a multitude of people who can respond to you. > > SimpleValue would probably be what I'd use too. As Heikki hinted you > might even create an ontology for annotating ontologies, which would > allow you to use Annotation::OntologyTerm for annotation, but then > there's no qualifier value ... > > Bioperl 1.5.1 has been released last year, please check the website. > > -hilmar > > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote: > >> Hi Hilmar, >> I really like your suggestion of implementing the Bio::AnnotatableI >> interface in the Bio::Ontology::Ontology class. I am going to >> implement >> this and play around a little with it. I am planning to use >> Bio::Annotation::SimpleValue for annotating the header as it provides >> a >> good way of specifying the Tag/value pair. What are your thoughts on >> using this? >> >> Also, I was wondering if you have any idea about the scheduled date >> for the Bioperl 1.51 release. I would like to contribute some stuff in >> the next release. >> >> Thanks, >> Sohel. >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Friday, February 10, 2006 3:40 PM >> To: Sohel Merchant >> Cc: Bioperl >> Subject: Re: Bio::Ontology::Ontology >> >> Sohel, >> >> please allow me to copy the list in my response. There's many good and >> insightful people on the list who may have something to add or >> different ideas. >> >> I've come across that problem myself, for instance with InterPro. What >> I've done so far simply is to stick it unstructured into the >> definition >> slot, which is not helpful if your purpose goes further than just >> displaying it in an unstructured fashion. >> >> I'm not sure you would want to create another class for this (like >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the >> implementation, probably not the interface) annotatable (i.e., >> implement Bio::Annotatable), which supposedly would be simple to do >> (AnnotationCollection is already implemented, you'd just return an >> instance of it). >> >> Even though tag/value pairs sound like quick&fast way to go I'm >> leaning >> against it; in essence we're moving away from that elsewhere >> (SeqFeatureI) and hence I don't think we should restart it here. >> >> I'm not giving a definitive answer here, just my (initial) thoughts. >> Hope that helps nonetheless. Can you fancy yourself trying the >> Annotatable approach and let us know how it goes? >> >> -hilmar >> >> >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: >> >>> Hi Hilmar, >>> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, >>> Northwestern University. I am working on a parser for an ontology >>> file. I really like the ontology object model which you have >>> contributed to Bioperl. I think its just Awesome!! One of things >>> which >> >>> I thought would be great to capture is the ontology headers. Right >>> now >> >>> one can specify only the name, authority information. I was wondering >>> if there is any way, I could also capture other ontology file headers >>> like version of the file, date when that ontology file was made. I >>> was >> >>> thinking of making a header class or alternatively it could go as >>> Hash >> >>> of values in the Bio::Ontology::Ontology class itself. I wanted to >>> know whets your thoughts about on this. >>> ? >>> Thanks, >>> Sohel Merchant >>> dictyBase >>> >> -- >> ------------------------------------------------------------- >> Hilmar Lapp email: lapp at gnf.org >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 >> ------------------------------------------------------------- >> >> >> >> > -- > ------------------------------------------------------------- > Hilmar Lapp email: lapp at gnf.org > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > ------------------------------------------------------------- > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Mon Feb 20 23:42:18 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 20 Feb 2006 23:42:18 -0500 Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy In-Reply-To: <43FA0FB7.6060904@lsi.upc.edu> Message-ID: Gabriel, You had a couple of little errors in your script but once fixed it worked fine: #!/usr/bin/perl -w use strict; use lib "/Users/bosborne/bioperl-live"; use Bio::DB::Taxonomy; my $nodesfile = "nodes.dmp"; my $namefile = "names.dmp"; my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => $nodesfile, -namesfile => $namefile); my $taxonid = $db->get_taxonid('Homo sapiens'); # Here, $taxonid is 9606. However, my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid); print $species->common_name; This is using bioperl-live on Mac OSX, Perl 5.8. Are you on Windows? If so then do "-directory => C:/temp", see what happens. Brian O. On 2/20/06 1:51 PM, "Gabriel Valiente" wrote: > use Bio::DB::Taxonomy; > my $nodesfile = "nodes.dmp"; > my $namesfile = "names.dmp"; > my $db = new Bio::DB::Taxonomy(-source => 'flatfile' > -nodesfile => $nodesfile, > -namesfile => $namefile); > my $taxonid = $db->get_taxonid('Homo sapiens'); > > Here, $taxonid is 9606. However, > > my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid); From valiente at lsi.upc.edu Tue Feb 21 07:19:04 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Tue, 21 Feb 2006 13:19:04 +0100 (MET) Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy Message-ID: <1125313334valiente@lsi.upc.es> Thanks. There's still a problem with Bio::DB::Taxonomy: use strict; use Bio::DB::Taxonomy; my $nodesfile = "nodes.dmp"; my $namesfile = "names.dmp"; my $db = new Bio::DB::Taxonomy(-source => 'flatfile' -nodesfile => $nodesfile, -namesfile => $namesfile); my $taxonid = $db->get_taxonid('Homo sapiens'); my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); So far so good. Now, access to the parent node via my $parent = $node->get_Parent_Node; is alright, but access to the children nodes via my @childrenids = $db->get_Children_Taxids($taxonid); raises: ------------- EXCEPTION ------------- MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not implemented by package Bio::DB::Taxonomy::entrez. This is not your fault - author of Bio::DB::Taxonomy::entrez should be blamed! STACK Bio::Root::RootI::throw_not_implemented /home/valiente/bioperl-live/Bio/Root/RootI.pm:523 STACK Bio::DB::Taxonomy::get_Children_Taxids /home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162 STACK toplevel fetch.pl:17 Perhaps there could be a $node->get_Children_Nodes() method in Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez. You, know, efficient access to the children of a node is a quite important method for almost any interesting use of the NCBI Taxonomy. Gabriel From dhoworth at mrc-lmb.cam.ac.uk Tue Feb 21 05:47:41 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Tue, 21 Feb 2006 10:47:41 +0000 Subject: [Bioperl-l] Bio::Graphics off by one? Message-ID: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> I'm drawing a simple graphic and seeing something I didn't expect. I'm not sure whether I've misunderstood the docs or found a bug. If I run a program containing: my $name = 'O68601'; my $length = 44; my $panel = Bio::Graphics::Panel->new( -length => $length, -width => 800, -pad_left => 10, -pad_right => 10, -key_style => 'between', ); my $feature = new Bio::SeqFeature::Generic( -start => 1, -end => $length, -display_name => $name . " ($length)", ); $panel->add_track($feature, -glyph => 'arrow', -tick => 1, -fgcolor => 'black', -double => 1, -label => 1, ); Then I see a tick strip labelled at its left end with '1' and at its right end with '45'. I expected to see '44'. Should I be looking for a bug in Bio::Graphics or fixing my program? Thanks, Dave From gbazykin at Princeton.EDU Tue Feb 21 09:37:32 2006 From: gbazykin at Princeton.EDU (Georgii A Bazykin) Date: Tue, 21 Feb 2006 09:37:32 -0500 Subject: [Bioperl-l] planning sequence mutating modules Message-ID: <922343764.20060221093732@princeton.edu> Heikki: Let me explain what I need more clearly, and perhaps you guys can tell me how this can be done best in Bioperl. I?d like to marry the trees and the sequences, so that I could get a sequence corresponding to each of the nodes (including internal nodes) on the tree. The sequences of the nodes can be either generated by some evolution process, or loaded; PAUP, for example, can reconstruct the sequences of the internal nodes. I am dealing with coding sequence, and for my purposes, I need to look at individual codons rather than nucleotides. Then I answer questions such as this: - for this codon (position), when (before which nodes of the tree) did all (synonymous or non-synonymous) mutations occur? - for this node and for this codon, when (before which node) did the preceding (synonymous or non-synonymous) mutation occur? Preceding means that it occurred in the line of direct ancestors, i.e. between some two sequences on the path from this node to the root. - infer position-specific ?substitution matrix? from the tree, i.e. in this position, what fraction of nucleotides A that were present at the beginning of each brunch, turned into nucleotide ?C? by the end of the branch, possibly weighting with branch lengths. Further, I need to do simulate sequence evolution along the tree, e.g., like this: - mutate specified codon along the tree, perhaps with given substitution matrix (and, possibly, with given non-synonymous/synonymous substitutions rate). In the process, the codons for all nodes will be generated. I need to do all this for large trees (with hundreds of leaves) and long sequences. So far, I have been using a huge hash to store all my sequences for each of the nodes: my $node = (some tree::node object) my $posit = 0; $codons{$posit}->{$node} = ?AAA?; etc. But there should be a better way to do it? How can I integrate all this into Bioperl? (I am new to object-oriented programming). I?ll be thankful for any feedback. Yegor ------------------------------ Tuesday, February 14, 2006, 11:09:27 AM, you wrote: > Yegor, > Like you said, there are examples how it is done.. It should be possible to > evolve sequences based on a rooted tree. You just walk the tree and evolve > each sequence from its parent. If there is an agreement how the branch > lengths get translated to mutations, even that could be done. Do you have > any suggestions? > -Heikki > On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote: >> Hi, >> >> Just a thought: I really think that in perspective, it would be nice >> to be able to evolve the sequence along a tree of given shape. I think >> PAML's "evolver" has this functionality. I've already been doing this >> in my scripts, but I am not sure how to couple the tree and the >> sequence data properly. >> >> Yegor (George) Bazykin >> >> >> ------------------------------ >> >> Tuesday, February 14, 2006, 1:59:29 AM, you wrote: >> > I've committed an interim solution to the sequence evolution problem: >> > >> > $newseq = Bio::SeqUtils-> evolve >> > ($seq, $similarity, $transition_transversion_rate); >> > >> > I will go on to transform this code to fully OO, extensible solution. >> > >> > -Heikki >> > >> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: >> >> Ryan Golhar's mail got me thinking that we should have a simple >> >> framework for mutating sequences to a desired level. The model can then >> >> be extended to necessary complexity when needed by subclassing. >> >> >> >> To start with, I have been planning: >> >> >> >> >> >> Bio::SeqEvolution::EvolutionI - interface file >> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate >> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, >> >> (defaults to Bio::PrimarySeq) >> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses >> >> Bio::SeqEvolution::EvolutionI::each_seqs($count) >> >> - returns an array of $count seqs >> >> Bio::SeqEvolution::EvolutionI::_generate_seq() >> >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring >> >> converteed to probabilites of change internally >> >> >> >> various methods to define the extent of divergence: >> >> only one to start with: >> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation >> >> (= 100% - identity) >> >> >> >> Bio::SeqEvolution::Factory - core class to call, >> >> instantiates subclasses, Bio::SeqEvolution::DNASimple for >> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, >> >> defaults to Bio::SeqEvolution::DNASimple for nucleotides >> >> >> >> >> >> Bio::SeqEvolution::DNASimple - default for nucleotides >> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, >> >> e.g. 5 => 5:1, defaults to 1:1 >> >> simple alternative to a scoring matrix >> >> >> >> >> >> I am soliciting usual comments and suggestions about naming and minimal >> >> functionality. >> >> >> >> >> >> -Heikki >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Tue Feb 21 09:46:56 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 21 Feb 2006 09:46:56 -0500 Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy In-Reply-To: <1125313334valiente@lsi.upc.es> Message-ID: Gabriel, I don't think so, this works: #!/usr/bin/perl -w use strict; use lib "/Users/bosborne/bioperl-live"; use Bio::DB::Taxonomy; my $nodesfile = "nodes.dmp"; my $namefile = "names.dmp"; my $db = new Bio::DB::Taxonomy(-source => 'flatfile', -nodesfile => $nodesfile, -namesfile => $namefile); my $taxonid = $db->get_taxonid('Homo sapiens'); my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); # Here, $taxonid is 9606. However, my $parent = $node->get_Parent_Node; # is alright, but access to the children nodes via my @childrenids = $db->get_Children_Taxids($taxonid); print "@childrenids"; What Bioperl version are you using? Brian O. On 2/21/06 7:19 AM, "Gabriel Valiente" wrote: > my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); From gbazykin at Princeton.EDU Mon Feb 20 18:21:03 2006 From: gbazykin at Princeton.EDU (Georgii A Bazykin) Date: Mon, 20 Feb 2006 18:21:03 -0500 Subject: [Bioperl-l] planning sequence mutating modules In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za> References: <200602100906.11885.heikki@sanbi.ac.za> <200602140859.30136.heikki@sanbi.ac.za> <214316262.20060214093454@princeton.edu> <200602141809.28057.heikki@sanbi.ac.za> Message-ID: <158747055.20060220182103@princeton.edu> Heikki: Let me explain what I need more clearly, and perhaps you guys can tell me how this can be done best in Bioperl. I?d like to marry the trees and the sequences, so that I could get a sequence corresponding to each of the nodes (including internal nodes) on the tree. The sequences of the nodes can be either generated by some evolution process, or loaded; PAUP, for example, can reconstruct the sequences of the internal nodes. I am dealing with coding sequence, and for my purposes, I need to look at individual codons rather than nucleotides. Then I answer questions such as this: - for this codon (position), when (before which nodes of the tree) did all (synonymous or non-synonymous) mutations occur? - for this node and for this codon, when (before which node) did the preceding (synonymous or non-synonymous) mutation occur? Preceding means that it occurred in the line of direct ancestors, i.e. between some two sequences on the path from this node to the root. - infer position-specific ?substitution matrix? from the tree, i.e. in this position, what fraction of nucleotides A that were present at the beginning of each brunch, turned into nucleotide ?C? by the end of the branch, possibly weighting with branch lengths. Further, I need to do simulate sequence evolution along the tree, e.g., like this: - mutate specified codon along the tree, perhaps with given substitution matrix (and, possibly, with given non-synonymous/synonymous substitutions rate). In the process, the codons for all nodes will be generated. I need to do all this for large trees (with hundreds of leaves) and long sequences. So far, I have been using a huge hash to store all my sequences for each of the nodes: my $node = (some tree::node object) my $posit = 0; $codons{$posit}->{$node} = ?AAA?; etc. But there should be a better way to do it? How can I integrate all this into Bioperl? (I am new to object-oriented programming). I?ll be thankful for any feedback. Yegor ------------------------------ Tuesday, February 14, 2006, 11:09:27 AM, you wrote: > Yegor, > Like you said, there are examples how it is done.. It should be possible to > evolve sequences based on a rooted tree. You just walk the tree and evolve > each sequence from its parent. If there is an agreement how the branch > lengths get translated to mutations, even that could be done. Do you have > any suggestions? > -Heikki > On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote: >> Hi, >> >> Just a thought: I really think that in perspective, it would be nice >> to be able to evolve the sequence along a tree of given shape. I think >> PAML's "evolver" has this functionality. I've already been doing this >> in my scripts, but I am not sure how to couple the tree and the >> sequence data properly. >> >> Yegor (George) Bazykin >> >> >> ------------------------------ >> >> Tuesday, February 14, 2006, 1:59:29 AM, you wrote: >> > I've committed an interim solution to the sequence evolution problem: >> > >> > $newseq = Bio::SeqUtils-> evolve >> > ($seq, $similarity, $transition_transversion_rate); >> > >> > I will go on to transform this code to fully OO, extensible solution. >> > >> > -Heikki >> > >> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote: >> >> Ryan Golhar's mail got me thinking that we should have a simple >> >> framework for mutating sequences to a desired level. The model can then >> >> be extended to necessary complexity when needed by subclassing. >> >> >> >> To start with, I have been planning: >> >> >> >> >> >> Bio::SeqEvolution::EvolutionI - interface file >> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate >> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class, >> >> (defaults to Bio::PrimarySeq) >> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses >> >> Bio::SeqEvolution::EvolutionI::each_seqs($count) >> >> - returns an array of $count seqs >> >> Bio::SeqEvolution::EvolutionI::_generate_seq() >> >> Bio::SeqEvolution::EvolutionI::matrix # Bio::Matrix::Scoring >> >> converteed to probabilites of change internally >> >> >> >> various methods to define the extent of divergence: >> >> only one to start with: >> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation >> >> (= 100% - identity) >> >> >> >> Bio::SeqEvolution::Factory - core class to call, >> >> instantiates subclasses, Bio::SeqEvolution::DNASimple for >> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model, >> >> defaults to Bio::SeqEvolution::DNASimple for nucleotides >> >> >> >> >> >> Bio::SeqEvolution::DNASimple - default for nucleotides >> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer, >> >> e.g. 5 => 5:1, defaults to 1:1 >> >> simple alternative to a scoring matrix >> >> >> >> >> >> I am soliciting usual comments and suggestions about naming and minimal >> >> functionality. >> >> >> >> >> >> -Heikki >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason.stajich at duke.edu Tue Feb 21 09:51:39 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 21 Feb 2006 09:51:39 -0500 Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy In-Reply-To: <1125313334valiente@lsi.upc.es> References: <1125313334valiente@lsi.upc.es> Message-ID: <16B69355-A7EC-4FA6-B0F3-A473C705B921@duke.edu> of course it should, and it does support this. Children query definitely exists for the flatfile implementation I don't understand why are you getting entrez errors when you are requesting the flatfile handle? I can't investigate but it definitely worked for me to get children nodes. Did you actually try running the script that already should work - scripts/taxa/local_taxonomdb_query ? You definitely can't request children nodes via the entrez implementation because NCBI doesn't (or didn't when this was written I don't know about now) provide children id access so it is pretty useful for that - although the eutils support may have expanded I'm not sure. If someone has the itch, please scratch it and work on this. I think you need to pass in $parent instead of $taxonid to get_Children_Taxids -- although I guess I wrote the method to accept either. -jason On Feb 21, 2006, at 7:19 AM, Gabriel Valiente wrote: > Thanks. There's still a problem with Bio::DB::Taxonomy: > > use strict; > use Bio::DB::Taxonomy; > > my $nodesfile = "nodes.dmp"; > my $namesfile = "names.dmp"; > my $db = new Bio::DB::Taxonomy(-source => 'flatfile' > -nodesfile => $nodesfile, > -namesfile => $namesfile); > > my $taxonid = $db->get_taxonid('Homo sapiens'); > my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); > > So far so good. Now, access to the parent node via > > my $parent = $node->get_Parent_Node; > > is alright, but access to the children nodes via > > my @childrenids = $db->get_Children_Taxids($taxonid); > > raises: > > ------------- EXCEPTION ------------- > MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not > implemented by package Bio::DB::Taxonomy::entrez. > This is not your fault - author of Bio::DB::Taxonomy::entrez should be > blamed! > > STACK Bio::Root::RootI::throw_not_implemented > /home/valiente/bioperl-live/Bio/Root/RootI.pm:523 > STACK Bio::DB::Taxonomy::get_Children_Taxids > /home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162 > STACK toplevel fetch.pl:17 > > Perhaps there could be a $node->get_Children_Nodes() method in > Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez. > You, know, efficient access to the children of a node is a quite > important method for almost any interesting use of the NCBI Taxonomy. > > Gabriel > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From hlapp at gmx.net Mon Feb 20 21:52:34 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 20 Feb 2006 18:52:34 -0800 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: <930b0083193357df7d43cc7a3111c938@fruitfly.org> References: <000001c62e9a$4f82eee0$c2987ca5@pc13> <3666b00b7322d2bfe4d82129b047e5ce@gmx.net> <930b0083193357df7d43cc7a3111c938@fruitfly.org> Message-ID: On 2/20/06, chris mungall wrote: > > I like the idea of using an ontology to describe the ontology. > > Note that the proposed structure: > OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI > > will lead to cycles in the object graph when the metadata ontology > describes itself. Yes I know, that's why I didn't want to be too vocal about it ... > > actually, I think the ontology module already has object reference > cycles. TermI->OntologyI->TermI > > When I brought this up originally people didn't seem to care much - so > long as you're only parsing GO then it's not a big issue, people have > enough memory they won't notice a big chunk of memory that refuses to > be garbage collected way after it's used. There is a method that destroys the cycle: $ontology->close() (this is also an interface method) Essentially, the cycle is not in OntologyI itself but in OntologyI HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to terms which (may) hold a reference to an OntologyI which holds a reference to the OntologyEngineI. I say 'may' in parentheses because an implementation may use tricks like late instantiation, stringified references (handles), and weak references. It's possible to avoid the cycle altogether using such tricks but it remains questionable how much this then affects performance, and how ugly and incomprehensible the code would become. Since there is the close() method I haven't bothered yet trying a fully de-cycled implementation. > Of course, if you want to use > bioperl to cycle though all of OBO + SnoMed + UMLS then it's a > different story. Well if you want to keep all three in memory for some kind of cross-reasoning then yes you are in trouble. But if you do one ontology after another, you'd just have make sure to call close() on an ontology once you're done with it. > > I think it's best of Sohel concentrates on getting obo.pm working, then > we can start thinking as a group about the best way to capture ontology > metadata. This includes metadata on the whole ontology, and metadata on > the terms (eg synonyms). > > To what extent are the current modules already in use? I don't know about others but I use them often. > I think the object cycle is a serious flaw, will it be possible to fix this without > a major overhaul? If I recall correctly the way go-perl circumvents this is by having the ontology of a term as a flat attribute. This also means that when having a term alone, you cannot ask for its connected terms. It's been a while, so Chris set me straight where this is not true. It should be possible to come up with an implementation of OntologyI that for all intents and purposes behaves like a flat scalar giving the name until you call one of its graph traversal methods. At that point it would instantiate the engine from persistent storage (file, or a database connection), or retrieve one from a 'store'. The latter is I believe what Allen started with the OntologyStore, but again I would need to check the details. -hilmar > > > On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote: > > > Sohel, please do keep the discussion on the list, in your own interest > > as there's a multitude of people who can respond to you. > > > > SimpleValue would probably be what I'd use too. As Heikki hinted you > > might even create an ontology for annotating ontologies, which would > > allow you to use Annotation::OntologyTerm for annotation, but then > > there's no qualifier value ... > > > > Bioperl 1.5.1 has been released last year, please check the website. > > > > -hilmar > > > > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote: > > > >> Hi Hilmar, > >> I really like your suggestion of implementing the Bio::AnnotatableI > >> interface in the Bio::Ontology::Ontology class. I am going to > >> implement > >> this and play around a little with it. I am planning to use > >> Bio::Annotation::SimpleValue for annotating the header as it provides > >> a > >> good way of specifying the Tag/value pair. What are your thoughts on > >> using this? > >> > >> Also, I was wondering if you have any idea about the scheduled date > >> for the Bioperl 1.51 release. I would like to contribute some stuff in > >> the next release. > >> > >> Thanks, > >> Sohel. > >> > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Friday, February 10, 2006 3:40 PM > >> To: Sohel Merchant > >> Cc: Bioperl > >> Subject: Re: Bio::Ontology::Ontology > >> > >> Sohel, > >> > >> please allow me to copy the list in my response. There's many good and > >> insightful people on the list who may have something to add or > >> different ideas. > >> > >> I've come across that problem myself, for instance with InterPro. What > >> I've done so far simply is to stick it unstructured into the > >> definition > >> slot, which is not helpful if your purpose goes further than just > >> displaying it in an unstructured fashion. > >> > >> I'm not sure you would want to create another class for this (like > >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the > >> implementation, probably not the interface) annotatable (i.e., > >> implement Bio::Annotatable), which supposedly would be simple to do > >> (AnnotationCollection is already implemented, you'd just return an > >> instance of it). > >> > >> Even though tag/value pairs sound like quick&fast way to go I'm > >> leaning > >> against it; in essence we're moving away from that elsewhere > >> (SeqFeatureI) and hence I don't think we should restart it here. > >> > >> I'm not giving a definitive answer here, just my (initial) thoughts. > >> Hope that helps nonetheless. Can you fancy yourself trying the > >> Annotatable approach and let us know how it goes? > >> > >> -hilmar > >> > >> > >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > >> > >>> Hi Hilmar, > >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase, > >>> Northwestern University. I am working on a parser for an ontology > >>> file. I really like the ontology object model which you have > >>> contributed to Bioperl. I think its just Awesome!! One of things > >>> which > >> > >>> I thought would be great to capture is the ontology headers. Right > >>> now > >> > >>> one can specify only the name, authority information. I was wondering > >>> if there is any way, I could also capture other ontology file headers > >>> like version of the file, date when that ontology file was made. I > >>> was > >> > >>> thinking of making a header class or alternatively it could go as > >>> Hash > >> > >>> of values in the Bio::Ontology::Ontology class itself. I wanted to > >>> know whets your thoughts about on this. > >>> > >>> Thanks, > >>> Sohel Merchant > >>> dictyBase > >>> > >> -- > >> ------------------------------------------------------------- > >> Hilmar Lapp email: lapp at gnf.org > >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >> ------------------------------------------------------------- > >> > >> > >> > >> > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From valiente at lsi.upc.edu Tue Feb 21 11:10:05 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Tue, 21 Feb 2006 17:10:05 +0100 (MET) Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy Message-ID: <1783551242valiente@lsi.upc.es> It works now, with the #!/usr/bin/perl -w switch. Sorry about that. I'd like to contribute a couple of additional methods to Bio::DB::Taxonomy. The first one returns a reference to an array with the full lineage of a given node. sub lineage { my $node = shift; my @PATH; while ($node->node_name ne "root") { $node = $node->get_Parent_Node; unshift @PATH, $node; } return \@PATH; } The second one uses the lineage method to return the most recent common ancestor of two given nodes. sub LCA { my $node1 = shift; my $node2 = shift; my @PATH1 = @{lineage($node1)}; my @PATH2 = @{lineage($node2)}; my $root1 = shift @PATH1; my $root2 = shift @PATH2; while ($root1->node_name eq $root2->node_name) { $root1 = shift @PATH1; $root2 = shift @PATH2; } return $root1; } Jason, shall I include them myself in Bio::DB::Taxonomy or can you take care of this? I think, the right place for these methods might be Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy. Thanks, Gabriel From lstein at cshl.edu Tue Feb 21 10:55:30 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 21 Feb 2006 10:55:30 -0500 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> Message-ID: <200602211055.31221.lstein@cshl.edu> Hi, When you are looking at the resolution of individual bases, a base pair at position one occupies the half-open interval from 1->2, meaning that it comes up to, but doesn't quite touch, the 2. For the purposes of display, Bio::Graphics draws the end of the half-open interval. Lincoln On Tuesday 21 February 2006 05:47, Dave Howorth wrote: > I'm drawing a simple graphic and seeing something I didn't expect. I'm > not sure whether I've misunderstood the docs or found a bug. If I run a > program containing: > > my $name = 'O68601'; > my $length = 44; > my $panel = Bio::Graphics::Panel->new( > -length => $length, > -width => 800, > -pad_left => 10, > -pad_right => 10, > -key_style => 'between', > ); > > my $feature = new Bio::SeqFeature::Generic( > -start => 1, > -end => $length, > -display_name => $name . " ($length)", > ); > > $panel->add_track($feature, > -glyph => 'arrow', > -tick => 1, > -fgcolor => 'black', > -double => 1, > -label => 1, > ); > > Then I see a tick strip labelled at its left end with '1' and at its > right end with '45'. I expected to see '44'. Should I be looking for a > bug in Bio::Graphics or fixing my program? > > Thanks, Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From jason.stajich at duke.edu Tue Feb 21 11:28:22 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Tue, 21 Feb 2006 11:28:22 -0500 Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy In-Reply-To: <1783551242valiente@lsi.upc.es> References: <1783551242valiente@lsi.upc.es> Message-ID: <1C38DDCF-9312-42D3-923F-C0DD4CE7E9AA@duke.edu> you'll have to do it - I don't have time, I thought there was something like this already, but I guess not, so please put it in. I must do this when we initialize the classification array when building a node, On Feb 21, 2006, at 11:10 AM, Gabriel Valiente wrote: > It works now, with the #!/usr/bin/perl -w switch. Sorry about that. > > I'd like to contribute a couple of additional methods to > Bio::DB::Taxonomy. The first one returns a reference to an array with > the full lineage of a given node. > > sub lineage { > my $node = shift; > my @PATH; > while ($node->node_name ne "root") { > $node = $node->get_Parent_Node; > unshift @PATH, $node; > } > return \@PATH; > } > > The second one uses the lineage method to return the most recent > common > ancestor of two given nodes. > > sub LCA { > my $node1 = shift; > my $node2 = shift; > my @PATH1 = @{lineage($node1)}; > my @PATH2 = @{lineage($node2)}; > my $root1 = shift @PATH1; > my $root2 = shift @PATH2; > while ($root1->node_name eq $root2->node_name) { > $root1 = shift @PATH1; > $root2 = shift @PATH2; > } > return $root1; > } > > Jason, shall I include them myself in Bio::DB::Taxonomy or can you > take > care of this? I think, the right place for these methods might be > Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy. > > Thanks, > > Gabriel > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From dhoworth at mrc-lmb.cam.ac.uk Tue Feb 21 11:50:37 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Tue, 21 Feb 2006 16:50:37 +0000 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <200602211055.31221.lstein@cshl.edu> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211055.31221.lstein@cshl.edu> Message-ID: <43FB44DD.4090504@mrc-lmb.cam.ac.uk> Lincoln Stein wrote: > When you are looking at the resolution of individual bases, a base pair at > position one occupies the half-open interval from 1->2, meaning that it comes > up to, but doesn't quite touch, the 2. For the purposes of display, > Bio::Graphics draws the end of the half-open interval. I think I understand the description of what it's doing but I don't understand why. What is the purpose of labelling the [44,45) interval 45, when that interval is representing the 44th discrete mer? I'm working with proteins and domains, so I'm always at the level of individual residues and people frequently care about the exact residue boundaries, especially when the regions are short. So I need to make pictures that match the data. The displayed track seems more consistent with an interpretation that the residues are represented by the discrete integer points along the line but I don't know if I'm buying myself trouble later if I try to adopt that interpretation. Alternatively, is there some way to get a track with 44 intervals, labelled 1 to 44? Or will I need to patch my copy of bioperl to achieve that? Thanks, Dave From cjfields at uiuc.edu Tue Feb 21 12:30:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Feb 2006 11:30:58 -0600 Subject: [Bioperl-l] another searchIO bug? with blast report In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk> Message-ID: <000301c6370c$93b07c70$15327e82@pyrimidine> Anders, I think you should look through the mail list archives for an answer, specifically: http://portal.open-bio.org/pipermail/bioperl-l/2004-November/017285.html Look up the other methods in Bio::Search::HSP::BlastHSP as well. They may be more helpful. I can't help but think there is something wrong with the logic in your subroutines since they don't call other methods built in to HSP objects. It may be an off-by-one error. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Anders Stegmann > Sent: Friday, February 17, 2006 3:18 AM > To: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] another searchIO bug? with blast report > > > > >>>Anders Stegmann 02/16/06 11:20 am >>> > Hi! > > I am blasting a protein seq (query) against an identical seq with a > deletion of Aa nr 61 (subject). > Then I print out the type of nomatch Aa and its position. > The nomatch for the query seq is Aa G at position 61, which is correct. > The nomatch for the subject seq is V at position 60, which is definitely > not correct!? > > Is this a bug? > > testblast2.pl is the program to run > > Q0045 is the query seq. > > Q0045del61 is the subject seq (it has to be formated: formatdb -i > Q0045del61 -p T -o F). > > Regards Anders. > From staffa at niehs.nih.gov Tue Feb 21 12:24:39 2006 From: staffa at niehs.nih.gov (staffa) Date: Tue, 21 Feb 2006 12:24:39 -0500 Subject: [Bioperl-l] Pattern Density Message-ID: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov> Good Friends, I have an important client who wants a histogram display of the density of "ccgg" along any chromosome of the mouse genome in 1000 bp windows. I'm thinking that maybe there is a bio-perl module that could help with this. That'd probably beat having to write something from scratch. Any help that you give would be greatly appreciated. I am more concerned about the reading and analysis of the sequence than actual plotting of the histogram, but anything you can offer will be appreciated. Thank you. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 1167 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060221/49e81c8b/attachment.bin From lstein at cshl.edu Tue Feb 21 13:25:59 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 21 Feb 2006 13:25:59 -0500 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <43FB44DD.4090504@mrc-lmb.cam.ac.uk> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211055.31221.lstein@cshl.edu> <43FB44DD.4090504@mrc-lmb.cam.ac.uk> Message-ID: <200602211326.00021.lstein@cshl.edu> Hi Dave, Well, when you are using 1-based coordinates, an line that contains 44 intervals will have 45 ticks. If you move to 0-based coordinates, then the first tick will be labeled 0 and the last tick will be labeled 44. An alternative is to make each base dimensionless, but that becomes a problem when dealing with single base features, such as SNPs. These issues are why I have long advocated for interbase coordinates in which you number the positions between bases rather than the bases themselves. Draw me the picture of what you expect to see. I think of it this way: 1 2 3 4 5 6 A>G>C>T>A> Lincoln On Tuesday 21 February 2006 11:50, Dave Howorth wrote: > Lincoln Stein wrote: > > When you are looking at the resolution of individual bases, a base pair > > at position one occupies the half-open interval from 1->2, meaning that > > it comes up to, but doesn't quite touch, the 2. For the purposes of > > display, Bio::Graphics draws the end of the half-open interval. > > I think I understand the description of what it's doing but I don't > understand why. What is the purpose of labelling the [44,45) interval > 45, when that interval is representing the 44th discrete mer? > > I'm working with proteins and domains, so I'm always at the level of > individual residues and people frequently care about the exact residue > boundaries, especially when the regions are short. So I need to make > pictures that match the data. > > The displayed track seems more consistent with an interpretation that > the residues are represented by the discrete integer points along the > line but I don't know if I'm buying myself trouble later if I try to > adopt that interpretation. > > Alternatively, is there some way to get a track with 44 intervals, > labelled 1 to 44? > > Or will I need to patch my copy of bioperl to achieve that? > > Thanks, Dave -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From osborne1 at optonline.net Tue Feb 21 13:25:35 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 21 Feb 2006 13:25:35 -0500 Subject: [Bioperl-l] Pattern Density In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov> Message-ID: Nick, Right, BioPerl really can?t help you with the histogram itself but there are probably multiple solutions to the problem of iterating over the sequence. Here?s one idea, untested, it assumes your sequence is in fasta format: use strict; use Bio::DB::Fasta; use Bio::Tools::SeqWords; my $db = Bio::DB::Fasta->new('/path/to/fasta/files'); my $obj = $db->get_Seq_by_id('CHROMOSOME_I'); my $start = 0; my $windowsize = 1000; my $str = ?ccgg?; my $len = $obj->length; my $overlap = 250; while (1) { my $end = $start + $windowsize; last if ( $end > $len); my $subseq = $obj->subseq($start,$end); my $count = get_count($str,$subseq); $start += $overlap; } sub get_count { my ($str,$subseq) = @_; my $seqobj = Bio::Seq->new(-seq => $subseq); my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj); my $ref = $seq_word->count_overlap_words(length($str)); $ref->{$str}; } Note this skips the very last window, debugging needed. Brian O. On 2/21/06 12:24 PM, "staffa" wrote: > I am more concerned about the reading and analysis of the sequence than actual > plotting of the histogram, but anything you can offer will be appreciated. From gyang at plantbio.uga.edu Tue Feb 21 13:45:50 2006 From: gyang at plantbio.uga.edu (Guojun Yang) Date: Tue, 21 Feb 2006 13:45:50 -0500 Subject: [Bioperl-l] full chromosome accesscion number mess In-Reply-To: <000001c63669$2bf06a80$15327e82@pyrimidine> Message-ID: <20060221184550.6557851b@dogwood.plantbio.uga.edu> Hi, everybody, In the process of reparing my CGI script after NCBI blast output format change, I noticed that the accession number for rice pseudochromosome is very confusing and cause trouble for sequence retrieving. My script use remoteblast to search for similar sequences,and then retrieve the hit sequence with a bit flanking region from GenBank. The rice pseudochromosomes have accession numbers similar to that of the individual clones like AP00XXX. I do not want the sequence retrieving to involve these accessions because it takes forever. Can anybody give some suggestion on how to deal with it? Thanks, Guojun Yang Department of Plant Biology University of Georgia From valiente at lsi.upc.edu Tue Feb 21 13:46:10 2006 From: valiente at lsi.upc.edu (Gabriel Valiente) Date: Tue, 21 Feb 2006 19:46:10 +0100 (MET) Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy Message-ID: <3193394449valiente@lsi.upc.es> > you'll have to do it - I don't have time, I thought there was > something like this already, but I guess not, so please put it in. Done. I've added methods get_Lineage_Nodes and get_LCA_Node to Bio::Taxonomy::Node. > Uhm, does that return the LCA or one of the first divergent ancestors? > And what does it do if lineage($node1) is the same as lineage($node2)? Thanks, I've already taken this into account. Cheers Gabriel From s-merchant at northwestern.edu Tue Feb 21 13:47:54 2006 From: s-merchant at northwestern.edu (Sohel Merchant) Date: Tue, 21 Feb 2006 12:47:54 -0600 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: Message-ID: <000001c63717$5314ded0$c2987ca5@pc13> Hi Hilmar and Chris, I have played around a bit using Bio::Annotation::Collection to capture the headers of an ontology file. It behaves pretty well and avoids the cycle issue which might arise by suing ontology to describe the ontology. I have an initial version of a working parser for obo flat file format. Chris, I was able to model any kind of relationship by using some of the functionality in the Bio::Ontology::SimpleGoEngine which, I had initially overlooked. I would like to commit this code to the Bioperl CVS, but I don't have write access to it I believe. Can I send the stuff to either of you guys? Hilmar, I would like your feedback on the code base and would be happy to make any changes required before we commit it to the CVS. Thanks, Sohel Merchant. dictyBase -----Original Message----- From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar Lapp Sent: Monday, February 20, 2006 8:53 PM To: chris mungall Cc: Bioperl; Sohel Merchant Subject: Re: [Bioperl-l] Bio::Ontology::Ontology On 2/20/06, chris mungall wrote: > > I like the idea of using an ontology to describe the ontology. > > Note that the proposed structure: > OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI > > will lead to cycles in the object graph when the metadata ontology > describes itself. Yes I know, that's why I didn't want to be too vocal about it ... > > actually, I think the ontology module already has object reference > cycles. TermI->OntologyI->TermI > > When I brought this up originally people didn't seem to care much - so > long as you're only parsing GO then it's not a big issue, people have > enough memory they won't notice a big chunk of memory that refuses to > be garbage collected way after it's used. There is a method that destroys the cycle: $ontology->close() (this is also an interface method) Essentially, the cycle is not in OntologyI itself but in OntologyI HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to terms which (may) hold a reference to an OntologyI which holds a reference to the OntologyEngineI. I say 'may' in parentheses because an implementation may use tricks like late instantiation, stringified references (handles), and weak references. It's possible to avoid the cycle altogether using such tricks but it remains questionable how much this then affects performance, and how ugly and incomprehensible the code would become. Since there is the close() method I haven't bothered yet trying a fully de-cycled implementation. > Of course, if you want to use > bioperl to cycle though all of OBO + SnoMed + UMLS then it's a > different story. Well if you want to keep all three in memory for some kind of cross-reasoning then yes you are in trouble. But if you do one ontology after another, you'd just have make sure to call close() on an ontology once you're done with it. > > I think it's best of Sohel concentrates on getting obo.pm working, then > we can start thinking as a group about the best way to capture ontology > metadata. This includes metadata on the whole ontology, and metadata on > the terms (eg synonyms). > > To what extent are the current modules already in use? I don't know about others but I use them often. > I think the object cycle is a serious flaw, will it be possible to fix this without > a major overhaul? If I recall correctly the way go-perl circumvents this is by having the ontology of a term as a flat attribute. This also means that when having a term alone, you cannot ask for its connected terms. It's been a while, so Chris set me straight where this is not true. It should be possible to come up with an implementation of OntologyI that for all intents and purposes behaves like a flat scalar giving the name until you call one of its graph traversal methods. At that point it would instantiate the engine from persistent storage (file, or a database connection), or retrieve one from a 'store'. The latter is I believe what Allen started with the OntologyStore, but again I would need to check the details. -hilmar > > > On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote: > > > Sohel, please do keep the discussion on the list, in your own interest > > as there's a multitude of people who can respond to you. > > > > SimpleValue would probably be what I'd use too. As Heikki hinted you > > might even create an ontology for annotating ontologies, which would > > allow you to use Annotation::OntologyTerm for annotation, but then > > there's no qualifier value ... > > > > Bioperl 1.5.1 has been released last year, please check the website. > > > > -hilmar > > > > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote: > > > >> Hi Hilmar, > >> I really like your suggestion of implementing the Bio::AnnotatableI > >> interface in the Bio::Ontology::Ontology class. I am going to > >> implement > >> this and play around a little with it. I am planning to use > >> Bio::Annotation::SimpleValue for annotating the header as it provides > >> a > >> good way of specifying the Tag/value pair. What are your thoughts on > >> using this? > >> > >> Also, I was wondering if you have any idea about the scheduled date > >> for the Bioperl 1.51 release. I would like to contribute some stuff in > >> the next release. > >> > >> Thanks, > >> Sohel. > >> > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Friday, February 10, 2006 3:40 PM > >> To: Sohel Merchant > >> Cc: Bioperl > >> Subject: Re: Bio::Ontology::Ontology > >> > >> Sohel, > >> > >> please allow me to copy the list in my response. There's many good and > >> insightful people on the list who may have something to add or > >> different ideas. > >> > >> I've come across that problem myself, for instance with InterPro. What > >> I've done so far simply is to stick it unstructured into the > >> definition > >> slot, which is not helpful if your purpose goes further than just > >> displaying it in an unstructured fashion. > >> > >> I'm not sure you would want to create another class for this (like > >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the > >> implementation, probably not the interface) annotatable (i.e., > >> implement Bio::Annotatable), which supposedly would be simple to do > >> (AnnotationCollection is already implemented, you'd just return an > >> instance of it). > >> > >> Even though tag/value pairs sound like quick&fast way to go I'm > >> leaning > >> against it; in essence we're moving away from that elsewhere > >> (SeqFeatureI) and hence I don't think we should restart it here. > >> > >> I'm not giving a definitive answer here, just my (initial) thoughts. > >> Hope that helps nonetheless. Can you fancy yourself trying the > >> Annotatable approach and let us know how it goes? > >> > >> -hilmar > >> > >> > >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > >> > >>> Hi Hilmar, > >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase, > >>> Northwestern University. I am working on a parser for an ontology > >>> file. I really like the ontology object model which you have > >>> contributed to Bioperl. I think its just Awesome!! One of things > >>> which > >> > >>> I thought would be great to capture is the ontology headers. Right > >>> now > >> > >>> one can specify only the name, authority information. I was wondering > >>> if there is any way, I could also capture other ontology file headers > >>> like version of the file, date when that ontology file was made. I > >>> was > >> > >>> thinking of making a header class or alternatively it could go as > >>> Hash > >> > >>> of values in the Bio::Ontology::Ontology class itself. I wanted to > >>> know whets your thoughts about on this. > >>> > >>> Thanks, > >>> Sohel Merchant > >>> dictyBase > >>> > >> -- > >> ------------------------------------------------------------- > >> Hilmar Lapp email: lapp at gnf.org > >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > >> ------------------------------------------------------------- > >> > >> > >> > >> > > -- > > ------------------------------------------------------------- > > Hilmar Lapp email: lapp at gnf.org > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > ------------------------------------------------------------- > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Tue Feb 21 14:25:02 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Feb 2006 13:25:02 -0600 Subject: [Bioperl-l] full chromosome accesscion number mess In-Reply-To: <20060221184550.6557851b@dogwood.plantbio.uga.edu> Message-ID: <000001c6371c$83bf92a0$15327e82@pyrimidine> What is the accession you're having problems with? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Guojun Yang > Sent: Tuesday, February 21, 2006 12:46 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] full chromosome accesscion number mess > > Hi, everybody, > In the process of reparing my CGI script after NCBI blast output format > change, I noticed that the accession number for rice pseudochromosome is > very confusing and cause trouble for sequence retrieving. My script use > remoteblast to search for similar sequences,and then retrieve the hit > sequence with a bit flanking region from GenBank. The rice > pseudochromosomes have accession numbers similar to that of the individual > clones like AP00XXX. I do not want the sequence retrieving to involve > these accessions because it takes forever. Can anybody give some > suggestion on how to deal with it? > Thanks, > > > Guojun Yang > Department of Plant Biology > University of Georgia > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Tue Feb 21 14:31:31 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 21 Feb 2006 11:31:31 -0800 Subject: [Bioperl-l] Bio::Ontology::Ontology In-Reply-To: <000001c63717$5314ded0$c2987ca5@pc13> References: <000001c63717$5314ded0$c2987ca5@pc13> Message-ID: Send it to me. I'll review and check it in if appropriate. You should also write a test (and include it in what you send to me; see t/*.t for examples for how to write a test). (and obviously the test should succeed) Chris, I suppose this is the time to object - I would conceptually like the ontology-based annotation too but now we are up against a (hopefully) working implementation which can only be beaten by another working implementation, and frankly I don't have time to attempt one now. -hilmar On 2/21/06, Sohel Merchant wrote: > Hi Hilmar and Chris, > I have played around a bit using Bio::Annotation::Collection to > capture the headers of an ontology file. It behaves pretty well and > avoids the cycle issue which might arise by suing ontology to describe > the ontology. I have an initial version of a working parser for obo flat > file format. > > Chris, I was able to model any kind of relationship by using some of the > functionality in the Bio::Ontology::SimpleGoEngine which, I had > initially overlooked. > > I would like to commit this code to the Bioperl CVS, but I don't have > write access to it I believe. Can I send the stuff to either of you > guys? > > Hilmar, I would like your feedback on the code base and would be happy > to make any changes required before we commit it to the CVS. > > Thanks, > Sohel Merchant. > dictyBase > > -----Original Message----- > From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar > Lapp > Sent: Monday, February 20, 2006 8:53 PM > To: chris mungall > Cc: Bioperl; Sohel Merchant > Subject: Re: [Bioperl-l] Bio::Ontology::Ontology > > On 2/20/06, chris mungall wrote: > > > > I like the idea of using an ontology to describe the ontology. > > > > Note that the proposed structure: > > OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI > > > > will lead to cycles in the object graph when the metadata ontology > > describes itself. > > Yes I know, that's why I didn't want to be too vocal about it ... > > > > > actually, I think the ontology module already has object reference > > cycles. TermI->OntologyI->TermI > > > > When I brought this up originally people didn't seem to care much - so > > long as you're only parsing GO then it's not a big issue, people have > > enough memory they won't notice a big chunk of memory that refuses to > > be garbage collected way after it's used. > > There is a method that destroys the cycle: $ontology->close() > (this is also an interface method) > > Essentially, the cycle is not in OntologyI itself but in OntologyI > HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to > terms which (may) hold a reference to an OntologyI which holds a > reference to the OntologyEngineI. > > I say 'may' in parentheses because an implementation may use tricks > like late instantiation, stringified references (handles), and weak > references. It's possible to avoid the cycle altogether using such > tricks but it remains questionable how much this then affects > performance, and how ugly and incomprehensible the code would become. > Since there is the close() method I haven't bothered yet trying a > fully de-cycled implementation. > > > Of course, if you want to use > > bioperl to cycle though all of OBO + SnoMed + UMLS then it's a > > different story. > > Well if you want to keep all three in memory for some kind of > cross-reasoning then yes you are in trouble. But if you do one > ontology after another, you'd just have make sure to call close() on > an ontology once you're done with it. > > > > > I think it's best of Sohel concentrates on getting obo.pm working, > then > > we can start thinking as a group about the best way to capture > ontology > > metadata. This includes metadata on the whole ontology, and metadata > on > > the terms (eg synonyms). > > > > To what extent are the current modules already in use? > > I don't know about others but I use them often. > > > I think the object cycle is a serious flaw, will it be possible to fix > this without > > a major overhaul? > > If I recall correctly the way go-perl circumvents this is by having > the ontology of a term as a flat attribute. This also means that when > having a term alone, you cannot ask for its connected terms. It's been > a while, so Chris set me straight where this is not true. > > It should be possible to come up with an implementation of OntologyI > that for all intents and purposes behaves like a flat scalar giving > the name until you call one of its graph traversal methods. At that > point it would instantiate the engine from persistent storage (file, > or a database connection), or retrieve one from a 'store'. The latter > is I believe what Allen started with the OntologyStore, but again I > would need to check the details. > > -hilmar > > > > > > > On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote: > > > > > Sohel, please do keep the discussion on the list, in your own > interest > > > as there's a multitude of people who can respond to you. > > > > > > SimpleValue would probably be what I'd use too. As Heikki hinted you > > > might even create an ontology for annotating ontologies, which would > > > allow you to use Annotation::OntologyTerm for annotation, but then > > > there's no qualifier value ... > > > > > > Bioperl 1.5.1 has been released last year, please check the website. > > > > > > -hilmar > > > > > > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote: > > > > > >> Hi Hilmar, > > >> I really like your suggestion of implementing the > Bio::AnnotatableI > > >> interface in the Bio::Ontology::Ontology class. I am going to > > >> implement > > >> this and play around a little with it. I am planning to use > > >> Bio::Annotation::SimpleValue for annotating the header as it > provides > > >> a > > >> good way of specifying the Tag/value pair. What are your thoughts > on > > >> using this? > > >> > > >> Also, I was wondering if you have any idea about the scheduled > date > > >> for the Bioperl 1.51 release. I would like to contribute some stuff > in > > >> the next release. > > >> > > >> Thanks, > > >> Sohel. > > >> > > >> -----Original Message----- > > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > > >> Sent: Friday, February 10, 2006 3:40 PM > > >> To: Sohel Merchant > > >> Cc: Bioperl > > >> Subject: Re: Bio::Ontology::Ontology > > >> > > >> Sohel, > > >> > > >> please allow me to copy the list in my response. There's many good > and > > >> insightful people on the list who may have something to add or > > >> different ideas. > > >> > > >> I've come across that problem myself, for instance with InterPro. > What > > >> I've done so far simply is to stick it unstructured into the > > >> definition > > >> slot, which is not helpful if your purpose goes further than just > > >> displaying it in an unstructured fashion. > > >> > > >> I'm not sure you would want to create another class for this (like > > >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., > the > > >> implementation, probably not the interface) annotatable (i.e., > > >> implement Bio::Annotatable), which supposedly would be simple to do > > >> (AnnotationCollection is already implemented, you'd just return an > > >> instance of it). > > >> > > >> Even though tag/value pairs sound like quick&fast way to go I'm > > >> leaning > > >> against it; in essence we're moving away from that elsewhere > > >> (SeqFeatureI) and hence I don't think we should restart it here. > > >> > > >> I'm not giving a definitive answer here, just my (initial) > thoughts. > > >> Hope that helps nonetheless. Can you fancy yourself trying the > > >> Annotatable approach and let us know how it goes? > > >> > > >> -hilmar > > >> > > >> > > >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote: > > >> > > >>> Hi Hilmar, > > >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase, > > >>> Northwestern University. I am working on a parser for an ontology > > >>> file. I really like the ontology object model which you have > > >>> contributed to Bioperl. I think its just Awesome!! One of things > > >>> which > > >> > > >>> I thought would be great to capture is the ontology headers. Right > > >>> now > > >> > > >>> one can specify only the name, authority information. I was > wondering > > >>> if there is any way, I could also capture other ontology file > headers > > >>> like version of the file, date when that ontology file was made. I > > >>> was > > >> > > >>> thinking of making a header class or alternatively it could go as > > >>> Hash > > >> > > >>> of values in the Bio::Ontology::Ontology class itself. I wanted to > > >>> know whets your thoughts about on this. > > >>> > > >>> Thanks, > > >>> Sohel Merchant > > >>> dictyBase > > >>> > > >> -- > > >> ------------------------------------------------------------- > > >> Hilmar Lapp email: lapp at gnf.org > > >> GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > >> ------------------------------------------------------------- > > >> > > >> > > >> > > >> > > > -- > > > ------------------------------------------------------------- > > > Hilmar Lapp email: lapp at gnf.org > > > GNF, San Diego, Ca. 92121 phone: +1-858-812-1757 > > > ------------------------------------------------------------- > > > > > > > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- > > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From MEC at stowers-institute.org Tue Feb 21 15:38:55 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 21 Feb 2006 14:38:55 -0600 Subject: [Bioperl-l] Pattern Density Message-ID: You might consider displaying ccgg content as a track in mouse genome browser at http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34 For example, the following track causes it to display 3 proportionally sized red boxes in the first 3K of mouse Chr1 [MotifContent] glyph = xyplot graph_type = boxes fgcolor = black bgcolor = red height=100 min_score=0 max_score=100 label=1 key="Motif Content" reference=Chr1 MotifContent CCGG 1..1000 score=20 MotifContent CCGG 1001..2000 score=50 MotifContent CCGG 2001..3000 score=30 There are many ways for computing the score. I myself would begin with: #!/usr/bin/env perl use strict; use Bio::SeqIO; # for reading sequence to scan use TFBS::Word::Consensus; # for the pattern matching. cf. http://forkhead.cgb.ki.se/TFBS/ use PDL::Basic; # if you have it installed, for the histogram binning statistics ________________________________ From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa Sent: Tuesday, February 21, 2006 11:25 AM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Pattern Density Good Friends, I have an important client who wants a histogram display of the density of "ccgg" along any chromosome of the mouse genome in 1000 bp windows. I'm thinking that maybe there is a bio-perl module that could help with this. That'd probably beat having to write something from scratch. Any help that you give would be greatly appreciated. I am more concerned about the reading and analysis of the sequence than actual plotting of the histogram, but anything you can offer will be appreciated. Thank you. Nick Staffa Telephone: 919-316-4569 (NIEHS: 6-4569) Scientific Computing Support Group NIEHS Information Technology Support Services Contract (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) National Institute of Environmental Health Sciences National Institutes of Health Research Triangle Park, North Carolina From cjfields at uiuc.edu Tue Feb 21 16:15:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Feb 2006 15:15:18 -0600 Subject: [Bioperl-l] bioperl maillist searches not updated Message-ID: <000801c6372b$eae00870$15327e82@pyrimidine> Seems that using Google to search through the mailing list will only get mail up to the beginning of August 2005. I went back to look up Hilmar's email on bioperl-db recently and can't find it. So I tried anything in 2006: http://www.google.com/search?hl=en&lr=&safe=off&as_qdr=all&q=site%3Abioperl. org+inurl%3Apipermail+inurl%3Abioperl-l+2006&btnG=Search And got nothin'! The Open-Bio form has some mail from 2006, but only up to 1-24-2006. Luckily, the mailing list archives seem to be fine: Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From osborne1 at optonline.net Tue Feb 21 16:13:44 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 21 Feb 2006 16:13:44 -0500 Subject: [Bioperl-l] Pattern Density In-Reply-To: Message-ID: Nick, I was mistaken previously when I hinted that you couldn't create histograms using Bioperl: http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Graphics/Glyph/xyplot. html This could do exactly what you want. Brian O. On 2/21/06 3:38 PM, "Cook, Malcolm" wrote: > > You might consider displaying ccgg content as a track in mouse genome > browser at > http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34 > > For example, the following track causes it to display 3 proportionally > sized red boxes in the first 3K of mouse Chr1 > > [MotifContent] > glyph = xyplot > graph_type = boxes > fgcolor = black > bgcolor = red > height=100 > min_score=0 > max_score=100 > label=1 > key="Motif Content" > > reference=Chr1 > MotifContent CCGG 1..1000 score=20 > MotifContent CCGG 1001..2000 score=50 > MotifContent CCGG 2001..3000 score=30 > > > There are many ways for computing the score. I myself would begin with: > > #!/usr/bin/env perl > use strict; > > use Bio::SeqIO; # for reading sequence to scan > use TFBS::Word::Consensus; # for the pattern matching. cf. > http://forkhead.cgb.ki.se/TFBS/ > use PDL::Basic; # if you have it installed, for the histogram binning > statistics > > > > > > > ________________________________ > > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa > Sent: Tuesday, February 21, 2006 11:25 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Pattern Density > > > Good Friends, > I have an important client who wants a histogram display of the > density of "ccgg" along any chromosome of the mouse genome in 1000 bp > windows. > > I'm thinking that maybe there is a bio-perl module that could > help with this. > That'd probably beat having to write something from scratch. > Any help that you give would be greatly appreciated. > I am more concerned about the reading and analysis of the > sequence than actual plotting of the histogram, but anything you can > offer will be appreciated. > > Thank you. > > Nick Staffa > Telephone: 919-316-4569 (NIEHS: 6-4569) > Scientific Computing Support Group > NIEHS Information Technology Support Services Contract > (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) > National Institute of Environmental Health Sciences > National Institutes of Health > Research Triangle Park, North Carolina > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Feb 21 16:58:07 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Feb 2006 15:58:07 -0600 Subject: [Bioperl-l] bioperl-db issues Message-ID: <000d01c63731$e61be1f0$15327e82@pyrimidine> Sorry about the huge delay in this response, got caught up with other things. > > Bad News: There's a new problem now. I updated from CVS yesterday; I > > walked > > through the steps and ran 'nmake test', with everything passing fine. > > However, load_seqdatabase.pl is extremely slow; it's loading a sequence > > every 5 minutes or so. I noticed (when using '-debug') that it is > > hanging > > up in Bio::DB::BioSQL::SpeciesAdaptor each time. If I create a > > database, > > load the biosql schema, and load sequences w/o loading taxonomy, the > > problem > > goes away. > > > > Here's the debugging output (I cut it off at the point it hangs up): > > [...] > > > preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL, > > taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE > > taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND > > ncbi_taxon_id = > > ? > > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) > > SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid) > > I'm a bit surprised if this is the query where it hangs. Are the > indexes all there? There should be a primary key index on > taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name > over (taxon_id,name,name_class). Also, there should be separate indexes > on taxon_name.taxon_id and taxon_name.name. Are they all there? If you > reinstantiated the schema from the DDL then it seems unlikely that > somehow the indexes have vanished except if you messed with the schema > or the DDL. So far everything looks like you mentioned (see below for the ANALYZE stuff). The only thing that I wasn't sure about was that taxon_name indexes were all primary keys. That's really it. > Putting an index on taxon_name.name_class really can't make sense, so > let's assume it can't be that. > > So really I suspect this has something to do with the state of the > database and the version of MySQL. In particular, from some 4.x version > of MySQL under certain circumstances you have to analyze the statistics > of the tables in order to get the optimizer pick up the indexes > properly. Are you on MySQL 4.x and if so, have you done that? > > There's the ANALYZE TABLE command: > http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html > > Note the comment: "This statement works with MyISAM, BDB, and (as of > MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher? > > Also, you can check the execution plan for the query using EXPLAIN. > http://dev.mysql.com/doc/refman/4.1/en/explain.html > > This should show you whether the index would be picked up for the query > or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to > the db using the mysql shell (mysql). > > I believe something similarly strange was encountered by someone using > DB::GFF (or Chado) under MySQL, and if I recall correctly the solution > was to optimize (analyze) the tables. Maybe someone who was in that > thread reads this and can comment? I find it odd that it worked well back in December and doesn't work now. I updated bioperl and bioperl-db from CVS since then, so have there been any changes that may have caused this? I noticed a few changes here and there. Here's what I have tried thus far: 1) I reinstalled MySQL. I thought it might be that I had my database on a partitioned drive, so I reinstalled on the main drive. 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the schema, and got the same error when loading (hanging on SpeciesAdaptor. Tried ANALYZE: ------------------------------------ mysql> ANALYZE TABLE taxon; +----------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +----------------+---------+----------+----------+ | bioseqdb.taxon | analyze | status | OK | +----------------+---------+----------+----------+ 1 row in set (0.42 sec) mysql> ANALYZE TABLE taxon_name; +---------------------+---------+----------+----------+ | Table | Op | Msg_type | Msg_text | +---------------------+---------+----------+----------+ | bioseqdb.taxon_name | analyze | status | OK | +---------------------+---------+----------+----------+ 1 row in set (0.36 sec) mysql> ------------------------------------ so that's fine. 3) Using EXPLAIN table: ------------------------------------ mysql> EXPLAIN taxon; +-------------------+---------------------+------+-----+---------+---------- ------+ | Field | Type | Null | Key | Default | Extra | +-------------------+---------------------+------+-----+---------+---------- ------+ | taxon_id | int(10) unsigned | NO | PRI | NULL | auto_increment | | ncbi_taxon_id | int(10) | YES | UNI | NULL | | | parent_taxon_id | int(10) unsigned | YES | MUL | NULL | | | node_rank | varchar(32) | YES | | NULL | | | genetic_code | tinyint(3) unsigned | YES | | NULL | | | mito_genetic_code | tinyint(3) unsigned | YES | | NULL | | | left_value | int(10) unsigned | YES | UNI | NULL | | | right_value | int(10) unsigned | YES | UNI | NULL | | +-------------------+---------------------+------+-----+---------+---------- ------+ 8 rows in set (0.02 sec) mysql> EXPLAIN taxon_name; +------------+------------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +------------+------------------+------+-----+---------+-------+ | taxon_id | int(10) unsigned | NO | PRI | | | | name | varchar(255) | NO | PRI | | | | name_class | varchar(32) | NO | PRI | | | +------------+------------------+------+-----+---------+-------+ 3 rows in set (0.00 sec) ------------------------------------ Does taxon_name need three primary keys? 4) So I tried reloading the sequences: ------------------------------------ C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -format genbank -dbname bioseqdb -dbuser root -dbpass ********** -testonly -safe -debug NP_249092.gpt And got this: Loading NP_249092.gpt ... attempting to load adaptor class for Bio::Seq::RichSeq attempting to load module Bio::DB::BioSQL::RichSeqAdaptor ...... SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank) SimpleValueAdaptor::add_assoc: binding column 1 to "21" (FK to Bio::SeqFeature::Generic) SimpleValueAdaptor::add_assoc: binding column 2 to "34" (FK to Bio::Annotation::SimpleValue) SimpleValueAdaptor::add_assoc: binding column 3 to "11" (value) SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank) no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager no adaptor found for class Bio::Annotation::TypeManager BioNamespaceAdaptor: binding UK column 1 to "bioperl" (namespace) SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid) ------------------------------------ Which is where it hangs, as before, usually about 2 minutes for each sequence. It seems there's a timeout happening in there somewhere... It definitely has something to do with the lookup, but like I said it did run much faster last Nov-Dec. So I'm a bit lost now. Any ideas? I may try re-optimizing tables to see if it helps any. I'm also really thinking of giving postgresql a shot but I have used mysql for a while now; I'd like to stay with it if I can. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Tue Feb 21 23:09:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Feb 2006 22:09:18 -0600 Subject: [Bioperl-l] bioperl-db issues In-Reply-To: Message-ID: <000001c63765$c0472370$15327e82@pyrimidine> I got it worked out. The Windows installer had picked out lower memory settings (key buffer 10M, for instance) when I reinstalled, which drastically slowed everything down. I reset the settings for a server environment and it's fine now. Well, as fine as it will likely get since I'm running this on a 1.8 GHz P4 with 756 MB RAM, so I'm not expecting it to actually fly. It's loading at about two sequences/second. I'll have to see if I get a speed improvement when optimizing tables. I'll add this to the wiki for installing bioperl-db under Windows. Are there optimal settings for using bioperl-db, such as key buffer and sort buffer size, buffer pool size, etc? Or do you think I'm likely to run into a processor speed limit? Just trying to get a fix on how much memory I could push towards getting a smaller sequence database loaded, nothing like swissprot. I saw something in the mail list about setting max_allowed_packet and a few other settings but that was about four years ago. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign > -----Original Message----- > From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar > Lapp > Sent: Tuesday, February 21, 2006 6:44 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: bioperl-db issues > > On 2/21/06, Chris Fields wrote: > > [...] > > I find it odd that it worked well back in December and doesn't work now. > I > > updated bioperl and bioperl-db from CVS since then, so have there been > any > > changes that may have caused this? I noticed a few changes here and > there. > > The changes were fixes to retrieve the rank on persistent annotation > objects (it was only stored before, but never retrieved). Neither the > SpeciesAdaptor nor any of the taxonomy queries was affected by this. > > > > > Here's what I have tried thus far: > > > > 1) I reinstalled MySQL. I thought it might be that I had my database on > a > > partitioned drive, so I reinstalled on the main drive. > > > > 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded > the > > schema, and got the same error when loading (hanging on SpeciesAdaptor. > > Tried ANALYZE: > > ------------------------------------ > > mysql> ANALYZE TABLE taxon; > > +----------------+---------+----------+----------+ > > | Table | Op | Msg_type | Msg_text | > > +----------------+---------+----------+----------+ > > | bioseqdb.taxon | analyze | status | OK | > > +----------------+---------+----------+----------+ > > 1 row in set (0.42 sec) > > > > mysql> ANALYZE TABLE taxon_name; > > +---------------------+---------+----------+----------+ > > | Table | Op | Msg_type | Msg_text | > > +---------------------+---------+----------+----------+ > > | bioseqdb.taxon_name | analyze | status | OK | > > +---------------------+---------+----------+----------+ > > 1 row in set (0.36 sec) > > I'm not sure but you may have to analyze all tables. > > > > > mysql> > > ------------------------------------ > > so that's fine. > > > > 3) Using EXPLAIN table: > > ------------------------------------ > > mysql> EXPLAIN taxon; > > Note that you wouldn't use EXPLAIN on a table but on a query instead. > I.e., copy&paste the offending query into the mysql editor, prefix it > with EXPLAIN and then see what the results are. It should show whether > the indexes are being used properly. > > Most likely it doesn't use one of the idnexes that it should be using > but does a full table scan instead. The explain plan should pinpoint > that. > > BTW you can also use this to reconfirm the command line observation > about the query being slow - it should 'hang' in the mysql shell as > well. If it doesn't then there is something else going on. (if the > placeholders pose a problem replace them with the actual values as > given in the log) > > > [..] > > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) > > SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid) > > ------------------------------------ > > Which is where it hangs, as before, usually about 2 minutes for each > > sequence. > > Do you also see a SELECT CLASSIFICATION query succeeding the one above > (e.g., if you wait)? I'm asking because I'm surprised that that isn't > the one you're seeing as taking too long, because it has been reported > earlier to cause such problems with mysql. Alex Zelensky posted what > he found worked as a fix. > > -hilmar > -- > ---------------------------------------------------------- > : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : > ---------------------------------------------------------- From hlapp at gmx.net Tue Feb 21 19:43:42 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 21 Feb 2006 16:43:42 -0800 Subject: [Bioperl-l] bioperl-db issues In-Reply-To: <000d01c63731$e61be1f0$15327e82@pyrimidine> References: <000d01c63731$e61be1f0$15327e82@pyrimidine> Message-ID: On 2/21/06, Chris Fields wrote: > [...] > I find it odd that it worked well back in December and doesn't work now. I > updated bioperl and bioperl-db from CVS since then, so have there been any > changes that may have caused this? I noticed a few changes here and there. The changes were fixes to retrieve the rank on persistent annotation objects (it was only stored before, but never retrieved). Neither the SpeciesAdaptor nor any of the taxonomy queries was affected by this. > > Here's what I have tried thus far: > > 1) I reinstalled MySQL. I thought it might be that I had my database on a > partitioned drive, so I reinstalled on the main drive. > > 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the > schema, and got the same error when loading (hanging on SpeciesAdaptor. > Tried ANALYZE: > ------------------------------------ > mysql> ANALYZE TABLE taxon; > +----------------+---------+----------+----------+ > | Table | Op | Msg_type | Msg_text | > +----------------+---------+----------+----------+ > | bioseqdb.taxon | analyze | status | OK | > +----------------+---------+----------+----------+ > 1 row in set (0.42 sec) > > mysql> ANALYZE TABLE taxon_name; > +---------------------+---------+----------+----------+ > | Table | Op | Msg_type | Msg_text | > +---------------------+---------+----------+----------+ > | bioseqdb.taxon_name | analyze | status | OK | > +---------------------+---------+----------+----------+ > 1 row in set (0.36 sec) I'm not sure but you may have to analyze all tables. > > mysql> > ------------------------------------ > so that's fine. > > 3) Using EXPLAIN table: > ------------------------------------ > mysql> EXPLAIN taxon; Note that you wouldn't use EXPLAIN on a table but on a query instead. I.e., copy&paste the offending query into the mysql editor, prefix it with EXPLAIN and then see what the results are. It should show whether the indexes are being used properly. Most likely it doesn't use one of the idnexes that it should be using but does a full table scan instead. The explain plan should pinpoint that. BTW you can also use this to reconfirm the command line observation about the query being slow - it should 'hang' in the mysql shell as well. If it doesn't then there is something else going on. (if the placeholders pose a problem replace them with the actual values as given in the log) > [..] > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class) > SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid) > ------------------------------------ > Which is where it hangs, as before, usually about 2 minutes for each > sequence. Do you also see a SELECT CLASSIFICATION query succeeding the one above (e.g., if you wait)? I'm asking because I'm surprised that that isn't the one you're seeing as taking too long, because it has been reported earlier to cause such problems with mysql. Alex Zelensky posted what he found worked as a fix. -hilmar -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From cjfields at uiuc.edu Wed Feb 22 00:13:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 21 Feb 2006 23:13:18 -0600 Subject: [Bioperl-l] removing sequences from a database? Message-ID: <000001c6376e$b113c170$15327e82@pyrimidine> I think this has been posed once but I couldn't find a straight answer on the mailing list; is there a way to remove sequences in a BioSQL database using bioperl-db? This is the last I heard about it: http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Wed Feb 22 00:20:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 21 Feb 2006 21:20:05 -0800 Subject: [Bioperl-l] removing sequences from a database? In-Reply-To: <000001c6376e$b113c170$15327e82@pyrimidine> References: <000001c6376e$b113c170$15327e82@pyrimidine> Message-ID: This is a pretty old posting :-) Sure you can remove sequences. In fact you can remove any persistent object by calling $pobj->remove(). I.e., for a persistent sequence (which is what you get from the adaptors): $pseq->remove() Do not forget to call commit() on the persistence adaptor or the persistent object itself or otherwise the operation is rolled back when you disconnect. BTW there are examples for objects other than the sequence object itself (say you want to remove only the features) in the scripts/biosql directory; some of the --mergeobjs closure examples do this. -hilmar On 2/21/06, Chris Fields wrote: > I think this has been posed once but I couldn't find a straight answer on > the mailing list; is there a way to remove sequences in a BioSQL database > using bioperl-db? This is the last I heard about it: > > http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- ---------------------------------------------------------- : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net : ---------------------------------------------------------- From dhoworth at mrc-lmb.cam.ac.uk Wed Feb 22 05:20:10 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed, 22 Feb 2006 10:20:10 +0000 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <200602211326.00021.lstein@cshl.edu> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211055.31221.lstein@cshl.edu> <43FB44DD.4090504@mrc-lmb.cam.ac.uk> <200602211326.00021.lstein@cshl.edu> Message-ID: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk> Lincoln Stein wrote: > Hi Dave, > > Well, when you are using 1-based coordinates, an line that contains 44 > intervals will have 45 ticks. If you move to 0-based coordinates, then the > first tick will be labeled 0 and the last tick will be labeled 44. An > alternative is to make each base dimensionless, but that becomes a problem > when dealing with single base features, such as SNPs. > > These issues are why I have long advocated for interbase coordinates > in which you number the positions between bases rather than the bases > themselves. I see your point but I need to work with the coordinates that the users expect and are familiar with. (Things get much worse with PDB residue numbering :) > Draw me the picture of what you expect to see. I think of it this way: > > 1 2 3 4 5 6 > A>G>C>T>A> I guess something went wrong with your ASCII art :( OK, consider a 44-residue entry from SwissProt (P12239): TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR The first T is numbered 1 and the last R is numbered 44. So I expect to see a line with 44 positions indicated somehow (whether these are half-open intervals or points on the line), with the number 1 at the left end and the number 44 at the right end. An important point is that if I then place other tracks below this one that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, they should align properly (according to whatever convention is used to represent a residue). For a short sequence like this it would be possible to use letters to represent the residue but I'd like to use the same convention for longer sequences as well and have everything be consistent. I'm hoping Bio:Graphics will make this easy. Thanks, Dave From khoueiry at ibdm.univ-mrs.fr Wed Feb 22 04:12:20 2006 From: khoueiry at ibdm.univ-mrs.fr (khoueiry) Date: Wed, 22 Feb 2006 10:12:20 +0100 Subject: [Bioperl-l] [Fwd: Re: Pattern Density] Message-ID: <1140599541.19981.26.camel@localhost> An embedded and charset-unspecified text was scrubbed... Name: not available Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060222/af675b05/attachment.pl -------------- next part -------------- An embedded message was scrubbed... From: khoueiry Subject: Re: [Bioperl-l] Pattern Density Date: Tue, 21 Feb 2006 19:47:54 +0100 Size: 3812 Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060222/af675b05/attachment.mht From dhoworth at mrc-lmb.cam.ac.uk Wed Feb 22 10:13:10 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed, 22 Feb 2006 15:13:10 +0000 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <1140619014.3142.81.camel@localhost.localdomain> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211055.31221.lstein@cshl.edu> <43FB44DD.4090504@mrc-lmb.cam.ac.uk> <200602211326.00021.lstein@cshl.edu> <43FC3ADA.4090203@mrc-lmb.cam.ac.uk> <1140619014.3142.81.camel@localhost.localdomain> Message-ID: <43FC7F86.6060901@mrc-lmb.cam.ac.uk> Scott Cain wrote: > I don't know if this helps at all, but you could think of that 45 tick > mark as the termination, since the space between the 44th and the 45th > tick mark corresponds to your 44th residue. Yes, that's the way I do think of it and that's the way I expect everybody else to think of it. But the numbers need to match the residues in any case. ie. the numbers need to match the spaces not the tick marks, if the spaces match the residues. > I suppose it is a matter of correctly training your users :-) The important thing is to have a consistent model, then it's easy to explain to users. Cheers, Dave From lstein at cshl.edu Wed Feb 22 11:22:02 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 22 Feb 2006 11:22:02 -0500 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211326.00021.lstein@cshl.edu> <43FC3ADA.4090203@mrc-lmb.cam.ac.uk> Message-ID: <200602221122.02707.lstein@cshl.edu> The base starts at the tickmark and extends to (but doesn't touch) the next one. If you are down at the resolution at which you see residue letters, then lines drawn underneath the letters will line up like this: 1 2 3 4 5 6 7 8 9 10 ticks T S N T P N Q E P residues ========= =========== domains Right? Lincoln On Wednesday 22 February 2006 05:20, Dave Howorth wrote: > Lincoln Stein wrote: > > Hi Dave, > > > > Well, when you are using 1-based coordinates, an line that contains 44 > > intervals will have 45 ticks. If you move to 0-based coordinates, then > > the first tick will be labeled 0 and the last tick will be labeled 44. An > > alternative is to make each base dimensionless, but that becomes a > > problem when dealing with single base features, such as SNPs. > > > > These issues are why I have long advocated for interbase coordinates > > in which you number the positions between bases rather than the bases > > themselves. > > I see your point but I need to work with the coordinates that the users > expect and are familiar with. (Things get much worse with PDB residue > numbering :) > > > Draw me the picture of what you expect to see. I think of it this way: > > > > 1 2 3 4 5 6 > > A>G>C>T>A> > > I guess something went wrong with your ASCII art :( > > OK, consider a 44-residue entry from SwissProt (P12239): > > TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR > > The first T is numbered 1 and the last R is numbered 44. > > So I expect to see a line with 44 positions indicated somehow (whether > these are half-open intervals or points on the line), with the number 1 > at the left end and the number 44 at the right end. > > An important point is that if I then place other tracks below this one > that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, > they should align properly (according to whatever convention is used to > represent a residue). > > For a short sequence like this it would be possible to use letters to > represent the residue but I'd like to use the same convention for longer > sequences as well and have everything be consistent. > > I'm hoping Bio:Graphics will make this easy. > > Thanks, Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From dhoworth at mrc-lmb.cam.ac.uk Wed Feb 22 11:34:08 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed, 22 Feb 2006 16:34:08 +0000 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <200602221122.02707.lstein@cshl.edu> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211326.00021.lstein@cshl.edu> <43FC3ADA.4090203@mrc-lmb.cam.ac.uk> <200602221122.02707.lstein@cshl.edu> Message-ID: <43FC9280.1020008@mrc-lmb.cam.ac.uk> Lincoln Stein wrote: > The base starts at the tickmark and extends to (but doesn't touch) the next > one. If you are down at the resolution at which you see residue letters, then > lines drawn underneath the letters will line up like this: > > 1 2 3 4 5 6 7 8 9 10 ticks > T S N T P N Q E P residues > ========= =========== domains > > Right? Yes. What's your point? Dave From cain at cshl.edu Wed Feb 22 11:29:21 2006 From: cain at cshl.edu (Scott Cain) Date: Wed, 22 Feb 2006 11:29:21 -0500 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <43FC7F86.6060901@mrc-lmb.cam.ac.uk> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211055.31221.lstein@cshl.edu> <43FB44DD.4090504@mrc-lmb.cam.ac.uk> <200602211326.00021.lstein@cshl.edu> <43FC3ADA.4090203@mrc-lmb.cam.ac.uk> <1140619014.3142.81.camel@localhost.localdomain> <43FC7F86.6060901@mrc-lmb.cam.ac.uk> Message-ID: <1140625762.3142.107.camel@localhost.localdomain> Hi Dave, I took the example code you posted a few days ago and added a few motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the last residue), which results in the attached graphic. As Lincoln pointed it, the features are drawn from the beginning (1 and 35), and through the last residue (up to but not touching 11 and 45). So the space between 35 and 36 corresponds to residue 35. That's the way it works. Scott On Wed, 2006-02-22 at 15:13 +0000, Dave Howorth wrote: > Scott Cain wrote: > > I don't know if this helps at all, but you could think of that 45 tick > > mark as the termination, since the space between the 44th and the 45th > > tick mark corresponds to your 44th residue. > > Yes, that's the way I do think of it and that's the way I expect > everybody else to think of it. > > But the numbers need to match the residues in any case. ie. the numbers > need to match the spaces not the tick marks, if the spaces match the > residues. > > > I suppose it is a matter of correctly training your users :-) > > The important thing is to have a consistent model, then it's easy to > explain to users. > > Cheers, Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: motifs.png Type: image/png Size: 1879 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060222/f3263327/attachment.png From dhoworth at mrc-lmb.cam.ac.uk Wed Feb 22 11:45:00 2006 From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth) Date: Wed, 22 Feb 2006 16:45:00 +0000 Subject: [Bioperl-l] Bio::Graphics off by one? In-Reply-To: <1140625762.3142.107.camel@localhost.localdomain> References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk> <200602211055.31221.lstein@cshl.edu> <43FB44DD.4090504@mrc-lmb.cam.ac.uk> <200602211326.00021.lstein@cshl.edu> <43FC3ADA.4090203@mrc-lmb.cam.ac.uk> <1140619014.3142.81.camel@localhost.localdomain> <43FC7F86.6060901@mrc-lmb.cam.ac.uk> <1140625762.3142.107.camel@localhost.localdomain> Message-ID: <43FC950C.7080007@mrc-lmb.cam.ac.uk> Scott Cain wrote: > Hi Dave, > > I took the example code you posted a few days ago and added a few > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the > last residue), which results in the attached graphic. Yes, that's the same sort of graphic I'm getting. > As Lincoln pointed it, the features are drawn from the beginning (1 and > 35), and through the last residue (up to but not touching 11 and 45). > So the space between 35 and 36 corresponds to residue 35. But there is no residue 45! So there should be no number 45 anywhere on the picture. I think the problem is that the tick strip is displaying numbers for the ticks instead of the intervals. The intervals are what corresponds to users' models of physical reality and my graphics need to match that. > That's the way it works. I guess I'll have to experiment and patch until it does what I want then, if nobody knows how to do it. Cheers, Dave From iamvela at yahoo.com Wed Feb 22 12:21:59 2006 From: iamvela at yahoo.com (Raghunath Verabelli) Date: Wed, 22 Feb 2006 09:21:59 -0800 (PST) Subject: [Bioperl-l] Blast returns result, but does not return hits Message-ID: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com> Hi All: I am new to Perl/BioPerl world. I am debugging a program that used to work fine before. Blast works fine and returns results, but I am unale to get any hits from the results. Here is the relevant code: $blastObj = new Bio::SearchIO (-file=>$resultsFile, -format=>'blast'); while (my $result = $blastObj->next_result()) { while (my $bioPerlHit = $result->next_hit()) { ....... The first while condition returns true, but the second while condition returns false. So looks like there is some result, but it is unable to identify the hits in the result. I printed the $result (pasted below). Any ideas/comments to resolve this? Thanks in advance. I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on Windows XP platform. Like I said before, this application was running fine on a different windows machine with similar environment,so looks like there is some change in the products/versions that is causing the problem. thanks again, Raghu Blast result (i can send complete result if you need it):

BLASTP 2.2.13 [Nov-27-2005]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Sch?ffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman 
(1997), "Gapped BLAST and PSI-BLAST: a new generation
of 
protein database search programs", Nucleic Acids Res.
25:3389-3402.

RID: 1140573059-19990-140117828872.BLASTQ1


Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding
environmental samples
           3,297,000 sequences; 1,129,354,045 total
letters
Query=  
Length=360


                                                      
            Score     E
Sequences producing significant alignments:           
            (Bits)  Value

ref|XP_534770.2|  PREDICTED: similar to
Mitogen-activated prot...   739    0.0   
gb|AAX36107.1|  mitogen-activated protein kinase 1
[synthetic con   739    0.0   
pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
Complexed...   739    0.0   
pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
With A S...   739    0.0   
ref|NP_786987.1|  mitogen-activated protein kinase 1
[Bos taur...   739    0.0   
emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
>prf||1813...   738    0.0   
gb|AAQ02541.1|  mitogen-activated protein kinase 1
[synthetic con   736    0.0   
gb|AAH99905.1|  Mitogen-activated protein kinase 1
[Homo sapiens]   735    0.0   
emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
             734    0.0   
gb|AAH58258.1|  Mitogen activated protein kinase 1
[Mus muscul...   731    0.0   
pdb|4ERK|   The Complex Structure Of The Map Kinase
Erk2OLOMOU...   731    0.0   
pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
Arginin...   730    0.0   
ref|XP_860750.1|  PREDICTED: similar to
Mitogen-activated prot...   729    0.0   
gb|AAK56503.1|  extracellular signal-regulated kinase
2 [Gallu...   726    0.0   
ref|XP_860716.1|  PREDICTED: similar to
Mitogen-activated prot...   726    0.0   
pdb|2ERK|   Phosphorylated Map Kinase Erk2            
             726    0.0   
pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
Kinase Co...   725    0.0   
ref|XP_860682.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
ref|XP_860651.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
>prf||181...   717    0.0   
ref|NP_001017127.1|  mitogen-activated protein kinase
1 [Xenopus    715    0.0   
dbj|BAE28679.1|  unnamed protein product [Mus
musculus]             713    0.0   
emb|CAA42482.1|  MAP kinase [Xenopus laevis]
>gb|AAH60748.1| M...   711    0.0   
sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
1 (Myel...   711    0.0   
gb|AAH76730.1|  Xp42 protein [Xenopus laevis]         
             706    0.0   
gb|AAH65868.1|  Mitogen-activated protein kinase 1
[Danio rerio]    696    0.0   
dbj|BAD23843.1|  extracellular signal regulated
protein kinase...   694    0.0   
ref|NP_878308.2|  mitogen-activated protein kinase 1
[Danio re...   694    0.0   
emb|CAG07778.1|  unnamed protein product [Tetraodon
nigroviridis]   692    0.0   
dbj|BAB11813.1|  ERK2 [Danio rerio]                   
             689    0.0   
gb|AAY57805.1|  extracellular signal-regulated kinase
2 [Danio re   687    0.0   
gb|AAH45505.1|  Mitogen-activated protein kinase 3
[Danio reri...   654    0.0   
dbj|BAB11812.1|  ERK1 [Danio rerio]                   
             654    0.0   
ref|XP_609884.2|  PREDICTED: similar to mitogen
activated prot...   653    0.0   
dbj|BAD23842.1|  extracellular signal regulated
protein kinase...   650    0.0   
gb|AAH29712.1|  Mitogen activated protein kinase 3
[Mus muscul...   644    0.0   
ref|XP_885698.1|  PREDICTED: similar to mitogen
activated prot...   644    0.0   
gb|AAA20009.1|  microtubule-associated protein-2
kinase             643    0.0   
emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
>ref|NP_059043...   641    0.0   
gb|AAH13992.1|  Mitogen-activated protein kinase 3
[Homo sapie...   641    0.0   
gb|AAQ02422.1|  mitogen-activated protein kinase 3
[synthetic ...   641    0.0   
gb|AAA41123.1|  extracellular signal-regulated kinase
1             640    0.0   
ref|XP_854045.1|  PREDICTED: similar to mitogen
activated prot...   640    0.0   
gb|AAA63486.1|  extracellular-signal-regulated kinase
1 [Rattus n   640    0.0   
emb|CAG02655.1|  unnamed protein product [Tetraodon
nigroviridis]   640    0.0   
emb|CAA42744.1|  protein serine/threonine kinase [Homo
sapiens...   639    0.0   
gb|AAA36142.1|  kinase 1                              
             639    0.0   
emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
>prf||181...   639    0.0   
ref|XP_885840.1|  PREDICTED: similar to mitogen
activated prot...   632    5e-180
ref|XP_885818.1|  PREDICTED: similar to mitogen
activated prot...   630    3e-179
ref|XP_860621.1|  PREDICTED: similar to
Mitogen-activated prot...   627    2e-178
gb|AAF71666.1|  extracellular signal-regulated kinase
1b [Rattus    627    2e-178
ref|XP_393029.1|  PREDICTED: similar to MAP kinase
[Apis mellifer   621    1e-176
gb|AAA83210.1|  MAP kinase                            
             619    4e-176
dbj|BAE46741.1|  Extracellular regulated MAP kinase
[Bombyx mori]   618    1e-175
gb|AAH13754.1|  Mapk3 protein [Mus musculus]          
             612    9e-174
dbj|BAE06412.1|  mitogen-activated protein kinase
[Ciona intestin   607    2e-172
dbj|BAE33167.1|  unnamed protein product [Mus
musculus]             600    3e-170
gb|AAN46679.1|  MAP kinase [Strongylocentrotus
purpuratus] >re...   598    1e-169
dbj|BAC02940.1|  mitogen-activated protein kinase
[Halocynthia ro   592    6e-168
gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
>gb|EAA4631...   590    2e-167
emb|CAD97888.1|  hypothetical protein [Homo sapiens]  
             589    5e-167
emb|CAD60453.1|  extracellular signal-regulated
protein kinase...   589    5e-167
emb|CAD56894.1|  mitogen-activated protein kinase 1
[Meloidogyne    589    6e-167
ref|XP_536917.2|  PREDICTED: similar to mitogen
activated prot...   588    1e-166
gb|AAN40736.1|  mitogen-activated protein kinase
[Paralichthys ol   586    4e-166
emb|CAE73725.1|  Hypothetical protein CBG21247
[Caenorhabditis br   583    3e-165
emb|CAA87057.1|  Hypothetical protein F43C1.2a
[Caenorhabditis...   581    2e-164
gb|AAA18956.1|  Sur-1 MAP kinase                      
             581    2e-164
emb|CAB60996.1|  Hypothetical protein F43C1.2b
[Caenorhabditis...   581    2e-164
gb|AAK52329.1|  extracellular signal-related kinase 1b
[Homo sapi   580    4e-164
ref|XP_885794.1|  PREDICTED: similar to mitogen
activated prot...   553    4e-156
ref|XP_868146.1|  PREDICTED: similar to mitogen
activated prot...   548    2e-154
gb|AAK52330.1|  extracellular signal-related kinase 1c
[Homo sapi   546    4e-154
dbj|BAA22620.1|  ERK2 [Mus musculus]                  
             544    2e-153
ref|XP_510921.1|  PREDICTED: mitogen-activated protein
kinase 3 [   529    8e-149
gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]    
             496    7e-139
emb|CAJ44437.1|  MAP kinase [Echinococcus
multilocularis]           491    1e-137
ref|XP_885774.1|  PREDICTED: similar to mitogen
activated prot...   444    3e-123
gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
str. PES...   431    2e-119
gb|AAZ38881.1|  extracellular regulated kinase
[Littorina littore   431    2e-119
emb|CAD60723.1|  unnamed protein product [Podospora
anserina]       411    2e-113
gb|AAK25816.1|  MAP kinase [Neurospora crassa]
>ref|XP_959713....   411    2e-113
gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
[Aspergillus ...   409    1e-112
gb|EAA74589.1|  hypothetical protein FG06385.1
[Gibberella zea...   409    1e-112
ref|XP_504312.1|  hypothetical protein [Yarrowia
lipolytica] >...   408    2e-112
gb|AAG01162.1|  mitogen-activated protein kinase
[Fusarium oxy...   408    2e-112
gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
>gb|AAK52840.1|...   408    2e-112
dbj|BAE57584.1|  unnamed protein product [Aspergillus
oryzae]       408    2e-112
dbj|BAD42855.1|  mitogen-activated protein kinase
[Bipolaris oryz   407    3e-112
gb|AAD50496.1|  mitogen activated protein kinase
[Colletotrichum    407    3e-112
gb|AAF05913.1|  mitogen-activated protein kinase
[Cochliobolus he   407    3e-112
gb|AAM89501.1|  mitogen-activated protein kinase
[Leptosphaeria m   407    3e-112
dbj|BAB21569.1|  mitogen-activated protein kinase
[Glomerella cin   407    3e-112
gb|AAB72017.1|  mitogen-activated protein kinase
[Nectria haem...   407    3e-112
emb|CAC36428.1|  mitogen activated protein kinase
[Gibberella fuj   406    6e-112
ref|XP_364720.1|  hypothetical protein MG09565.4
[Magnaporthe gri   406    6e-112
gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]   
             406    6e-112
gb|AAO63561.1|  mitogen activated protein kinase
[Verticillium fu   406    8e-112
dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]     
             405    1e-111

ALIGNMENTS
>ref|XP_534770.2| PREDICTED: similar to
Mitogen-activated protein kinase 1 (Extracellular 
signal-regulated kinase 2) (ERK-2) (Mitogen-activated 
protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
(ERT1) 
isoform 1 [Canis familiaris]
 ref|NP_620407.1| mitogen-activated protein kinase 1
[Homo sapiens]
 ref|NP_002736.3| mitogen-activated protein kinase 1
[Homo sapiens]
 gb|AAH17832.1| Mitogen-activated protein kinase 1
[Homo sapiens]
 sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
1 (Extracellular signal-regulated 
kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)

(MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
 gb|AAA58459.1| protein kinase 2
Length=360

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>gb|AAX36107.1| mitogen-activated protein kinase 1
[synthetic construct]
Length=361

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
Complexed With A Pyrazolopyridazine 
Derivative
Length=368

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  9   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 68

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From lstein at cshl.edu  Wed Feb 22 13:23:09 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:23:09 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221323.09872.lstein@cshl.edu>

Hi Dave,

If you want to adjust the way that the arrow.pm module draws the ticks, please 
make it a user-configurable option with the default being the current method. 
It should be easy enough to do this -- you just offset the position of the 
labels by 0.5 interval and inhibit drawing of the last one.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From lstein at cshl.edu  Wed Feb 22 13:40:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:40:27 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221340.28573.lstein@cshl.edu>

I have just committed a version of the arrow.pm glyph that has a 
-label_intervals flag.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From cjfields at uiuc.edu  Wed Feb 22 14:45:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 13:45:54 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <000c01c637e8$980c6f90$15327e82@pyrimidine>

Upgrade bioperl from CVS using nmake. 

Installation instructions for using nmake:

http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core

You can download a tarball using anonymous CVS (link at bottom):

http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/

or use CVS directly:

http://www.bioperl.org/wiki/Using_CVS

Then make sure to grab the last SearchIO::last bugfix, which is not in CVS
yet:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

Replace the blast.pm in \site\lib\Bio\SearchIO in your Perl directory.

Does that fix it?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 11:22 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Blast returns result, but does not return hits
> 
> Hi All:
> 
> I am new to Perl/BioPerl world.
> 
> I am debugging a program that used to work fine
> before.
> Blast works fine and returns results, but I am unale
> to get any hits from the results.
> 
> Here is the relevant code:
> 
> $blastObj = new Bio::SearchIO (-file=>$resultsFile,
> -format=>'blast');
>   while (my $result = $blastObj->next_result()) {
>      while (my $bioPerlHit = $result->next_hit()) {
>          .......
> 
> 
> The first while condition returns true, but the second
> while condition returns false. So looks like there is
> some result, but it is unable to identify the hits in
> the result. I printed the $result (pasted below).
> 
> Any ideas/comments to resolve this? Thanks in advance.
> 
> I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on
> Windows XP platform.
> 
> Like I said before, this application was running fine
> on a different windows machine with similar
> environment,so looks like there is some change in the
> products/versions that is causing the problem.
> 
> thanks again,
> Raghu
> 
> 
> 
> 
> Blast result (i can send complete result if you need
> it):
> 
> 

> BLASTP 2.2.13 [Nov-27-2005]
> Reference: Altschul, Stephen F., Thomas L. Madden,
> Alejandro A. Sch?ffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> Lipman
> (1997), "Gapped BLAST and PSI-BLAST: a new generation
> of
> protein database search programs", Nucleic Acids Res.
> 25:3389-3402.
> 
> RID: 1140573059-19990-140117828872.BLASTQ1
> 
> 
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding
> environmental samples
>            3,297,000 sequences; 1,129,354,045 total
> letters
> Query=
> Length=360
> 
> 
> 
>             Score     E
> Sequences producing significant alignments:
>             (Bits)  Value
> 
> ref|XP_534770.2|  PREDICTED: similar to
> Mitogen-activated prot...   739    0.0
> gb|AAX36107.1|  mitogen-activated protein kinase 1
> [synthetic con   739    0.0
> pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
> Complexed...   739    0.0
> pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
> With A S...   739    0.0
> ref|NP_786987.1|  mitogen-activated protein kinase 1
> [Bos taur...   739    0.0
> emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
> >prf||1813...   738    0.0
> gb|AAQ02541.1|  mitogen-activated protein kinase 1
> [synthetic con   736    0.0
> gb|AAH99905.1|  Mitogen-activated protein kinase 1
> [Homo sapiens]   735    0.0
> emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
>              734    0.0
> gb|AAH58258.1|  Mitogen activated protein kinase 1
> [Mus muscul...   731    0.0
> pdb|4ERK|   The Complex Structure Of The Map Kinase
> Erk2OLOMOU...   731    0.0
> pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
> Arginin...   730    0.0
> ref|XP_860750.1|  PREDICTED: similar to
> Mitogen-activated prot...   729    0.0
> gb|AAK56503.1|  extracellular signal-regulated kinase
> 2 [Gallu...   726    0.0
> ref|XP_860716.1|  PREDICTED: similar to
> Mitogen-activated prot...   726    0.0
> pdb|2ERK|   Phosphorylated Map Kinase Erk2
>              726    0.0
> pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
> Kinase Co...   725    0.0
> ref|XP_860682.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> ref|XP_860651.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
> >prf||181...   717    0.0
> ref|NP_001017127.1|  mitogen-activated protein kinase
> 1 [Xenopus    715    0.0
> dbj|BAE28679.1|  unnamed protein product [Mus
> musculus]             713    0.0
> emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> >gb|AAH60748.1| M...   711    0.0
> sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
> 1 (Myel...   711    0.0
> gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
>              706    0.0
> gb|AAH65868.1|  Mitogen-activated protein kinase 1
> [Danio rerio]    696    0.0
> dbj|BAD23843.1|  extracellular signal regulated
> protein kinase...   694    0.0
> ref|NP_878308.2|  mitogen-activated protein kinase 1
> [Danio re...   694    0.0
> emb|CAG07778.1|  unnamed protein product [Tetraodon
> nigroviridis]   692    0.0
> dbj|BAB11813.1|  ERK2 [Danio rerio]
>              689    0.0
> gb|AAY57805.1|  extracellular signal-regulated kinase
> 2 [Danio re   687    0.0
> gb|AAH45505.1|  Mitogen-activated protein kinase 3
> [Danio reri...   654    0.0
> dbj|BAB11812.1|  ERK1 [Danio rerio]
>              654    0.0
> ref|XP_609884.2|  PREDICTED: similar to mitogen
> activated prot...   653    0.0
> dbj|BAD23842.1|  extracellular signal regulated
> protein kinase...   650    0.0
> gb|AAH29712.1|  Mitogen activated protein kinase 3
> [Mus muscul...   644    0.0
> ref|XP_885698.1|  PREDICTED: similar to mitogen
> activated prot...   644    0.0
> gb|AAA20009.1|  microtubule-associated protein-2
> kinase             643    0.0
> emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
> >ref|NP_059043...   641    0.0
> gb|AAH13992.1|  Mitogen-activated protein kinase 3
> [Homo sapie...   641    0.0
> gb|AAQ02422.1|  mitogen-activated protein kinase 3
> [synthetic ...   641    0.0
> gb|AAA41123.1|  extracellular signal-regulated kinase
> 1             640    0.0
> ref|XP_854045.1|  PREDICTED: similar to mitogen
> activated prot...   640    0.0
> gb|AAA63486.1|  extracellular-signal-regulated kinase
> 1 [Rattus n   640    0.0
> emb|CAG02655.1|  unnamed protein product [Tetraodon
> nigroviridis]   640    0.0
> emb|CAA42744.1|  protein serine/threonine kinase [Homo
> sapiens...   639    0.0
> gb|AAA36142.1|  kinase 1
>              639    0.0
> emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
> >prf||181...   639    0.0
> ref|XP_885840.1|  PREDICTED: similar to mitogen
> activated prot...   632    5e-180
> ref|XP_885818.1|  PREDICTED: similar to mitogen
> activated prot...   630    3e-179
> ref|XP_860621.1|  PREDICTED: similar to
> Mitogen-activated prot...   627    2e-178
> gb|AAF71666.1|  extracellular signal-regulated kinase
> 1b [Rattus    627    2e-178
> ref|XP_393029.1|  PREDICTED: similar to MAP kinase
> [Apis mellifer   621    1e-176
> gb|AAA83210.1|  MAP kinase
>              619    4e-176
> dbj|BAE46741.1|  Extracellular regulated MAP kinase
> [Bombyx mori]   618    1e-175
> gb|AAH13754.1|  Mapk3 protein [Mus musculus]
>              612    9e-174
> dbj|BAE06412.1|  mitogen-activated protein kinase
> [Ciona intestin   607    2e-172
> dbj|BAE33167.1|  unnamed protein product [Mus
> musculus]             600    3e-170
> gb|AAN46679.1|  MAP kinase [Strongylocentrotus
> purpuratus] >re...   598    1e-169
> dbj|BAC02940.1|  mitogen-activated protein kinase
> [Halocynthia ro   592    6e-168
> gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
> >gb|EAA4631...   590    2e-167
> emb|CAD97888.1|  hypothetical protein [Homo sapiens]
>              589    5e-167
> emb|CAD60453.1|  extracellular signal-regulated
> protein kinase...   589    5e-167
> emb|CAD56894.1|  mitogen-activated protein kinase 1
> [Meloidogyne    589    6e-167
> ref|XP_536917.2|  PREDICTED: similar to mitogen
> activated prot...   588    1e-166
> gb|AAN40736.1|  mitogen-activated protein kinase
> [Paralichthys ol   586    4e-166
> emb|CAE73725.1|  Hypothetical protein CBG21247
> [Caenorhabditis br   583    3e-165
> emb|CAA87057.1|  Hypothetical protein F43C1.2a
> [Caenorhabditis...   581    2e-164
> gb|AAA18956.1|  Sur-1 MAP kinase
>              581    2e-164
> emb|CAB60996.1|  Hypothetical protein F43C1.2b
> [Caenorhabditis...   581    2e-164
> gb|AAK52329.1|  extracellular signal-related kinase 1b
> [Homo sapi   580    4e-164
> ref|XP_885794.1|  PREDICTED: similar to mitogen
> activated prot...   553    4e-156
> ref|XP_868146.1|  PREDICTED: similar to mitogen
> activated prot...   548    2e-154
> gb|AAK52330.1|  extracellular signal-related kinase 1c
> [Homo sapi   546    4e-154
> dbj|BAA22620.1|  ERK2 [Mus musculus]
>              544    2e-153
> ref|XP_510921.1|  PREDICTED: mitogen-activated protein
> kinase 3 [   529    8e-149
> gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]
>              496    7e-139
> emb|CAJ44437.1|  MAP kinase [Echinococcus
> multilocularis]           491    1e-137
> ref|XP_885774.1|  PREDICTED: similar to mitogen
> activated prot...   444    3e-123
> gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
> str. PES...   431    2e-119
> gb|AAZ38881.1|  extracellular regulated kinase
> [Littorina littore   431    2e-119
> emb|CAD60723.1|  unnamed protein product [Podospora
> anserina]       411    2e-113
> gb|AAK25816.1|  MAP kinase [Neurospora crassa]
> >ref|XP_959713....   411    2e-113
> gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
> [Aspergillus ...   409    1e-112
> gb|EAA74589.1|  hypothetical protein FG06385.1
> [Gibberella zea...   409    1e-112
> ref|XP_504312.1|  hypothetical protein [Yarrowia
> lipolytica] >...   408    2e-112
> gb|AAG01162.1|  mitogen-activated protein kinase
> [Fusarium oxy...   408    2e-112
> gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
> >gb|AAK52840.1|...   408    2e-112
> dbj|BAE57584.1|  unnamed protein product [Aspergillus
> oryzae]       408    2e-112
> dbj|BAD42855.1|  mitogen-activated protein kinase
> [Bipolaris oryz   407    3e-112
> gb|AAD50496.1|  mitogen activated protein kinase
> [Colletotrichum    407    3e-112
> gb|AAF05913.1|  mitogen-activated protein kinase
> [Cochliobolus he   407    3e-112
> gb|AAM89501.1|  mitogen-activated protein kinase
> [Leptosphaeria m   407    3e-112
> dbj|BAB21569.1|  mitogen-activated protein kinase
> [Glomerella cin   407    3e-112
> gb|AAB72017.1|  mitogen-activated protein kinase
> [Nectria haem...   407    3e-112
> emb|CAC36428.1|  mitogen activated protein kinase
> [Gibberella fuj   406    6e-112
> ref|XP_364720.1|  hypothetical protein MG09565.4
> [Magnaporthe gri   406    6e-112
> gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]
>              406    6e-112
> gb|AAO63561.1|  mitogen activated protein kinase
> [Verticillium fu   406    8e-112
> dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]
>              405    1e-111
> 
> ALIGNMENTS
> >ref|XP_534770.2| PREDICTED: similar to
> Mitogen-activated protein kinase 1 (Extracellular
> signal-regulated kinase 2) (ERK-2) (Mitogen-activated
> protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
> (ERT1)
> isoform 1 [Canis familiaris]
>  ref|NP_620407.1| mitogen-activated protein kinase 1
> [Homo sapiens]
>  ref|NP_002736.3| mitogen-activated protein kinase 1
> [Homo sapiens]
>  gb|AAH17832.1| Mitogen-activated protein kinase 1
> [Homo sapiens]
>  sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
> 1 (Extracellular signal-regulated
> kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)
> 
> (MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
>  gb|AAA58459.1| protein kinase 2
> Length=360
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >gb|AAX36107.1| mitogen-activated protein kinase 1
> [synthetic construct]
> Length=361
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
> Complexed With A Pyrazolopyridazine
> Derivative
> Length=368
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  9
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  68
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From iamvela at yahoo.com  Wed Feb 22 16:06:54 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 13:06:54 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000c01c637e8$980c6f90$15327e82@pyrimidine>
Message-ID: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>

Thanks Chris. I am getting below mentioned errors with
nmake.

As suggested, I downloaded the nmake utility from
Microsoft website and the bioperl-live tarball.

After untaring, I replaced the blast.pm file (under
bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
size) attached to the bug report 1934.

I then did the following to install packages using
nmake:

1) perl Makefile.pl was successful without any errors.


2) 'c:\nmake' results in following errors

        pl2bat.bat blib\script\bp_unflatten_seq.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.


3) 'c:\nmake test' fails with following errors:

NMAKE : fatal error U1095: expanded command line
'C:\mod_perl\Perl\bin\perl.exe
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib\lib', 'blib\arch')" t\AACh
ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
t\AlignStats.t t\AlignUti
l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
t\Annotation.t t\AnnotationAdapto
r.t t\asciitree.t t\Assembly.t t\Biblio.t
t\Biblio_biofetch.t t\Biblio_eutils.t
t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
t\BioGraphics.t t\BlastIndex.t
 t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
t\Chain.t t\chaosxml.t t\cig
arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
t\Compatible.t t\consed.t
 t\CoordinateGraph.t t\CoordinateMapper.t
t\Correlate.t t\ctf.t t\CytoMap.t t\DB
.t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
t\ECnumber.t t\ELM.t t\embl
.t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
t\entrezgene.t t\ePCR.t t\ESEfind
er.t t\est2genome.t t\Exception.t t\Exonerate.t
t\exp.t t\fasta.t t\FeatureIO.t
t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
t\gcg.t t\GDB.t t\Gel.t t\genba
nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
t\Genomewise.t t\Genpred.t
 t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
t\GuessSeqFormat.t t\hmmer.t t\HNN
.t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
t\InterProParser.t t\IUPAC.
t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
t\largepseq.t t\LinkageMap.t t\L
iveSeq.t t\LocatableSeq.t t\Location.t
t\LocationFactory.t t\LocusLink.t t\lucy.
t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
t\MeSH.t t\metafasta.t t\Me
taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
t\MitoProt.t t\Molphy.t t\Mult
iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
t\NetPhos.t t\Node.t t\OddCo
des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
t\OMIMparser.t t\Ontology.t t\On
tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
t\phd.t t\Phenotype.t t\Phyli
pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
t\pir.t t\pln.t t\PopGen.
t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
t\primedseq.t t\Primer.t t\prime
r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
t\ProtMatrix.t t\ProtPsm.t t\Ps
eudowise.t t\psm.t t\QRNA.t t\qual.t
t\RandDistFunctions.t t\RandomTreeFactory.t
 t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
t\Relationship.t t\Relatio
nshipType.t t\RemoteBlast.t t\RepeatMasker.t
t\RestrictionAnalysis.t t\Restricti
onEnzyme.t t\RestrictionIO.t t\RNAChange.t
t\Root-Utilities.t t\RootI.t t\RootIO
.t t\RootStorable.t t\Scansite.t t\scf.t
t\SearchDist.t t\SearchIO.t t\Seq.t t\s
eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
t\SeqDiff.t t\SeqFeatCollectio
n.t t\SeqFeature.t t\seqfeaturePrimer.t
t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
 t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
t\sequencetrace.t t\SeqUtils.t
 t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
t\Sigcleave.t t\Sim4.t t\Similar
ityPair.t t\SimpleAlign.t t\simpleGOparser.t
t\singlet.t t\sirna.t t\SiteMatrix.
t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
t\splicedseq.t t\StandAloneBlast.t t\
StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
t\TagHaplotype.t t\Taxonom
y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
t\tinyseq.t t\Tools.t t\Tree
.t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
t\tutorial.t t\UCSCParsers.t
 t\Unflattener.t t\Unflattener2.t t\UniGene.t
t\Variation_IO.t t\WABA.t t\XEMBL_
DB.t t\ztr.t' too long
Stop.

C:\bioperl-live\bioperl-live>



4) 'c:\nmake install' results in following errors:

        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.
Appending installation info to
C:\mod_perl\Perl\lib/perllocal.pod
NMAKE : fatal error U1095: expanded command line '@
C:\mod_perl\Perl\bin\perl.ex
e "-MExtUtils::Command::MM" -e perllocal_install 
"Module" "Bio"  "installed int
o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic" 
VERSION "1.5"  EXE_FILES "./
scripts_temp/bp_biblio.pl
./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
./scripts_temp/bp_genbank2gff3.
pl ./scripts_temp/bp_generate_histogram.pl
./scripts_temp/bp_load_gff.pl ./scrip
ts_temp/bp_meta_gff.pl
./scripts_temp/bp_process_gadfly.pl
./scripts_temp/bp_pro
cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
./scripts_temp/bp_embl2picture
.pl ./scripts_temp/bp_glyphs1-demo.pl
./scripts_temp/bp_glyphs2-demo.pl ./script
s_temp/bp_biofetch_genbank_proxy.pl
./scripts_temp/bp_bioflat_index.pl ./scripts
_temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
./scripts_temp/bp_contig_draw.
pl ./scripts_temp/bp_feature_draw.pl
./scripts_temp/bp_frend.pl ./scripts_temp/b
p_search_overview.pl ./scripts_temp/bp_fetch.pl
./scripts_temp/bp_index.pl ./scr
ipts_temp/bp_seqret.pl
./scripts_temp/bp_composite_LD.pl
./scripts_temp/bp_heter
ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
./scripts_temp/bp_filter_
search.pl ./scripts_temp/bp_hmmer_to_table.pl
./scripts_temp/bp_search2table.pl
./scripts_temp/bp_extract_feature_seq.pl
./scripts_temp/bp_make_mrna_protein.pl
./scripts_temp/bp_seqconvert.pl
./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
_translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
./scripts_temp/bp_aacomp.pl
 ./scripts_temp/bp_chaos_plot.pl
./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
ligo_count.pl
./scripts_temp/bp_classify_hits_kingdom.pl
./scripts_temp/bp_local
_taxonomydb_query.pl
./scripts_temp/bp_query_entrez_taxa.pl
./scripts_temp/bp_ta
xid4species.pl ./scripts_temp/bp_blast2tree.pl
./scripts_temp/bp_nexus2nh.pl ./s
cripts_temp/bp_tree2pag.pl
./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
l ./scripts_temp/bp_sreformat.pl
./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
mask_by_search.pl ./scripts_temp/bp_mutate.pl
./scripts_temp/bp_pairwise_kaks.pl
 ./scripts_temp/bp_remote_blast.pl
./scripts_temp/bp_search2alnblocks.pl ./scrip
ts_temp/bp_search2BSML.pl
./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
C:\mod_perl\Perl\lib\perllocal.
pod' too long
Stop.

C:\bioperl-live\bioperl-live>

--- Chris Fields  wrote:

> Upgrade bioperl from CVS using nmake. 
> 
> Installation instructions for using nmake:
> 
>
http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> 
> You can download a tarball using anonymous CVS (link
> at bottom):
> 
>
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> 
> or use CVS directly:
> 
> http://www.bioperl.org/wiki/Using_CVS
> 
> Then make sure to grab the last SearchIO::last
> bugfix, which is not in CVS
> yet:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> 
> Replace the blast.pm in \site\lib\Bio\SearchIO in
> your Perl directory.
> 
> Does that fix it?
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 11:22 AM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Hi All:
> > 
> > I am new to Perl/BioPerl world.
> > 
> > I am debugging a program that used to work fine
> > before.
> > Blast works fine and returns results, but I am
> unale
> > to get any hits from the results.
> > 
> > Here is the relevant code:
> > 
> > $blastObj = new Bio::SearchIO
> (-file=>$resultsFile,
> > -format=>'blast');
> >   while (my $result = $blastObj->next_result()) {
> >      while (my $bioPerlHit = $result->next_hit())
> {
> >          .......
> > 
> > 
> > The first while condition returns true, but the
> second
> > while condition returns false. So looks like there
> is
> > some result, but it is unable to identify the hits
> in
> > the result. I printed the $result (pasted below).
> > 
> > Any ideas/comments to resolve this? Thanks in
> advance.
> > 
> > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> 1.3.34 on
> > Windows XP platform.
> > 
> > Like I said before, this application was running
> fine
> > on a different windows machine with similar
> > environment,so looks like there is some change in
> the
> > products/versions that is causing the problem.
> > 
> > thanks again,
> > Raghu
> > 
> > 
> > 
> > 
> > Blast result (i can send complete result if you
> need
> > it):
> > 
> > 

> > BLASTP 2.2.13 [Nov-27-2005]
> > Reference: Altschul, Stephen F., Thomas L. Madden,
> > Alejandro A. Sch?ffer,
> > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> J.
> > Lipman
> > (1997), "Gapped BLAST and PSI-BLAST: a new
> generation
> > of
> > protein database search programs", Nucleic Acids
> Res.
> > 25:3389-3402.
> > 
> > RID: 1140573059-19990-140117828872.BLASTQ1
> > 
> > 
> > Database: All non-redundant GenBank CDS
> > translations+PDB+SwissProt+PIR+PRF excluding
> > environmental samples
> >            3,297,000 sequences; 1,129,354,045
> total
> > letters
> > Query=
> > Length=360
> > 
> > 
> > 
> >             Score     E
> > Sequences producing significant alignments:
> >             (Bits)  Value
> > 
> > ref|XP_534770.2|  PREDICTED: similar to
> > Mitogen-activated prot...   739    0.0
> > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > [synthetic con   739    0.0
> > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> Erk2
> > Complexed...   739    0.0
> > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> Complex
> > With A S...   739    0.0
> > ref|NP_786987.1|  mitogen-activated protein kinase
> 1
> > [Bos taur...   739    0.0
> > emb|CAA77752.1|  41kD protein kinase [Homo
> sapiens]
> > >prf||1813...   738    0.0
> > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > [synthetic con   736    0.0
> > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > [Homo sapiens]   735    0.0
> > emb|CAI29602.1|  hypothetical protein [Pongo
> pygmaeus]
> >              734    0.0
> > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > [Mus muscul...   731    0.0
> > pdb|4ERK|   The Complex Structure Of The Map
> Kinase
> > Erk2OLOMOU...   731    0.0
> > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> With An
> > Arginin...   730    0.0
> > ref|XP_860750.1|  PREDICTED: similar to
> > Mitogen-activated prot...   729    0.0
> > gb|AAK56503.1|  extracellular signal-regulated
> kinase
> > 2 [Gallu...   726    0.0
> > ref|XP_860716.1|  PREDICTED: similar to
> > Mitogen-activated prot...   726    0.0
> > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> >              726    0.0
> > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> Map
> > Kinase Co...   725    0.0
> > ref|XP_860682.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > ref|XP_860651.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > emb|CAA77753.1|  40kDa protein kinase [Homo
> sapiens]
> > >prf||181...   717    0.0
> > ref|NP_001017127.1|  mitogen-activated protein
> kinase
> > 1 [Xenopus    715    0.0
> > dbj|BAE28679.1|  unnamed protein product [Mus
> > musculus]             713    0.0
> > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > >gb|AAH60748.1| M...   711    0.0
> > sp|P26696|MK01_XENLA  Mitogen-activated protein
> kinase
> > 1 (Myel...   711    0.0
> > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> >              706    0.0
> > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > [Danio rerio]    696    0.0
> > dbj|BAD23843.1|  extracellular signal regulated
> > protein kinase...   694    0.0
> > ref|NP_878308.2|  mitogen-activated protein kinase
> 1
> > [Danio re...   694    0.0
> > emb|CAG07778.1|  unnamed protein product
> [Tetraodon
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Wed Feb 22 16:55:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 15:55:34 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <001701c637fa$b5110120$15327e82@pyrimidine>

You know, I assumed you were using ActivePerl b/c of the older version of
Bioperl (and since it?s the most commonly used Perl for Windows build).  My
goof.  It looks like you're using Apache/mod_perl/perl, right?  The only
Perl/Apache/mod_perl combos for Windows I know of are listed here:

http://perl.apache.org/docs/2.0/os/win32/install.html

The only Perl for Windows we have actively supported is ActivePerl AFAIK,
but maybe we can walk through this.  Anything learned here can be added to
the installation instructions in case this comes up again.

To start, what mod_perl/Perl version are you using, and from what
distributor (IndigoStar, Apache, etc)?  Each distribution should have some
documentation for installing CPAN modules or prebuilt/pretested packages,
like ActiveState's PPM or IndigoStar's GPM.  I think Apache's Perl build is
from ActiveState's source code so should come with PPM.

Next: you obviously have installed Bioperl before (v1.2.3); did you use
'make' or 'nmake', or was it from a repository (like IndigoPerl's GPM)?
AFAIK, you would install it like you would any other perl module; there
should be no problem with 'make/nmake', though 'make/nmake test' will not
pass completely (it should pass most tests, though, otherwise something is
seriously wrong).

The other option, though not as nice, is setting the PERL5LIB variable to
include the bioperl-live directory; it works for me while I'm developing.  I
don?t know how this may affect other mod_perl-related functions, though.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Wednesday, February 22, 2006 3:07 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris. I am getting below mentioned errors with
> nmake.
> 
> As suggested, I downloaded the nmake utility from
> Microsoft website and the bioperl-live tarball.
> 
> After untaring, I replaced the blast.pm file (under
> bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
> size) attached to the bug report 1934.
> 
> I then did the following to install packages using
> nmake:
> 
> 1) perl Makefile.pl was successful without any errors.
> 
> 
> 2) 'c:\nmake' results in following errors
> 
>         pl2bat.bat blib\script\bp_unflatten_seq.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> 
> 
> 3) 'c:\nmake test' fails with following errors:
> 
> NMAKE : fatal error U1095: expanded command line
> 'C:\mod_perl\Perl\bin\perl.exe
> "-MExtUtils::Command::MM" "-e" "test_harness(0,
> 'blib\lib', 'blib\arch')" t\AACh
> ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
> t\AlignStats.t t\AlignUti
> l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> t\Annotation.t t\AnnotationAdapto
> r.t t\asciitree.t t\Assembly.t t\Biblio.t
> t\Biblio_biofetch.t t\Biblio_eutils.t
> t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> t\BioGraphics.t t\BlastIndex.t
>  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
> t\Chain.t t\chaosxml.t t\cig
> arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
> t\Compatible.t t\consed.t
>  t\CoordinateGraph.t t\CoordinateMapper.t
> t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
> t\ECnumber.t t\ELM.t t\embl
> .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> t\entrezgene.t t\ePCR.t t\ESEfind
> er.t t\est2genome.t t\Exception.t t\Exonerate.t
> t\exp.t t\fasta.t t\FeatureIO.t
> t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> t\gcg.t t\GDB.t t\Gel.t t\genba
> nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
> t\Genomewise.t t\Genpred.t
>  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> t\GuessSeqFormat.t t\hmmer.t t\HNN
> .t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
> t\InterProParser.t t\IUPAC.
> t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> t\largepseq.t t\LinkageMap.t t\L
> iveSeq.t t\LocatableSeq.t t\Location.t
> t\LocationFactory.t t\LocusLink.t t\lucy.
> t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
> t\MeSH.t t\metafasta.t t\Me
> taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> t\MitoProt.t t\Molphy.t t\Mult
> iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
> t\NetPhos.t t\Node.t t\OddCo
> des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> t\OMIMparser.t t\Ontology.t t\On
> tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> t\phd.t t\Phenotype.t t\Phyli
> pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
> t\pir.t t\pln.t t\PopGen.
> t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> t\primedseq.t t\Primer.t t\prime
> r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> t\ProtMatrix.t t\ProtPsm.t t\Ps
> eudowise.t t\psm.t t\QRNA.t t\qual.t
> t\RandDistFunctions.t t\RandomTreeFactory.t
>  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
> t\Relationship.t t\Relatio
> nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> t\RestrictionAnalysis.t t\Restricti
> onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> t\Root-Utilities.t t\RootI.t t\RootIO
> .t t\RootStorable.t t\Scansite.t t\scf.t
> t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> t\SeqDiff.t t\SeqFeatCollectio
> n.t t\SeqFeature.t t\seqfeaturePrimer.t
> t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
>  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> t\sequencetrace.t t\SeqUtils.t
>  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> t\Sigcleave.t t\Sim4.t t\Similar
> ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> t\singlet.t t\sirna.t t\SiteMatrix.
> t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
> t\splicedseq.t t\StandAloneBlast.t t\
> StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
> t\TagHaplotype.t t\Taxonom
> y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
> t\tinyseq.t t\Tools.t t\Tree
> .t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
> t\tutorial.t t\UCSCParsers.t
>  t\Unflattener.t t\Unflattener2.t t\UniGene.t
> t\Variation_IO.t t\WABA.t t\XEMBL_
> DB.t t\ztr.t' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> 
> 
> 4) 'c:\nmake install' results in following errors:
> 
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> Appending installation info to
> C:\mod_perl\Perl\lib/perllocal.pod
> NMAKE : fatal error U1095: expanded command line '@
> C:\mod_perl\Perl\bin\perl.ex
> e "-MExtUtils::Command::MM" -e perllocal_install
> "Module" "Bio"  "installed int
> o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic"
> VERSION "1.5"  EXE_FILES "./
> scripts_temp/bp_biblio.pl
> ./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
> k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
> ./scripts_temp/bp_genbank2gff3.
> pl ./scripts_temp/bp_generate_histogram.pl
> ./scripts_temp/bp_load_gff.pl ./scrip
> ts_temp/bp_meta_gff.pl
> ./scripts_temp/bp_process_gadfly.pl
> ./scripts_temp/bp_pro
> cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
> ./scripts_temp/bp_embl2picture
> .pl ./scripts_temp/bp_glyphs1-demo.pl
> ./scripts_temp/bp_glyphs2-demo.pl ./script
> s_temp/bp_biofetch_genbank_proxy.pl
> ./scripts_temp/bp_bioflat_index.pl ./scripts
> _temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
> ./scripts_temp/bp_contig_draw.
> pl ./scripts_temp/bp_feature_draw.pl
> ./scripts_temp/bp_frend.pl ./scripts_temp/b
> p_search_overview.pl ./scripts_temp/bp_fetch.pl
> ./scripts_temp/bp_index.pl ./scr
> ipts_temp/bp_seqret.pl
> ./scripts_temp/bp_composite_LD.pl
> ./scripts_temp/bp_heter
> ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
> ./scripts_temp/bp_filter_
> search.pl ./scripts_temp/bp_hmmer_to_table.pl
> ./scripts_temp/bp_search2table.pl
> ./scripts_temp/bp_extract_feature_seq.pl
> ./scripts_temp/bp_make_mrna_protein.pl
> ./scripts_temp/bp_seqconvert.pl
> ./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
> _translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
> ./scripts_temp/bp_aacomp.pl
>  ./scripts_temp/bp_chaos_plot.pl
> ./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
> ligo_count.pl
> ./scripts_temp/bp_classify_hits_kingdom.pl
> ./scripts_temp/bp_local
> _taxonomydb_query.pl
> ./scripts_temp/bp_query_entrez_taxa.pl
> ./scripts_temp/bp_ta
> xid4species.pl ./scripts_temp/bp_blast2tree.pl
> ./scripts_temp/bp_nexus2nh.pl ./s
> cripts_temp/bp_tree2pag.pl
> ./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
> l ./scripts_temp/bp_sreformat.pl
> ./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
> mask_by_search.pl ./scripts_temp/bp_mutate.pl
> ./scripts_temp/bp_pairwise_kaks.pl
>  ./scripts_temp/bp_remote_blast.pl
> ./scripts_temp/bp_search2alnblocks.pl ./scrip
> ts_temp/bp_search2BSML.pl
> ./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
> ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
> C:\mod_perl\Perl\lib\perllocal.
> pod' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> --- Chris Fields  wrote:
> 
> > Upgrade bioperl from CVS using nmake.
> >
> > Installation instructions for using nmake:
> >
> >
> http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> >
> > You can download a tarball using anonymous CVS (link
> > at bottom):
> >
> >
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >
> > or use CVS directly:
> >
> > http://www.bioperl.org/wiki/Using_CVS
> >
> > Then make sure to grab the last SearchIO::last
> > bugfix, which is not in CVS
> > yet:
> >
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >
> > Replace the blast.pm in \site\lib\Bio\SearchIO in
> > your Perl directory.
> >
> > Does that fix it?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 11:22 AM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > Hi All:
> > >
> > > I am new to Perl/BioPerl world.
> > >
> > > I am debugging a program that used to work fine
> > > before.
> > > Blast works fine and returns results, but I am
> > unale
> > > to get any hits from the results.
> > >
> > > Here is the relevant code:
> > >
> > > $blastObj = new Bio::SearchIO
> > (-file=>$resultsFile,
> > > -format=>'blast');
> > >   while (my $result = $blastObj->next_result()) {
> > >      while (my $bioPerlHit = $result->next_hit())
> > {
> > >          .......
> > >
> > >
> > > The first while condition returns true, but the
> > second
> > > while condition returns false. So looks like there
> > is
> > > some result, but it is unable to identify the hits
> > in
> > > the result. I printed the $result (pasted below).
> > >
> > > Any ideas/comments to resolve this? Thanks in
> > advance.
> > >
> > > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> > 1.3.34 on
> > > Windows XP platform.
> > >
> > > Like I said before, this application was running
> > fine
> > > on a different windows machine with similar
> > > environment,so looks like there is some change in
> > the
> > > products/versions that is causing the problem.
> > >
> > > thanks again,
> > > Raghu
> > >
> > >
> > >
> > >
> > > Blast result (i can send complete result if you
> > need
> > > it):
> > >
> > > 

> > > BLASTP 2.2.13 [Nov-27-2005]
> > > Reference: Altschul, Stephen F., Thomas L. Madden,
> > > Alejandro A. Sch?ffer,
> > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> > J.
> > > Lipman
> > > (1997), "Gapped BLAST and PSI-BLAST: a new
> > generation
> > > of
> > > protein database search programs", Nucleic Acids
> > Res.
> > > 25:3389-3402.
> > >
> > > RID: 1140573059-19990-140117828872.BLASTQ1
> > >
> > >
> > > Database: All non-redundant GenBank CDS
> > > translations+PDB+SwissProt+PIR+PRF excluding
> > > environmental samples
> > >            3,297,000 sequences; 1,129,354,045
> > total
> > > letters
> > > Query=
> > > Length=360
> > >
> > >
> > >
> > >             Score     E
> > > Sequences producing significant alignments:
> > >             (Bits)  Value
> > >
> > > ref|XP_534770.2|  PREDICTED: similar to
> > > Mitogen-activated prot...   739    0.0
> > > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > > [synthetic con   739    0.0
> > > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> > Erk2
> > > Complexed...   739    0.0
> > > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> > Complex
> > > With A S...   739    0.0
> > > ref|NP_786987.1|  mitogen-activated protein kinase
> > 1
> > > [Bos taur...   739    0.0
> > > emb|CAA77752.1|  41kD protein kinase [Homo
> > sapiens]
> > > >prf||1813...   738    0.0
> > > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > > [synthetic con   736    0.0
> > > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > > [Homo sapiens]   735    0.0
> > > emb|CAI29602.1|  hypothetical protein [Pongo
> > pygmaeus]
> > >              734    0.0
> > > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > > [Mus muscul...   731    0.0
> > > pdb|4ERK|   The Complex Structure Of The Map
> > Kinase
> > > Erk2OLOMOU...   731    0.0
> > > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> > With An
> > > Arginin...   730    0.0
> > > ref|XP_860750.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   729    0.0
> > > gb|AAK56503.1|  extracellular signal-regulated
> > kinase
> > > 2 [Gallu...   726    0.0
> > > ref|XP_860716.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   726    0.0
> > > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> > >              726    0.0
> > > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> > Map
> > > Kinase Co...   725    0.0
> > > ref|XP_860682.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > ref|XP_860651.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > emb|CAA77753.1|  40kDa protein kinase [Homo
> > sapiens]
> > > >prf||181...   717    0.0
> > > ref|NP_001017127.1|  mitogen-activated protein
> > kinase
> > > 1 [Xenopus    715    0.0
> > > dbj|BAE28679.1|  unnamed protein product [Mus
> > > musculus]             713    0.0
> > > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > > >gb|AAH60748.1| M...   711    0.0
> > > sp|P26696|MK01_XENLA  Mitogen-activated protein
> > kinase
> > > 1 (Myel...   711    0.0
> > > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> > >              706    0.0
> > > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > > [Danio rerio]    696    0.0
> > > dbj|BAD23843.1|  extracellular signal regulated
> > > protein kinase...   694    0.0
> > > ref|NP_878308.2|  mitogen-activated protein kinase
> > 1
> > > [Danio re...   694    0.0
> > > emb|CAG07778.1|  unnamed protein product
> > [Tetraodon
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com



From iamvela at yahoo.com  Wed Feb 22 17:32:08 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 14:32:08 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <001701c637fa$b5110120$15327e82@pyrimidine>
Message-ID: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>

Chris,

Please see my response below.

--- Chris Fields  wrote:

> You know, I assumed you were using ActivePerl b/c of
> the older version of
> Bioperl (and since it?s the most commonly used Perl
> for Windows build).  My
> goof.  It looks like you're using
> Apache/mod_perl/perl, right?  The only
> Perl/Apache/mod_perl combos for Windows I know of
> are listed here:


I am using ActivePerl 5.8.7 downloaded from
activeperl.com. I just happened to install it under
c:\mod_perl\Perl directory (application has hardcoded
dependencies for this directory). I am not using
apache/mod_perl/perl.

Please see below version string returned by perl
exectutable.

 
C:\bioperl-live\bioperl-live>perl -version

This is perl, v5.8.7 built for
MSWin32-x86-multi-thread
(with 14 registered patches, see perl -V for more
detail)

Copyright 1987-2005, Larry Wall

Binary build 815 [211909] provided by ActiveState
http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Nov  2 2005 08:44:52


> 
>
http://perl.apache.org/docs/2.0/os/win32/install.html
> 
> The only Perl for Windows we have actively supported
> is ActivePerl AFAIK,
> but maybe we can walk through this.  Anything
> learned here can be added to
> the installation instructions in case this comes up
> again.
> 
> To start, what mod_perl/Perl version are you using,
> and from what
> distributor (IndigoStar, Apache, etc)?  Each
> distribution should have some
> documentation for installing CPAN modules or
> prebuilt/pretested packages,
> like ActiveState's PPM or IndigoStar's GPM.  I think
> Apache's Perl build is
> from ActiveState's source code so should come with
> PPM.
> 



I used 'ppm' to install packages (DBI, Oracle-DBD,
bioperl etc) before, so this is the first time I tried
to install it using 'nmake' utility.

After downloading the latest bioperl tar ball and
replacing the blast.pm file, can I just do ppm install
bioperl instead of doing nmake?


> Next: you obviously have installed Bioperl before
> (v1.2.3); did you use
> 'make' or 'nmake', or was it from a repository (like
> IndigoPerl's GPM)?
> AFAIK, you would install it like you would any other
> perl module; there
> should be no problem with 'make/nmake', though
> 'make/nmake test' will not
> pass completely (it should pass most tests, though,
> otherwise something is
> seriously wrong).
> 
> The other option, though not as nice, is setting the
> PERL5LIB variable to
> include the bioperl-live directory; it works for me
> while I'm developing. 

I tried setting PERL5LIB, but it did not make any
difference. I am still getting the same errors.


I wanted to a clean install, i tried 'nmake clean',
but looks like there is no 'rm' utility installed on
my machine.

thanks for all your help,
Raghu

> I
> don?t know how this may affect other
> mod_perl-related functions, though.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Raghunath Verabelli
> [mailto:iamvela at yahoo.com]
> > Sent: Wednesday, February 22, 2006 3:07 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Thanks Chris. I am getting below mentioned errors
> with
> > nmake.
> > 
> > As suggested, I downloaded the nmake utility from
> > Microsoft website and the bioperl-live tarball.
> > 
> > After untaring, I replaced the blast.pm file
> (under
> > bioperl-live\Bio\SearchIO) with the blast.pm (86
> KB
> > size) attached to the bug report 1934.
> > 
> > I then did the following to install packages using
> > nmake:
> > 
> > 1) perl Makefile.pl was successful without any
> errors.
> > 
> > 
> > 2) 'c:\nmake' results in following errors
> > 
> >         pl2bat.bat blib\script\bp_unflatten_seq.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_taxid4species.pl blib\script\bp_taxid4species.pl
> >         pl2bat.bat blib\script\bp_taxid4species.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_seqret.pl blib\script\bp_seqret.pl
> >         pl2bat.bat blib\script\bp_seqret.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioscripts.pod
> > Can't open bioscripts.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodatabases.pod
> > Can't open biodatabases.pod: No such file or
> > directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodesign.pod
> > Can't open biodesign.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioperl.pod
> > Can't open bioperl.pod: No such file or directory.
> > 
> > 
> > 3) 'c:\nmake test' fails with following errors:
> > 
> > NMAKE : fatal error U1095: expanded command line
> > 'C:\mod_perl\Perl\bin\perl.exe
> > "-MExtUtils::Command::MM" "-e" "test_harness(0,
> > 'blib\lib', 'blib\arch')" t\AACh
> > ange.t t\AAReverseMutate.t t\abi.t t\ace.t
> t\AlignIO.t
> > t\AlignStats.t t\AlignUti
> > l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> > t\Annotation.t t\AnnotationAdapto
> > r.t t\asciitree.t t\Assembly.t t\Biblio.t
> > t\Biblio_biofetch.t t\Biblio_eutils.t
> > t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> > t\BioGraphics.t t\BlastIndex.t
> >  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t
> t\bsml_sax.t
> > t\Chain.t t\chaosxml.t t\cig
> > arstring.t t\ClusterIO.t t\Coalescent.t
> t\CodonTable.t
> > t\Compatible.t t\consed.t
> >  t\CoordinateGraph.t t\CoordinateMapper.t
> > t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> > .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t
> t\Domcut.t
> > t\ECnumber.t t\ELM.t t\embl
> > .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> > t\entrezgene.t t\ePCR.t t\ESEfind
> > er.t t\est2genome.t t\Exception.t t\Exonerate.t
> > t\exp.t t\fasta.t t\FeatureIO.t
> > t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> > t\gcg.t t\GDB.t t\Gel.t t\genba
> > nk.t t\GeneCoordinateMapper.t t\Geneid.t
> t\Genewise.t
> > t\Genomewise.t t\Genpred.t
> >  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> > t\GuessSeqFormat.t t\hmmer.t t\HNN
> > .t t\HtSNP.t t\Index.t t\InstanceSite.t
> t\interpro.t
> > t\InterProParser.t t\IUPAC.
> > t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> > t\largepseq.t t\LinkageMap.t t\L
> > iveSeq.t t\LocatableSeq.t t\Location.t
> > t\LocationFactory.t t\LocusLink.t t\lucy.
> > t t\Map.t t\MapIO.t t\masta.t t\Matrix.t
> t\Measure.t
> > t\MeSH.t t\metafasta.t t\Me
> > taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> > t\MitoProt.t t\Molphy.t t\Mult
> > iFile.t t\multiple_fasta.t t\Mutation.t
> t\Mutator.t
> > t\NetPhos.t t\Node.t t\OddCo
> > des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> > t\OMIMparser.t t\Ontology.t t\On
> > tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> > t\phd.t t\Phenotype.t t\Phyli
> > pDist.t t\PhysicalMap.t t\pICalculator.t
> t\Pictogram.t
> > t\pir.t t\pln.t t\PopGen.
> > t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> > t\primedseq.t t\Primer.t t\prime
> > r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> > t\ProtMatrix.t t\ProtPsm.t t\Ps
> > eudowise.t t\psm.t t\QRNA.t t\qual.t
> > t\RandDistFunctions.t t\RandomTreeFactory.t
> >  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t
> t\Registry.t
> > t\Relationship.t t\Relatio
> > nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> > t\RestrictionAnalysis.t t\Restricti
> > onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> > t\Root-Utilities.t t\RootI.t t\RootIO
> > .t t\RootStorable.t t\Scansite.t t\scf.t
> > t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> > eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> > t\SeqDiff.t t\SeqFeatCollectio
> > n.t t\SeqFeature.t t\seqfeaturePrimer.t
> > t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
> >  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> > t\sequencetrace.t t\SeqUtils.t
> >  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> > t\Sigcleave.t t\Sim4.t t\Similar
> > ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Wed Feb 22 19:02:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 18:02:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>
Message-ID: <002101c6380c$75910880$15327e82@pyrimidine>

> 
> I am using ActivePerl 5.8.7 downloaded from
> activeperl.com. I just happened to install it under
> c:\mod_perl\Perl directory (application has hardcoded
> dependencies for this directory). I am not using
> apache/mod_perl/perl.
> 
> Please see below version string returned by perl
> exectutable.
> 
> 
> C:\bioperl-live\bioperl-live>perl -version
> 
> This is perl, v5.8.7 built for
> MSWin32-x86-multi-thread
> (with 14 registered patches, see perl -V for more
> detail)
> 
> Copyright 1987-2005, Larry Wall
> 
> Binary build 815 [211909] provided by ActiveState
> http://www.ActiveState.com
> ActiveState is a division of Sophos.
> Built Nov  2 2005 08:44:52
 
When you type 'perl -V' what do you see (make sure it is a capital 'V', not
lower case).

> http://perl.apache.org/docs/2.0/os/win32/install.html
> >
> > The only Perl for Windows we have actively supported
> > is ActivePerl AFAIK,
> > but maybe we can walk through this.  Anything
> > learned here can be added to
> > the installation instructions in case this comes up
> > again.
> >
> I used 'ppm' to install packages (DBI, Oracle-DBD,
> bioperl etc) before, so this is the first time I tried
> to install it using 'nmake' utility.
>
> After downloading the latest bioperl tar ball and
> replacing the blast.pm file, can I just do ppm install
> bioperl instead of doing nmake?

Okay, so I know you're using PPM now.  No, you can't do that.  I'm adding a
section to this page:

http://bioperl.open-bio.org/wiki/Making_a_BioPerl_release

about building your own PPM; it will explain everything.  It isn't up yet
but should be up tonight or tomorrow.  BTW, you'll still need nmake to work
for this to work.  Again, make sure nmake is in your PATH env variable, or
at least have it in the same directory you plan running 'nmake', 'nmake
install.'  Although nmake is buggy I haven't had a problem with it yet.
 
> > Next: you obviously have installed Bioperl before
> > (v1.2.3); did you use
> > 'make' or 'nmake', or was it from a repository (like
> > IndigoPerl's GPM)?
> > AFAIK, you would install it like you would any other
> > perl module; there
> > should be no problem with 'make/nmake', though
> > 'make/nmake test' will not
> > pass completely (it should pass most tests, though,
> > otherwise something is
> > seriously wrong).
> >
> > The other option, though not as nice, is setting the
> > PERL5LIB variable to
> > include the bioperl-live directory; it works for me
> > while I'm developing.
> 
> I tried setting PERL5LIB, but it did not make any
> difference. I am still getting the same errors.
 
Do you mean the errors from nmake or errors from your scripts?  If PERL5LIB
is set properly then it should parse those directories for modules before it
checks the rest in @INC (i.e. will not need to make and install these using
nmake).  

The reason I don't recommend this is it's not the best habit to get into
installing the entire Bioperl distribution into a folder and using PERL5LIB,
but some are forced to do it this way, so it's there if you need it.  A
direct installation is recommended if possible.

The PERL5LIB I use below only contains modules I'm working on or
modifications of current modules (like SearchIO::blast, RemoteBlast, etc).
Bioperl from CVS is installed via PPM (custom-built PPM, BTW, using the
instructions I mentioned).  

The following is what my PERL5LIB is set to.  Note that it also tells you
what @INC is set to as well:

C:\Perl\src\bioperl\bioperl-live>perl -V
Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef useithreads=define 



  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="C:/Perl/src/bioperl/bioperl-live;
C:/Perl/src/bioperl/bioperl-db"
  @INC:
    C:/Perl/src/bioperl/bioperl-live
     C:/Perl/src/bioperl/bioperl-db
    C:/Perl/lib
    C:/Perl/site/lib
    .


From iamvela at yahoo.com  Wed Feb 22 21:25:02 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 18:25:02 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <002101c6380c$75910880$15327e82@pyrimidine>
Message-ID: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>


Thanks very much Chris for your time.
Please see below output that you requested (the only
difference i saw between your output and mine is @INC
value. I have only 2 directories c:\mod_perl\perl
where i installed activeperl. I see two additional
directories in your @INC path).

>  
> When you type 'perl -V' what do you see (make sure
> it is a capital 'V', not
> lower case).

C:\Documents and Settings\Administrator>perl  -V
Summary of my perl5 (revision 5 version 8 subversion
7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0,
archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef
useithreads=define usemultiplicity=de
fine
    useperlio=define d_sfio=undef uselargefiles=define
usesocks=undef
    use64bitint=undef use64bitall=undef
uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi
-DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED
-DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_
CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO
-DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8,
byteorder=1234
    d_longlong=undef, longlongsize=8,
d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double',
nvsize=8, Off_t='__int64', lseeksi
ze=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug
-opt:ref,icf  -libpath:"C:
\mod_perl\Perl\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib 
netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comd
lg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib  netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes,
libperl=perl58.lib
    gnulibc_version='undef'
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef,
ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo
-nodefaultlib -debug -opt:ref,icf  -
libpath:"C:\mod_perl\Perl\lib\CORE"  -machine:x86'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES
                        USE_SITECUSTOMIZE
PERL_IMPLICIT_CONTEXT
                        PERL_IMPLICIT_SYS
  Locally applied patches:
        ActivePerl Build 815 [211909]
        Iin_load_module moved for compatibility with
build 806
        PerlEx support in CGI::Carp
        Less verbose ExtUtils::Install and Pod::Find
        instmodsh upgraded from
ExtUtils-MakeMaker-6.25
        Patch for CAN-2005-0448 from Debian with
modifications
        Upgrade to Time-HiRes-1.76
        25774 Keys of %INC always use forward slashes
        25747 Accidental interpolation of $@ in
Pod::Html
        25362 File::Path::mkpath resets errno
        25181 Incorrect (X)HTML generated by Pod::Html
        24999 Avoid redefinition warning for MinGW
        24699 ICMP_UNREACHABLE handling in Net::Ping
        21540 Fix backward-compatibility issues in
if.pm
  Built under MSWin32
  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="c:\bioperl-live"
  @INC:
    c:\bioperl-live
    C:/mod_perl/Perl/lib
    C:/mod_perl/Perl/site/lib
    .



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From michael.watson at bbsrc.ac.uk  Thu Feb 23 05:17:39 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 23 Feb 2006 10:17:39 -0000
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503008306@iahce2ksrv1.iah.bbsrc.ac.uk>

What I mean is, you have accession1, which is a contig file referring to
n other sequence files.  Accession1 has a version number.  Is that
version number increased when one of the sequences that constitute it is
updated? 

-----Original Message-----
From: Brian Osborne [mailto:osborne1 at optonline.net] 
Sent: 18 February 2006 04:56
To: michael watson (IAH-C); bioperl-l
Subject: Re: [Bioperl-l] CONTIG sequence files from the NCBI

Michael,

Yes, BioPerl has done this for you. Essentially what it does it take all
the ids in the CONTIG section and query for each individually, then use
the sequences and the location data to create the single large sequence.
This sequence is appended to the annotation and feature section of the
initial Genbank entry. If you want to study this yourself take a look at
Bio::DB::NCBIHelper::postprocess_data.

OK, to answer your first question with my assumption: what NCBI is doing
is simply providing a shorthand rather than an entire large sequence,
therefore no feature coordinates change, whether it's shorthand, CONTIG,
or longhand, ORIGIN. Second, my explanation tells you that all the
sequences are the very latest versions of each sequence, that's how
eutils works by default.
However, I don't think I've answered your question because I'm not sure
I understand what you mean by "when I ask bioperl if these sequences
have been updated, I will be told no". All Bioperl does is read the file
provided by GenBank and use its stated version, nothing fancy.

Brian O.


On 2/16/06 5:31 AM, "michael watson (IAH-C)"

wrote:

> Hi
> 
> I have two questions really.  I fetched bacterial genome sequences 
> from the NCBI using Bio::DB::GenBank.
> 
> Some of these sequence entries are CONTIG sequences, ie they just 
> point to other sequences that need to be joined together to form the 
> entire genome.
> 
> Looking at my downloads, it looks as if bioperl has done all the 
> necessary joining for me - or maybe it was the NCBI that did the 
> joining?
> 
> OK, so firstly, did bioperl do the joining, and if so, are all the 
> co-ordinates of the features updated to reflect their new location on 
> the new, joined sequence?
> 
> And secondly, sequence versions... I'm thinking that possibly the 
> sequence version of the CONTIG may be 1 (as it hasn't changed) yet the

> versions of the sequences it refers to might have changed, so when I 
> ask bioperl if these sequences have been updated, I will be told no 
> because the CONTIG sequence version is 1, but I should be told yes 
> because the underlying sequences have...?
> 
> Make sense?
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From neetisomaiya at gmail.com  Thu Feb 23 05:26:23 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:56:23 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <764978cf0602230226vb907821x5407599bf9accf44@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Thu Feb 23 05:45:19 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 16:15:19 +0530
Subject: [Bioperl-l] using parameters other than default in standalone blast
Message-ID: <764978cf0602230245m45747fexbb42074a98515177@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlas t->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Thu Feb 23 05:14:46 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:44:46 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230214r4b2a5efcl69ac207789379416@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Thu Feb 23 05:13:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:43:10 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive


From cjfields at uiuc.edu  Thu Feb 23 09:39:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 08:39:40 -0600
Subject: [Bioperl-l] urgent help required - syntax for using
	paramatersdifferent from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <000301c63886$fa95eb20$15327e82@pyrimidine>

Have you tried this to see if it works?  The blast report itself should tell
you if everything is set correctly.  Use 'perldoc
Bio::Tools::Run::StandAlone::Blast', which explains everything.  I don't
know if the example script works but the test script StandAloneBlast.t (in
/t) should; that will give you plenty of examples for setting parameters.

And please, don't spam the bioperl-l list with repeated emails (four at last
count over 2 1/2 hours).
 
Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: Thursday, February 23, 2006 4:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] urgent help required - syntax for using
> paramatersdifferent from default in standalone blast
> 
> Hi,
> 
> I am running standalone blast and I wanna use a particular e value, gap
> open
> and extension cost and matrix. Is the following the correct syntax for the
> same :
> 
>                                 my $Seq_in = Bio::SeqIO->new (-file =>
> $file, -format => 'fasta');
>                                 my $query = $Seq_in->next_seq();
>                                 my $factory =
> Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
>                                                  'database' => '
> human.rna.fna',
>                                                  _READMETHOD => "Blast"
>                                                  );
>                                 $factory->e(0.0001);
>                                 $factory->G(-11);
>                                 $factory->E(-1);
>                                 $factory->M('BLOSUM80');
> 
>                                 my $blast_report =
> $factory->blastall($query);
>                                 my $result = $blast_report->next_result;
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Feb 23 10:23:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 09:23:53 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>
Message-ID: <000a01c6388d$281ed010$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 8:25 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> 
> Thanks very much Chris for your time.
> Please see below output that you requested (the only
> difference i saw between your output and mine is @INC
> value. I have only 2 directories c:\mod_perl\perl
> where i installed activeperl. I see two additional
> directories in your @INC path).
> 
> >
> > When you type 'perl -V' what do you see (make sure
> > it is a capital 'V', not
> > lower case).
> 
> C:\Documents and Settings\Administrator>perl  -V
> Summary of my perl5 (revision 5 version 8 subversion
> 7) configuration:
>   Platform:
>     osname=MSWin32, osvers=5.0,
> archname=MSWin32-x86-multi-thread

[....]

> if.pm
>   Built under MSWin32
>   Compiled at Nov  2 2005 08:44:52
>   %ENV:
>     PERL5LIB="c:\bioperl-live"
>   @INC:
>     c:\bioperl-live
>     C:/mod_perl/Perl/lib
>     C:/mod_perl/Perl/site/lib
>     .

Personally I wouldn't place the the bioperl-live folder in the root
directory; this shouldn't make a difference, but you can try moving it to
the perl directory in a separate folder to see if that helps.  Can't see why
it would make a difference, but it is Windows... Main reason I'll switching
over to Mac OS X!

Make sure that the Bio directory is in the bioperl-live directory,
regardless (i.e. if PERL5LIB is set to
C:\mod_perl\Perl\bioperl\bioperl-live, then there should be a directory like
C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it won't work.

What do you get with this?

perl -MBio::Root::Version -e "print $Bio::Root::Version::VERSION"

If everything is working (PERL5LIB, etc) then it should be 1.5 for CVS
bioperl; otherwise it will either find the old version (1.2.3) or fail
completely.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 



From iamvela at yahoo.com  Thu Feb 23 11:23:56 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 08:23:56 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000a01c6388d$281ed010$15327e82@pyrimidine>
Message-ID: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>

Thanks Chris for all your help.

The patch for blast.pm worked. I was able to parse the
hits from the raw file. I uninstalled previous
versions of bioperl using ppm and then I installed
bioperl 1.4.x using nmake, and applied your fix. I am
getting hits the way I wanted.

However, I noticed that the p-value for each hit
doesn't seem to be parsed
correctly. It sets it to 0 for all hits. Not sure if
this is a known issue. Any suggestions/comments,
please let me know.

Thanks,
Raghu

--- Chris Fields  wrote:

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 8:25 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > 
> > Thanks very much Chris for your time.
> > Please see below output that you requested (the
> only
> > difference i saw between your output and mine is
> @INC
> > value. I have only 2 directories c:\mod_perl\perl
> > where i installed activeperl. I see two additional
> > directories in your @INC path).
> > 
> > >
> > > When you type 'perl -V' what do you see (make
> sure
> > > it is a capital 'V', not
> > > lower case).
> > 
> > C:\Documents and Settings\Administrator>perl  -V
> > Summary of my perl5 (revision 5 version 8
> subversion
> > 7) configuration:
> >   Platform:
> >     osname=MSWin32, osvers=5.0,
> > archname=MSWin32-x86-multi-thread
> 
> [....]
> 
> > if.pm
> >   Built under MSWin32
> >   Compiled at Nov  2 2005 08:44:52
> >   %ENV:
> >     PERL5LIB="c:\bioperl-live"
> >   @INC:
> >     c:\bioperl-live
> >     C:/mod_perl/Perl/lib
> >     C:/mod_perl/Perl/site/lib
> >     .
> 
> Personally I wouldn't place the the bioperl-live
> folder in the root
> directory; this shouldn't make a difference, but you
> can try moving it to
> the perl directory in a separate folder to see if
> that helps.  Can't see why
> it would make a difference, but it is Windows...
> Main reason I'll switching
> over to Mac OS X!
> 
> Make sure that the Bio directory is in the
> bioperl-live directory,
> regardless (i.e. if PERL5LIB is set to
> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> should be a directory like
> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> won't work.
> 
> What do you get with this?
> 
> perl -MBio::Root::Version -e "print
> $Bio::Root::Version::VERSION"
> 
> If everything is working (PERL5LIB, etc) then it
> should be 1.5 for CVS
> bioperl; otherwise it will either find the old
> version (1.2.3) or fail
> completely.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Thu Feb 23 12:41:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 11:41:07 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>
Message-ID: <000301c638a0$53eb9a30$15327e82@pyrimidine>

Yes that's a potential issue.  I'll try to replicate that here; please send
a code example so I can see how you're calling for the p-value.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Thursday, February 23, 2006 10:24 AM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris for all your help.
> 
> The patch for blast.pm worked. I was able to parse the
> hits from the raw file. I uninstalled previous
> versions of bioperl using ppm and then I installed
> bioperl 1.4.x using nmake, and applied your fix. I am
> getting hits the way I wanted.
> 
> However, I noticed that the p-value for each hit
> doesn't seem to be parsed
> correctly. It sets it to 0 for all hits. Not sure if
> this is a known issue. Any suggestions/comments,
> please let me know.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > >
> > > Thanks very much Chris for your time.
> > > Please see below output that you requested (the
> > only
> > > difference i saw between your output and mine is
> > @INC
> > > value. I have only 2 directories c:\mod_perl\perl
> > > where i installed activeperl. I see two additional
> > > directories in your @INC path).
> > >
> > > >
> > > > When you type 'perl -V' what do you see (make
> > sure
> > > > it is a capital 'V', not
> > > > lower case).
> > >
> > > C:\Documents and Settings\Administrator>perl  -V
> > > Summary of my perl5 (revision 5 version 8
> > subversion
> > > 7) configuration:
> > >   Platform:
> > >     osname=MSWin32, osvers=5.0,
> > > archname=MSWin32-x86-multi-thread
> >
> > [....]
> >
> > > if.pm
> > >   Built under MSWin32
> > >   Compiled at Nov  2 2005 08:44:52
> > >   %ENV:
> > >     PERL5LIB="c:\bioperl-live"
> > >   @INC:
> > >     c:\bioperl-live
> > >     C:/mod_perl/Perl/lib
> > >     C:/mod_perl/Perl/site/lib
> > >     .
> >
> > Personally I wouldn't place the the bioperl-live
> > folder in the root
> > directory; this shouldn't make a difference, but you
> > can try moving it to
> > the perl directory in a separate folder to see if
> > that helps.  Can't see why
> > it would make a difference, but it is Windows...
> > Main reason I'll switching
> > over to Mac OS X!
> >
> > Make sure that the Bio directory is in the
> > bioperl-live directory,
> > regardless (i.e. if PERL5LIB is set to
> > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > should be a directory like
> > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > won't work.
> >
> > What do you get with this?
> >
> > perl -MBio::Root::Version -e "print
> > $Bio::Root::Version::VERSION"
> >
> > If everything is working (PERL5LIB, etc) then it
> > should be 1.5 for CVS
> > bioperl; otherwise it will either find the old
> > version (1.2.3) or fail
> > completely.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at uiuc.edu  Thu Feb 23 13:06:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:06:37 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000301c638a0$53eb9a30$15327e82@pyrimidine>
Message-ID: <000401c638a3$e37fb520$15327e82@pyrimidine>

Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-mill NCBI
blast report these days gives e-values (expectation value), NOT p-values.  I
think they changed over to using only e-values with BLAST v2.  Make sure you
didn't mix these up; look out the text output to make sure that P values are
present.  That would explain why you're getting 0, since they don't exist.

>From the BLAST tutorial:

The BLAST programs report E-value rather than P-values because it is easier
to understand the difference between, for example, E-value of 5 and 10 than
P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and E-value
are nearly identical.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 23, 2006 11:41 AM
> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Yes that's a potential issue.  I'll try to replicate that here; please
> send
> a code example so I can see how you're calling for the p-value.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> > Sent: Thursday, February 23, 2006 10:24 AM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> >
> > Thanks Chris for all your help.
> >
> > The patch for blast.pm worked. I was able to parse the
> > hits from the raw file. I uninstalled previous
> > versions of bioperl using ppm and then I installed
> > bioperl 1.4.x using nmake, and applied your fix. I am
> > getting hits the way I wanted.
> >
> > However, I noticed that the p-value for each hit
> > doesn't seem to be parsed
> > correctly. It sets it to 0 for all hits. Not sure if
> > this is a known issue. Any suggestions/comments,
> > please let me know.
> >
> > Thanks,
> > Raghu
> >
> > --- Chris Fields  wrote:
> >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > > Verabelli
> > > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] Blast returns result, but
> > > does not return hits
> > > >
> > > >
> > > > Thanks very much Chris for your time.
> > > > Please see below output that you requested (the
> > > only
> > > > difference i saw between your output and mine is
> > > @INC
> > > > value. I have only 2 directories c:\mod_perl\perl
> > > > where i installed activeperl. I see two additional
> > > > directories in your @INC path).
> > > >
> > > > >
> > > > > When you type 'perl -V' what do you see (make
> > > sure
> > > > > it is a capital 'V', not
> > > > > lower case).
> > > >
> > > > C:\Documents and Settings\Administrator>perl  -V
> > > > Summary of my perl5 (revision 5 version 8
> > > subversion
> > > > 7) configuration:
> > > >   Platform:
> > > >     osname=MSWin32, osvers=5.0,
> > > > archname=MSWin32-x86-multi-thread
> > >
> > > [....]
> > >
> > > > if.pm
> > > >   Built under MSWin32
> > > >   Compiled at Nov  2 2005 08:44:52
> > > >   %ENV:
> > > >     PERL5LIB="c:\bioperl-live"
> > > >   @INC:
> > > >     c:\bioperl-live
> > > >     C:/mod_perl/Perl/lib
> > > >     C:/mod_perl/Perl/site/lib
> > > >     .
> > >
> > > Personally I wouldn't place the the bioperl-live
> > > folder in the root
> > > directory; this shouldn't make a difference, but you
> > > can try moving it to
> > > the perl directory in a separate folder to see if
> > > that helps.  Can't see why
> > > it would make a difference, but it is Windows...
> > > Main reason I'll switching
> > > over to Mac OS X!
> > >
> > > Make sure that the Bio directory is in the
> > > bioperl-live directory,
> > > regardless (i.e. if PERL5LIB is set to
> > > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > > should be a directory like
> > > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > > won't work.
> > >
> > > What do you get with this?
> > >
> > > perl -MBio::Root::Version -e "print
> > > $Bio::Root::Version::VERSION"
> > >
> > > If everything is working (PERL5LIB, etc) then it
> > > should be 1.5 for CVS
> > > bioperl; otherwise it will either find the old
> > > version (1.2.3) or fail
> > > completely.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason.stajich at duke.edu  Thu Feb 23 13:29:57 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 23 Feb 2006 13:29:57 -0500
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000401c638a3$e37fb520$15327e82@pyrimidine>
References: <000401c638a3$e37fb520$15327e82@pyrimidine>
Message-ID: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>

p-values do show up in WU-BLAST reports so that is why we have a p- 
value function.


On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:

> Hold up a second.  Do you mean e-value, or p-value?  A run-of-the- 
> mill NCBI
> blast report these days gives e-values (expectation value), NOT p- 
> values.  I
> think they changed over to using only e-values with BLAST v2.  Make  
> sure you
> didn't mix these up; look out the text output to make sure that P  
> values are
> present.  That would explain why you're getting 0, since they don't  
> exist.
>
>> From the BLAST tutorial:
>
> The BLAST programs report E-value rather than P-values because it  
> is easier
> to understand the difference between, for example, E-value of 5 and  
> 10 than
> P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and  
> E-value
> are nearly identical.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 23, 2006 11:41 AM
>> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Blast returns result, but does not return  
>> hits
>>
>> Yes that's a potential issue.  I'll try to replicate that here;  
>> please
>> send
>> a code example so I can see how you're calling for the p-value.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
>>> Sent: Thursday, February 23, 2006 10:24 AM
>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] Blast returns result, but does not  
>>> return hits
>>>
>>> Thanks Chris for all your help.
>>>
>>> The patch for blast.pm worked. I was able to parse the
>>> hits from the raw file. I uninstalled previous
>>> versions of bioperl using ppm and then I installed
>>> bioperl 1.4.x using nmake, and applied your fix. I am
>>> getting hits the way I wanted.
>>>
>>> However, I noticed that the p-value for each hit
>>> doesn't seem to be parsed
>>> correctly. It sets it to 0 for all hits. Not sure if
>>> this is a known issue. Any suggestions/comments,
>>> please let me know.
>>>
>>> Thanks,
>>> Raghu
>>>
>>> --- Chris Fields  wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
>>>> Verabelli
>>>>> Sent: Wednesday, February 22, 2006 8:25 PM
>>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] Blast returns result, but
>>>> does not return hits
>>>>>
>>>>>
>>>>> Thanks very much Chris for your time.
>>>>> Please see below output that you requested (the
>>>> only
>>>>> difference i saw between your output and mine is
>>>> @INC
>>>>> value. I have only 2 directories c:\mod_perl\perl
>>>>> where i installed activeperl. I see two additional
>>>>> directories in your @INC path).
>>>>>
>>>>>>
>>>>>> When you type 'perl -V' what do you see (make
>>>> sure
>>>>>> it is a capital 'V', not
>>>>>> lower case).
>>>>>
>>>>> C:\Documents and Settings\Administrator>perl  -V
>>>>> Summary of my perl5 (revision 5 version 8
>>>> subversion
>>>>> 7) configuration:
>>>>>   Platform:
>>>>>     osname=MSWin32, osvers=5.0,
>>>>> archname=MSWin32-x86-multi-thread
>>>>
>>>> [....]
>>>>
>>>>> if.pm
>>>>>   Built under MSWin32
>>>>>   Compiled at Nov  2 2005 08:44:52
>>>>>   %ENV:
>>>>>     PERL5LIB="c:\bioperl-live"
>>>>>   @INC:
>>>>>     c:\bioperl-live
>>>>>     C:/mod_perl/Perl/lib
>>>>>     C:/mod_perl/Perl/site/lib
>>>>>     .
>>>>
>>>> Personally I wouldn't place the the bioperl-live
>>>> folder in the root
>>>> directory; this shouldn't make a difference, but you
>>>> can try moving it to
>>>> the perl directory in a separate folder to see if
>>>> that helps.  Can't see why
>>>> it would make a difference, but it is Windows...
>>>> Main reason I'll switching
>>>> over to Mac OS X!
>>>>
>>>> Make sure that the Bio directory is in the
>>>> bioperl-live directory,
>>>> regardless (i.e. if PERL5LIB is set to
>>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
>>>> should be a directory like
>>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
>>>> won't work.
>>>>
>>>> What do you get with this?
>>>>
>>>> perl -MBio::Root::Version -e "print
>>>> $Bio::Root::Version::VERSION"
>>>>
>>>> If everything is working (PERL5LIB, etc) then it
>>>> should be 1.5 for CVS
>>>> bioperl; otherwise it will either find the old
>>>> version (1.2.3) or fail
>>>> completely.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12



From cjfields at uiuc.edu  Thu Feb 23 13:34:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:34:19 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>
Message-ID: <000501c638a7$c2802630$15327e82@pyrimidine>

I think Raghu's running NCBI BLAST, though.  Am I right? 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Thursday, February 23, 2006 12:30 PM
> To: Chris Fields
> Cc: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> p-values do show up in WU-BLAST reports so that is why we have a p-
> value function.
> 
> 
> On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> 
> > Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-
> > mill NCBI
> > blast report these days gives e-values (expectation value), NOT p-
> > values.  I
> > think they changed over to using only e-values with BLAST v2.  Make
> > sure you
> > didn't mix these up; look out the text output to make sure that P
> > values are
> > present.  That would explain why you're getting 0, since they don't
> > exist.
> >
> >> From the BLAST tutorial:
> >
> > The BLAST programs report E-value rather than P-values because it
> > is easier
> > to understand the difference between, for example, E-value of 5 and
> > 10 than
> > P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and
> > E-value
> > are nearly identical.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, February 23, 2006 11:41 AM
> >> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Blast returns result, but does not return
> >> hits
> >>
> >> Yes that's a potential issue.  I'll try to replicate that here;
> >> please
> >> send
> >> a code example so I can see how you're calling for the p-value.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab
> >> Dept. of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> >>> Sent: Thursday, February 23, 2006 10:24 AM
> >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>> Subject: Re: [Bioperl-l] Blast returns result, but does not
> >>> return hits
> >>>
> >>> Thanks Chris for all your help.
> >>>
> >>> The patch for blast.pm worked. I was able to parse the
> >>> hits from the raw file. I uninstalled previous
> >>> versions of bioperl using ppm and then I installed
> >>> bioperl 1.4.x using nmake, and applied your fix. I am
> >>> getting hits the way I wanted.
> >>>
> >>> However, I noticed that the p-value for each hit
> >>> doesn't seem to be parsed
> >>> correctly. It sets it to 0 for all hits. Not sure if
> >>> this is a known issue. Any suggestions/comments,
> >>> please let me know.
> >>>
> >>> Thanks,
> >>> Raghu
> >>>
> >>> --- Chris Fields  wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
> >>>> Verabelli
> >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] Blast returns result, but
> >>>> does not return hits
> >>>>>
> >>>>>
> >>>>> Thanks very much Chris for your time.
> >>>>> Please see below output that you requested (the
> >>>> only
> >>>>> difference i saw between your output and mine is
> >>>> @INC
> >>>>> value. I have only 2 directories c:\mod_perl\perl
> >>>>> where i installed activeperl. I see two additional
> >>>>> directories in your @INC path).
> >>>>>
> >>>>>>
> >>>>>> When you type 'perl -V' what do you see (make
> >>>> sure
> >>>>>> it is a capital 'V', not
> >>>>>> lower case).
> >>>>>
> >>>>> C:\Documents and Settings\Administrator>perl  -V
> >>>>> Summary of my perl5 (revision 5 version 8
> >>>> subversion
> >>>>> 7) configuration:
> >>>>>   Platform:
> >>>>>     osname=MSWin32, osvers=5.0,
> >>>>> archname=MSWin32-x86-multi-thread
> >>>>
> >>>> [....]
> >>>>
> >>>>> if.pm
> >>>>>   Built under MSWin32
> >>>>>   Compiled at Nov  2 2005 08:44:52
> >>>>>   %ENV:
> >>>>>     PERL5LIB="c:\bioperl-live"
> >>>>>   @INC:
> >>>>>     c:\bioperl-live
> >>>>>     C:/mod_perl/Perl/lib
> >>>>>     C:/mod_perl/Perl/site/lib
> >>>>>     .
> >>>>
> >>>> Personally I wouldn't place the the bioperl-live
> >>>> folder in the root
> >>>> directory; this shouldn't make a difference, but you
> >>>> can try moving it to
> >>>> the perl directory in a separate folder to see if
> >>>> that helps.  Can't see why
> >>>> it would make a difference, but it is Windows...
> >>>> Main reason I'll switching
> >>>> over to Mac OS X!
> >>>>
> >>>> Make sure that the Bio directory is in the
> >>>> bioperl-live directory,
> >>>> regardless (i.e. if PERL5LIB is set to
> >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> >>>> should be a directory like
> >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> >>>> won't work.
> >>>>
> >>>> What do you get with this?
> >>>>
> >>>> perl -MBio::Root::Version -e "print
> >>>> $Bio::Root::Version::VERSION"
> >>>>
> >>>> If everything is working (PERL5LIB, etc) then it
> >>>> should be 1.5 for CVS
> >>>> bioperl; otherwise it will either find the old
> >>>> version (1.2.3) or fail
> >>>> completely.
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher - Switzer Lab
> >>>> Dept. of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >>>
> >>> __________________________________________________
> >>> Do You Yahoo!?
> >>> Tired of spam?  Yahoo! Mail has the best spam protection around
> >>> http://mail.yahoo.com
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12



From iamvela at yahoo.com  Thu Feb 23 14:33:50 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 11:33:50 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000501c638a7$c2802630$15327e82@pyrimidine>
Message-ID: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>

Chris, you are right. I am using NCBI BLAST.

Here is my http query:

my $urltext =
"http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=nr&PROGRAM=blastp";

This is my code for populating p-value:

my $pValue = $bioPerlHit->significance;


I looked at the text output, could not find any p
value column, the only 'value' column in the output is
'E value'. I will try that.

Thanks,
Raghu
 
--- Chris Fields  wrote:

> I think Raghu's running NCBI BLAST, though.  Am I
> right? 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Jason Stajich
> [mailto:jason.stajich at duke.edu]
> > Sent: Thursday, February 23, 2006 12:30 PM
> > To: Chris Fields
> > Cc: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > p-values do show up in WU-BLAST reports so that is
> why we have a p-
> > value function.
> > 
> > 
> > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > 
> > > Hold up a second.  Do you mean e-value, or
> p-value?  A run-of-the-
> > > mill NCBI
> > > blast report these days gives e-values
> (expectation value), NOT p-
> > > values.  I
> > > think they changed over to using only e-values
> with BLAST v2.  Make
> > > sure you
> > > didn't mix these up; look out the text output to
> make sure that P
> > > values are
> > > present.  That would explain why you're getting
> 0, since they don't
> > > exist.
> > >
> > >> From the BLAST tutorial:
> > >
> > > The BLAST programs report E-value rather than
> P-values because it
> > > is easier
> > > to understand the difference between, for
> example, E-value of 5 and
> > > 10 than
> > > P-values of 0.993 and 0.99995. However, when E <
> 0.01, P-values and
> > > E-value
> > > are nearly identical.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris
> Fields
> > >> Sent: Thursday, February 23, 2006 11:41 AM
> > >> To: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Blast returns result,
> but does not return
> > >> hits
> > >>
> > >> Yes that's a potential issue.  I'll try to
> replicate that here;
> > >> please
> > >> send
> > >> a code example so I can see how you're calling
> for the p-value.
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher - Switzer Lab
> > >> Dept. of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath Verabelli
> > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>> Subject: Re: [Bioperl-l] Blast returns result,
> but does not
> > >>> return hits
> > >>>
> > >>> Thanks Chris for all your help.
> > >>>
> > >>> The patch for blast.pm worked. I was able to
> parse the
> > >>> hits from the raw file. I uninstalled previous
> > >>> versions of bioperl using ppm and then I
> installed
> > >>> bioperl 1.4.x using nmake, and applied your
> fix. I am
> > >>> getting hits the way I wanted.
> > >>>
> > >>> However, I noticed that the p-value for each
> hit
> > >>> doesn't seem to be parsed
> > >>> correctly. It sets it to 0 for all hits. Not
> sure if
> > >>> this is a known issue. Any
> suggestions/comments,
> > >>> please let me know.
> > >>>
> > >>> Thanks,
> > >>> Raghu
> > >>>
> > >>> --- Chris Fields  wrote:
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > >>>> [mailto:bioperl-l-
> > >>>>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath
> > >>>> Verabelli
> > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > >>>>> To: Chris Fields;
> bioperl-l at lists.open-bio.org
> > >>>>> Subject: Re: [Bioperl-l] Blast returns
> result, but
> > >>>> does not return hits
> > >>>>>
> > >>>>>
> > >>>>> Thanks very much Chris for your time.
> > >>>>> Please see below output that you requested
> (the
> > >>>> only
> > >>>>> difference i saw between your output and
> mine is
> > >>>> @INC
> > >>>>> value. I have only 2 directories
> c:\mod_perl\perl
> > >>>>> where i installed activeperl. I see two
> additional
> > >>>>> directories in your @INC path).
> > >>>>>
> > >>>>>>
> > >>>>>> When you type 'perl -V' what do you see
> (make
> > >>>> sure
> > >>>>>> it is a capital 'V', not
> > >>>>>> lower case).
> > >>>>>
> > >>>>> C:\Documents and Settings\Administrator>perl
>  -V
> > >>>>> Summary of my perl5 (revision 5 version 8
> > >>>> subversion
> > >>>>> 7) configuration:
> > >>>>>   Platform:
> > >>>>>     osname=MSWin32, osvers=5.0,
> > >>>>> archname=MSWin32-x86-multi-thread
> > >>>>
> > >>>> [....]
> > >>>>
> > >>>>> if.pm
> > >>>>>   Built under MSWin32
> > >>>>>   Compiled at Nov  2 2005 08:44:52
> > >>>>>   %ENV:
> > >>>>>     PERL5LIB="c:\bioperl-live"
> > >>>>>   @INC:
> > >>>>>     c:\bioperl-live
> > >>>>>     C:/mod_perl/Perl/lib
> > >>>>>     C:/mod_perl/Perl/site/lib
> > >>>>>     .
> > >>>>
> > >>>> Personally I wouldn't place the the
> bioperl-live
> > >>>> folder in the root
> > >>>> directory; this shouldn't make a difference,
> but you
> > >>>> can try moving it to
> > >>>> the perl directory in a separate folder to
> see if
> > >>>> that helps.  Can't see why
> > >>>> it would make a difference, but it is
> Windows...
> > >>>> Main reason I'll switching
> > >>>> over to Mac OS X!
> > >>>>
> > >>>> Make sure that the Bio directory is in the
> > >>>> bioperl-live directory,
> > >>>> regardless (i.e. if PERL5LIB is set to
> > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> there
> > >>>> should be a directory like
> > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> it
> > >>>> won't work.
> > >>>>
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

From cjfields at uiuc.edu  Thu Feb 23 16:11:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 15:11:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>
Message-ID: <000301c638bd$bc9eb590$15327e82@pyrimidine>

I think you want $hit->expect (for hits) or $hsp->evalue (for HSPs).
$hit->significance (for NCBI blast) gives the values from the descriptions
(the score and expect) for each hit.

If you want to see what methods are available for any given object (in this
case Bio::Search::Hit::BlastHit ot Bio::Search::HSP::BlastHSP), use the
below script from the bioperl FAQ (use PPM to install Class::Inspector
first) and pass the object module name on the command line.  Be careful as
many of these are get/sets (so don't pass any args).
----------------------------------
#!perl
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-methods($class,'full','public')}),
"\n";
----------------------------------
You should get something like this:

C:\Perl\Scripts>methods.pl Bio::Search::Hit::BlastHit
Bio::Root::Root::DESTROY
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::Search::Hit::BlastHit::expect
Bio::Search::Hit::BlastHit::found_again
Bio::Search::Hit::BlastHit::iteration
Bio::Search::Hit::BlastHit::new
Bio::Search::Hit::GenericHit::accession
Bio::Search::Hit::GenericHit::add_hsp
Bio::Search::Hit::GenericHit::algorithm
Bio::Search::Hit::GenericHit::ambiguous_aln
Bio::Search::Hit::GenericHit::bits
Bio::Search::Hit::GenericHit::description
Bio::Search::Hit::GenericHit::each_accession_number
Bio::Search::Hit::GenericHit::end
Bio::Search::Hit::GenericHit::frac_aligned_hit
Bio::Search::Hit::GenericHit::frac_aligned_query
Bio::Search::Hit::GenericHit::frac_conserved
Bio::Search::Hit::GenericHit::frac_identical
Bio::Search::Hit::GenericHit::frame
Bio::Search::Hit::GenericHit::gaps
Bio::Search::Hit::GenericHit::hsp
Bio::Search::Hit::GenericHit::hsps
Bio::Search::Hit::GenericHit::length
Bio::Search::Hit::GenericHit::length_aln
Bio::Search::Hit::GenericHit::locus
Bio::Search::Hit::GenericHit::logical_length
Bio::Search::Hit::GenericHit::matches
Bio::Search::Hit::GenericHit::n
Bio::Search::Hit::GenericHit::name
Bio::Search::Hit::GenericHit::next_hsp
Bio::Search::Hit::GenericHit::num_hsps
Bio::Search::Hit::GenericHit::num_unaligned_hit
Bio::Search::Hit::GenericHit::num_unaligned_query
Bio::Search::Hit::GenericHit::num_unaligned_sbjct
Bio::Search::Hit::GenericHit::overlap
Bio::Search::Hit::GenericHit::p
Bio::Search::Hit::GenericHit::query_length
Bio::Search::Hit::GenericHit::range
Bio::Search::Hit::GenericHit::rank
Bio::Search::Hit::GenericHit::raw_score
Bio::Search::Hit::GenericHit::rewind
Bio::Search::Hit::GenericHit::score
Bio::Search::Hit::GenericHit::seq_inds
Bio::Search::Hit::GenericHit::significance
Bio::Search::Hit::GenericHit::start
Bio::Search::Hit::GenericHit::strand
Bio::Search::Hit::GenericHit::tiled_hsps
Bio::Search::Hit::HitI::hit_description
Bio::Search::Hit::HitI::hit_length

Nice, huh?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Thursday, February 23, 2006 1:34 PM
> To: Chris Fields; 'Jason Stajich'
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Blast returns result, but does not return hits
> 
> Chris, you are right. I am using NCBI BLAST.
> 
> Here is my http query:
> 
> my $urltext =
> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=n
> r&PROGRAM=blastp";
> 
> This is my code for populating p-value:
> 
> my $pValue = $bioPerlHit->significance;
> 
> 
> I looked at the text output, could not find any p
> value column, the only 'value' column in the output is
> 'E value'. I will try that.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > I think Raghu's running NCBI BLAST, though.  Am I
> > right?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > > -----Original Message-----
> > > From: Jason Stajich
> > [mailto:jason.stajich at duke.edu]
> > > Sent: Thursday, February 23, 2006 12:30 PM
> > > To: Chris Fields
> > > Cc: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > p-values do show up in WU-BLAST reports so that is
> > why we have a p-
> > > value function.
> > >
> > >
> > > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > >
> > > > Hold up a second.  Do you mean e-value, or
> > p-value?  A run-of-the-
> > > > mill NCBI
> > > > blast report these days gives e-values
> > (expectation value), NOT p-
> > > > values.  I
> > > > think they changed over to using only e-values
> > with BLAST v2.  Make
> > > > sure you
> > > > didn't mix these up; look out the text output to
> > make sure that P
> > > > values are
> > > > present.  That would explain why you're getting
> > 0, since they don't
> > > > exist.
> > > >
> > > >> From the BLAST tutorial:
> > > >
> > > > The BLAST programs report E-value rather than
> > P-values because it
> > > > is easier
> > > > to understand the difference between, for
> > example, E-value of 5 and
> > > > 10 than
> > > > P-values of 0.993 and 0.99995. However, when E <
> > 0.01, P-values and
> > > > E-value
> > > > are nearly identical.
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Chris
> > Fields
> > > >> Sent: Thursday, February 23, 2006 11:41 AM
> > > >> To: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > >> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not return
> > > >> hits
> > > >>
> > > >> Yes that's a potential issue.  I'll try to
> > replicate that here;
> > > >> please
> > > >> send
> > > >> a code example so I can see how you're calling
> > for the p-value.
> > > >>
> > > >> Christopher Fields
> > > >> Postdoctoral Researcher - Switzer Lab
> > > >> Dept. of Biochemistry
> > > >> University of Illinois Urbana-Champaign
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath Verabelli
> > > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > > >>> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not
> > > >>> return hits
> > > >>>
> > > >>> Thanks Chris for all your help.
> > > >>>
> > > >>> The patch for blast.pm worked. I was able to
> > parse the
> > > >>> hits from the raw file. I uninstalled previous
> > > >>> versions of bioperl using ppm and then I
> > installed
> > > >>> bioperl 1.4.x using nmake, and applied your
> > fix. I am
> > > >>> getting hits the way I wanted.
> > > >>>
> > > >>> However, I noticed that the p-value for each
> > hit
> > > >>> doesn't seem to be parsed
> > > >>> correctly. It sets it to 0 for all hits. Not
> > sure if
> > > >>> this is a known issue. Any
> > suggestions/comments,
> > > >>> please let me know.
> > > >>>
> > > >>> Thanks,
> > > >>> Raghu
> > > >>>
> > > >>> --- Chris Fields  wrote:
> > > >>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > > >>>> [mailto:bioperl-l-
> > > >>>>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath
> > > >>>> Verabelli
> > > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > > >>>>> To: Chris Fields;
> > bioperl-l at lists.open-bio.org
> > > >>>>> Subject: Re: [Bioperl-l] Blast returns
> > result, but
> > > >>>> does not return hits
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks very much Chris for your time.
> > > >>>>> Please see below output that you requested
> > (the
> > > >>>> only
> > > >>>>> difference i saw between your output and
> > mine is
> > > >>>> @INC
> > > >>>>> value. I have only 2 directories
> > c:\mod_perl\perl
> > > >>>>> where i installed activeperl. I see two
> > additional
> > > >>>>> directories in your @INC path).
> > > >>>>>
> > > >>>>>>
> > > >>>>>> When you type 'perl -V' what do you see
> > (make
> > > >>>> sure
> > > >>>>>> it is a capital 'V', not
> > > >>>>>> lower case).
> > > >>>>>
> > > >>>>> C:\Documents and Settings\Administrator>perl
> >  -V
> > > >>>>> Summary of my perl5 (revision 5 version 8
> > > >>>> subversion
> > > >>>>> 7) configuration:
> > > >>>>>   Platform:
> > > >>>>>     osname=MSWin32, osvers=5.0,
> > > >>>>> archname=MSWin32-x86-multi-thread
> > > >>>>
> > > >>>> [....]
> > > >>>>
> > > >>>>> if.pm
> > > >>>>>   Built under MSWin32
> > > >>>>>   Compiled at Nov  2 2005 08:44:52
> > > >>>>>   %ENV:
> > > >>>>>     PERL5LIB="c:\bioperl-live"
> > > >>>>>   @INC:
> > > >>>>>     c:\bioperl-live
> > > >>>>>     C:/mod_perl/Perl/lib
> > > >>>>>     C:/mod_perl/Perl/site/lib
> > > >>>>>     .
> > > >>>>
> > > >>>> Personally I wouldn't place the the
> > bioperl-live
> > > >>>> folder in the root
> > > >>>> directory; this shouldn't make a difference,
> > but you
> > > >>>> can try moving it to
> > > >>>> the perl directory in a separate folder to
> > see if
> > > >>>> that helps.  Can't see why
> > > >>>> it would make a difference, but it is
> > Windows...
> > > >>>> Main reason I'll switching
> > > >>>> over to Mac OS X!
> > > >>>>
> > > >>>> Make sure that the Bio directory is in the
> > > >>>> bioperl-live directory,
> > > >>>> regardless (i.e. if PERL5LIB is set to
> > > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> > there
> > > >>>> should be a directory like
> > > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> > it
> > > >>>> won't work.
> > > >>>>
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com


From cain at cshl.edu  Wed Feb 22 09:36:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 22 Feb 2006 09:36:54 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
Message-ID: <1140619014.3142.81.camel@localhost.localdomain>

Hi Dave,

I don't know if this helps at all, but you could think of that 45 tick
mark as the termination, since the space between the 44th and the 45th
tick mark corresponds to your 44th residue.  I suppose it is a matter of
correctly training your users :-)

Scott


On Wed, 2006-02-22 at 10:20 +0000, Dave Howorth wrote:
> Lincoln Stein wrote:
> > Hi Dave,
> > 
> > Well, when you are using 1-based coordinates, an line that contains 44 
> > intervals will have 45 ticks. If you move to 0-based coordinates, then the 
> > first tick will be labeled 0 and the last tick will be labeled 44. An 
> > alternative is to make each base dimensionless, but that becomes a problem 
> > when dealing with single base features, such as SNPs.
>  >
> > These issues are why I have long advocated for interbase coordinates
> > in which you number the positions between bases rather than the bases
> > themselves.
> 
> I see your point but I need to work with the coordinates that the users 
> expect and are familiar with. (Things get much worse with PDB residue 
> numbering :)
> 
> > Draw me the picture of what you expect to see. I think of it this way:
> > 
> > 	1    2  3  4   5   6
> >          A>G>C>T>A>
> 
> I guess something went wrong with your ASCII art :(
> 
> OK, consider a 44-residue entry from SwissProt (P12239):
> 
>    TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR
> 
> The first T is numbered 1 and the last R is numbered 44.
> 
> So I expect to see a line with 44 positions indicated somehow (whether 
> these are half-open intervals or points on the line), with the number 1 
> at the left end and the number 44 at the right end.
> 
> An important point is that if I then place other tracks below this one 
> that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, 
> they should align properly (according to whatever convention is used to 
> represent a residue).
> 
> For a short sequence like this it would be possible to use letters to 
> represent the residue but I'd like to use the same convention for longer 
> sequences as well and have everything be consistent.
> 
> I'm hoping Bio:Graphics will make this easy.
> 
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


From hlapp at gnf.org  Thu Feb 23 21:10:13 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 23 Feb 2006 18:10:13 -0800
Subject: [Bioperl-l] [BioSQL-l] Load seqfeature from biosql database
	with perl
In-Reply-To: <1140744561.2888.19.camel@alien>
Message-ID: 

Yes, kudos to you for figuring this out yourself, and you actually figured
out the more difficult way. I apologize for my delay in responding, I was
tied up this morning and last night.

You got the first key step right, namely obtaining the right persistence
adaptor. This step determines which object you get back.

Your query will work, and in fact will be equally fast as the simple
solution (which is simple only because it is simpler to code, not because
the internally executed query is simpler). The simple solution is that every
Bio::DB::PersistenceAdaptorI implementing object (i.e., any object you get
back from $db->get_object_adaptor(..)) has a method
$adp->find_by_primary_key(). So, using that method:

    $feature = $adaptor->find_by_primary_key($seqfeature_id);

You can also control the type of object to be created (so long as it is a
Bio::SeqFeatureI) by passing in an object factory in addition.

BTW as an aside, using the finder method will also make the object cache
used for lookup first if the cache is enabled. It doesn't matter for seq
features because due to the potentially large number of objects the cache is
not enabled by default for this adaptor.

    -hilmar  

On 2/23/06 5:29 PM, "Michael Cipriano"  wrote:

> Ah, I think I figured it out.
> 
> my $seqfeature_id = '401138';
> my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");
> 
> my $query = Bio::DB::Query::BioQuery->new(
> 
> -datacollections=>["Bio::SeqFeatureI t1"],
>                                         -where => ["t1.Bio::SeqFeatureI
> = ?"]);
> 
> my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
> SEQ',
> 
> -values=>[$seqfeature_id]);
> 
> while(my $loc = $qres->next_object())
> {
>         my $obj = $loc;
> 
>         print $obj->primary_key() . "\n";
>         print 'location:' . $obj->location->to_FTstring() . "\n";
>         $obj->add_tag_value("test", "moretest");
>         foreach my $tag ($obj->get_all_tags())
>         {
>                 print " Values for tag $tag: ";
>                 print join(' ',$obj->get_tag_values($tag));
>                 print "\n";
>         }
>         print "------------------\n";
> 
> }
> 
> 
> 
> This seems to work
> On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
>> Hello BioSQLers,
>> 
>> I have a simple question (I hope), Can I easily load a seqfeature from a
>> biosql database into a perl Bio::SeqFeatureI object?  I have the
>> database value for the  seqfeature.seqfeature_id and would like to load
>> it using this alone.
>> 
>> I do not want to have to load the whole bioentry object then search for
>> the feature, I just want the feature object since the bioentry is a
>> whole genome and loading that will take more time then necessary.
>> 
>> I have searched the documentation and have even tried looking through
>> the code for the modules, but could not find an easy fast method.
>> 
>> Please reply directly to me as well as the list as I am not a list
>> member.
>> 
>> Thanks for your help,
>> 
>> 
>> Michael Cipriano
>> 
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



From praveecbt at yahoo.co.in  Fri Feb 24 00:57:22 2006
From: praveecbt at yahoo.co.in (Praveen Raj)
Date: Fri, 24 Feb 2006 05:57:22 +0000 (GMT)
Subject: [Bioperl-l] Problem in BioPerl. Help!
Message-ID: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>

Dear sir,
   
           I have one problem in using Bioperl module 'Clustalw.pm'.
Clustalw creates SimpleAlign object as output,isn't it?.
  I successfully convert the object into 'clustal' and 'phylip' format using a
  file handler.
Sir, I want to make a newick format( for phylogenetic tree ) from the object itself.
But I know that Standalone Clustalw creates a newick file(.dnd extension) as an output along with 
the .aln file.
When I created a 'clustal' format and printed into a web page, it look like this;
   
  CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence format is Pearson
Sequence 1: >gi|dengue2|           13 aa
Sequence 2: >gi|yellowfever|       13 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  15
Guide tree        file created:   [\tXGgJDIuZZ\jmIerlkHz7.dnd]
Start of Multiple Alignment
There are 1 groups
...............
   
  I don't know where the .dnd file(it's in newick format) is created.
It's not in the current directory.
Is there any method to specify the path for the .dnd file?
  I have gone through all the documentation provided with the BioPerl & clustalw.
  
How can I create a 'newick' output(.dnd file) format from a SimpleAlign object,created by Clustalw.pm?
   
  It's a great benefit for me, if you provide a solution for the same.
I can't move forward without a solution for this.
  So, Please reply...
   
                                    Thanking you,
                                                   Praveen Raj(student).
                                                   National Institute of Virology,   
                                                   Pune. India

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.

From roy at colibase.bham.ac.uk  Fri Feb 24 10:51:46 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Fri, 24 Feb 2006 15:51:46 +0000
Subject: [Bioperl-l] Problem in BioPerl. Help!
In-Reply-To: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
References: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
Message-ID: <43FF2B92.9090801@colibase.bham.ac.uk>

Praveen Raj wrote:
> Sir, I want to make a newick format( for phylogenetic tree ) from the
> object itself. But I know that Standalone Clustalw creates a newick
> file(.dnd extension) as an output along with the .aln file.

Be careful with this. The .dnd files produced by ClustalW contain a 
Newick format guide tree- produced from pairwise-aligned sequences to 
guide the multiple alignment process. This should not be confused with a 
phylogenetic analysis, and the .dnd file is usually best ignored.

ClustalW can be used to produce a true phylogenetic tree from the 
alignment using the Neighbor-joining method (see the menus and 
documentation for details). This method produces files with a .ph or 
.phb extension (.phb if the tree is bootstrapped). I'm not sure if this 
process can be done using BioPerl, but it is possible to do using 
ClustalW's command line flags, so if you need to automate the process 
you could use Perl's system command. If you want to use BioPerl you can 
use the Phylip program neighbor to generate your tree directly from a 
SimpleAlign object, using the module 
Bio::Tools::Run::Phylo::Phylip::Neighbor.

Cheers.
Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk



From perlmails at gmail.com  Sun Feb 26 06:51:37 2006
From: perlmails at gmail.com (perlmails at gmail.com)
Date: Sun, 26 Feb 2006 17:21:37 +0530
Subject: [Bioperl-l] extract ncDNA
Message-ID: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>

Dear Bioperl group,

I have been working on extracting non-coding DNA (ncDNA) sequences
from an organimsm.

I tried extracting the intergenic sequences from the sense-strand
after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
the EMBL feature table entries using the Bioperl and the additional
script (mentioned below).

Now, I realised that there is a problem to extract the ncDNA sequences
from the negative-strand, Any ideas?

To extract the ncDNAs from negative-strand, I thought of converting
the negative-strand co-ordinates to sense-strand co-ordinates and
adding these to the sense-strand cords. Then filter all the features
(select the ncDNAs after discarding the features from EMBL FT) to get
all the ncDNAs.

Is there anything I am missing for using from the bioperl kit?

##<<>
use strict;

my $EMBL_cord_file = "Organism.feature.cords";  # feature
co-ordinates: start \t end
my $RAW_file = "Organism.raw";
my $ncDNA_file = "Organism.ncDNA";

open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
open(RAW, $RAW_file) or die "Canot open RAW_file";
open(OUT, ">$ncDNA_file") or die;

my @dna=;
my $dna = join('', at dna);

while($dna){
	$dna=~s/\s//g;
	while(){
		my @cords = split /\t/;
		my	$start = $cords[0];
		my	$end = $cords[1];
		my $replaceString = "\n>$start..$end";
		substr($dna, $start-1, $end-$start+1, $replaceString);
}
	print OUT $dna,"\n";
	exit;
}
##<<>

Another thing is, since I am reading the whole file in a scalar the
script does not complete the extraction of all ncDNAs from the
sense-strand. Obviously, the features are parsed first before the
flattening of the 266,000 nt sequence into a single string.

Any help would be appreciated.

-PO


From cjfields at uiuc.edu  Sun Feb 26 09:12:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 26 Feb 2006 08:12:57 -0600
Subject: [Bioperl-l] extract ncDNA
In-Reply-To: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
References: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
Message-ID: 

You're not using bioperl.  See:

http://www.bioperl.org/wiki/HOWTO:Beginners

then go to:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Chris


On Feb 26, 2006, at 5:51 AM, perlmails at gmail.com wrote:

> Dear Bioperl group,
>
> I have been working on extracting non-coding DNA (ncDNA) sequences
> from an organimsm.
>
> I tried extracting the intergenic sequences from the sense-strand
> after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
> the EMBL feature table entries using the Bioperl and the additional
> script (mentioned below).
>
> Now, I realised that there is a problem to extract the ncDNA sequences
> from the negative-strand, Any ideas?
>
> To extract the ncDNAs from negative-strand, I thought of converting
> the negative-strand co-ordinates to sense-strand co-ordinates and
> adding these to the sense-strand cords. Then filter all the features
> (select the ncDNAs after discarding the features from EMBL FT) to get
> all the ncDNAs.
>
> Is there anything I am missing for using from the bioperl kit?
>
> ##<<>
> use strict;
>
> my $EMBL_cord_file = "Organism.feature.cords";  # feature
> co-ordinates: start \t end
> my $RAW_file = "Organism.raw";
> my $ncDNA_file = "Organism.ncDNA";
>
> open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
> open(RAW, $RAW_file) or die "Canot open RAW_file";
> open(OUT, ">$ncDNA_file") or die;
>
> my @dna=;
> my $dna = join('', at dna);
>
> while($dna){
> 	$dna=~s/\s//g;
> 	while(){
> 		my @cords = split /\t/;
> 		my	$start = $cords[0];
> 		my	$end = $cords[1];
> 		my $replaceString = "\n>$start..$end";
> 		substr($dna, $start-1, $end-$start+1, $replaceString);
> }
> 	print OUT $dna,"\n";
> 	exit;
> }
> ##<<>
>
> Another thing is, since I am reading the whole file in a scalar the
> script does not complete the extraction of all ncDNAs from the
> sense-strand. Obviously, the features are parsed first before the
> flattening of the 266,000 nt sequence into a single string.
>
> Any help would be appreciated.
>
> -PO
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign




From saldroubi at yahoo.com  Sun Feb 26 15:15:14 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Sun, 26 Feb 2006 12:15:14 -0800 (PST)
Subject: [Bioperl-l] Is it worth it?
Message-ID: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>

Hello everyone,
   
  Please forgive me for posting my questions on this list since they are not directly related to bioperl but since most of you are doing bioinformatics, I thought I could ask for some advise.  Also, please point me to other lists or websites if more appropriate. 
   
  Basically I am wondering if it is worth it getting a Master or PhD degree in bioinformatics with funding?  I already have an MS degree in Software Engineering and I've take a few bioinformatics courses and I like the field.  Additionally, I am almost 40 years old and have a stable job.  If I am to get PhD in 3 to 4 years, what job opportunities will be out there for me?  And is it better to work in academia or the private sector?  What the average salary like?
   
  Thank you very much and please respond to me directly instead of of the list since my questions are off topic.
   
   


Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com

From joel at macresearcher.com  Sun Feb 26 22:12:12 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Sun, 26 Feb 2006 20:12:12 -0700
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
References: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
Message-ID: 

It seems to me that your mind is already made up. By asking such a  
question I think it's safe to say a PhD program in Bioinformatics  
would not be your cup of tea. This is not to be negative. If you like  
bioinformatics, do bioinformatics. Join an open-source project, or  
start one of your own. If you live in a town with a University, find  
a lab that needs bioinformatics work and volunteer your time. If you  
really have a passion for bioinformatics, just do bioinformatics and  
your path will become clear, opportunities will arise, your salary  
will be what you need. Just my two shekels of course.

- Joel

On Feb 26, 2006, at 1:15 PM, Sam Al-Droubi wrote:

> Hello everyone,
>
>   Please forgive me for posting my questions on this list since  
> they are not directly related to bioperl but since most of you are  
> doing bioinformatics, I thought I could ask for some advise.  Also,  
> please point me to other lists or websites if more appropriate.
>
>   Basically I am wondering if it is worth it getting a Master or  
> PhD degree in bioinformatics with funding?  I already have an MS  
> degree in Software Engineering and I've take a few bioinformatics  
> courses and I like the field.  Additionally, I am almost 40 years  
> old and have a stable job.  If I am to get PhD in 3 to 4 years,  
> what job opportunities will be out there for me?  And is it better  
> to work in academia or the private sector?  What the average salary  
> like?
>
>   Thank you very much and please respond to me directly instead of  
> of the list since my questions are off topic.
>
>
>
>
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From sdavis2 at mail.nih.gov  Mon Feb 27 06:39:27 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 27 Feb 2006 06:39:27 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
Message-ID: 




On 2/26/06 10:12 PM, "Joel Dudley"  wrote:

> It seems to me that your mind is already made up. By asking such a
> question I think it's safe to say a PhD program in Bioinformatics
> would not be your cup of tea. This is not to be negative. If you like
> bioinformatics, do bioinformatics. Join an open-source project, or
> start one of your own. If you live in a town with a University, find
> a lab that needs bioinformatics work and volunteer your time. If you
> really have a passion for bioinformatics, just do bioinformatics and
> your path will become clear, opportunities will arise, your salary
> will be what you need. Just my two shekels of course.

I would second this sentiment.  Most of the folks that I know that are doing
bioinformatics are doing it WITHOUT a degree in it.  The trick is to have
both computational skills AND domain-specific knowledge.  Just find a
project that will require you to gain some domain-specific knowledge (which
can actually happen pretty quickly) and go for it.  As Joel said, there are
dozens of open source projects that would love a helping hand.  If you need
more face-time, do as Joel suggests and work with a local university (or
even high school) to design some web-based tools or something like that to
do things that would be either educational or novel.

Sean



From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb 27 05:40:19 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 27 Feb 2006 10:40:19 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602221340.28573.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	<1140625762.3142.107.camel@localhost.localdomain>	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
	<200602221340.28573.lstein@cshl.edu>
Message-ID: <4402D713.2050007@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> I have just committed a version of the arrow.pm glyph that has a 
> -label_intervals flag.

Thanks Lincoln,

I've edited your new version so it displays the tick labels pretty much 
as I need. My changes were to the first and last label and to move the 
position of the others a little. I hope that it behaves exactly like 
your version unless label_intervals is set. I've attached my edited version.

There's still an oddity with the number of minor ticks at the start and 
end of the line (I've seen 7, 8, and 9 minor intervals at the start of 
the line as well as 10) but I'll probably ignore that for now.

Thanks, Dave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arrow.pm
Type: application/x-perl
Size: 16357 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060227/3cfcff11/attachment.bin 

From boris.steipe at utoronto.ca  Mon Feb 27 10:42:54 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 27 Feb 2006 10:42:54 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
References: 
Message-ID: <56C842D6-18AD-40B0-AE9A-47A29AE83F1D@utoronto.ca>

I'd put I slightly different emphasis on this: obviously most of  
those in the field can't have a degree in bioinformatics because such  
degree programs haven't been around for all that long. One shouldn't  
conclude that graduate programs are therefore somehow less relevant.  
To successfully apply for a paid job, you need credentials for your  
ability to be productive.

Credentials can come from open source projects IF you can document  
the scope and quality of your contributions.

Credentials can come from a graduate degree IF your thesis appears  
relevant, original and well executed.

Credentials can come from peer-reviewed publications.

Credentials can come from personal references of collaborators.



Regards,
B.

On 27 Feb 2006, at 06:39, Sean Davis wrote:

>
>
>
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
>
>> It seems to me that your mind is already made up. By asking such a
>> question I think it's safe to say a PhD program in Bioinformatics
>> would not be your cup of tea. This is not to be negative. If you like
>> bioinformatics, do bioinformatics. Join an open-source project, or
>> start one of your own. If you live in a town with a University, find
>> a lab that needs bioinformatics work and volunteer your time. If you
>> really have a passion for bioinformatics, just do bioinformatics and
>> your path will become clear, opportunities will arise, your salary
>> will be what you need. Just my two shekels of course.
>
> I would second this sentiment.  Most of the folks that I know that  
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is  
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific  
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said,  
> there are
> dozens of open source projects that would love a helping hand.  If  
> you need
> more face-time, do as Joel suggests and work with a local  
> university (or
> even high school) to design some web-based tools or something like  
> that to
> do things that would be either educational or novel.
>
> Sean
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From slenk at emich.edu  Mon Feb 27 16:07:38 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Mon, 27 Feb 2006 16:07:38 -0500
Subject: [Bioperl-l] Is it worth it?
Message-ID: <556d070556f727.556f727556d070@emich.edu>

Gee golly ollie, this is good advice. I face the same issues, but am much older (53). I am taking a Sloan MS in 
Bioinformatics while working full time at the car parts company. I bring what I have newly learned at school to 
work (Perl especially, in which I build and share tools even as far away as exotic India (smile)). I take what I have 
from work (discipline, experience, work ethic) and apply it to open source and shared school projects. The 
world has given me a lot; I enjoy giving back. Why not take an MS in Biology/Bioinformatics at your pace and 
see where it leads. I have no idea if I will EVER have a JOB in Bioinformatics, so I just live it day by day. Plug 
follows - see MCPrimers at CPAN for PCR primer design for molecular cloning with site-directed mutagenesis. I 
did this as an outgrowth of a Rectech class I took. 



----- Original Message -----
From: Sean Davis 
Date: Monday, February 27, 2006 6:39 am
Subject: Re: [Bioperl-l] Is it worth it?

> 
> 
> 
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
> 
> > It seems to me that your mind is already made up. By asking such a
> > question I think it's safe to say a PhD program in Bioinformatics
> > would not be your cup of tea. This is not to be negative. If you 
> like> bioinformatics, do bioinformatics. Join an open-source 
> project, or
> > start one of your own. If you live in a town with a University, find
> > a lab that needs bioinformatics work and volunteer your time. If you
> > really have a passion for bioinformatics, just do bioinformatics and
> > your path will become clear, opportunities will arise, your salary
> > will be what you need. Just my two shekels of course.
> 
> I would second this sentiment.  Most of the folks that I know that 
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is 
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific 
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said, 
> there are
> dozens of open source projects that would love a helping hand.  If 
> you need
> more face-time, do as Joel suggests and work with a local 
> university (or
> even high school) to design some web-based tools or something like 
> that to
> do things that would be either educational or novel.
> 
> Sean
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

From joel at macresearcher.com  Mon Feb 27 20:56:13 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Mon, 27 Feb 2006 18:56:13 -0700
Subject: [Bioperl-l] BioPerlers Represent!
Message-ID: 

Hey list,
	The contest to fill the script repository at MacResearch.org is  
ending very soon. Thus far we've only received a paltry three  
submissions with PERL scripts. The contest take home prize is a black  
iPod nano (2GB) so if you've got anything lying around that you'd  
like to share I'd suggest zipping it up and adding it to the script  
repository. Full contest details can be viewed here:

http://www.macresearch.org/ipod_contest

Now before get ready to smack me with your anti-spam cudgel, or shake  
your fist in my general direction, please note that MacResearch.org  
is completely non-profit, existing only to aid and foster community  
for scientists using OS X. I gain nothing personally by attracting  
BioPerl scripts to the repository but I'd love to see Perl well  
represented. Thanks for understanding.

- Joel

From jforment at ibmcp.upv.es  Tue Feb 28 07:17:59 2006
From: jforment at ibmcp.upv.es (Javier Forment)
Date: Tue, 28 Feb 2006 13:17:59 +0100
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for hit
 or for hsp objects?
Message-ID: <44043F77.1010901@ibmcp.upv.es>

Hi bioperlers... I have some questions when parsing BLAST results.

As far as I know, bioperl documentation for Bio::SearchIO states that 
frac_identical and frac_conserved are methods for hsp objects (e.g., 
$hsp->frac_identical). I have found that it is also possible to use 
these methods for hit objects (e.g., $hit->frac_identical), since it 
does not give an error, but in this case they don't work properly (I 
think that they work fine with blastn, but not with blastx). So my 
questions are:

1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
2.- if so, how they get the frac_identical for a hit when it has more 
than one HSP (maybe getting the average value for all the hsps)?
3.- if so, why they don't work fine sometimes, for example, with blastx?
4.- if not, is there any method to get the fraction of identical or 
conserved residues for a hit, other than averaging the corresponding 
values for all the hsps of this hit?

Thanks a lot in advance,

Javier.

-- 
Javier Forment Millet
Unidad de Bioinformatica del Laboratorio de Genomica
Instituto de Biologia Molecular y Celular de Plantas
Universidad Politecnica de Valencia
Avenida de los Naranjos, s/n
46022 Valencia (Spain)
Tlf.(1): +34-963877885
Tlf.(2): 685142553
FAX: +34-963877859
e-mail: jforment at ibmcp.upv.es

From jason.stajich at duke.edu  Tue Feb 28 08:31:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 08:31:00 -0500
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for
	hit or for hsp objects?
In-Reply-To: <44043F77.1010901@ibmcp.upv.es>
References: <44043F77.1010901@ibmcp.upv.es>
Message-ID: 

Personally, I only use these values from HSPs - the Hit methods  
require HSPs to be tiled to summarize the bases and I'm not convinced  
the method works for all situations.

If you want it summarized to a single value for query/hit pair I  
would use FASTA or use WU-BLAST to if you must use BLAST, get the  
links path out and summarize it on a set of HSPs paths.

-jason
On Feb 28, 2006, at 7:17 AM, Javier Forment wrote:

> Hi bioperlers... I have some questions when parsing BLAST results.
>
> As far as I know, bioperl documentation for Bio::SearchIO states that
> frac_identical and frac_conserved are methods for hsp objects (e.g.,
> $hsp->frac_identical). I have found that it is also possible to use
> these methods for hit objects (e.g., $hit->frac_identical), since it
> does not give an error, but in this case they don't work properly (I
> think that they work fine with blastn, but not with blastx). So my
> questions are:
>
> 1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
> 2.- if so, how they get the frac_identical for a hit when it has more
> than one HSP (maybe getting the average value for all the hsps)?
> 3.- if so, why they don't work fine sometimes, for example, with  
> blastx?
> 4.- if not, is there any method to get the fraction of identical or
> conserved residues for a hit, other than averaging the corresponding
> values for all the hsps of this hit?
>
> Thanks a lot in advance,
>
> Javier.
>
> -- 
> Javier Forment Millet
> Unidad de Bioinformatica del Laboratorio de Genomica
> Instituto de Biologia Molecular y Celular de Plantas
> Universidad Politecnica de Valencia
> Avenida de los Naranjos, s/n
> 46022 Valencia (Spain)
> Tlf.(1): +34-963877885
> Tlf.(2): 685142553
> FAX: +34-963877859
> e-mail: jforment at ibmcp.upv.es
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12



From julioallen at hotmail.com  Tue Feb 28 08:22:14 2006
From: julioallen at hotmail.com (James Allen)
Date: Tue, 28 Feb 2006 13:22:14 +0000
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
Message-ID: 

Hello,
I'm retrieving data using the 'features' method of Bio::DB::GFF, and when 
the feature is on the reverse strand (ie = -1) the start and end points are 
flipped, so that 'feature->end' is the smaller number (ie what I consider 
the start point) and 'feature->start' is the larger number.
Is there anyway to prevent this behaviour, so that the start value of my 
feature is the same as the start value in my database, regardless of the 
strand?

Thanks,
Julio



From ewijaya at singnet.com.sg  Tue Feb 28 05:01:23 2006
From: ewijaya at singnet.com.sg (Edward WIJAYA)
Date: Tue, 28 Feb 2006 18:01:23 +0800
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file (Fasta)
	into Array
Message-ID: 

Hi,

Does Bio::SeqIO has a method  specially designed for
reading all the sequences from a fasta file into array.

What I have currently is this subroutine, it seems to me
__very inefficient__. I was wondering
is there a better way to achieve it.


sub get_sequence_from_fasta {
     my $file = shift;
     my @seqs= ();

     open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->new(-format => 'fasta',
                              -noclose => 1 ,
                              -fh => \*INFILE);

     while ( my $seq = $in->next_seq() ) {
        push @seqs, $seq->seq();
     }
     return @seqs;
}


BTW, I also have tried to do this. I thought
this might be a better way to do the above job.
but it doesn't work.

sub get_sequence_from_fasta_that_doesnot_work {
     my $file = shift;
      open my fh, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
     return <$in>;
}

Hope to hear from you again.

--
Regards,
Edward WIJAYA
SINGAPORE

From lstein at cshl.edu  Tue Feb 28 10:08:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 28 Feb 2006 10:08:27 -0500
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
In-Reply-To: 
References: 
Message-ID: <200602281008.28373.lstein@cshl.edu>

Call the absolute(1) method, which turns off relative addressing.

Lincoln

On Tuesday 28 February 2006 08:22, James Allen wrote:
> Hello,
> I'm retrieving data using the 'features' method of Bio::DB::GFF, and when
> the feature is on the reverse strand (ie = -1) the start and end points are
> flipped, so that 'feature->end' is the smaller number (ie what I consider
> the start point) and 'feature->start' is the larger number.
> Is there anyway to prevent this behaviour, so that the start value of my
> feature is the same as the start value in my database, regardless of the
> strand?
>
> Thanks,
> Julio
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu

From jason.stajich at duke.edu  Tue Feb 28 12:36:34 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 12:36:34 -0500
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file
	(Fasta) into Array
In-Reply-To: 
References: 
Message-ID: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>


On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:

> Hi,
>
> Does Bio::SeqIO has a method  specially designed for
> reading all the sequences from a fasta file into array.
>
no but feel free to contribute one.
> What I have currently is this subroutine, it seems to me
> __very inefficient__. I was wondering
> is there a better way to achieve it.
>
Do you have a reason to think this is the slow part of your algorithm  
or are you just going on a gut reaction?  There is certainly overhead  
in calling a method but I am pretty sure that it isn't that  
significant, depends on how many sequences you are reading in I guess.

Just write a next_seq_array method and have it put the seqs onto an  
array within the method and do a benchmark test to show that it is  
faster.

-jason
>
> sub get_sequence_from_fasta {
>      my $file = shift;
>      my @seqs= ();
>
>      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->new(-format => 'fasta',
>                               -noclose => 1 ,
>                               -fh => \*INFILE);
>
>      while ( my $seq = $in->next_seq() ) {
>         push @seqs, $seq->seq();
>      }
>      return @seqs;
> }
>
>
> BTW, I also have tried to do this. I thought
> this might be a better way to do the above job.
> but it doesn't work.
>
> sub get_sequence_from_fasta_that_doesnot_work {
>      my $file = shift;
>       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
>      return <$in>;
> }
>
> Hope to hear from you again.
>
> --
> Regards,
> Edward WIJAYA
> SINGAPORE
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12



From cjfields at uiuc.edu  Tue Feb 28 13:50:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 28 Feb 2006 12:50:50 -0600
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file(Fasta)
	into Array
In-Reply-To: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>
Message-ID: <002001c63c97$e57f20c0$15327e82@pyrimidine>

Is there any particular reason why you aren't opening the file directly with
Bio::SeqIO?  

 sub get_sequence_from_fasta {
      my $file = shift;
      my @seqs= ();
      my $in = Bio::SeqIO->new(-format => 'fasta',
                               -file => "<$file");
      while ( my $seq = $in->next_seq() ) {
         push @seqs, $seq->seq();
      }
      return @seqs;
 }

I'm not completely sure of your intent here, but I think if you want to use
a globbed filehandle this way you need to open the file before entering the
sub then pass the filehandle to the sub.  I'm not sure why you pass the file
name, open the file, attach the file handle, parse the seqs, then return an
array?  Or am I missing something here?

Also, read:

http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

which explains that loading arrays can be memory-intensive if the seqs are
big.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Tuesday, February 28, 2006 11:37 AM
> To: Edward WIJAYA
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence
> file(Fasta) into Array
> 
> 
> On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:
> 
> > Hi,
> >
> > Does Bio::SeqIO has a method  specially designed for
> > reading all the sequences from a fasta file into array.
> >
> no but feel free to contribute one.
> > What I have currently is this subroutine, it seems to me
> > __very inefficient__. I was wondering
> > is there a better way to achieve it.
> >
> Do you have a reason to think this is the slow part of your algorithm
> or are you just going on a gut reaction?  There is certainly overhead
> in calling a method but I am pretty sure that it isn't that
> significant, depends on how many sequences you are reading in I guess.
> 
> Just write a next_seq_array method and have it put the seqs onto an
> array within the method and do a benchmark test to show that it is
> faster.
> 
> -jason
> >
> > sub get_sequence_from_fasta {
> >      my $file = shift;
> >      my @seqs= ();
> >
> >      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->new(-format => 'fasta',
> >                               -noclose => 1 ,
> >                               -fh => \*INFILE);
> >
> >      while ( my $seq = $in->next_seq() ) {
> >         push @seqs, $seq->seq();
> >      }
> >      return @seqs;
> > }
> >
> >
> > BTW, I also have tried to do this. I thought
> > this might be a better way to do the above job.
> > but it doesn't work.
> >
> > sub get_sequence_from_fasta_that_doesnot_work {
> >      my $file = shift;
> >       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
> >      return <$in>;
> > }
> >
> > Hope to hear from you again.
> >
> > --
> > Regards,
> > Edward WIJAYA
> > SINGAPORE
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From pterry2 at unlnotes.unl.edu  Tue Feb 28 13:53:11 2006
From: pterry2 at unlnotes.unl.edu (Philip M Terry)
Date: Tue, 28 Feb 2006 12:53:11 -0600
Subject: [Bioperl-l] Bioperl use question
Message-ID: 


Hello,

Is this an appropriate mailing list for this question?

I am trying Test 4 from the Tisdale book, p-299, "Mastering Perl for
Bioinformatics".

Comparing screen output from p-303 of the Tisdale book for bp1.pl with
mine:

philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$ ./bp1.pl
Sequence name is AI129902
Sequence acc  is AI129902
First 5 bases is CTCCG

-------------------- WARNING ---------------------
MSG: acc (gb|3598416) does not exist
---------------------------------------------------
Submitted Blast for [ROA1_HUMAN]
philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$

Two questions:
i. why the warning message in my screen output?
ii. my Blast fails, that is,
--I don't see "dots" on the output line on screen following "Submitted
Blast for [ROA1_HUMAN]"?
--my output file, blast.out has 0 KB in it?

My computer system:
Power Mac G5, OS X 10.4.5, installed "core" bioperl, that is,
sudo perl -MCPAN -e shell;
cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz

Can you comment?

Thanks,
Philip M. Terry, Ph.D.
University of Nebraska-Lincoln


From staffa at niehs.nih.gov  Tue Feb 28 15:01:42 2006
From: staffa at niehs.nih.gov (staffa)
Date: Tue, 28 Feb 2006 15:01:42 -0500
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
Message-ID: 

Hello,
Does anyone know if Bio::Tools::SeqWords
count_words
or
count_overlap_words
will do DNA pattern searches and honor ambiguity symbols
like exist in some restriction enzyme pattern definitions,
e.g. GGnnCC


> Thank you.
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1028 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060228/4e9390c1/attachment-0001.bin 

From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 16:45:16 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 08:45:16 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: 
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
	
Message-ID: <4404C46C.4010005@infotech.monash.edu.au>

Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010

From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 17:01:38 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:01:38 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
Message-ID: <4404C842.2050608@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010

From staffa at niehs.nih.gov  Tue Feb 28 16:46:30 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 16:46:30 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>

Yes 
N matches any of the four bases.

Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 4:45 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] seq_word and pattern counts


Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010

From staffa at niehs.nih.gov  Tue Feb 28 17:08:40 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 17:08:40 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>

The real problem is this:
We want to count sites in a long sequence where a restriction enzyme would cut.
This restriction enzyme, in the example I gave will recognize GGnnCC,
that is two G separated by two of any bases followed by two C.

The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
I'm sure there is some module somewhere for this purpose. 





Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 5:02 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l
Subject: Re: [Bioperl-l] seq_word and pattern counts


Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010

From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 17:47:01 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:47:01 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
Message-ID: <4404D2E5.4090405@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> The real problem is this:
> We want to count sites in a long sequence where a restriction enzyme would cut.
> This restriction enzyme, in the example I gave will recognize GGnnCC,
> that is two G separated by two of any bases followed by two C.
> The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
> I'm sure there is some module somewhere for this purpose. 

(Nick - please respond to me AND the bioperl-l at bioperl.org mailing list 
ie. "Reply All", so others can benefit from the Q&A - I've re-sent your 
past responses already).

Perhaps this module?

http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html

With this code?

my $enz = "GGNNCC";
my $re = new Bio::Tools::RestrictionEnzyme(-NAME =>"NicksResEnz--$enz",
	  			  	 -MAKE =>'custom');
@fragments = $re->cut_seq($seqobj);
print "$enz cuts ", $seqobj->display_id, " ", scalar(@fragments), " 
times.\n";

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010

From cjfields at uiuc.edu  Tue Feb 28 21:41:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 28 Feb 2006 20:41:08 -0600
Subject: [Bioperl-l] WGS sequences through Bio::DB::GenBank
Message-ID: <000001c63cd9$98988520$15327e82@pyrimidine>

I know that a recent post showed that you could retrieve CONTIG sequences
from GenBank files fairly easily:

http://bioperl.org/pipermail/bioperl-l/2006-February/020891.html

I'm driving myself a bit buggy looking for this, and I may be blind to it,
but can the same be done with WGS files?  I've tried Bio::DB::GenBank and a
few other Bio::DB* modules to see if it's been implemented but haven't had
any luck yet.  I may try getting around it using Bio::DB::Query::GenBank,
but just trying to find a more direct route.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 



From chandan.kr.singh at gmail.com  Thu Feb  2 02:26:09 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Thu, 2 Feb 2006 12:56:09 +0530
Subject: [Bioperl-l] Sorry, failure in post on the net,
	so still via email
In-Reply-To: <001001c62793$bef08f70$93656785@zhur>
References: <001001c62793$bef08f70$93656785@zhur>
Message-ID: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>

Hi
It seems that its not a proxy problem. I tried today and faced the same
problem. It has been months since my last try and therefore something might
have changed.
Try reading more on this problem.
I myself will try to do it.
Regards
Chandan

On 2/2/06, Huang Jian  wrote:
>
> I tried  some "Quick getting started scripts" in bptutorial.
>
> use Bio::Perl;
>   $seq = get_sequence('swiss',"ROA1_HUMAN");
>   # uses the default database - nr in this case
>   $blast_result = blast_sequence($seq);
>   write_blast(">roa1.blast",$blast_result);
>
> It returns "Submitted Blast for [ROA1_HUMAN] "
> It does not return me any error after I run the script.  However, it does
> not
> return me any result either.  The file "roa1.blast" is created but is
> always
> empty.
>
> I found the return is like the code below in function "blast_sequence"
>  if( $verbose ) {
>  print STDERR "Submitted Blast for [".$seq->id."] ";
>     }
>     sleep 5;
> ....
> I have tested "( env_proxy => 1 )" ...The problem remains the same...
>
> Help! By the way, could you send me an invitation letter of gmail, I want
> to have a gmail account too... :-)
>
> Best Regards!
> Jian Huang
>
>



From osborne1 at optonline.net  Thu Feb  2 17:06:25 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 02 Feb 2006 17:06:25 -0500
Subject: [Bioperl-l] Sorry, failure in post on the net,
	so still via email
In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
Message-ID: 

Chandan,

I'd be interested in what you find. This is not a new problem, this same
code snippet has been mentioned many times, but for many others, like me,
the code always works.

Brian O.


On 2/2/06 2:26 AM, "CHANDAN SINGH"  wrote:

> Hi
> It seems that its not a proxy problem. I tried today and faced the same
> problem. It has been months since my last try and therefore something might
> have changed.
> Try reading more on this problem.
> I myself will try to do it.
> Regards
> Chandan
> 
> On 2/2/06, Huang Jian  wrote:
>> 
>> I tried  some "Quick getting started scripts" in bptutorial.
>> 
>> use Bio::Perl;
>>   $seq = get_sequence('swiss',"ROA1_HUMAN");
>>   # uses the default database - nr in this case
>>   $blast_result = blast_sequence($seq);
>>   write_blast(">roa1.blast",$blast_result);
>> 
>> It returns "Submitted Blast for [ROA1_HUMAN] "
>> It does not return me any error after I run the script.  However, it does
>> not
>> return me any result either.  The file "roa1.blast" is created but is
>> always
>> empty.
>> 
>> I found the return is like the code below in function "blast_sequence"
>>  if( $verbose ) {
>>  print STDERR "Submitted Blast for [".$seq->id."] ";
>>     }
>>     sleep 5;
>> ....
>> I have tested "( env_proxy => 1 )" ...The problem remains the same...
>> 
>> Help! By the way, could you send me an invitation letter of gmail, I want
>> to have a gmail account too... :-)
>> 
>> Best Regards!
>> Jian Huang
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From nagesh.chakka at anu.edu.au  Thu Feb  2 20:23:50 2006
From: nagesh.chakka at anu.edu.au (Nagesh Chakka)
Date: Fri, 03 Feb 2006 12:23:50 +1100
Subject: [Bioperl-l] RemoteBlast.pm version 1.28
In-Reply-To: <003901c6285e$d1b36670$93656785@zhur>
References: 
	<43E28C39.2060308@anu.edu.au> <003901c6285e$d1b36670$93656785@zhur>
Message-ID: <43E2B0A6.7000307@anu.edu.au>

Hi Huang,
Thanks for the message. The older version of RemoteBlast.pm works on the 
logic of checking the temporary file size to determine whether the Blast 
results are ready. This condition is not getting satisfied may be due to 
some changes brought about by NCBI. I had this problem recently and 
figured out that the solution was to use the latest version which has 
this problem fixed (does not use file size logic any more) which is not 
yet included in the BioPerl package.
Cheers
Nagesh

Huang Jian wrote:

> Dear Nagesh,
>
> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send 
> me. Now it works perfectly!!!
>
> Thank you!!
>
> Huang
>
> ----- Original Message ----- From: "Nagesh Chakka" 
> 
> To: "Huang Jian" ; "bioperl-l" 
> 
> Sent: Friday, February 03, 2006 7:48 AM
> Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still 
> via email
>
>
>> Hi Huang,
>> I see that you are submitting a sequence for a remote blast search. Can
>> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
>> not I have attached it with this email, try to replace it with the old
>> one which has a bug.
>> Let me know if it works.
>> Nagesh
>
>
>
   


From cjfields at uiuc.edu  Fri Feb  3 10:45:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 09:45:23 -0600
Subject: [Bioperl-l] RemoteBlast.pm version 1.28
In-Reply-To: <43E2B0A6.7000307@anu.edu.au>
Message-ID: <001501c628d8$d91cd430$15327e82@pyrimidine>

Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS.  It will
work for saving text output.  However, it will not parse anything using
next_result (it will likely hang) and will not save XML format.  See these
bugs:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934
http://bugzilla.bioperl.org/show_bug.cgi?id=1935

for explanations and possible fixes (changes to RemoteBlast and
Bio::SearchIO::blast).  Note that these haven't been checked in yet so are
still not included in bioperl-live; they may be further modified before
committing to CVS.  If you're not worried about XML, you could just try the
first fix, which is a change to SearchIO::blast.

Nagesh, I remember you posting to the list a month ago using a script which
had problems; the script you used saves the output but doesn't actually
parse it (i.e. you don't use next_result() to go through the data).  Is the
version of BLAST in your text output 2.2.12 or 2.2.13?  Have you tried
parsing the output using "-readmethod => SearchIO" or "-readmethod => blast"
using your version of RemoteBlast and method next_result()? Like below (from
perldoc):  

        while ( my @rids = $factory->each_rid ) {
          foreach my $rid ( @rids ) {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) ) {
              if( $rc < 0 ) {
                $factory->remove_rid($rid);
              }
              print STDERR "." if ( $v > 0 );
              sleep 5;
            } else { 				 		# parsing
starts here
              my $result = $rc->next_result(); 		# it should hang
here
              #save the output
              my $filename = $result->query_name()."\.out";
              $factory->save_output($filename);
              $factory->remove_rid($rid);
              print "\nQuery Name: ", $result->query_name(), "\n";
              while ( my $hit = $result->next_hit ) {
                next unless ( $v > 0);
                print "\thit name is ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                  print "\t\tscore is ", $hsp->score, "\n";
                }
              }
            }
          }
        }
      }


My script hanged if I used next_result() in any way prior to the fixes.  I
want to see how many others are having the same issues with parsing using
the CVS version of bioperl-live.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> Sent: Thursday, February 02, 2006 7:24 PM
> To: Huang Jian; bioperl-l
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Hi Huang,
> Thanks for the message. The older version of RemoteBlast.pm works on the
> logic of checking the temporary file size to determine whether the Blast
> results are ready. This condition is not getting satisfied may be due to
> some changes brought about by NCBI. I had this problem recently and
> figured out that the solution was to use the latest version which has
> this problem fixed (does not use file size logic any more) which is not
> yet included in the BioPerl package.
> Cheers
> Nagesh
> 
> Huang Jian wrote:
> 
> > Dear Nagesh,
> >
> > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > me. Now it works perfectly!!!
> >
> > Thank you!!
> >
> > Huang
> >
> > ----- Original Message ----- From: "Nagesh Chakka"
> > 
> > To: "Huang Jian" ; "bioperl-l"
> > 
> > Sent: Friday, February 03, 2006 7:48 AM
> > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > via email
> >
> >
> >> Hi Huang,
> >> I see that you are submitting a sequence for a remote blast search. Can
> >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> >> not I have attached it with this email, try to replace it with the old
> >> one which has a bug.
> >> Let me know if it works.
> >> Nagesh
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From osborne1 at optonline.net  Fri Feb  3 13:05:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 03 Feb 2006 13:05:44 -0500
Subject: [Bioperl-l] Documentation in the Bioperl package
Message-ID: 

bioperl-l,

The recent work on the Bioperl Wiki moved much of the Bioperl documentation
online. Since we cannot maintain 2 locations for all of this we?ll be
removing a number of files from the package, specifically:

biodatabases.pod   
biodesign.pod    
bioperl.pod   
bioscripts.pod
doc/howto/*
doc/faq/*
FAQ

Rest assured that all of these files have been gone over in detail to make
sure that no important information was lost during the migration. All of
this will be replaced by a single file, such as ?README.docs?, that explains
where all the documentation is. It?s not entirely clear what will happen to
bptutorial.pl. Moving its content to different online locations is possible
but in this case we loose its functionality as a script.

Are there any comments or questions or concerns?

Brian O.




From saldroubi at yahoo.com  Fri Feb  3 13:38:26 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Fri, 3 Feb 2006 10:38:26 -0800 (PST)
Subject: [Bioperl-l] Gibbs sampling algorithm?
Message-ID: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com>

Hi everyone,

I am wondering if anyone has implemented the Gibbs sampling algorithm in BioPerl or otherwise for finding motifs.  I saw Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs program which is not free open source, I think.   I prefer not to write my one Gibbs sampling algorithm if it is already out there.  Any comments are appreciated.

Thank you

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From cjfields at uiuc.edu  Fri Feb  3 14:34:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 13:34:27 -0600
Subject: [Bioperl-l] Gibbs sampling algorithm?
In-Reply-To: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com>
Message-ID: <001901c628f8$d89917b0$15327e82@pyrimidine>

Do you mean this Gibbs program?

ftp://ncbi.nlm.nih.gov/pub/neuwald/ 

You can also request a license from the Gibbs Motif Sampler homepage, which
is more up to date:

http://bayesweb.wadsworth.org/gibbs/gibbs.html.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sam Al-Droubi
> Sent: Friday, February 03, 2006 12:38 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Gibbs sampling algorithm?
> 
> Hi everyone,
> 
> I am wondering if anyone has implemented the Gibbs sampling algorithm in
> BioPerl or otherwise for finding motifs.  I saw
> Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs
> program which is not free open source, I think.   I prefer not to write my
> one Gibbs sampling algorithm if it is already out there.  Any comments are
> appreciated.
> 
> Thank you
> 
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Fri Feb  3 14:44:50 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri, 03 Feb 2006 14:44:50 -0500
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <001501c628d8$d91cd430$15327e82@pyrimidine>
Message-ID: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu>

Hi, Everybody,  
I see this post and am wondering if this is the reason for the malfunctionning of my webserver. We set up a webserver named MAK, for MITE sequence analysis. It was working very well until around November 2005, when it stopped returning any result (the site is fine and seems to be doing sth after submission).  In the CGI script, I used remoteblast (that work was done in 2003) to do searches. I currently do not have access to the server because I moved. Quite several people sent emails to us about its malfunctioning. Is there any suggestion on fixing the problem?  Should I simplily ask the remoteblast.pm be replaced with the new version?  
Thanks a lot,  
Guojun

Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun
      _____  

  From: Chris Fields [mailto:cjfields at uiuc.edu]
To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-l at bioperl.org]
Sent: Fri, 03 Feb 2006 10:45:23 -0500
Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28

Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It will
work for saving text output. However, it will not parse anything using
next_result (it will likely hang) and will not save XML format. See these
bugs:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934
http://bugzilla.bioperl.org/show_bug.cgi?id=1935

for explanations and possible fixes (changes to RemoteBlast and
Bio::SearchIO::blast). Note that these haven't been checked in yet so are
still not included in bioperl-live; they may be further modified before
committing to CVS. If you're not worried about XML, you could just try the
first fix, which is a change to SearchIO::blast.

Nagesh, I remember you posting to the list a month ago using a script which
had problems; the script you used saves the output but doesn't actually
parse it (i.e. you don't use next_result() to go through the data). Is the
version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
parsing the output using "-readmethod => SearchIO" or "-readmethod => blast"
using your version of RemoteBlast and method next_result()? Like below (from
perldoc): 

while ( my @rids = $factory->each_rid ) {
foreach my $rid ( @rids ) {
my $rc = $factory->retrieve_blast($rid);
if( !ref($rc) ) {
if( $rc < 0 ) {
$factory->remove_rid($rid);
}
print STDERR "." if ( $v > 0 );
sleep 5;
} else { # parsing
starts here
my $result = $rc->next_result(); # it should hang
here
#save the output
my $filename = $result->query_name()."\.out";
$factory->save_output($filename);
$factory->remove_rid($rid);
print "\nQuery Name: ", $result->query_name(), "\n";
while ( my $hit = $result->next_hit ) {
next unless ( $v > 0);
print "\thit name is ", $hit->name, "\n";
while( my $hsp = $hit->next_hsp ) {
print "\t\tscore is ", $hsp->score, "\n";
}
}
}
}
}
}


My script hanged if I used next_result() in any way prior to the fixes. I
want to see how many others are having the same issues with parsing using
the CVS version of bioperl-live.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> Sent: Thursday, February 02, 2006 7:24 PM
> To: Huang Jian; bioperl-l
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Hi Huang,
> Thanks for the message. The older version of RemoteBlast.pm works on the
> logic of checking the temporary file size to determine whether the Blast
> results are ready. This condition is not getting satisfied may be due to
> some changes brought about by NCBI. I had this problem recently and
> figured out that the solution was to use the latest version which has
> this problem fixed (does not use file size logic any more) which is not
> yet included in the BioPerl package.
> Cheers
> Nagesh
> 
> Huang Jian wrote:
> 
> > Dear Nagesh,
> >
> > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > me. Now it works perfectly!!!
> >
> > Thank you!!
> >
> > Huang
> >
> > ----- Original Message ----- From: "Nagesh Chakka"
> > 
> > To: "Huang Jian" ; "bioperl-l"
> > 
> > Sent: Friday, February 03, 2006 7:48 AM
> > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > via email
> >
> >
> >> Hi Huang,
> >> I see that you are submitting a sequence for a remote blast search. Can
> >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> >> not I have attached it with this email, try to replace it with the old
> >> one which has a bug.
> >> Let me know if it works.
> >> Nagesh
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
      
   
 


From gbazykin at Princeton.EDU  Fri Feb  3 15:38:04 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Fri, 3 Feb 2006 15:38:04 -0500
Subject: [Bioperl-l] proposed additions to Tree and cladogram
In-Reply-To: <148174979677.20051026172707@princeton.edu>
References: <148174979677.20051026172707@princeton.edu>
Message-ID: <8010525745.20060203153804@princeton.edu>

Hi all,

a while ago, I mailed to bioperl-l some proposed additions to
phylogeny-related modules (see below). I am doing a project on hiv
phylogeny now, and rely on these additions heavily. They expand on
what was already present in the corresponding modules. I expected them
to be also of general usage (at least the first one).

However, I never got any answer, so I assumed that these additions
were considered superfluous by most.

I am now working on an addition to Tree::Draw::Cladogram module. For
my project, I need to color individual tree edges (including internal)
into colors from red to blue (according to the nosynonymous/synonymous
ratios of these branches). This should be technically easy (I guess I
will add -Rcolor, -Gcolor and -Bcolor tags to nodes and use them in
Cladogram to color preceding edges), but I have two questions:

    - will this add-on be of general interest - should I try to do it
    "the right way", updating the pods etc.;
    
    - in general, are there any guidelines about how specific an issue
    a method should address to be included in bioperl distribution?

Thanks,
Yegor Bazykin



This is a forwarded message
From: Georgii Bazykin 
To: bioperl-l at bioperl.org
Date: Wednesday, October 26, 2005, 4:27:07 PM
Subject: suggestions for additions to Tree

===8<==============Original message text===============
Hi,

here are some tree-related methods I needed and added to my bioperl.
Hope someone else finds any of them useful as well.

Yegor Bazykin



=============================================
To NodeI:


# modified from total_branch_length in Tree:Tree module
# gets sum of branches in the subtree - descendents of given node

=head2 children_branch_length

 Title   : children_branch_length
 Usage   : my $size = $node->children_branch_length
 Function: Returns the sum of the length of all branches of the subtree which starts at given node
 Returns : integer
 Args    : none

=cut

sub children_branch_length {
   my ($self) = @_;
   
   return 0 if($self -> is_Leaf) ;

   my $sum = 0;

   for ($self -> get_all_Descendents) {
       $sum += $_->branch_length || 0;
   }

   return $sum;
}


-----------------------------------

=head2 height_nodes

 Title   : height_nodes
 Usage   : my $len = $node->height_nodes
 Function: Returns the height of the tree starting at this
           node.  Height is the maximum branchlength to get to the tip.
 Returns : The longest length to a leaf, in nodes
 Args    : none

=cut

sub height_nodes{
   my ($self) = @_;
   
   return 0 if( $self->is_Leaf );

   my $max = 0;
   foreach my $subnode ( $self->each_Descendent ) { 
       my $s = $subnode->height_nodes + 1;
       if( $s > $max ) { $max = $s; }
   }
   return $max;
}



----------------------------------

=head2 get_all_Descendent_Leaves

 Title   : get_all_Descendent_Leaves($sortby)
 Usage   : my @nodes = $node->get_all_Descendent_Leaves;
 Function: Recursively fetch all the nodes and their descendents, only selecting leaves
           *NOTE* This is different from each_Descendent
 Returns : Array or Bio::Tree::NodeI objects
 Args    : $sortby [optional] "height", "creation" or coderef to be used
           to sort the order of children nodes.

=cut

sub get_all_Descendent_Leaves{
   my ($self, $sortby) = @_;
   $sortby ||= 'height';   
   my @nodes;
   foreach my $node ( $self->each_Descendent($sortby) ) {
       if ($node->is_Leaf) {
           push @nodes, $node;
       }
       else {
           push @nodes, ($node->get_all_Descendents($sortby));
       }
   }
   return @nodes;
} 

=====================================================
To Tree:

=head2 total_internal_branch_length

 Title   : total_internal_branch_length
 Usage   : my $size = $tree->total_internal_branch_length
 Function: Returns the sum of the length of all branches, excluding branches leading to leaves
 Returns : integer
 Args    : none

=cut

sub total_internal_branch_length {
   my ($self) = @_;
   my $sum = 0;
   if( defined $self->get_root_node ) {
       for ( $self->get_root_node->get_Descendents() ) {
           unless ($_->is_Leaf) {       # YB: THIS IS ALL I ADDED
               $sum += $_->branch_length || 0;
           }
       }
   }
   return $sum;
} 


=================================================

To TreeFunctionsI:

=head2 distance_nodes

 Title   : distance_nodes
 Usage   : distance_nodes(-nodes => \@nodes )
 Function: returns the distance between two given nodes in numbers of nodes
 Returns : numerical distance
 Args    : -nodes => arrayref of nodes to test

=cut


# YB: distance_nodes is very similar to distance method in TreeFunctionsI except that 
# it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and 
# daughter, 2 between two sisters, etc.)


sub distance_nodes {
    my ($self, at args) = @_;
    my ($nodes) = $self->_rearrange([qw(NODES)], at args);
    if( ! defined $nodes ) {
        $self->warn("Must supply -nodes parameter to distance_nodes() method");
        return undef;
    }
    my ($node1,$node2) = $self->_check_two_nodes($nodes);
    # algorithm:

    # Find lca: Start with first node, find and save every node from it
    # to root, saving cumulative distance. Then start with second node;
    # for it and each of its ancestor nodes, check to see if it's in
    # the first node's ancestor list - if so it is the lca. Return sum
    # of (cumul. distance from node1 to lca) and (cumul. distance from
    # node2 to lca)

    # find and save every ancestor of node1 (including itself)

    my %node1_ancestors;        # keys are internal ids, values are objects
    my %node1_cumul_dist;       # keys are internal ids, values 
    # are cumulative distance from node1 to given node
    my $place = $node1;         # start at node1
    my $cumul_dist = 0;

    while ( $place ){
        $node1_ancestors{$place->internal_id} = $place;
        $node1_cumul_dist{$place->internal_id} = $cumul_dist;
        $cumul_dist++;                                                # YB
#YB     if ($place->branch_length) {
#YB         $cumul_dist += $place->branch_length; # include current branch
#YB                                               # length in next iteration
#YB     }
        $place = $place->ancestor;
    }

    # now climb up node2, for each node checking whether 
    # it's in node1_ancestors
    $place = $node2;  # start at node2
    $cumul_dist = 0;
    while ( $place ){
        foreach my $key ( keys %node1_ancestors ){ # ugh
            if ( $place->internal_id == $key){ # we're at lca
                return $node1_cumul_dist{$key} + $cumul_dist;
            }
        }
        # include current branch length in next iteration
#YB     $cumul_dist += $place->branch_length || 0; 
        $cumul_dist++;                                                 # YB
        $place = $place->ancestor;
    }
    $self->warn("Could not find distance!"); # should never execute, 
    # if so, there's a problem
    return undef;
}
===8<===========End of original message text===========





From cjfields at uiuc.edu  Fri Feb  3 16:07:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 15:07:29 -0600
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu>
Message-ID: <001a01c62905$d7ef0920$15327e82@pyrimidine>

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below).  I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed.  The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml.  There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless.  If you look back the
last month or so there has been a bit of discussion here about it.  Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission).  In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem?  Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
>       _____
> 
>   From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hlapp at gmx.net  Fri Feb  3 18:11:03 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 3 Feb 2006 15:11:03 -0800
Subject: [Bioperl-l] Documentation in the Bioperl package
In-Reply-To: 
References: 
Message-ID: 

Just to be sure, the wiki will be able to handle versions (releases)?
(documentation and APIs may change between releases and hence a more
recent doc page may not apply to an earlier release)

  -hilmar

On 2/3/06, Brian Osborne  wrote:
> bioperl-l,
>
> The recent work on the Bioperl Wiki moved much of the Bioperl documentation
> online. Since we cannot maintain 2 locations for all of this we?ll be
> removing a number of files from the package, specifically:
>
> biodatabases.pod
> biodesign.pod
> bioperl.pod
> bioscripts.pod
> doc/howto/*
> doc/faq/*
> FAQ
>
> Rest assured that all of these files have been gone over in detail to make
> sure that no important information was lost during the migration. All of
> this will be replaced by a single file, such as ?README.docs?, that explains
> where all the documentation is. It?s not entirely clear what will happen to
> bptutorial.pl. Moving its content to different online locations is possible
> but in this case we loose its functionality as a script.
>
> Are there any comments or questions or concerns?
>
> Brian O.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From hubert.prielinger at gmx.at  Fri Feb  3 17:47:37 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 03 Feb 2006 16:47:37 -0600
Subject: [Bioperl-l] standalone blast composition based statistics parameter
Message-ID: <43E3DD89.7080903@gmx.at>

Hi,
Does anybody know whether it is possible to perform a with the 
standalone blast a database search where the composition based 
statistics parameter is on
and what's the abbreviation for the parameter

thanks
Hubert


From osborne1 at optonline.net  Fri Feb  3 22:32:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 03 Feb 2006 22:32:18 -0500
Subject: [Bioperl-l] Documentation in the Bioperl package
In-Reply-To: 
Message-ID: 

Hilmar,

MediaWiki supports such things as rollback based on date but it is not CVS
where an entire set of pages are tagged by version. It is also scriptable so
it may be possible to emulate this type of tagging by script, but I'm not
entirely sure (see WWW::Mediawiki::Client, Jason pointed this out to me).

So the simple answer is probably "no". But let's be honest: synchrony
between code and documentation wasn't achieved using the previous approach,
CVS, either. 

What Jason, Torsten, and I appreciated when adding content to this new site
was that it was relatively easy, our hope is that this approach will get
more people involved. The assumption is that more involvement will lead to
better documentation - Jason made this assumption when electing to move the
site to MediaWiki and I have to say that I completely agree with this
assumption.

Jason, any thoughts on this question? An interesting one...

Brian O.



On 2/3/06 6:11 PM, "Hilmar Lapp"  wrote:

> Just to be sure, the wiki will be able to handle versions (releases)?
> (documentation and APIs may change between releases and hence a more
> recent doc page may not apply to an earlier release)
> 
>   -hilmar
> 
> On 2/3/06, Brian Osborne  wrote:
>> bioperl-l,
>> 
>> The recent work on the Bioperl Wiki moved much of the Bioperl documentation
>> online. Since we cannot maintain 2 locations for all of this we?ll be
>> removing a number of files from the package, specifically:
>> 
>> biodatabases.pod
>> biodesign.pod
>> bioperl.pod
>> bioscripts.pod
>> doc/howto/*
>> doc/faq/*
>> FAQ
>> 
>> Rest assured that all of these files have been gone over in detail to make
>> sure that no important information was lost during the migration. All of
>> this will be replaced by a single file, such as ?README.docs?, that explains
>> where all the documentation is. It?s not entirely clear what will happen to
>> bptutorial.pl. Moving its content to different online locations is possible
>> but in this case we loose its functionality as a script.
>> 
>> Are there any comments or questions or concerns?
>> 
>> Brian O.
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> 
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From shameer at ncbs.res.in  Sat Feb  4 05:15:33 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Sat, 4 Feb 2006 15:45:33 +0530 (IST)
Subject: [Bioperl-l] Calpha to Co-ordinates Program
In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
Message-ID: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176>

Dear All,

Any one is aware of a perl script / Bio::PERL module that can be used to
construct full atomic coordinates of a protein from a given C(alpha) trace
and optimizes side chain geometry.

I tried the original program Maxsprout from Holms Group, But it is not
giving me proper results (am getting errors like segmentation fault -
backbonchain failed etc.)

Since I need to use as a part of a webs server - I would appreciate if any
one could let me know about a perl script for the same.

Thanks and cheers in advance,
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From torsten.seemann at infotech.monash.edu.au  Sat Feb  4 22:34:35 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 05 Feb 2006 14:34:35 +1100
Subject: [Bioperl-l] standalone blast composition based statistics
	parameter
In-Reply-To: <43E3DD89.7080903@gmx.at>
References: <43E3DD89.7080903@gmx.at>
Message-ID: <43E5724B.5070007@infotech.monash.edu.au>

Hubert,

> Does anybody know whether it is possible to perform a with the 
> standalone blast a database search where the composition based 
> statistics parameter is on
> and what's the abbreviation for the parameter

The StandAloneBlast only runs the "blastall" binary on your system. It 
accepts all the command line options (like "-d" etc.) that "blastall" 
does but just passes them as-is; it doesn't do anything special.

On a Unix system, type "blastall -" to list all the options that your 
BLAST binary supports.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From fernan at iib.unsam.edu.ar  Sat Feb  4 23:34:27 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Sun, 5 Feb 2006 01:34:27 -0300
Subject: [Bioperl-l] standalone blast composition based statistics
	parameter
In-Reply-To: <43E3DD89.7080903@gmx.at>
References: <43E3DD89.7080903@gmx.at>
Message-ID: <20060205043427.GB39264@iib.unsam.edu.ar>

+----[ Hubert Prielinger  (03.Feb.2006 21:06):
|
| Hi,
| Does anybody know whether it is possible to perform a with the 
| standalone blast a database search where the composition based 
| statistics parameter is on
| and what's the abbreviation for the parameter
| 
| thanks
| Hubert
|
+----]

only for tblastn.

As Torsten said, 'blastall' with no arguments would have
revealed it: 

[ ... ]
  -C  Use composition-based statistics for tblastn:
      D or d: default (equivalent to F)
      0 or F or f: no composition-based statistics
      1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001
      2: Composition-based score adjustment as in Bioinformatics 21:902-911,
          2005, conditioned on sequence properties
      3: Composition-based score adjustment as in Bioinformatics 21:902-911,
          2005, unconditionally
      For programs other than tblastn, must either be absent or be D, F or 0.
      [String]
    default = D

Fernan

PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206)


From hubert.prielinger at gmx.at  Sun Feb  5 21:56:07 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 05 Feb 2006 20:56:07 -0600
Subject: [Bioperl-l] standalone blast composition based
	statistics	parameter
In-Reply-To: <20060205043427.GB39264@iib.unsam.edu.ar>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
Message-ID: <43E6BAC7.5050707@gmx.at>

Hi,
thank you very much, If I use the tblastn instead of blastp, I get the 
following error message

[blastall] WARNING: : Unable to open nr.00.nin

I looked up in the folder, but I don't have that file, and if I download 
the database and extract the file, it isn't there either...

thanks

Hubert

Fernan Aguero wrote:

>+----[ Hubert Prielinger  (03.Feb.2006 21:06):
>|
>| Hi,
>| Does anybody know whether it is possible to perform a with the 
>| standalone blast a database search where the composition based 
>| statistics parameter is on
>| and what's the abbreviation for the parameter
>| 
>| thanks
>| Hubert
>|
>+----]
>
>only for tblastn.
>
>As Torsten said, 'blastall' with no arguments would have
>revealed it: 
>
>[ ... ]
>  -C  Use composition-based statistics for tblastn:
>      D or d: default (equivalent to F)
>      0 or F or f: no composition-based statistics
>      1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001
>      2: Composition-based score adjustment as in Bioinformatics 21:902-911,
>          2005, conditioned on sequence properties
>      3: Composition-based score adjustment as in Bioinformatics 21:902-911,
>          2005, unconditionally
>      For programs other than tblastn, must either be absent or be D, F or 0.
>      [String]
>    default = D
>
>Fernan
>
>PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206)
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Sun Feb  5 23:29:11 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 06 Feb 2006 15:29:11 +1100
Subject: [Bioperl-l] standalone blast composition
	based	statistics	parameter
In-Reply-To: <43E6BAC7.5050707@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at>
Message-ID: <43E6D097.7080304@infotech.monash.edu.au>

Hubert

> thank you very much, If I use the tblastn instead of blastp, I get the 
> following error message
> [blastall] WARNING: : Unable to open nr.00.nin
> I looked up in the folder, but I don't have that file, and if I download 
> the database and extract the file, it isn't there either...

"tblastn" requires a NUCLEOTIDE database to search. It appears that you 
have specified a PROTEIN database with "-d nr" ("nr" is protein). You 
probably want to install the "nt" blast database and use that instead.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From hubert.prielinger at gmx.at  Sun Feb  5 23:12:27 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 05 Feb 2006 22:12:27 -0600
Subject: [Bioperl-l] standalone blast
	composition	based	statistics	parameter
In-Reply-To: <43E6D097.7080304@infotech.monash.edu.au>
References: <43E3DD89.7080903@gmx.at>
	<20060205043427.GB39264@iib.unsam.edu.ar>	<43E6BAC7.5050707@gmx.at>
	<43E6D097.7080304@infotech.monash.edu.au>
Message-ID: <43E6CCAB.2060107@gmx.at>

dear torsten,
thanks for your quick reply, I have looked up at the ftp server and 
there are nt.00 to nt.04. Do I have to download all of them, are there 
differences?

thanks
Hubert


Torsten Seemann wrote:

>Hubert
>
>  
>
>>thank you very much, If I use the tblastn instead of blastp, I get the 
>>following error message
>>[blastall] WARNING: : Unable to open nr.00.nin
>>I looked up in the folder, but I don't have that file, and if I download 
>>the database and extract the file, it isn't there either...
>>    
>>
>
>"tblastn" requires a NUCLEOTIDE database to search. It appears that you 
>have specified a PROTEIN database with "-d nr" ("nr" is protein). You 
>probably want to install the "nt" blast database and use that instead.
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Mon Feb  6 00:22:09 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 06 Feb 2006 16:22:09 +1100
Subject: [Bioperl-l] standalone blast
	composition	based	statistics	parameter
In-Reply-To: <43E6CCAB.2060107@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au>
	<43E6CCAB.2060107@gmx.at>
Message-ID: <43E6DD01.2010600@infotech.monash.edu.au>

Hubert

> thanks for your quick reply, I have looked up at the ftp server and 
> there are nt.00 to nt.04. Do I have to download all of them, are there 
> differences?

You have to download them all. The "nt" database (actually the index 
files) is very big, and it is split up into gigabyte (?) parts. Although 
they are called "nt.00" "nt.01" etc, you still pass "-d nt" to 
"blastall", because together these parts are one "nt" database. The 
"blastall" program will automatically use the separate parts; you do not 
have to join them.

You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are 
using the correct BLAST search for your problem.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From shameer at ncbs.res.in  Mon Feb  6 03:27:50 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 6 Feb 2006 13:57:50 +0530 (IST)
Subject: [Bioperl-l] Need a  slogan for OBF
In-Reply-To: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
	<47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
Message-ID: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38>

Dear All,

As we are moving to the all new look wiki-style-web - why dont we think
about a unique logo +  slogan that can express our spirit and excitement
???

For Example we can have a logo with O|B|F its full form and the slogan -
any body is interested - i would be happy to design logos once we have
done with the logo.

I have a couple of suggestions -I hope all OBF members can sent much more
powerful slogans than mine

'Let's Code for Life'
'Let's Decode Life'
'Let's Recode Life'
'Code your Life '

Happy O|B|!!!
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From olsonbr2 at msu.edu  Fri Feb  3 15:54:22 2006
From: olsonbr2 at msu.edu (Bradley J. S. C. Olson)
Date: Fri, 3 Feb 2006 15:54:22 -0500
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the
	method?
Message-ID: <005e01c62904$02b2ad30$db4c0a23@dihedral>

I have been working with the RemoteBlast.pm module and have found that it is
a bit clunky to use loops to keep checking to see if you RID has finished.

 

For example, every time you write a script, you need to add a code block
(see example in the documentation) in order to keep checking if @rid is
finished.

 

Would it be better to maybe write this in as a method in the RemoteBlast
module?  It seems like it would be better for remoteblast to have a method
we could call say retrieve_when_done that would return the blast report when
the value of retrieve_blast is no longer 0.

 

The only issue may be report parsing, but I wonder if it might be better to
separate out submittal/retrieval of BLAST requests from the parsing step and
make these more discrete processes?  Since NCBI seems to be not supporting
text results as a standard, maybe the module should work exclusively with
XML and we could change report handling away from the headaches of text
processing and just allow Bio::SeqIO or blastxml handle the task of making a
blast reports into different forms (such as HTML, text etc).

 

This would definitely simplifying coding using the RemoteBlast.pm module as
then you could treat the report retrieval process as an object and just wait
for the object to return its value, instead of coding in a bunch of test
loops to see if it is done.  This may also help keep bugs out of the module
and make the module longer lasting and not require module users to rewrite
their code every time NCBI makes changes.

 

Any thoughts or ideas?

 

Is anyone working on this?

 

Thanks

 

Brad Olson

 

 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
 


From cjfields at uiuc.edu  Mon Feb  6 12:27:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 6 Feb 2006 11:27:56 -0600
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
	themethod?
In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral>
Message-ID: <002c01c62b42$ab7671a0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson
> Sent: Friday, February 03, 2006 2:54 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
> themethod?
> 
> I have been working with the RemoteBlast.pm module and have found that it
> is
> a bit clunky to use loops to keep checking to see if you RID has finished.
> 
> 
> 
> For example, every time you write a script, you need to add a code block
> (see example in the documentation) in order to keep checking if @rid is
> finished.
> 
> Would it be better to maybe write this in as a method in the RemoteBlast
> module?  It seems like it would be better for remoteblast to have a method
> we could call say retrieve_when_done that would return the blast report
> when
> the value of retrieve_blast is no longer 0.

Sounds reasonable, though I'm not sure how easy it would be to implement.
Why not drop by Bugzilla (http://bugzilla.bioperl.org/) and submit this as
an enhancement?

> The only issue may be report parsing, but I wonder if it might be better
> to
> separate out submittal/retrieval of BLAST requests from the parsing step
> and
> make these more discrete processes?  Since NCBI seems to be not supporting
> text results as a standard, maybe the module should work exclusively with
> XML and we could change report handling away from the headaches of text
> processing and just allow Bio::SeqIO or blastxml handle the task of making
> a
> blast reports into different forms (such as HTML, text etc).

They are separated.  RemoteBlast executes BLAST remotely (via HTTP).
Results are parsed via various Bio::SearchIO modules depending on what you
set '-readmethod' to.  This is from perldoc:

>From Bio::Tools::Run::RemoteBlast
________________________________________________________

DESCRIPTION
    Class for remote execution of the NCBI Blast via HTTP.

    For a description of the many CGI parameters see:
    http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

    Various additional options and input formats are available.

________________________________________________________

>From Bio::SearchIO____________
____________________________________________
DESCRIPTION
    This is a driver for instantiating a parser for report files from
    sequence database searches. This object serves as a wrapper for the
    format parsers in Bio::SearchIO::* - you should not need to ever use
    those format parsers directly. (For people used to the SeqIO system it,
    we are deliberately using the same pattern).

    Once you get a SearchIO object, calling next_result() gives you back a
    Bio::Search::Result::ResultI compliant object, which is an object that
    represents one Blast/Fasta/HMMER whatever report.

    A list of module names and formats is below:

      blast      BLAST (WUBLAST, NCBIBLAST,bl2seq)
      fasta      FASTA -m9 and -m0
      blasttable BLAST -m9 or -m8 output (NCBI not WUBLAST tabular)
      megablast  MEGABLAST
      psl        UCSC PSL format
      waba       WABA output
      axt        AXT format
      sim4       Sim4
      hmmer      HMMER hmmpfam and hmmsearch
      exonerate  Exonerate CIGAR and VULGAR format
      blastxml   NCBI BLAST XML
      wise       Genewise -genesf format

    See the SearchIO HOWTO linked from http://bioperl.org/HOWTOs/

________________________________________________________

This is also in the wiki online now:

http://www.bioperl.org/wiki/Module:Bio::SearchIO 
http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

I think the current line of thought is to make XML the default, but I also
know you would irritate a LOT of people out there by cutting off text output
parsing completely.  Roger Hall or Jason pointed out that doing so will
break many scripts out there.  

Furthermore, the problems with text output parsing are usually minimal.  For
instance, the last one was a small change which broke a regex, causing an
infinite loop; the actual bug was in Bio::SearchIO::blast and not in
RemoteBlast.  A simple addition to the regex fixed it.  The only change to
RemoteBlast was to implement the option of saving XML formatted BLAST
output.

I do like the idea of using XML output to build custom (bioperl-specific)
BLAST reports, but that also requires more work, likely a lot more work.
Again, maybe add that as an enhancement in Bugzilla or, better yet, submit
some sample code maybe as an example.  

> This would definitely simplifying coding using the RemoteBlast.pm module
> as
> then you could treat the report retrieval process as an object and just
> wait
> for the object to return its value, instead of coding in a bunch of test
> loops to see if it is done.  This may also help keep bugs out of the
> module
> and make the module longer lasting and not require module users to rewrite
> their code every time NCBI makes changes.

I think the most stable way of submitting jobs is by using the netblast
client (blastcl3) and parsing the results from that.  No CGI, no HTML, just
saving to a temp file and parsing through SearchIO.

RemoteBlast was designed, I believe, with the idea of letting researchers
with some basic knowledge of perl use an interface familiar to them (i.e.
the BLAST interface at NCBI) and retrieve results on a regular basis.  The
results are parsed via SearchIO::blast/blastxml/blasttable.  The problem is,
though convenient, RemoteBlast is also reliant on the powers that be at NCBI
not changing anything dramatically.  It is possible that NCBI could modify
the HTML code from the BLAST retrieval process, thus breaking RemoteBlast.
Text output could change again, even more dramatically, thus severely
breaking Bio::SearchIO::blast.  Thus, we adapt to those changes by modifying
the broken modules.  It's evolution at its finest.  It's also a fact of life
that code breaks and needs to be fixed every once in a while to stay
current.

Okay, I'm waxing philosophical now so I know I've definitely had too much
coffee.  Must get back to work...

> 
> 
> 
> Any thoughts or ideas?
> 
> 
> 
> Is anyone working on this?
> 
> 
> 
> Thanks
> 
> 
> 
> Brad Olson
> 
> 
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign



From roger at iosea.com  Mon Feb  6 13:14:11 2006
From: roger at iosea.com (Roger Hall)
Date: Mon, 6 Feb 2006 12:14:11 -0600
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
	the	method?
In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral>
Message-ID: <000f01c62b49$25732d30$4301a8c0@LIBERAL>

Brad,

I decided to fix this module about ten days ago, and then was out all of
last week with Strep plus a virus or two - it's one of the advantages of
having young kids.

I see that there have been quite a few messages about this module in just
the last week. I am sitting down now to read through them.

I'll get back to you (and the list) ASAP.

If you have any other questions or suggestions about RemoteBlast, feel free
to bug me with 'em. 

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C.
Olson
Sent: Friday, February 03, 2006 2:54 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the
method?

I have been working with the RemoteBlast.pm module and have found that it is
a bit clunky to use loops to keep checking to see if you RID has finished.

 

For example, every time you write a script, you need to add a code block
(see example in the documentation) in order to keep checking if @rid is
finished.

 

Would it be better to maybe write this in as a method in the RemoteBlast
module?  It seems like it would be better for remoteblast to have a method
we could call say retrieve_when_done that would return the blast report when
the value of retrieve_blast is no longer 0.

 

The only issue may be report parsing, but I wonder if it might be better to
separate out submittal/retrieval of BLAST requests from the parsing step and
make these more discrete processes?  Since NCBI seems to be not supporting
text results as a standard, maybe the module should work exclusively with
XML and we could change report handling away from the headaches of text
processing and just allow Bio::SeqIO or blastxml handle the task of making a
blast reports into different forms (such as HTML, text etc).

 

This would definitely simplifying coding using the RemoteBlast.pm module as
then you could treat the report retrieval process as an object and just wait
for the object to return its value, instead of coding in a bunch of test
loops to see if it is done.  This may also help keep bugs out of the module
and make the module longer lasting and not require module users to rewrite
their code every time NCBI makes changes.

 

Any thoughts or ideas?

 

Is anyone working on this?

 

Thanks

 

Brad Olson

 

 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From barry.m.dancis at gsk.com  Mon Feb  6 12:17:13 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Mon, 6 Feb 2006 12:17:13 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: <003701c625c4$5527d790$2f01a8c0@GOLHARMOBILE1>
Message-ID: 

Hi --

        Are there any classes for manipulating miRNA's with functions such 
as parsing the name, storing and interlinking pri/pre/mat sequences, etc?

Thanks,

Barry


From hubert.prielinger at gmx.at  Mon Feb  6 18:16:01 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 06 Feb 2006 17:16:01 -0600
Subject: [Bioperl-l] no results with standalone tblastn
In-Reply-To: <43E6DD01.2010600@infotech.monash.edu.au>
References: <43E3DD89.7080903@gmx.at>
	<20060205043427.GB39264@iib.unsam.edu.ar>	<43E6BAC7.5050707@gmx.at>
	<43E6D097.7080304@infotech.monash.edu.au>	<43E6CCAB.2060107@gmx.at>
	<43E6DD01.2010600@infotech.monash.edu.au>
Message-ID: <43E7D8B1.5030307@gmx.at>

dear torsten,
I have downloaded all the databases, as you recommended me. And it is 
working, but I don't get any results, if I try it online it works fine.
my result file looks like that:

TBLASTN 2.2.13 [Nov-27-2005]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query=
         (8 letters)

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           3,749,503 sequences; 16,556,997,203 total letters

Searching..................................................done

                                                                
Sequences producing significant alignments:                Score    
E      (bits) Value



the program code for it looks like that:

#!/usr/local/bin/perl -w
BEGIN
{
      $ENV{BLASTDIR}= "/home/Hubert/blast/blast-2.2.13/bin";
    $ENV{BLASTDATADIR}= "/home/Hubert/blast/blast-2.2.13/data"; 
}

use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
use Bio::SeqIO;
use strict;

print "Please insert matrix:\t";
my $matrix_STD = ;
chomp $matrix_STD;

print "Please insert count:\t";
my $count_STD = ;
chomp $count_STD;



# parameters
my $expect_value = 20000;
#my $filter_query_sequence = 'T';
my $one_line_description = 1000;
my $alignments = 1000;
#my $matrix = 'BLOSUM80';
my $gapcost = 10;
my $gapextend = 1;
my $wordsize = 2;
#my $compbasedStat = '1';
#my $count = 1;
# my $strands = 1;

my @params = ('program' => 'tblastn','database' => 'nt');
#my $progress_interval = 100;


my $seqio_obj = Bio::SeqIO->new(
  -file   => "aloneblosum62.txt",
  -format => "raw",
);

# create factory object and set parameters

my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
print "submitted parameters successfully \n";

$factory->e($expect_value);
#$factory->F($filter_query_sequence);
$factory->v($one_line_description);
$factory->b($alignments);
$factory->M($matrix_STD);
$factory->G($gapcost);
$factory->E($gapextend);
$factory->W($wordsize);
#$factory->C($compbasedStat);
#$factory->S($strands);

print "changed parameters successfully \n";
print "\n";


# get query

while ( my $query = $seqio_obj->next_seq) {
      print "entered while loop \n";
      my $blast_report = $factory->blastall($query);
#      print "$blast_report\n";
      $factory->outfile("nucleo80$count_STD.txt");
      $count_STD++;
      print $query->seq;
      print "\n";
     
}



thanks
Hubert



Torsten Seemann wrote:

>Hubert
>
>  
>
>>thanks for your quick reply, I have looked up at the ftp server and 
>>there are nt.00 to nt.04. Do I have to download all of them, are there 
>>differences?
>>    
>>
>
>You have to download them all. The "nt" database (actually the index 
>files) is very big, and it is split up into gigabyte (?) parts. Although 
>they are called "nt.00" "nt.01" etc, you still pass "-d nt" to 
>"blastall", because together these parts are one "nt" database. The 
>"blastall" program will automatically use the separate parts; you do not 
>have to join them.
>
>You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are 
>using the correct BLAST search for your problem.
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Mon Feb  6 21:17:40 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 07 Feb 2006 13:17:40 +1100
Subject: [Bioperl-l] no results with standalone tblastn
In-Reply-To: <43E7D8B1.5030307@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au>
	<43E6CCAB.2060107@gmx.at> <43E6DD01.2010600@infotech.monash.edu.au>
	<43E7D8B1.5030307@gmx.at>
Message-ID: <43E80344.5090207@infotech.monash.edu.au>


> I have downloaded all the databases, as you recommended me. And it is 
> working, but I don't get any results, if I try it online it works fine.
> my result file looks like that:
> 
> TBLASTN 2.2.13 [Nov-27-2005]
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> Query=
>          (8 letters)
> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
>            3,749,503 sequences; 16,556,997,203 total letters
> Searching..................................................done
> Sequences producing significant alignments:                Score    
> E      (bits) Value

Is your query only 8 amino acids long?

This report looks like it did have alignments that were not displayed, 
otherwise it would print "**** No hits ****".

This mailing list is not here to solve your BLAST problems unless it is 
a problem with the Perl module running BLAST.

You first need to try and get your problem working on the command line 
*without* Perl. eg.

/home/Hubert/blast/blast-2.2.13/bin/blastall -p tblastn -d nt -i 
YOUR_FASTA_FILE_WITH_SEQUENCE_IN_IT -o OUTPUT_FILE.txt -e 0.001
...

where "..." is the rest of the options you are setting in your Perl 
script. If it doesn't work that way, it will never work in Perl.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From rahall2 at ualr.edu  Mon Feb  6 21:46:44 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Mon, 6 Feb 2006 20:46:44 -0600
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please
	reply
Message-ID: <002001c62b90$bb9dbe00$4301a8c0@LIBERAL>

To everyone who uses RemoteBlast.pm:

 

Would anyone object to RemoteBlast being rewritten in a way that requires
NCBI's blastcl3 executable?

 

Binary downloads of blastcl3 (column "netblast") are available for numerous
platforms at: http://ncbi.nih.gov/BLAST/download.shtml

 

Does anyone require or desire a "pure perl" implementation? If so, please
explain the advantage you see with such an implementation.

 

Thanks!

 

Roger Hall

Technical Director

MidSouth Bioinformatics Center

University of Arkansas at Little Rock

(501) 569-8074

 



From osborne1 at optonline.net  Tue Feb  7 12:05:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 07 Feb 2006 12:05:56 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: 

Barry,

If the sequence information is in one of the formats that Bioperl
understands (Genbank, Swissprot flat, and so on) then the answer is yes.
This assumes that the details on sequence that you mentioned are found in
some sequence feature section in the file. But it looks to me like there's
no specialized parser for miRNA sequence per se, I'll be corrected if I'm
wrong.

Brian O.


On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com"  wrote:

> Hi --
> 
>         Are there any classes for manipulating miRNA's with functions such
> as parsing the name, storing and interlinking pri/pre/mat sequences, etc?
> 
> Thanks,
> 
> Barry
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From barry.m.dancis at gsk.com  Tue Feb  7 15:26:27 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Tue, 7 Feb 2006 15:26:27 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: 

It's the parser in particular that I need




"Brian Osborne"  
Sent by: bioperl-l-bounces at lists.open-bio.org
07-Feb-2006 12:05
 
To
barry.m.dancis at gsk.com, "bioperl-l" , 
bioperl-l-bounces at lists.open-bio.org
cc

Subject
Re: [Bioperl-l] Handling miRNA's






Barry,

If the sequence information is in one of the formats that Bioperl
understands (Genbank, Swissprot flat, and so on) then the answer is yes.
This assumes that the details on sequence that you mentioned are found in
some sequence feature section in the file. But it looks to me like there's
no specialized parser for miRNA sequence per se, I'll be corrected if I'm
wrong.

Brian O.


On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com"  
wrote:

> Hi --
> 
>         Are there any classes for manipulating miRNA's with functions 
such
> as parsing the name, storing and interlinking pri/pre/mat sequences, 
etc?
> 
> Thanks,
> 
> Barry
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From deep.raman at gmail.com  Tue Feb  7 15:16:48 2006
From: deep.raman at gmail.com (Raman Deep Singh)
Date: Wed, 8 Feb 2006 01:46:48 +0530
Subject: [Bioperl-l] Needed help
Message-ID: 

Hi all
     I have a huge task of retrieving a number of sequences from the
swiss prot databases on some fixed criteria. FOr that i want to index
the swiss prot database on my local disk. I have downloaded the whole
swiss prot database on my local disc  (the january 2006 release).

  I am currently using the bioperl on linux machine . I am using the
code listed below


=======================

    use Bio::Index::Swissprot;

    my $Index_File_Name = shift;
    my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name,
 '-write_flag' => 'WRITE');
    $inx->make_index(@ARGV);
-----------------------------------------
    # Print out several sequences present in the index
    # in gcg format
    use Bio::Index::Swissprot;
    use Bio::SeqIO;

    my $out = Bio::SeqIO->new( '-format' => 'gcg', '-fh' => \*STDOUT );
    my $Index_File_Name = shift;
    my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name);

    foreach my $id (@ARGV) {
        my $seq = $inx->fetch($id); # Returns Bio::Seq object
        $out->write_seq($seq);
    }

    # alternatively

    my $seq1 = $inx->get_Seq_by_id($id);
    my $seq2 = $inx->get_Seq_by_acc($acc);


-- -------------------------------
i am running teh script as

 perl getseqfromid.pl sample.dat

from the shell

and i am getting this error repeatedly

------------- EXCEPTION  -------------
MSG: Can't open 'DB_File' dbm file 'swiss100.dat' : No such file or directory
STACK Bio::Index::Abstract::open_dbm
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:389
STACK Bio::Index::Abstract::new
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
STACK Bio::Index::AbstractSeq::new
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
STACK toplevel i.pl:6


--------------------------
At some place online, i also found some document that some variables
need to be exported. I also did the same but still got teh same errors

kindly  help




Ramandeep Singh



From cjfields at uiuc.edu  Tue Feb  7 17:40:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 16:40:15 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <007701c62c37$7914af60$15327e82@pyrimidine>

Are you talking about sequences or text output from a specific program?  If
you are talking about sequences in a particular format, then listen to
Brian.  If you are talking about output, then we need to know which program
you're using, as a parser may exist or could be built.  

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
>  
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Tue Feb  7 18:06:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 17:06:21 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <000001c62c3b$1c6017b0$15327e82@pyrimidine>

Sorry if this gets posted twice.

Are you talking about sequences or text output from a specific program?  If
you are talking about sequences in a particular format, then Brian's right.
If you are talking about output, then we need to know which program you're
using, as a parser may exist, or prbably could be built from and existing
one.

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
>  
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From paul.boutros at utoronto.ca  Tue Feb  7 20:38:42 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Tue,  7 Feb 2006 20:38:42 -0500
Subject: [Bioperl-l] (no subject)
Message-ID: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>

Hi Roger,

I would definitely prefer a fully Perl-based implementation.  For starters, I have not 
been successful in compiling the Toolkit that contains netblast for some platforms (e.g. 
AIX 5.2 w/gcc 4.0).

I haven't been following the discussion: is there some compelling reason to prefer a 
netblast-based system that's come up recently?  I'm guessing that adding a new non-perl 
dependency would only be done if there was considerable justification for this type of 
change, but I'm not clear from your message what that justification is.

Paul



------------------------------ 

Message: 12 
Date: Mon, 6 Feb 2006 20:46:44 -0600 
From: "Roger Hall"  
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
        please        reply 
To:  
Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
Content-Type: text/plain;        charset="us-ascii" 

To everyone who uses RemoteBlast.pm: 

Would anyone object to RemoteBlast being rewritten in a way that requires 
NCBI's blastcl3 executable? 

Binary downloads of blastcl3 (column "netblast") are available for numerous 
platforms at: http://ncbi.nih.gov/BLAST/download.shtml 

Does anyone require or desire a "pure perl" implementation? If so, please 
explain the advantage you see with such an implementation. 

Thanks! 
 

Roger Hall 

Technical Director 

MidSouth Bioinformatics Center 

University of Arkansas at Little Rock 

(501) 569-8074 

  





From cjfields at uiuc.edu  Tue Feb  7 23:52:36 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 22:52:36 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
Message-ID: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>

I want to submit a module for parsing RNAMotif output  
(Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning  
output and returning Bio::SeqFeature::Generic objects with added tags  
for descriptors/sequences/file info.  I'm in the process of writing  
up tests and going through biodesign to make sure everything's  
kosher, but the module itself is essentially ready-to-go.  What  
should I do next?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From rahall2 at ualr.edu  Wed Feb  8 00:16:44 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Tue, 7 Feb 2006 23:16:44 -0600
Subject: [Bioperl-l] RemoteBlast  [was: (no subject)]
In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
Message-ID: <004401c62c6e$da906a40$4301a8c0@LIBERAL>

Paul,

I think that most core Bioperl folks have long since moved away from
RemoteBlast and are using the functionality in StandAloneBlast to run their
own local servers. More importantly, they are, in general, researchers who
are coming to Bioinformatics from the life sciences side, and are
particularly tired of dealing with the technical issues that RemoteBlast
consistently generates due to changes in the text-formatted BLAST reports. 

They aren't code-for-code-sake geeks like me. ;}

When RemoteBlast was written, XML was barely on the technology radar, and
XML-formatted BLAST reports weren't even available. It seems that everyone
recognizes that the XML reports now generated by NCBI's blast server is the
wave of the future, but I think there is still some concern that not every
flavor of BLAST produces XML yet. Even so, the XML parser is considered to
be very strong, and only helps hasten the end of text-formatted support,
since parsing text-formatted reports is the primary source of pain. 

In discussing the shift from old to new, I think the idea of relying on
NCBI's application (and NCBI's issue system and NCBI's developers) entered
the realm of possibility, so as the guy who just showed up to adopt
RemoteBlast, I am trying to air all options and beg for all requirements. 

Personally, I am okay with the idea of maintaining text-formatted report
parsing, but like I said, I'm pound foolish about code sometimes. Additional
foolishness arises from the fact that the first money I earned in
Bioinformatics was on a contract gig where I relied on RemoteBlast (and the
related text parsers).

For my money, I just needed anyone, anywhere, to say they desired a pure
perl implementation to meet my personal threshold. So far, you're the
second. ;}

I do, however, see the advantage in shifting to XML-formatted reporting and
parsing *only* as soon as every BLAST flavor supports it, if not before.
(Anyone - is this still an issue. Please educate me.)

At the moment, I'm leaning towards adding an option to RemoteBlast. The
default (no option) would use a "pure perl" implementation, and the
enhancement (with explicit option) would merely wrap the NCBI executable.
However, there are other issues (queuing, batches) that I don't fully
understand in context, so I haven't zeroed in on a complete recommendation
yet. Additionally, the end of text-formatted reports, while drawing near, is
not yet agreed, although it is pretty clear that the only way text support
will be continued is if I insist on it and then deliver the support myself.
:}

In any case, I am very interested in a pure perl implementation for exactly
the two reasons stated thus far: it's one less thing for a newbie to worry
about, and it will run on every platform that runs perl. 

Thanks much for the input!

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock
(501) 569-8074




-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Paul Boutros
Sent: Tuesday, February 07, 2006 7:39 PM
To: BioPerl Mailing List
Cc: Roger Hall
Subject: [Bioperl-l] (no subject)

Hi Roger,

I would definitely prefer a fully Perl-based implementation.  For starters,
I have not 
been successful in compiling the Toolkit that contains netblast for some
platforms (e.g. 
AIX 5.2 w/gcc 4.0).

I haven't been following the discussion: is there some compelling reason to
prefer a 
netblast-based system that's come up recently?  I'm guessing that adding a
new non-perl 
dependency would only be done if there was considerable justification for
this type of 
change, but I'm not clear from your message what that justification is.

Paul



------------------------------ 

Message: 12 
Date: Mon, 6 Feb 2006 20:46:44 -0600 
From: "Roger Hall"  
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
        please        reply 
To:  
Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
Content-Type: text/plain;        charset="us-ascii" 

To everyone who uses RemoteBlast.pm: 

Would anyone object to RemoteBlast being rewritten in a way that requires 
NCBI's blastcl3 executable? 

Binary downloads of blastcl3 (column "netblast") are available for numerous 
platforms at: http://ncbi.nih.gov/BLAST/download.shtml 

Does anyone require or desire a "pure perl" implementation? If so, please 
explain the advantage you see with such an implementation. 

Thanks! 
 

Roger Hall 

Technical Director 

MidSouth Bioinformatics Center 

University of Arkansas at Little Rock 

(501) 569-8074 

  



_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From heikki at sanbi.ac.za  Wed Feb  8 01:53:58 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 8 Feb 2006 08:53:58 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
Message-ID: <200602080853.58889.heikki@sanbi.ac.za>

Chris,

Post your files to bugzilla (ticket type enhancement, add files to ticket 
after creation)  and someone with commit ability will add them to CVS once 
the code is in satisfactory condition. 

Thanks,

	-Heikki

On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> I want to submit a module for parsing RNAMotif output
> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> output and returning Bio::SeqFeature::Generic objects with added tags
> for descriptors/sequences/file info.  I'm in the process of writing
> up tests and going through biodesign to make sure everything's
> kosher, but the module itself is essentially ready-to-go.  What
> should I do next?
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hlapp at gmx.net  Wed Feb  8 00:48:40 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 7 Feb 2006 21:48:40 -0800
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
Message-ID: 

I presume you don't have a cvs write account yet - if you do just add
and commit the module and test. Otherwise could you post the POD to
the list please; either somebody with an account will hopefully
volunteer or Jason or I or Heikki or Aaron will assume mentorship and
commit the code with feedback to you. Unless you completely refuse to
heed any and all advice ;) that person will then soon try to absolve
him/herself of having to do this again for you and support you for
receiving a cvs write account of your own.

   -hilmar

On 2/7/06, Chris Fields  wrote:
> I want to submit a module for parsing RNAMotif output
> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> output and returning Bio::SeqFeature::Generic objects with added tags
> for descriptors/sequences/file info.  I'm in the process of writing
> up tests and going through biodesign to make sure everything's
> kosher, but the module itself is essentially ready-to-go.  What
> should I do next?
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb  8 07:57:46 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 06:57:46 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: 
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
	
Message-ID: 

I'll probably goes with Heikki's advice and post the module (with  
POD, tests, and test file) to bugzilla as an enhancement.  That way  
it can be looked through before committing.  I will likely have a few  
more modules for ERPIN and maybe Infernal int he next few months (if  
I can get it up and running).

Also, completely off-topic, I'll post what I have written up for  
installing bioperl-db on WinXP here soon.  I think it should probably  
be included in the wiki in some way, maybe as a link from the bioperl- 
db wiki page.

Thanks Hilmar, Heikki!

Chris


On Feb 7, 2006, at 11:48 PM, Hilmar Lapp wrote:

> I presume you don't have a cvs write account yet - if you do just add
> and commit the module and test. Otherwise could you post the POD to
> the list please; either somebody with an account will hopefully
> volunteer or Jason or I or Heikki or Aaron will assume mentorship and
> commit the code with feedback to you. Unless you completely refuse to
> heed any and all advice ;) that person will then soon try to absolve
> him/herself of having to do this again for you and support you for
> receiving a cvs write account of your own.
>
>    -hilmar
>
> On 2/7/06, Chris Fields  wrote:
>> I want to submit a module for parsing RNAMotif output
>> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
>> output and returning Bio::SeqFeature::Generic objects with added tags
>> for descriptors/sequences/file info.  I'm in the process of writing
>> up tests and going through biodesign to make sure everything's
>> kosher, but the module itself is essentially ready-to-go.  What
>> should I do next?
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Wed Feb  8 10:32:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 09:32:25 -0600
Subject: [Bioperl-l] RemoteBlast  [was: (no subject)]
In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
Message-ID: <000401c62cc4$de0cc9b0$15327e82@pyrimidine>

Roger, 

It might be better to build a wrapper for the blastcl3 and make it a
separate Bio::Tools::Run module, maybe branch it off from RemoteBlast or,
better yet, StandAloneBlast.  All the put/get parameters in the BEGIN{}
block for RemoteBlast look like they are configured for NCBI's HTTP
submission via CGI; I don't think you can use these for blastcl3.  Ergo,
you'll have to create a whole new set of hashes or parameter arrays inside
RemoteBlast just for blastcl3 since everything is passed via command-line
flags, like so (from http://www.ncbi.nlm.nih.gov/blast/docs/netblast.html):

blastcl3 -p blastp -d nr -i MY_QUEYR -o MY_QUERY.out

However, StandAloneBlast looks like it has all the parameters mapped out in
the BEGIN{} block.  And it looks like the command line options support just
about everything you get via the web version.  It probably wouldn't take
much modification from StandAloneBlast to get it to run blastcl3.

As for queueing, I don't think it's supported, though you can send in a
FASTA file with multiple sequences for multiple BLAST queries (I tried this
and it works).  You could also create a queue using a sequence factory,
sending them to the netblast client one at a time, though I'd suggest
putting a delay in between cycles in that case so as not to make the guys at
NCBI cranky.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Roger Hall
> Sent: Tuesday, February 07, 2006 11:17 PM
> To: Paul.Boutros at utoronto.ca; 'BioPerl Mailing List'
> Subject: Re: [Bioperl-l] RemoteBlast [was: (no subject)]
> 
> Paul,
> 
> I think that most core Bioperl folks have long since moved 
> away from RemoteBlast and are using the functionality in 
> StandAloneBlast to run their own local servers. More 
> importantly, they are, in general, researchers who are coming 
> to Bioinformatics from the life sciences side, and are 
> particularly tired of dealing with the technical issues that 
> RemoteBlast consistently generates due to changes in the 
> text-formatted BLAST reports. 
> 
> They aren't code-for-code-sake geeks like me. ;}
> 
> When RemoteBlast was written, XML was barely on the 
> technology radar, and XML-formatted BLAST reports weren't 
> even available. It seems that everyone recognizes that the 
> XML reports now generated by NCBI's blast server is the wave 
> of the future, but I think there is still some concern that 
> not every flavor of BLAST produces XML yet. Even so, the XML 
> parser is considered to be very strong, and only helps hasten 
> the end of text-formatted support, since parsing 
> text-formatted reports is the primary source of pain. 
> 
> In discussing the shift from old to new, I think the idea of 
> relying on NCBI's application (and NCBI's issue system and 
> NCBI's developers) entered the realm of possibility, so as 
> the guy who just showed up to adopt RemoteBlast, I am trying 
> to air all options and beg for all requirements. 
> 
> Personally, I am okay with the idea of maintaining 
> text-formatted report parsing, but like I said, I'm pound 
> foolish about code sometimes. Additional foolishness arises 
> from the fact that the first money I earned in Bioinformatics 
> was on a contract gig where I relied on RemoteBlast (and the 
> related text parsers).
> 
> For my money, I just needed anyone, anywhere, to say they 
> desired a pure perl implementation to meet my personal 
> threshold. So far, you're the second. ;}
> 
> I do, however, see the advantage in shifting to XML-formatted 
> reporting and parsing *only* as soon as every BLAST flavor 
> supports it, if not before.
> (Anyone - is this still an issue. Please educate me.)
> 
> At the moment, I'm leaning towards adding an option to 
> RemoteBlast. The default (no option) would use a "pure perl" 
> implementation, and the enhancement (with explicit option) 
> would merely wrap the NCBI executable.
> However, there are other issues (queuing, batches) that I 
> don't fully understand in context, so I haven't zeroed in on 
> a complete recommendation yet. Additionally, the end of 
> text-formatted reports, while drawing near, is not yet 
> agreed, although it is pretty clear that the only way text 
> support will be continued is if I insist on it and then 
> deliver the support myself.
> :}
> 
> In any case, I am very interested in a pure perl 
> implementation for exactly the two reasons stated thus far: 
> it's one less thing for a newbie to worry about, and it will 
> run on every platform that runs perl. 
> 
> Thanks much for the input!
> 
> Roger Hall
> Technical Director
> MidSouth Bioinformatics Center
> University of Arkansas at Little Rock
> (501) 569-8074
> 
> 
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Paul Boutros
> Sent: Tuesday, February 07, 2006 7:39 PM
> To: BioPerl Mailing List
> Cc: Roger Hall
> Subject: [Bioperl-l] (no subject)
> 
> Hi Roger,
> 
> I would definitely prefer a fully Perl-based implementation.  
> For starters, I have not been successful in compiling the 
> Toolkit that contains netblast for some platforms (e.g. 
> AIX 5.2 w/gcc 4.0).
> 
> I haven't been following the discussion: is there some 
> compelling reason to prefer a netblast-based system that's 
> come up recently?  I'm guessing that adding a new non-perl 
> dependency would only be done if there was considerable 
> justification for this type of change, but I'm not clear from 
> your message what that justification is.
> 
> Paul
> 
> 
> 
> ------------------------------ 
> 
> Message: 12
> Date: Mon, 6 Feb 2006 20:46:44 -0600
> From: "Roger Hall" 
> Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
>         please        reply 
> To: 
> Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
> Content-Type: text/plain;        charset="us-ascii" 
> 
> To everyone who uses RemoteBlast.pm: 
> 
> Would anyone object to RemoteBlast being rewritten in a way 
> that requires NCBI's blastcl3 executable? 
> 
> Binary downloads of blastcl3 (column "netblast") are 
> available for numerous platforms at: 
> http://ncbi.nih.gov/BLAST/download.shtml 
> 
> Does anyone require or desire a "pure perl" implementation? 
> If so, please explain the advantage you see with such an 
> implementation. 
> 
> Thanks! 
>  
> 
> Roger Hall 
> 
> Technical Director 
> 
> MidSouth Bioinformatics Center 
> 
> University of Arkansas at Little Rock 
> 
> (501) 569-8074 
> 
>   
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Wed Feb  8 15:51:41 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 14:51:41 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output
Message-ID: <43EA59DD.1030608@gmx.at>

Hi,
If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO,  
I get the following error message:

MSG: no data for midline Query  1   WWWKWRW  7
STACK Bio::SearchIO::blast::next_result 
/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
STACK toplevel 
/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

is that a bug......

If I want to parse Blast Output (version 2.2.13), I don't get anything.....
I'm using bioperl 1.4

before, I have installed bioperl 1.4, it worked fine parsing Blast 
Output (version 2.2.12), but I don't remember which bioperl version I 
had installed

thanks in advance

Hubert





From cjfields at uiuc.edu  Wed Feb  8 17:15:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 16:15:23 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA59DD.1030608@gmx.at>
Message-ID: <001101c62cfd$28605df0$15327e82@pyrimidine>

My guess is you're running into text parsing problems in
Bio::SearchIO::blast.  Upgrade to the latest developer version (1.5.1) or
bioperl-live (CVS), then see the bug below. 

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

I think the first problem you ran into is solved in bioperl 1.5.1, the last
problem (more recent, not related to the first) has been fixed but hasn't
been committed to bioperl-live yet.  The fixed SearchIO::blast is available
in the link above, but realize it hasn't been committed yet and may change.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Hubert Prielinger
> Sent: Wednesday, February 08, 2006 2:52 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> Hi,
> If I want to parse a Blast Output (Version 2.2.12) with 
> Bio::SearchIO, I get the following error message:
> 
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> is that a bug......
> 
> If I want to parse Blast Output (version 2.2.13), I don't get 
> anything.....
> I'm using bioperl 1.4
> 
> before, I have installed bioperl 1.4, it worked fine parsing 
> Blast Output (version 2.2.12), but I don't remember which 
> bioperl version I had installed
> 
> thanks in advance
> 
> Hubert
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Wed Feb  8 16:41:04 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 15:41:04 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <001101c62cfd$28605df0$15327e82@pyrimidine>
References: <001101c62cfd$28605df0$15327e82@pyrimidine>
Message-ID: <43EA6570.9070909@gmx.at>

hi chris,
thanks, I have upgraded to version 1.5.1 but it isn't still working, do 
you have any ohter idea, the problem I have is that I have to parse a 
lot of textfiles....
or shall I look for another option to parse those files...

regards
Hubert



Chris Fields wrote:

>My guess is you're running into text parsing problems in
>Bio::SearchIO::blast.  Upgrade to the latest developer version (1.5.1) or
>bioperl-live (CVS), then see the bug below. 
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>I think the first problem you ran into is solved in bioperl 1.5.1, the last
>problem (more recent, not related to the first) has been fixed but hasn't
>been committed to bioperl-live yet.  The fixed SearchIO::blast is available
>in the link above, but realize it hasn't been committed yet and may change.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org 
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>Hubert Prielinger
>>Sent: Wednesday, February 08, 2006 2:52 PM
>>To: bioperl-l at bioperl.org
>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>Hi,
>>If I want to parse a Blast Output (Version 2.2.12) with 
>>Bio::SearchIO, I get the following error message:
>>
>>MSG: no data for midline Query  1   WWWKWRW  7
>>STACK Bio::SearchIO::blast::next_result
>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>STACK toplevel
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>>is that a bug......
>>
>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>anything.....
>>I'm using bioperl 1.4
>>
>>before, I have installed bioperl 1.4, it worked fine parsing 
>>Blast Output (version 2.2.12), but I don't remember which 
>>bioperl version I had installed
>>
>>thanks in advance
>>
>>Hubert
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>
>  
>



From cjfields at uiuc.edu  Wed Feb  8 18:00:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 17:00:21 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6570.9070909@gmx.at>
Message-ID: <001201c62d03$703178c0$15327e82@pyrimidine>

Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
just the modules you want; mixing bioperl versions might work, but you might
run into interoperability problems).  Then replace the Bio::SearchIO::blast
with the one in Bugzilla.  The 'other option' you mentioned might be trying
XML instead of text, which is more stable in the long run.  You will still
need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
this:

http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

If you're using SearchIO directly instead of Remoteblast, you should be able
to set the '-readmethod' flag to 'blastxml'.

It also wouldn't hurt to know what OS you're using or see some code.  Roger
is out there somewhere (I think) and may also have some input.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
> Sent: Wednesday, February 08, 2006 3:41 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> hi chris,
> thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, do you have any ohter idea, the problem I have is 
> that I have to parse a lot of textfiles....
> or shall I look for another option to parse those files...
> 
> regards
> Hubert
> 
> 
> 
> Chris Fields wrote:
> 
> >My guess is you're running into text parsing problems in 
> >Bio::SearchIO::blast.  Upgrade to the latest developer 
> version (1.5.1) 
> >or bioperl-live (CVS), then see the bug below.
> >
> >http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >
> >I think the first problem you ran into is solved in bioperl 
> 1.5.1, the 
> >last problem (more recent, not related to the first) has 
> been fixed but 
> >hasn't been committed to bioperl-live yet.  The fixed 
> SearchIO::blast 
> >is available in the link above, but realize it hasn't been 
> committed yet and may change.
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >  
> >
> >>-----Original Message-----
> >>From: bioperl-l-bounces at lists.open-bio.org
> >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>Prielinger
> >>Sent: Wednesday, February 08, 2006 2:52 PM
> >>To: bioperl-l at bioperl.org
> >>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>output
> >>
> >>Hi,
> >>If I want to parse a Blast Output (Version 2.2.12) with 
> Bio::SearchIO, 
> >>I get the following error message:
> >>
> >>MSG: no data for midline Query  1   WWWKWRW  7
> >>STACK Bio::SearchIO::blast::next_result
> >>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>STACK toplevel
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>is that a bug......
> >>
> >>If I want to parse Blast Output (version 2.2.13), I don't get 
> >>anything.....
> >>I'm using bioperl 1.4
> >>
> >>before, I have installed bioperl 1.4, it worked fine parsing Blast 
> >>Output (version 2.2.12), but I don't remember which bioperl 
> version I 
> >>had installed
> >>
> >>thanks in advance
> >>
> >>Hubert
> >>
> >>
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>    
> >>
> >
> >
> >  
> >
> 



From hubert.prielinger at gmx.at  Wed Feb  8 17:22:44 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 16:22:44 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <001201c62d03$703178c0$15327e82@pyrimidine>
References: <001201c62d03$703178c0$15327e82@pyrimidine>
Message-ID: <43EA6F34.4090007@gmx.at>

hi,
I have installed from the following page: 
http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext. 
I'm using only the SearchIO without remoteblast module, because I have 
already all my Blast output files.
My operating system is fedora core 9.

Code:

#!/usr/bin/perl -w

use Bio::SearchIO;

print "start program\n";
my $directory = 
"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
print "read file\n";

my $search = new Bio::SearchIO (-format => 'blast',
                                -file => $file);
                               
my $cutoff_len = 10;
                               


#iterate over each query sequence
while (my $result = $search->next_result) {
print "entered 1st while loop\n";
   
    #iterate over each hit on the query sequence
    while (my $hit = $result->next_hit) {
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
           
            if ($hsp->length('sbjct') <= $cutoff_len) {
                #print $hsp->hit_string, "\n";
                for ($hsp->hit_string) {
               
                   
                    if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || 
tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
                       
                        # Print some tab-delimited data about this HSP
           
                           open (bigShot, ">>BlastOutputTrial.txt") || 
die ("Could not open file. $!");
                                #print $result->query_name, "\t";
           
#                        print $hit->significance, "\t";
                         print bigShot $hit->name, "-->";
                         print bigShot $hit->description, "\n";
                         #print bigShot "Query:   ", 
$hsp->start('query'), "  ", $hsp->query_string, "  ", 
$hsp->end('query'), "\n";
                         print bigShot "Seq:     ", $hsp->start('hit'), 
"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
                          
#                        print $hsp->rank, "\t";
#                        print $hsp->percent_identity, "\t";
#                        print $hsp->evalue, "\t";
#                        print $hsp->hsp_length, "\n";
                   
                        close (bigShot);
                       
                    };
               
           
            }
        }
        }
    }
}

}

closedir(DIR);


Chris Fields wrote:

>Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
>just the modules you want; mixing bioperl versions might work, but you might
>run into interoperability problems).  Then replace the Bio::SearchIO::blast
>with the one in Bugzilla.  The 'other option' you mentioned might be trying
>XML instead of text, which is more stable in the long run.  You will still
>need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
>this:
>
>http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>
>If you're using SearchIO directly instead of Remoteblast, you should be able
>to set the '-readmethod' flag to 'blastxml'.
>
>It also wouldn't hurt to know what OS you're using or see some code.  Roger
>is out there somewhere (I think) and may also have some input.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
>>Sent: Wednesday, February 08, 2006 3:41 PM
>>To: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>hi chris,
>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>working, do you have any ohter idea, the problem I have is 
>>that I have to parse a lot of textfiles....
>>or shall I look for another option to parse those files...
>>
>>regards
>>Hubert
>>
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>My guess is you're running into text parsing problems in 
>>>Bio::SearchIO::blast.  Upgrade to the latest developer 
>>>      
>>>
>>version (1.5.1) 
>>    
>>
>>>or bioperl-live (CVS), then see the bug below.
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>I think the first problem you ran into is solved in bioperl 
>>>      
>>>
>>1.5.1, the 
>>    
>>
>>>last problem (more recent, not related to the first) has 
>>>      
>>>
>>been fixed but 
>>    
>>
>>>hasn't been committed to bioperl-live yet.  The fixed 
>>>      
>>>
>>SearchIO::blast 
>>    
>>
>>>is available in the link above, but realize it hasn't been 
>>>      
>>>
>>committed yet and may change.
>>    
>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>Prielinger
>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>To: bioperl-l at bioperl.org
>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>        
>>>>
>>parsing Blast 
>>    
>>
>>>>output
>>>>
>>>>Hi,
>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>        
>>>>
>>Bio::SearchIO, 
>>    
>>
>>>>I get the following error message:
>>>>
>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>STACK Bio::SearchIO::blast::next_result
>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>STACK toplevel
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>is that a bug......
>>>>
>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>anything.....
>>>>I'm using bioperl 1.4
>>>>
>>>>before, I have installed bioperl 1.4, it worked fine parsing Blast 
>>>>Output (version 2.2.12), but I don't remember which bioperl 
>>>>        
>>>>
>>version I 
>>    
>>
>>>>had installed
>>>>
>>>>thanks in advance
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>
>
>  
>



From rahall2 at ualr.edu  Wed Feb  8 18:34:45 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Wed, 8 Feb 2006 17:34:45 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6F34.4090007@gmx.at>
Message-ID: <000401c62d08$3ede6b70$4301a8c0@LIBERAL>

Hubert,

Give me a bit to look over your code and think this through. I am still
re-familiarizing myself with the relevant modules, so I can't give an answer
off the top of my head.

Also, please send me one or more of your blast reports (zipped) if you don't
mind (and maybe avoid including the list in your reply). Let's take this
"offline" relative to the list - we'll include the list again if there is a
Bioperl issue and solution. (In case you are concerned at all, I promise not
to share or study the actual BLAST results.)

I'm not particularly familiar with the Fedora distributions, but I'm sure I
can either chase down the perl problem or at least eliminate everything else
but Fedora as the culprit. ;}

(Chris - I'm not quite paying attention on an hourly basis yet, but I do
intend to help support these issues for the foreseeable future. Thanks as
always for the assist.)

Thanks!

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock
(501) 569-8074



-----Original Message-----
From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
Sent: Wednesday, February 08, 2006 4:23 PM
To: Chris Fields; bioperl-l at bioperl.org; rahall2 at ualr.edu
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
output

hi,
I have installed from the following page: 
http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext. 
I'm using only the SearchIO without remoteblast module, because I have 
already all my Blast output files.
My operating system is fedora core 9.

Code:

#!/usr/bin/perl -w

use Bio::SearchIO;

print "start program\n";
my $directory = 
"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
print "read file\n";

my $search = new Bio::SearchIO (-format => 'blast',
                                -file => $file);
                               
my $cutoff_len = 10;
                               


#iterate over each query sequence
while (my $result = $search->next_result) {
print "entered 1st while loop\n";
   
    #iterate over each hit on the query sequence
    while (my $hit = $result->next_hit) {
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
           
            if ($hsp->length('sbjct') <= $cutoff_len) {
                #print $hsp->hit_string, "\n";
                for ($hsp->hit_string) {
               
                   
                    if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || 
tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
                       
                        # Print some tab-delimited data about this HSP
           
                           open (bigShot, ">>BlastOutputTrial.txt") || 
die ("Could not open file. $!");
                                #print $result->query_name, "\t";
           
#                        print $hit->significance, "\t";
                         print bigShot $hit->name, "-->";
                         print bigShot $hit->description, "\n";
                         #print bigShot "Query:   ", 
$hsp->start('query'), "  ", $hsp->query_string, "  ", 
$hsp->end('query'), "\n";
                         print bigShot "Seq:     ", $hsp->start('hit'), 
"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
                          
#                        print $hsp->rank, "\t";
#                        print $hsp->percent_identity, "\t";
#                        print $hsp->evalue, "\t";
#                        print $hsp->hsp_length, "\n";
                   
                        close (bigShot);
                       
                    };
               
           
            }
        }
        }
    }
}

}

closedir(DIR);


Chris Fields wrote:

>Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
>just the modules you want; mixing bioperl versions might work, but you
might
>run into interoperability problems).  Then replace the Bio::SearchIO::blast
>with the one in Bugzilla.  The 'other option' you mentioned might be trying
>XML instead of text, which is more stable in the long run.  You will still
>need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
>this:
>
>http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>
>If you're using SearchIO directly instead of Remoteblast, you should be
able
>to set the '-readmethod' flag to 'blastxml'.
>
>It also wouldn't hurt to know what OS you're using or see some code.  Roger
>is out there somewhere (I think) and may also have some input.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
>>Sent: Wednesday, February 08, 2006 3:41 PM
>>To: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>hi chris,
>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>working, do you have any ohter idea, the problem I have is 
>>that I have to parse a lot of textfiles....
>>or shall I look for another option to parse those files...
>>
>>regards
>>Hubert
>>
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>My guess is you're running into text parsing problems in 
>>>Bio::SearchIO::blast.  Upgrade to the latest developer 
>>>      
>>>
>>version (1.5.1) 
>>    
>>
>>>or bioperl-live (CVS), then see the bug below.
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>I think the first problem you ran into is solved in bioperl 
>>>      
>>>
>>1.5.1, the 
>>    
>>
>>>last problem (more recent, not related to the first) has 
>>>      
>>>
>>been fixed but 
>>    
>>
>>>hasn't been committed to bioperl-live yet.  The fixed 
>>>      
>>>
>>SearchIO::blast 
>>    
>>
>>>is available in the link above, but realize it hasn't been 
>>>      
>>>
>>committed yet and may change.
>>    
>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>Prielinger
>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>To: bioperl-l at bioperl.org
>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>        
>>>>
>>parsing Blast 
>>    
>>
>>>>output
>>>>
>>>>Hi,
>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>        
>>>>
>>Bio::SearchIO, 
>>    
>>
>>>>I get the following error message:
>>>>
>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>STACK Bio::SearchIO::blast::next_result
>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>STACK toplevel
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>is that a bug......
>>>>
>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>anything.....
>>>>I'm using bioperl 1.4
>>>>
>>>>before, I have installed bioperl 1.4, it worked fine parsing Blast 
>>>>Output (version 2.2.12), but I don't remember which bioperl 
>>>>        
>>>>
>>version I 
>>    
>>
>>>>had installed
>>>>
>>>>thanks in advance
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>
>
>  
>




From injunjoel at hotmail.com  Wed Feb  8 19:54:26 2006
From: injunjoel at hotmail.com (Joel Steele)
Date: Wed, 08 Feb 2006 16:54:26 -0800
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blastoutput
In-Reply-To: <43EA6F34.4090007@gmx.at>
Message-ID: 

Greetings,
Im not well versed in Bio::SearchIO but there are a few comments about your 
code that may or may not be relevant...

first thing:

=-=-=-=-=code snippet=-=-=-=-=

#!/usr/bin/perl -w
use strict;   #save yourself the headaches and force yourself to write clean 
code.

=-=-=-=-=code snippet=-=-=-=-=

next thing:
when you are reading the files from the directory you are not doing any sort 
of filtering as to what is returned. If you are on a Unix flavored system 
you may be getting the '.' and '..' entries from your readdir(DIR) call. I 
would suggest placing a grep in there somewhere to get only blast files.
something like:

=-=-=-=-=code snippet=-=-=-=-=

#assuming the file extension for blast files is .bls
#the -e and -f are filetests; you could probably get away with just
#-f. Here is a link for reference on the filetests available in Perl.
#
# http://www.perlmonks.org/?node_id=370

my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
closedir(DIR);

#then proceed with your foreach but over @files_to_parse

foreach my $file(@files_to_parse){
     #do cool stuff here...
}

=-=-=-=-=code snippet=-=-=-=-=

Hope that helps.
-Joel Steele


"The surest way to corrupt a youth is to instruct him to hold in higher 
regard those who think alike than those who think differently." -Nietzsche

"I do not feel obliged to believe that the same God who endowed us with 
sense, reason and intellect has intended us to forego their use." -Galileo




>From: Hubert Prielinger 
>To: Chris Fields , bioperl-l at bioperl.org, 
>rahall2 at ualr.edu
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>Blastoutput
>Date: Wed, 08 Feb 2006 16:22:44 -0600
>MIME-Version: 1.0
>Received: from newportal.open-bio.org ([209.59.5.172]) by 
>bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Wed, 8 
>Feb 2006 15:21:55 -0800
>Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by 
>newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k18NKjCX009295;Wed, 8 
>Feb 2006 18:20:53 -0500
>Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by 
>newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for 
>; Wed, 8 Feb 2006 18:20:43 -0500
>Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) 
>[136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006 00:19:21 
>+0100
>X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>X-Authenticated: #16854991
>User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>X-Accept-Language: en-us, en
>References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>X-Y-GMX-Trusted: 0
>X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 
>(newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21 -0500 (EST)
>X-Greylist: IP, sender and recipient auto-whitelisted, not delayed 
>bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Wed, 08 
>Feb 2006 18:20:43 -0500 (EST)
>X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>X-Scanned-By: MIMEDefang 2.52
>X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>X-BeenThere: bioperl-l at lists.open-bio.org
>X-Mailman-Version: 2.1.7
>Precedence: list
>List-Id: Bioperl Project Discussion List 
>List-Unsubscribe: 
>,
>List-Archive: 
>List-Post: 
>List-Help: 
>List-Subscribe: 
>,
>Errors-To: bioperl-l-bounces at lists.open-bio.org
>Return-Path: bioperl-l-bounces at lists.open-bio.org
>X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC) 
>FILETIME=[7419CF20:01C62D06]
>
>hi,
>I have installed from the following page:
>http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext.
>I'm using only the SearchIO without remoteblast module, because I have
>already all my Blast output files.
>My operating system is fedora core 9.
>
>Code:
>
>#!/usr/bin/perl -w
>
>use Bio::SearchIO;
>
>print "start program\n";
>my $directory =
>"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>opendir(DIR, $directory) || die("Cannot open directory");
>print "opened directory\n";
>
>foreach my $file (readdir(DIR))  {
>print "read file\n";
>
>my $search = new Bio::SearchIO (-format => 'blast',
>                                 -file => $file);
>
>my $cutoff_len = 10;
>
>
>
>#iterate over each query sequence
>while (my $result = $search->next_result) {
>print "entered 1st while loop\n";
>
>     #iterate over each hit on the query sequence
>     while (my $hit = $result->next_hit) {
>
>         #iterate over each HSP in the hit
>         while (my $hsp = $hit->next_hsp) {
>
>             if ($hsp->length('sbjct') <= $cutoff_len) {
>                 #print $hsp->hit_string, "\n";
>                 for ($hsp->hit_string) {
>
>
>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>
>                         # Print some tab-delimited data about this HSP
>
>                            open (bigShot, ">>BlastOutputTrial.txt") ||
>die ("Could not open file. $!");
>                                 #print $result->query_name, "\t";
>
>#                        print $hit->significance, "\t";
>                          print bigShot $hit->name, "-->";
>                          print bigShot $hit->description, "\n";
>                          #print bigShot "Query:   ",
>$hsp->start('query'), "  ", $hsp->query_string, "  ",
>$hsp->end('query'), "\n";
>                          print bigShot "Seq:     ", $hsp->start('hit'),
>"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
>
>#                        print $hsp->rank, "\t";
>#                        print $hsp->percent_identity, "\t";
>#                        print $hsp->evalue, "\t";
>#                        print $hsp->hsp_length, "\n";
>
>                         close (bigShot);
>
>                     };
>
>
>             }
>         }
>         }
>     }
>}
>
>}
>
>closedir(DIR);
>
>
>Chris Fields wrote:
>
> >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live 
>(not
> >just the modules you want; mixing bioperl versions might work, but you 
>might
> >run into interoperability problems).  Then replace the 
>Bio::SearchIO::blast
> >with the one in Bugzilla.  The 'other option' you mentioned might be 
>trying
> >XML instead of text, which is more stable in the long run.  You will 
>still
> >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
> >this:
> >
> >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
> >
> >If you're using SearchIO directly instead of Remoteblast, you should be 
>able
> >to set the '-readmethod' flag to 'blastxml'.
> >
> >It also wouldn't hurt to know what OS you're using or see some code.  
>Roger
> >is out there somewhere (I think) and may also have some input.
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >
> >
> >>-----Original Message-----
> >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
> >>Sent: Wednesday, February 08, 2006 3:41 PM
> >>To: Chris Fields; bioperl-l at bioperl.org
> >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>parsing Blast output
> >>
> >>hi chris,
> >>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>working, do you have any ohter idea, the problem I have is
> >>that I have to parse a lot of textfiles....
> >>or shall I look for another option to parse those files...
> >>
> >>regards
> >>Hubert
> >>
> >>
> >>
> >>Chris Fields wrote:
> >>
> >>
> >>
> >>>My guess is you're running into text parsing problems in
> >>>Bio::SearchIO::blast.  Upgrade to the latest developer
> >>>
> >>>
> >>version (1.5.1)
> >>
> >>
> >>>or bioperl-live (CVS), then see the bug below.
> >>>
> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>
> >>>I think the first problem you ran into is solved in bioperl
> >>>
> >>>
> >>1.5.1, the
> >>
> >>
> >>>last problem (more recent, not related to the first) has
> >>>
> >>>
> >>been fixed but
> >>
> >>
> >>>hasn't been committed to bioperl-live yet.  The fixed
> >>>
> >>>
> >>SearchIO::blast
> >>
> >>
> >>>is available in the link above, but realize it hasn't been
> >>>
> >>>
> >>committed yet and may change.
> >>
> >>
> >>>Christopher Fields
> >>>Postdoctoral Researcher - Switzer Lab
> >>>Dept. of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>-----Original Message-----
> >>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
> >>>>Prielinger
> >>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>To: bioperl-l at bioperl.org
> >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>
> >>>>
> >>parsing Blast
> >>
> >>
> >>>>output
> >>>>
> >>>>Hi,
> >>>>If I want to parse a Blast Output (Version 2.2.12) with
> >>>>
> >>>>
> >>Bio::SearchIO,
> >>
> >>
> >>>>I get the following error message:
> >>>>
> >>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>STACK Bio::SearchIO::blast::next_result
> >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>STACK toplevel
> >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>>
> >>>>is that a bug......
> >>>>
> >>>>If I want to parse Blast Output (version 2.2.13), I don't get
> >>>>anything.....
> >>>>I'm using bioperl 1.4
> >>>>
> >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast
> >>>>Output (version 2.2.12), but I don't remember which bioperl
> >>>>
> >>>>
> >>version I
> >>
> >>
> >>>>had installed
> >>>>
> >>>>thanks in advance
> >>>>
> >>>>Hubert
> >>>>
> >>>>
> >>>>
> >>>>_______________________________________________
> >>>>Bioperl-l mailing list
> >>>>Bioperl-l at lists.open-bio.org
> >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l




From saldroubi at yahoo.com  Wed Feb  8 20:12:16 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Wed, 8 Feb 2006 17:12:16 -0800 (PST)
Subject: [Bioperl-l] Documentation link?
Message-ID: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com>

All,
  
 Forgive me but I don't see the documentation link on the  new website.  I only see a link to the HOWTO's. I think I am  looking for the Pdoc link. 
  
  Thank you. 
  


Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From saldroubi at yahoo.com  Wed Feb  8 20:24:23 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Wed, 8 Feb 2006 17:24:23 -0800 (PST)
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>

All,
  
  Say I have an array of nucleotide sequences of of length N.  I  want to calculate the count matrix (weight matrix). That is for each  position 1..N, I want to know how many As, Cs ,Ts and Gs there  are.  Is the code to do this already written in bioperl to build  this matrix if I pass it those strings?
  
  Please excuse my lack of knowledge as I am a new comer to bioinformatics.
  
  Thank you. 
  
  
  
  

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From osborne1 at optonline.net  Wed Feb  8 20:44:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 08 Feb 2006 20:44:56 -0500
Subject: [Bioperl-l] Documentation link?
In-Reply-To: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com>
Message-ID: 

Sam,

http://bioperl.open-bio.org/wiki/Main_Page

Look for the API Docs under "main links".

Brian O.


On 2/8/06 8:12 PM, "Sam Al-Droubi"  wrote:

> All,
>   
>  Forgive me but I don't see the documentation link on the  new website.  I
> only see a link to the HOWTO's. I think I am  looking for the Pdoc link.
>   
>   Thank you. 
>   
> 
> 
> Sincerely, 
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From torsten.seemann at infotech.monash.edu.au  Wed Feb  8 21:54:39 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 13:54:39 +1100
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>
References: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>
Message-ID: <43EAAEEF.3000304@infotech.monash.edu.au>

>   Say I have an array of nucleotide sequences of of length N.  I  want to calculate the count matrix (weight matrix). That is for each  position 1..N, I want to know how many As, Cs ,Ts and Gs there  are.  Is the code to do this already written in bioperl to build  this matrix if I pass it those strings?
>   Please excuse my lack of knowledge as I am a new comer to bioinformatics.

Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
example similar to what you want to do:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html

--Torsten Seemann



From cjfields at uiuc.edu  Thu Feb  9 00:07:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 23:07:15 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blastoutput
In-Reply-To: 
References: 
Message-ID: 


On Feb 8, 2006, at 6:54 PM, Joel Steele wrote:

> Greetings,
> Im not well versed in Bio::SearchIO but there are a few comments  
> about your
> code that may or may not be relevant...
>
> first thing:
>
> =-=-=-=-=code snippet=-=-=-=-=
>
> #!/usr/bin/perl -w
> use strict;   #save yourself the headaches and force yourself to  
> write clean
> code.
>
> =-=-=-=-=code snippet=-=-=-=-=
>

Tread very carefully here.  Just about every book on perl suggests  
'use strict' and adding warnings for code development (ex. the Camel,  
the Llama, and others); in fact, these are the very books most  
beginners start from.  Some would consider NOT using -w or 'use  
strict' a bad habit; everybody has an opinion (I would repeat an oft- 
heard Texas saying, but I'll refrain).  Just remember: try to be a  
little more constructive in your critique and insert a little less  
about your personal coding style.  If you hit the wrong person, you  
might get flamed.

Here's a link that may help a bit here:

http://bioperl.org/Core/Latest/ 
biodesign.html#respect_people_s_code__in_particular_if_it_works_

> next thing:
> when you are reading the files from the directory you are not doing  
> any sort
> of filtering as to what is returned. If you are on a Unix flavored  
> system
> you may be getting the '.' and '..' entries from your readdir(DIR)  
> call. I
> would suggest placing a grep in there somewhere to get only blast  
> files.
> something like:
>

I agree here.  You could probably also use something like File::Find  
here to make things a bit easier with the file names as well; works  
wonderfully, esp. when traversing a directory tree.

> =-=-=-=-=code snippet=-=-=-=-=
>
> #assuming the file extension for blast files is .bls
> #the -e and -f are filetests; you could probably get away with just
> #-f. Here is a link for reference on the filetests available in Perl.
> #
> # http://www.perlmonks.org/?node_id=370
>
> my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
> closedir(DIR);
>
> #then proceed with your foreach but over @files_to_parse
>
> foreach my $file(@files_to_parse){
>      #do cool stuff here...
> }
>

Again, agreed.  But, does it really solve the main problem, which is  
an issue with SearchIO::blast?  It seemed to try parsing a blast file...

> =-=-=-=-=code snippet=-=-=-=-=
>
> Hope that helps.
> -Joel Steele
>
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger 
>> To: Chris Fields , bioperl-l at bioperl.org,
>> rahall2 at ualr.edu
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
>> Blastoutput
>> Date: Wed, 08 Feb 2006 16:22:44 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC 
>> (6.0.3790.211); Wed, 8
>> Feb 2006 15:21:55 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k18NKjCX009295;Wed, 8
>> Feb 2006 18:20:53 -0500
>> Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for
>> ; Wed, 8 Feb 2006 18:20:43 -0500
>> Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006  
>> 00:19:21
>> +0100
>> X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Wed, 08
>> Feb 2006 18:20:43 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List > bio.org>
>> List-Unsubscribe:
>> > l>,
>> List-Archive: 
>> List-Post: 
>> List-Help: 
>> List-Subscribe:
>> > l>,
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC)
>> FILETIME=[7419CF20:01C62D06]
>>
>> hi,
>> I have installed from the following page:
>> http://news.open-bio.org/archives/2005_10.html,  the Core, Run and  
>> Ext.
>> I'm using only the SearchIO without remoteblast module, because I  
>> have
>> already all my Blast output files.
>> My operating system is fedora core 9.
>>
>> Code:
>>
>> #!/usr/bin/perl -w
>>
>> use Bio::SearchIO;
>>
>> print "start program\n";
>> my $directory =
>> "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>> opendir(DIR, $directory) || die("Cannot open directory");
>> print "opened directory\n";
>>
>> foreach my $file (readdir(DIR))  {
>> print "read file\n";
>>
>> my $search = new Bio::SearchIO (-format => 'blast',
>>                                 -file => $file);
>>
>> my $cutoff_len = 10;
>>
>>
>>
>> #iterate over each query sequence
>> while (my $result = $search->next_result) {
>> print "entered 1st while loop\n";
>>
>>     #iterate over each hit on the query sequence
>>     while (my $hit = $result->next_hit) {
>>
>>         #iterate over each HSP in the hit
>>         while (my $hsp = $hit->next_hsp) {
>>
>>             if ($hsp->length('sbjct') <= $cutoff_len) {
>>                 #print $hsp->hit_string, "\n";
>>                 for ($hsp->hit_string) {
>>
>>
>>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>> tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>>
>>                         # Print some tab-delimited data about this  
>> HSP
>>
>>                            open (bigShot,  
>> ">>BlastOutputTrial.txt") ||
>> die ("Could not open file. $!");
>>                                 #print $result->query_name, "\t";
>>
>> #                        print $hit->significance, "\t";
>>                          print bigShot $hit->name, "-->";
>>                          print bigShot $hit->description, "\n";
>>                          #print bigShot "Query:   ",
>> $hsp->start('query'), "  ", $hsp->query_string, "  ",
>> $hsp->end('query'), "\n";
>>                          print bigShot "Seq:     ", $hsp->start 
>> ('hit'),
>> "  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
>>
>> #                        print $hsp->rank, "\t";
>> #                        print $hsp->percent_identity, "\t";
>> #                        print $hsp->evalue, "\t";
>> #                        print $hsp->hsp_length, "\n";
>>
>>                         close (bigShot);
>>
>>                     };
>>
>>
>>             }
>>         }
>>         }
>>     }
>> }
>>
>> }
>>
>> closedir(DIR);
>>
>>
>> Chris Fields wrote:
>>
>>> Make sure you ran a full installation of bioperl-1.5.1 or bioperl- 
>>> live
>> (not
>>> just the modules you want; mixing bioperl versions might work,  
>>> but you
>> might
>>> run into interoperability problems).  Then replace the
>> Bio::SearchIO::blast
>>> with the one in Bugzilla.  The 'other option' you mentioned might be
>> trying
>>> XML instead of text, which is more stable in the long run.  You will
>> still
>>> need to run a full upgrade to bioperl 1.5.1 for that; make sure  
>>> you read
>>> this:
>>>
>>> http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>>>
>>> If you're using SearchIO directly instead of Remoteblast, you  
>>> should be
>> able
>>> to set the '-readmethod' flag to 'blastxml'.
>>>
>>> It also wouldn't hurt to know what OS you're using or see some code.
>> Roger
>>> is out there somewhere (I think) and may also have some input.
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
>>>> Sent: Wednesday, February 08, 2006 3:41 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>> working, do you have any ohter idea, the problem I have is
>>>> that I have to parse a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer
>>>>>
>>>>>
>>>> version (1.5.1)
>>>>
>>>>
>>>>> or bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl
>>>>>
>>>>>
>>>> 1.5.1, the
>>>>
>>>>
>>>>> last problem (more recent, not related to the first) has
>>>>>
>>>>>
>>>> been fixed but
>>>>
>>>>
>>>>> hasn't been committed to bioperl-live yet.  The fixed
>>>>>
>>>>>
>>>> SearchIO::blast
>>>>
>>>>
>>>>> is available in the link above, but realize it hasn't been
>>>>>
>>>>>
>>>> committed yet and may change.
>>>>
>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab
>>>>> Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>
>>>>>>
>>>> Bio::SearchIO,
>>>>
>>>>
>>>>>> I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>> Blast.pl:21
>>>>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine parsing  
>>>>>> Blast
>>>>>> Output (version 2.2.12), but I don't remember which bioperl
>>>>>>
>>>>>>
>>>> version I
>>>>
>>>>
>>>>>> had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From golharam at umdnj.edu  Wed Feb  8 23:46:43 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 08 Feb 2006 23:46:43 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
Message-ID: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>

Does anyone know of tool to mutate a DNA sequence by a specified amount?
For instance, say I have a DNA sequence 1000 bases long, and I want to
simulate mutations to make it 75% (or 80%, etc) similar to the original.


Ryan



From torsten.seemann at infotech.monash.edu.au  Thu Feb  9 06:15:28 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 22:15:28 +1100
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <43EB2450.6000606@infotech.monash.edu.au>

Ryan,

> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.

The EMBOSS suite comes with a tool called "msbar" which can controllably 
mutate sequences:

http://emboss.sourceforge.net/apps/msbar.html

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From cjfields at uiuc.edu  Thu Feb  9 11:16:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 10:16:28 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu>
Message-ID: <001b01c62d94$2e8bee50$15327e82@pyrimidine>


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Thursday, February 09, 2006 9:13 AM
> To: Hubert Prielinger
> Cc: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> > hi chris,
> > thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, 
> > do you have any ohter idea, the problem I have is that I 
> have to parse 
> > a lot of textfiles....
> > or shall I look for another option to parse those files...
> >
> > regards
> > Hubert
> 
> 
> The code from Bioperl 1.5.1 works fine for me for blast 
> 2.2.13 reports but unless you post your blast report we can't 
> really determine the problem.
> 
> If you are still getting the same error like this I am not 
> convinced you have upgraded to 1.5.1 which includes a fix in 
> the fact that NCBI changed the HSP result format to remove 
> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
> as it was apparent sometime in September.
> 
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> If you are just getting no results but also no warnings wrt 
> parsing, are you sure your logic is correct?
> 
> If you remove your filters do you see all the HSPS?
> 
> 
> while (my $result = $search->next_result) {
>      print $result->query_name, "\n";
>      #iterate over each hit on the query sequence
>      while (my $hit = $result->next_hit) {
> 	print $hit->name, "\n";
>          #iterate over each HSP in the hit
>          while (my $hsp = $hit->next_hsp) {
> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>  >hit_string, "\n";	
>         }
>     }
> }

I tested some of the BLAST results that Hubert sent Roger and me with a
similar script to the above.  I removed the file parsing logic and it seemed
to work just fine.  It may very well be a logic issue or that he hasn't
installed the latest fix.
    
It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
though the returned output was from nr, the top of the blast output showed
that it was v2.2.12:  

BLASTP 2.2.12 [Aug-07-2005]

I double-checked my local version and it's definitely v.2.2.13:
-------------------------------------
C:\Perl\Scripts>blastcl3 -

blastcl3 2.2.13   arguments:...
-------------------------------------

If you use RemoteBlast using the same settings, the version in the header
looks like this:

BLASTP 2.2.13 [Nov-27-2005]

I'm wondering if all the blast executables (blast and netblast) from NCBI
have text output like v.2.2.12, while the wwwblast outputs a new format
(2.2.13).  I'll ask blast-help at NCBI about this.

> 
> To clarify some stuff -
> Chris I don't necessarily think the XML is best way forward 
> for BLAST reports generated locally, it isn't as detailed as 
> the Text format and it is what most people expect to be able 
> to scroll through and parse -- it is also harder for the 
> format to change dramatically if you have a static binary on 
> your machine =).  I think for remoteblast the XML format 
> should be the way forward but I expect Bioperl to maintain 
> support of any plain text BLAST report format that people use 
> on a regular basis.
> 

Does XML lack some specific info that text output has?  Didn't know that.  I
believe that XML should be default in RemoteBlast since it will not break,
but I agree with you about text output.  I also agree that it will need
somebody to maintain it constantly, much like RemoteBlast.

> -jason
> >
> >
> > Chris Fields wrote:
> >
> >> My guess is you're running into text parsing problems in 
> >> Bio::SearchIO::blast.  Upgrade to the latest developer version
> >> (1.5.1) or
> >> bioperl-live (CVS), then see the bug below.
> >>
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>
> >> I think the first problem you ran into is solved in bioperl 1.5.1, 
> >> the last problem (more recent, not related to the first) has been 
> >> fixed but hasn't been committed to bioperl-live yet.  The fixed 
> >> SearchIO::blast is available in the link above, but 
> realize it hasn't 
> >> been committed yet and may change.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org
> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>> Prielinger
> >>> Sent: Wednesday, February 08, 2006 2:52 PM
> >>> To: bioperl-l at bioperl.org
> >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>> output
> >>>
> >>> Hi,
> >>> If I want to parse a Blast Output (Version 2.2.12) with 
> >>> Bio::SearchIO, I get the following error message:
> >>>
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>
> >>> is that a bug......
> >>>
> >>> If I want to parse Blast Output (version 2.2.13), I don't get 
> >>> anything.....
> >>> I'm using bioperl 1.4
> >>>
> >>> before, I have installed bioperl 1.4, it worked fine 
> parsing Blast 
> >>> Output (version 2.2.12), but I don't remember which 
> bioperl version 
> >>> I had installed
> >>>
> >>> thanks in advance
> >>>
> >>> Hubert
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  



From cjfields at uiuc.edu  Thu Feb  9 12:53:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 11:53:24 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <200602080853.58889.heikki@sanbi.ac.za>
Message-ID: <000001c62da1$ba346ba0$15327e82@pyrimidine>

Heikki, 

I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and
two test data files to bugzilla.  The first data file is needed for normal
tests, the second is for testing parsing with modified data in the score tag
(using sprintf() in the RNAMotif descriptor).  I ran 'perl t\RNAMotif.t' and
they all passed.

Thanks!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Heikki Lehvaslaiho
> Sent: Wednesday, February 08, 2006 12:54 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> 
> Chris,
> 
> Post your files to bugzilla (ticket type enhancement, add 
> files to ticket after creation)  and someone with commit 
> ability will add them to CVS once the code is in satisfactory 
> condition. 
> 
> Thanks,
> 
> 	-Heikki
> 
> On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > I want to submit a module for parsing RNAMotif output 
> > (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning 
> > output and returning Bio::SeqFeature::Generic objects with 
> added tags 
> > for descriptors/sequences/file info.  I'm in the process of 
> writing up 
> > tests and going through biodesign to make sure everything's kosher, 
> > but the module itself is essentially ready-to-go.  What should I do 
> > next?
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason.stajich at duke.edu  Thu Feb  9 10:13:09 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 10:13:09 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6570.9070909@gmx.at>
References: <001101c62cfd$28605df0$15327e82@pyrimidine>
	<43EA6570.9070909@gmx.at>
Message-ID: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu>

On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> hi chris,
> thanks, I have upgraded to version 1.5.1 but it isn't still  
> working, do
> you have any ohter idea, the problem I have is that I have to parse a
> lot of textfiles....
> or shall I look for another option to parse those files...
>
> regards
> Hubert


The code from Bioperl 1.5.1 works fine for me for blast 2.2.13  
reports but unless you post your blast report we can't really  
determine the problem.

If you are still getting the same error like this I am not convinced  
you have upgraded to 1.5.1 which includes a fix in the fact that NCBI  
changed the HSP result format to remove the ':' from the Query/Sbjct  
prefixes.  We fixed this as soon as it was apparent sometime in  
September.

>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

If you are just getting no results but also no warnings wrt parsing,  
are you sure your logic is correct?

If you remove your filters do you see all the HSPS?


while (my $result = $search->next_result) {
     print $result->query_name, "\n";
     #iterate over each hit on the query sequence
     while (my $hit = $result->next_hit) {
	print $hit->name, "\n";
         #iterate over each HSP in the hit
         while (my $hsp = $hit->next_hsp) {
	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
 >hit_string, "\n";	
        }
    }
}


To clarify some stuff -
Chris I don't necessarily think the XML is best way forward for BLAST  
reports generated locally, it isn't as detailed as the Text format  
and it is what most people expect to be able to scroll through and  
parse -- it is also harder for the format to change dramatically if  
you have a static binary on your machine =).  I think for remoteblast  
the XML format should be the way forward but I expect Bioperl to  
maintain support of any plain text BLAST report format that people  
use on a regular basis.

-jason
>
>
> Chris Fields wrote:
>
>> My guess is you're running into text parsing problems in
>> Bio::SearchIO::blast.  Upgrade to the latest developer version  
>> (1.5.1) or
>> bioperl-live (CVS), then see the bug below.
>>
>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>
>> I think the first problem you ran into is solved in bioperl 1.5.1,  
>> the last
>> problem (more recent, not related to the first) has been fixed but  
>> hasn't
>> been committed to bioperl-live yet.  The fixed SearchIO::blast is  
>> available
>> in the link above, but realize it hasn't been committed yet and  
>> may change.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Hubert Prielinger
>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>> To: bioperl-l at bioperl.org
>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>> parsing Blast output
>>>
>>> Hi,
>>> If I want to parse a Blast Output (Version 2.2.12) with
>>> Bio::SearchIO, I get the following error message:
>>>
>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> is that a bug......
>>>
>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>> anything.....
>>> I'm using bioperl 1.4
>>>
>>> before, I have installed bioperl 1.4, it worked fine parsing
>>> Blast Output (version 2.2.12), but I don't remember which
>>> bioperl version I had installed
>>>
>>> thanks in advance
>>>
>>> Hubert
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From barry.m.dancis at gsk.com  Wed Feb  8 16:44:55 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Wed, 8 Feb 2006 16:44:55 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: <007701c62c37$7914af60$15327e82@pyrimidine>
Message-ID: 

Hi Chris--

        The problem I am solving is given a mature miRna name, how do I 
use it to search for its pre/pri miRna and vice versa. For example, how to 
go from mir-102a* to hsa-mir-102a-1*. Yes, I can write a parser for it, 
but I'm hoping that someone else has already done it and has some bells 
and whistles to go with it.  Below is a hierarchy chart of a data 
structure to hold the naming information. The parsing is not trivial and 
given data in that structure there could be all kinds of neat functions 
that return various aspects of the names.

Barry












"Chris Fields"  
Sent by: bioperl-l-bounces at lists.open-bio.org
07-Feb-2006 17:40
 
To
barry.m.dancis at gsk.com, "'bioperl-l'" 
cc

Subject
Re: [Bioperl-l] Handling miRNA's






Are you talking about sequences or text output from a specific program? If
you are talking about sequences in a particular format, then listen to
Brian.  If you are talking about output, then we need to know which 
program
you're using, as a parser may exist or could be built. 

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
> 
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 8775 bytes
Desc: not available
URL: 

From pmr at ebi.ac.uk  Thu Feb  9 03:25:24 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 9 Feb 2006 08:25:24 -0000 (GMT)
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <2714.86.132.216.50.1139473524.squirrel@webmail.ebi.ac.uk>

Ryan Golhar writes:

> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.

EMBOSS has the msbar program ("mutate sequence beyond all recognition")
which allows you to select the number and type of changes.

With some tuning of options to match the sequence length you should be
able to get results that match whatever your definition of 75% similar
might be (amazing how much more similarity you can get by adding gaps in
an alignment :-)

If you can specify a clear and generally useful way to define what you
need we could of course add a "percent change" option to the msbar program
for a future release.

Hope that helps,

Peter



From sofia at neuro.utah.edu  Thu Feb  9 13:00:05 2006
From: sofia at neuro.utah.edu (Sofia Robb)
Date: Thu, 09 Feb 2006 11:00:05 -0700
Subject: [Bioperl-l] Bio::Assembly::IO::phrap and Bio::Assembly::IO::ace
	with large files
Message-ID: <43EB8325.6050501@neuro.utah.edu>

I am having trouble parsing large (2030 contigs) phrap.out and ace.1 
files.  I have no problem with a small files (1 contig).  Here are the 
errors I get when try the code that is at the end of my email.  My 
script fails on this line:  my $assembly = $in->next_assembly;  I think 
it may be something to do with BTREE in Collection.pm, but have been 
unable to correct my errors.

-------

file with 2030 contigs
Bio::Assembly::IO::ace
Can't call method "get_dup" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 359,  line 
17699.

line 17699 of my ace file is the last line of the record for Contig253

------

file with 2030 contigs
Bio::Assembly::IO::phrap
Can't call method "put" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 225,  line 
39839. 

line 39839 of my phrap.out file is first line of the record for Contig253

------

use Bio::Assembly::IO;

my $filename = $ARGV[0];

my $in = Bio::Assembly::IO->new(-file=>"$filename",
                                -format=>"phrap"    #or -format=>"ace" 
for ace.1 files
                                );
my $assembly = $in->next_assembly;
my @contigs = $assembly->all_contigs();
foreach my $contig ($assembly->all_contigs){
        my $id = $contig->id();
        print "contig id = $id ";
        my $seqObj = $contig->get_consensus_sequence();
        my $seq = $seqObj->seq();
        print "is $seq\n";
}
my $id = $assembly->id();
print "$id\n";       

-----

Thanks for any input,
Sofia

Sofia Robb
Molecular Biology Ph.D Program
Sanchez Laboratory
Department of Neurobiology and Anatomy
University of Utah
http://planaria.neuro.utah.edu





From hubert.prielinger at gmx.at  Thu Feb  9 12:32:39 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 11:32:39 -0600
Subject: [Bioperl-l] zip file
In-Reply-To: 
References: <43EA75FF.7010504@gmx.at>
	
Message-ID: <43EB7CB7.7040602@gmx.at>

Hi Chris,
It doesn't work with the simple input line either, but I have tried my 
script on the command line with the file scanning part and it is 
working, but it takes more than 10 minutes!!!!!!!!!!! for reading one 
file and it doesn't create the output file, so there is no output. 
Before I run the script in the eclipse IDE.
I'm trying to upgrade to bioperl 1.5.1 once more, hopefully that's the 
problem, I have installed the from bioperl.org the core, run and ext part...
the output as you got it is just fine, but nevertheless I need the 
script with the file scanning part, because I have a lot of them.

to Roger: I have tried it with different files, but always the same 
result.....reads the files, but takes them a very long time and no 
Output result file


Hubert




Chris Fields wrote:

> Hubert,
>
> I tried this script out it and it managed to parse your reports.  I  
> removed the file scanning and replaced it with a simple arg line  
> input (i.e. script.pl blast_file).   I attached one of the output files.
>
> Chris
>
>
>
> #!perl
>
> $file = shift @ARGV;
>
> use Bio::SearchIO;
> my $cutoff_len = 10;
> my $searchio = Bio::SearchIO->new( -format => 'blast',
>                                    -file   =>  $file );
> while ( my $result = $searchio->next_result() ) {
>       while( my $hit = $result->next_hit ) {
>           while(my $hsp = $hit->next_hsp) {
>             if ($hsp->length('sbjct') <= $cutoff_len) {
>                 for ($hsp->hit_string) {
>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>                         tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>                          #Print some tab-delimited data about this HSP
>                            open (bigShot, ">>BlastOutputTrial.txt") ||
>                                  die ("Could not open file. $!");
>                          #print $result->query_name, "\t";
>                          #print $hit->significance, "\t";
>                          print bigShot $hit->name, "-->";
>                          print bigShot $hit->description, "\n";
>                          print bigShot "Query:   ",
>                          $hsp->start('query'), "  ", $hsp- 
> >query_string, "  ",
>                             $hsp->end('query'), "\n";
>                          print bigShot "Seq:     ", $hsp->start('hit'),
>                             "  ", $hsp->hit_string, "  ", 
> $hsp->end('hit'), "\n";
> #                        print $hsp->rank, "\t";
> #                        print $hsp->percent_identity, "\t";
> #                        print $hsp->evalue, "\t";
> #                        print $hsp->hsp_length, "\n";
>
>                         close (bigShot);
>
>                     };
>
>
>             }
>         }
>         }
>     }
> }
>
>------------------------------------------------------------------------
>
>  
>



From heikki at sanbi.ac.za  Thu Feb  9 09:54:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 16:54:30 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091654.30890.heikki@sanbi.ac.za>

Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of course,  
all that depends on what you will be using the sequences for. If you are 
going to draw evolutionary conclusions from those sequences, you must mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in every 
location. Mutations do occur multiple times in same places as sequences get 
saturated by mutations. Also, you should decide the relative occurrence of 
transversions versus transitions. Then there are indels; do you want to take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of Celera, 
one of the tools that enabled them to estimate the feasibility of the whole 
genome shotgun sequence assembly, was a very complete program to 'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 06:31:20 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 13:31:20 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091331.21690.heikki@sanbi.ac.za>

Ryan,

Instructions in pseudo code:

take the sequence string out of the object
use a hash to store changed locations
repeat 
    pick a location in the string randomly
    if the location is not in a hash , i.e. changed already, 
        change it into something else
    add the changed location into the hash
    if enough locations have been changed (scalar keys hash), exit loop
put the sequence string back into the seq object

   -Heikki   

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 06:31:20 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 13:31:20 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091331.21690.heikki@sanbi.ac.za>

Ryan,

Instructions in pseudo code:

take the sequence string out of the object
use a hash to store changed locations
repeat 
    pick a location in the string randomly
    if the location is not in a hash , i.e. changed already, 
        change it into something else
    add the changed location into the hash
    if enough locations have been changed (scalar keys hash), exit loop
put the sequence string back into the seq object

   -Heikki   

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From jason.stajich at duke.edu  Thu Feb  9 14:10:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 14:10:54 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>

Depending on whether or not you want to use evolutionary realistic  
models...
* evolver which comes with PAML lets you evolve sequences on a tree
* SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html? 
id=seqgen
also lets you do this
I believe there are PISE interfaces to both of these at the pasteur  
bioweb site - http://bioweb.pasteur.fr/

-jason
On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote:

> Does anyone know of tool to mutate a DNA sequence by a specified  
> amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the  
> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From heikki at sanbi.ac.za  Thu Feb  9 09:54:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 16:54:30 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091654.30890.heikki@sanbi.ac.za>

Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of course,  
all that depends on what you will be using the sequences for. If you are 
going to draw evolutionary conclusions from those sequences, you must mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in every 
location. Mutations do occur multiple times in same places as sequences get 
saturated by mutations. Also, you should decide the relative occurrence of 
transversions versus transitions. Then there are indels; do you want to take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of Celera, 
one of the tools that enabled them to estimate the feasibility of the whole 
genome shotgun sequence assembly, was a very complete program to 'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 14:41:33 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 21:41:33 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <000001c62da1$ba346ba0$15327e82@pyrimidine>
References: <000001c62da1$ba346ba0$15327e82@pyrimidine>
Message-ID: <200602092141.34401.heikki@sanbi.ac.za>

Chris,

I committed your file. All tests pass; code looks like written by a long term 
bioperl contributor! Impressive.

I truncated the larger test file from 270K to 20K (200 lines), to not bloat 
the distribution unnecessarily. Tests pass which is the main thing. Shout if 
if you disagree.

Great job!

	-Heikki
 

On Thursday 09 February 2006 19:53, Chris Fields wrote:
> Heikki,
>
> I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and
> two test data files to bugzilla.  The first data file is needed for normal
> tests, the second is for testing parsing with modified data in the score
> tag (using sprintf() in the RNAMotif descriptor).  I ran 'perl
> t\RNAMotif.t' and they all passed.
>
> Thanks!
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > Heikki Lehvaslaiho
> > Sent: Wednesday, February 08, 2006 12:54 AM
> > To: bioperl-l at lists.open-bio.org
> > Cc: Chris Fields
> > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> >
> > Chris,
> >
> > Post your files to bugzilla (ticket type enhancement, add
> > files to ticket after creation)  and someone with commit
> > ability will add them to CVS once the code is in satisfactory
> > condition.
> >
> > Thanks,
> >
> > 	-Heikki
> >
> > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > I want to submit a module for parsing RNAMotif output
> > > (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> > > output and returning Bio::SeqFeature::Generic objects with
> >
> > added tags
> >
> > > for descriptors/sequences/file info.  I'm in the process of
> >
> > writing up
> >
> > > tests and going through biodesign to make sure everything's kosher,
> > > but the module itself is essentially ready-to-go.  What should I do
> > > next?
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hubert.prielinger at gmx.at  Thu Feb  9 15:13:31 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 14:13:31 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blast	output
In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
Message-ID: <43EBA26B.4010907@gmx.at>

dear roger,
this error message I got, when I tried to parse Blast output (version 
2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot 
of Blast output files
with version 2.2.13 and for that I don't get any error message.....it 
just doesn't work

Hubert



Roger Hall wrote:

>Guys - I'm looking at the error message:
>
>MSG: no data for midline Query  1   WWWKWRW  7
>STACK Bio::SearchIO::blast::next_result
>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>STACK toplevel
>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
>This is my line of thought:
>1. "no data for midline $_" is a unique message generated by blast.pm in one
>location only at the point of a. reading three lines b. dropping lines with
>spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
>2. There is a regexp match that fails in order to reach that error message
>3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
>4. It does anyway
>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
>reports
>
>I suspect a newline/chomp/metacharacter issue. Not finding the string
>anywhere has me thoroughly confused - I asked Hubert for the additional
>file, assuming that I didn't have it.
>
>My next thought is to write a quick script to test perl behavior on "Fedora
>Core 9".
>
>Thoughts?
>
>Did I misread the issue entirely? :}
>
>Roger
>
>
>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>Sent: Thursday, February 09, 2006 10:16 AM
>To: 'Jason Stajich'; 'Hubert Prielinger'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
>output
>
>
>  
>
>>-----Original Message-----
>>From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>>Sent: Thursday, February 09, 2006 9:13 AM
>>To: Hubert Prielinger
>>Cc: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>    
>>
>>>hi chris,
>>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>>      
>>>
>>working, 
>>    
>>
>>>do you have any ohter idea, the problem I have is that I 
>>>      
>>>
>>have to parse 
>>    
>>
>>>a lot of textfiles....
>>>or shall I look for another option to parse those files...
>>>
>>>regards
>>>Hubert
>>>      
>>>
>>The code from Bioperl 1.5.1 works fine for me for blast 
>>2.2.13 reports but unless you post your blast report we can't 
>>really determine the problem.
>>
>>If you are still getting the same error like this I am not 
>>convinced you have upgraded to 1.5.1 which includes a fix in 
>>the fact that NCBI changed the HSP result format to remove 
>>the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
>>as it was apparent sometime in September.
>>
>>    
>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>
>>>>>          
>>>>>
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>>If you are just getting no results but also no warnings wrt 
>>parsing, are you sure your logic is correct?
>>
>>If you remove your filters do you see all the HSPS?
>>
>>
>>while (my $result = $search->next_result) {
>>     print $result->query_name, "\n";
>>     #iterate over each hit on the query sequence
>>     while (my $hit = $result->next_hit) {
>>	print $hit->name, "\n";
>>         #iterate over each HSP in the hit
>>         while (my $hsp = $hit->next_hsp) {
>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>> >hit_string, "\n";	
>>        }
>>    }
>>}
>>    
>>
>
>I tested some of the BLAST results that Hubert sent Roger and me with a
>similar script to the above.  I removed the file parsing logic and it seemed
>to work just fine.  It may very well be a logic issue or that he hasn't
>installed the latest fix.
>    
>It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
>though the returned output was from nr, the top of the blast output showed
>that it was v2.2.12:  
>
>BLASTP 2.2.12 [Aug-07-2005]
>
>I double-checked my local version and it's definitely v.2.2.13:
>-------------------------------------
>C:\Perl\Scripts>blastcl3 -
>
>blastcl3 2.2.13   arguments:...
>-------------------------------------
>
>If you use RemoteBlast using the same settings, the version in the header
>looks like this:
>
>BLASTP 2.2.13 [Nov-27-2005]
>
>I'm wondering if all the blast executables (blast and netblast) from NCBI
>have text output like v.2.2.12, while the wwwblast outputs a new format
>(2.2.13).  I'll ask blast-help at NCBI about this.
>
>  
>
>>To clarify some stuff -
>>Chris I don't necessarily think the XML is best way forward 
>>for BLAST reports generated locally, it isn't as detailed as 
>>the Text format and it is what most people expect to be able 
>>to scroll through and parse -- it is also harder for the 
>>format to change dramatically if you have a static binary on 
>>your machine =).  I think for remoteblast the XML format 
>>should be the way forward but I expect Bioperl to maintain 
>>support of any plain text BLAST report format that people use 
>>on a regular basis.
>>
>>    
>>
>
>Does XML lack some specific info that text output has?  Didn't know that.  I
>believe that XML should be default in RemoteBlast since it will not break,
>but I agree with you about text output.  I also agree that it will need
>somebody to maintain it constantly, much like RemoteBlast.
>
>  
>
>>-jason
>>    
>>
>>>Chris Fields wrote:
>>>
>>>      
>>>
>>>>My guess is you're running into text parsing problems in 
>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>(1.5.1) or
>>>>bioperl-live (CVS), then see the bug below.
>>>>
>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>>I think the first problem you ran into is solved in bioperl 1.5.1, 
>>>>the last problem (more recent, not related to the first) has been 
>>>>fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>>SearchIO::blast is available in the link above, but 
>>>>        
>>>>
>>realize it hasn't 
>>    
>>
>>>>been committed yet and may change.
>>>>
>>>>Christopher Fields
>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>>University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>>Prielinger
>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>To: bioperl-l at bioperl.org
>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>>          
>>>>>
>>parsing Blast 
>>    
>>
>>>>>output
>>>>>
>>>>>Hi,
>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>>Bio::SearchIO, I get the following error message:
>>>>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>
>>>>>          
>>>>>
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>    
>>
>>>>>is that a bug......
>>>>>
>>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>>anything.....
>>>>>I'm using bioperl 1.4
>>>>>
>>>>>before, I have installed bioperl 1.4, it worked fine 
>>>>>          
>>>>>
>>parsing Blast 
>>    
>>
>>>>>Output (version 2.2.12), but I don't remember which 
>>>>>          
>>>>>
>>bioperl version 
>>    
>>
>>>>>I had installed
>>>>>
>>>>>thanks in advance
>>>>>
>>>>>Hubert
>>>>>
>>>>>
>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>      
>>>
>>--
>>Jason Stajich
>>Duke University
>>http://www.duke.edu/~jes12
>>
>>    
>>
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>



From rahall2 at ualr.edu  Thu Feb  9 15:09:52 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Thu, 09 Feb 2006 14:09:52 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blast	output
In-Reply-To: <001b01c62d94$2e8bee50$15327e82@pyrimidine>
Message-ID: <004301c62db4$c9bcbab0$d416a790@LIBERAL>

Guys - I'm looking at the error message:

MSG: no data for midline Query  1   WWWKWRW  7
STACK Bio::SearchIO::blast::next_result
/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
STACK toplevel
/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

This is my line of thought:
1. "no data for midline $_" is a unique message generated by blast.pm in one
location only at the point of a. reading three lines b. dropping lines with
spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
2. There is a regexp match that fails in order to reach that error message
3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
4. It does anyway
5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
reports

I suspect a newline/chomp/metacharacter issue. Not finding the string
anywhere has me thoroughly confused - I asked Hubert for the additional
file, assuming that I didn't have it.

My next thought is to write a quick script to test perl behavior on "Fedora
Core 9".

Thoughts?

Did I misread the issue entirely? :}

Roger


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, February 09, 2006 10:16 AM
To: 'Jason Stajich'; 'Hubert Prielinger'
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
output


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Thursday, February 09, 2006 9:13 AM
> To: Hubert Prielinger
> Cc: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> > hi chris,
> > thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, 
> > do you have any ohter idea, the problem I have is that I 
> have to parse 
> > a lot of textfiles....
> > or shall I look for another option to parse those files...
> >
> > regards
> > Hubert
> 
> 
> The code from Bioperl 1.5.1 works fine for me for blast 
> 2.2.13 reports but unless you post your blast report we can't 
> really determine the problem.
> 
> If you are still getting the same error like this I am not 
> convinced you have upgraded to 1.5.1 which includes a fix in 
> the fact that NCBI changed the HSP result format to remove 
> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
> as it was apparent sometime in September.
> 
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> If you are just getting no results but also no warnings wrt 
> parsing, are you sure your logic is correct?
> 
> If you remove your filters do you see all the HSPS?
> 
> 
> while (my $result = $search->next_result) {
>      print $result->query_name, "\n";
>      #iterate over each hit on the query sequence
>      while (my $hit = $result->next_hit) {
> 	print $hit->name, "\n";
>          #iterate over each HSP in the hit
>          while (my $hsp = $hit->next_hsp) {
> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>  >hit_string, "\n";	
>         }
>     }
> }

I tested some of the BLAST results that Hubert sent Roger and me with a
similar script to the above.  I removed the file parsing logic and it seemed
to work just fine.  It may very well be a logic issue or that he hasn't
installed the latest fix.
    
It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
though the returned output was from nr, the top of the blast output showed
that it was v2.2.12:  

BLASTP 2.2.12 [Aug-07-2005]

I double-checked my local version and it's definitely v.2.2.13:
-------------------------------------
C:\Perl\Scripts>blastcl3 -

blastcl3 2.2.13   arguments:...
-------------------------------------

If you use RemoteBlast using the same settings, the version in the header
looks like this:

BLASTP 2.2.13 [Nov-27-2005]

I'm wondering if all the blast executables (blast and netblast) from NCBI
have text output like v.2.2.12, while the wwwblast outputs a new format
(2.2.13).  I'll ask blast-help at NCBI about this.

> 
> To clarify some stuff -
> Chris I don't necessarily think the XML is best way forward 
> for BLAST reports generated locally, it isn't as detailed as 
> the Text format and it is what most people expect to be able 
> to scroll through and parse -- it is also harder for the 
> format to change dramatically if you have a static binary on 
> your machine =).  I think for remoteblast the XML format 
> should be the way forward but I expect Bioperl to maintain 
> support of any plain text BLAST report format that people use 
> on a regular basis.
> 

Does XML lack some specific info that text output has?  Didn't know that.  I
believe that XML should be default in RemoteBlast since it will not break,
but I agree with you about text output.  I also agree that it will need
somebody to maintain it constantly, much like RemoteBlast.

> -jason
> >
> >
> > Chris Fields wrote:
> >
> >> My guess is you're running into text parsing problems in 
> >> Bio::SearchIO::blast.  Upgrade to the latest developer version
> >> (1.5.1) or
> >> bioperl-live (CVS), then see the bug below.
> >>
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>
> >> I think the first problem you ran into is solved in bioperl 1.5.1, 
> >> the last problem (more recent, not related to the first) has been 
> >> fixed but hasn't been committed to bioperl-live yet.  The fixed 
> >> SearchIO::blast is available in the link above, but 
> realize it hasn't 
> >> been committed yet and may change.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org
> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>> Prielinger
> >>> Sent: Wednesday, February 08, 2006 2:52 PM
> >>> To: bioperl-l at bioperl.org
> >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>> output
> >>>
> >>> Hi,
> >>> If I want to parse a Blast Output (Version 2.2.12) with 
> >>> Bio::SearchIO, I get the following error message:
> >>>
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>
> >>> is that a bug......
> >>>
> >>> If I want to parse Blast Output (version 2.2.13), I don't get 
> >>> anything.....
> >>> I'm using bioperl 1.4
> >>>
> >>> before, I have installed bioperl 1.4, it worked fine 
> parsing Blast 
> >>> Output (version 2.2.12), but I don't remember which 
> bioperl version 
> >>> I had installed
> >>>
> >>> thanks in advance
> >>>
> >>> Hubert
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From Lalancettec at AGR.GC.CA  Thu Feb  9 15:53:10 2006
From: Lalancettec at AGR.GC.CA (Lalancette, Claudia)
Date: Thu, 9 Feb 2006 15:53:10 -0500
Subject: [Bioperl-l] module for finding restriction site in batch of
	sequences?
Message-ID: 

Greetings,

 

I need to find a way to look for a specific restriction enzyme site in
hundreds of sequences.  Been looking at Bio::Restriction, but not sure
if will work...  Any suggestions?

 

Thanks,

Claudia

 

 




From cjfields at uiuc.edu  Thu Feb  9 16:25:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 15:25:01 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <200602092141.34401.heikki@sanbi.ac.za>
Message-ID: <000901c62dbf$49bfae20$15327e82@pyrimidine>

Thanks!  I think, as long as the tests pass everything is fine with me.  I
may be submitting another module or two in the next few weeks; just depends
on how much time I can spend on them.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za] 
> Sent: Thursday, February 09, 2006 1:42 PM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> 
> Chris,
> 
> I committed your file. All tests pass; code looks like 
> written by a long term bioperl contributor! Impressive.
> 
> I truncated the larger test file from 270K to 20K (200 
> lines), to not bloat the distribution unnecessarily. Tests 
> pass which is the main thing. Shout if if you disagree.
> 
> Great job!
> 
> 	-Heikki
>  
> 
> On Thursday 09 February 2006 19:53, Chris Fields wrote:
> > Heikki,
> >
> > I've added the Bio::Tools::RNAMotif module with test suite 
> (24 tests) 
> > and two test data files to bugzilla.  The first data file is needed 
> > for normal tests, the second is for testing parsing with 
> modified data 
> > in the score tag (using sprintf() in the RNAMotif 
> descriptor).  I ran 
> > 'perl t\RNAMotif.t' and they all passed.
> >
> > Thanks!
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki 
> > > Lehvaslaiho
> > > Sent: Wednesday, February 08, 2006 12:54 AM
> > > To: bioperl-l at lists.open-bio.org
> > > Cc: Chris Fields
> > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> > >
> > > Chris,
> > >
> > > Post your files to bugzilla (ticket type enhancement, add 
> files to 
> > > ticket after creation)  and someone with commit ability will add 
> > > them to CVS once the code is in satisfactory condition.
> > >
> > > Thanks,
> > >
> > > 	-Heikki
> > >
> > > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > > I want to submit a module for parsing RNAMotif output 
> > > > (Bio::Tools::RNAMotif).  It is capable, at the moment, 
> of scanning 
> > > > output and returning Bio::SeqFeature::Generic objects with
> > >
> > > added tags
> > >
> > > > for descriptors/sequences/file info.  I'm in the process of
> > >
> > > writing up
> > >
> > > > tests and going through biodesign to make sure everything's 
> > > > kosher, but the module itself is essentially ready-to-go.  What 
> > > > should I do next?
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher
> > > > Lab of Dr. Robert Switzer
> > > > Dept of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > ______ _/      
> _/_____________________________________________________
> > >       _/      _/
> > >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> > >    _/  _/  _/  SANBI, South African National 
> Bioinformatics Institute
> > >   _/  _/  _/  University of Western Cape, South Africa
> > >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > ___ 
> > > _/_/_/_/_/________________________________________________________
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ 
> _/_/_/_/_/________________________________________________________



From golharam at umdnj.edu  Thu Feb  9 16:19:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 09 Feb 2006 16:19:46 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za>
Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1>

Thanks all.  The responses I got were definitely more than helpful.  FYI
- I did initially look at msbar.  I glanced over the "Number of times to
perform mutation operations", which is what I was looking for.  

I'm looking to statistically test some simply scoring matrices.  I think
msbar will do.

Ryan

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Thursday, February 09, 2006 9:55 AM
To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu
Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l';
emboss at emboss.open-bio.org
Subject: Re: [Bioperl-l] Tool to mutate DNA sequence


Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of
course,  
all that depends on what you will be using the sequences for. If you are

going to draw evolutionary conclusions from those sequences, you must
mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in
every 
location. Mutations do occur multiple times in same places as sequences
get 
saturated by mutations. Also, you should decide the relative occurrence
of 
transversions versus transitions. Then there are indels; do you want to
take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of
Celera, 
one of the tools that enabled them to estimate the feasibility of the
whole 
genome shotgun sequence assembly, was a very complete program to
'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of
that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified 
> amount? For instance, say I have a DNA sequence 1000 bases long, and I

> want to simulate mutations to make it 75% (or 80%, etc) similar to the

> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From golharam at umdnj.edu  Thu Feb  9 16:19:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 09 Feb 2006 16:19:46 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za>
Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1>

Thanks all.  The responses I got were definitely more than helpful.  FYI
- I did initially look at msbar.  I glanced over the "Number of times to
perform mutation operations", which is what I was looking for.  

I'm looking to statistically test some simply scoring matrices.  I think
msbar will do.

Ryan

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Thursday, February 09, 2006 9:55 AM
To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu
Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l';
emboss at emboss.open-bio.org
Subject: Re: [Bioperl-l] Tool to mutate DNA sequence


Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of
course,  
all that depends on what you will be using the sequences for. If you are

going to draw evolutionary conclusions from those sequences, you must
mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in
every 
location. Mutations do occur multiple times in same places as sequences
get 
saturated by mutations. Also, you should decide the relative occurrence
of 
transversions versus transitions. Then there are indels; do you want to
take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of
Celera, 
one of the tools that enabled them to estimate the feasibility of the
whole 
genome shotgun sequence assembly, was a very complete program to
'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of
that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified 
> amount? For instance, say I have a DNA sequence 1000 bases long, and I

> want to simulate mutations to make it 75% (or 80%, etc) similar to the

> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From injunjoel at hotmail.com  Thu Feb  9 16:33:45 2006
From: injunjoel at hotmail.com (Joel Steele)
Date: Thu, 09 Feb 2006 13:33:45 -0800
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast
	output
In-Reply-To: <43EBA26B.4010907@gmx.at>
Message-ID: 

Greetings again,
Its the colon...
observe.

-=Code Snippet=-
#!/usr/bin/perl -w
use strict;

#the string as reported from your error.
my $string1 = 'Query  1   WWWKWRW  7';

#your string with a colon thrown in for testing.
my $string2 = 'Query:  1   WWWKWRW  7';

foreach ($string1, $string2){
	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
		print "Match Found in $_\n";
		print $1."\n";
		print $2."\n";
		print $3."\n";
		print $4."\n";
		print $5."\n";
	}else{
		print "no Match for $_\n";
	}
}

-=End Code=-

The Output

-=Code Snippet=-
no Match for Query  1   WWWKWRW  7
Match Found in Query:  1   WWWKWRW  7
Query:  1
Query
1
WWWKWRW
7

-=End Code=-


Now I would suggest changing the regexp

From:
/^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

To:
/^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

in SearchIO::Blast.

General suggestion:
Again I would like to suggest that everyone get use to using the strict 
pragma. Though it may not applicable to this particular problem it becomes 
essential if you wish progress in your use of Perl.
It is a core module so there is nothing to download from CPAN. It helps with 
development and once your code can run without warnings and errors you can 
remove it. This is not a targeted attack as some may interpret it, rather a 
general FYI for those out there new to Perl or programming in general. 
Better to start learning the rules early before bad habits creep in.
One more thing. There is a wonderfully supportive Perl community available 
to anyone who wants to join at PerlMonks.org check it out, who knows you may 
even catch a glimpse of Larry Wall while youre there.

-Joel Steele

"The surest way to corrupt a youth is to instruct him to hold in higher 
regard those who think alike than those who think differently." -Nietzsche

"I do not feel obliged to believe that the same God who endowed us with 
sense, reason and intellect has intended us to forego their use." -Galileo




>From: Hubert Prielinger 
>To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields 
>,        Jason Stajich 
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>parsingBlast	output
>Date: Thu, 09 Feb 2006 14:13:31 -0600
>MIME-Version: 1.0
>Received: from newportal.open-bio.org ([209.59.5.172]) by 
>bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9 
>Feb 2006 13:14:17 -0800
>Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by 
>newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k19LAD2j009778;Thu, 9 
>Feb 2006 16:10:49 -0500
>Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by 
>newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for 
>; Thu, 9 Feb 2006 16:09:59 -0500
>Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) 
>[136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 22:10:05 
>+0100
>X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>X-Authenticated: #16854991
>User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>X-Accept-Language: en-us, en
>References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>X-Y-GMX-Trusted: 0
>X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 
>(newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 -0500 (EST)
>X-Greylist: IP, sender and recipient auto-whitelisted, not delayed 
>bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Thu, 09 
>Feb 2006 16:09:59 -0500 (EST)
>X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>X-Scanned-By: MIMEDefang 2.52
>X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>X-BeenThere: bioperl-l at lists.open-bio.org
>X-Mailman-Version: 2.1.7
>Precedence: list
>List-Id: Bioperl Project Discussion List 
>List-Unsubscribe: 
>,
>List-Archive: 
>List-Post: 
>List-Help: 
>List-Subscribe: 
>,
>Errors-To: bioperl-l-bounces at lists.open-bio.org
>Return-Path: bioperl-l-bounces at lists.open-bio.org
>X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) 
>FILETIME=[C95D94A0:01C62DBD]
>
>dear roger,
>this error message I got, when I tried to parse Blast output (version
>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot
>of Blast output files
>with version 2.2.13 and for that I don't get any error message.....it
>just doesn't work
>
>Hubert
>
>
>
>Roger Hall wrote:
>
> >Guys - I'm looking at the error message:
> >
> >MSG: no data for midline Query  1   WWWKWRW  7
> >STACK Bio::SearchIO::blast::next_result
> >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >STACK toplevel
> >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >
> >This is my line of thought:
> >1. "no data for midline $_" is a unique message generated by blast.pm in 
>one
> >location only at the point of a. reading three lines b. dropping lines 
>with
> >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 
>3)
> >2. There is a regexp match that fails in order to reach that error 
>message
> >3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
> >4. It does anyway
> >5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
> >reports
> >
> >I suspect a newline/chomp/metacharacter issue. Not finding the string
> >anywhere has me thoroughly confused - I asked Hubert for the additional
> >file, assuming that I didn't have it.
> >
> >My next thought is to write a quick script to test perl behavior on 
>"Fedora
> >Core 9".
> >
> >Thoughts?
> >
> >Did I misread the issue entirely? :}
> >
> >Roger
> >
> >
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Thursday, February 09, 2006 10:16 AM
> >To: 'Jason Stajich'; 'Hubert Prielinger'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> >output
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >>Sent: Thursday, February 09, 2006 9:13 AM
> >>To: Hubert Prielinger
> >>Cc: Chris Fields; bioperl-l at bioperl.org
> >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>parsing Blast output
> >>
> >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> >>
> >>
> >>>hi chris,
> >>>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>>
> >>>
> >>working,
> >>
> >>
> >>>do you have any ohter idea, the problem I have is that I
> >>>
> >>>
> >>have to parse
> >>
> >>
> >>>a lot of textfiles....
> >>>or shall I look for another option to parse those files...
> >>>
> >>>regards
> >>>Hubert
> >>>
> >>>
> >>The code from Bioperl 1.5.1 works fine for me for blast
> >>2.2.13 reports but unless you post your blast report we can't
> >>really determine the problem.
> >>
> >>If you are still getting the same error like this I am not
> >>convinced you have upgraded to 1.5.1 which includes a fix in
> >>the fact that NCBI changed the HSP result format to remove
> >>the ':' from the Query/Sbjct prefixes.  We fixed this as soon
> >>as it was apparent sometime in September.
> >>
> >>
> >>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>If you are just getting no results but also no warnings wrt
> >>parsing, are you sure your logic is correct?
> >>
> >>If you remove your filters do you see all the HSPS?
> >>
> >>
> >>while (my $result = $search->next_result) {
> >>     print $result->query_name, "\n";
> >>     #iterate over each hit on the query sequence
> >>     while (my $hit = $result->next_hit) {
> >>	print $hit->name, "\n";
> >>         #iterate over each HSP in the hit
> >>         while (my $hsp = $hit->next_hsp) {
> >>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
> >> >hit_string, "\n";
> >>        }
> >>    }
> >>}
> >>
> >>
> >
> >I tested some of the BLAST results that Hubert sent Roger and me with a
> >similar script to the above.  I removed the file parsing logic and it 
>seemed
> >to work just fine.  It may very well be a logic issue or that he hasn't
> >installed the latest fix.
> >
> >It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), 
>even
> >though the returned output was from nr, the top of the blast output 
>showed
> >that it was v2.2.12:
> >
> >BLASTP 2.2.12 [Aug-07-2005]
> >
> >I double-checked my local version and it's definitely v.2.2.13:
> >-------------------------------------
> >C:\Perl\Scripts>blastcl3 -
> >
> >blastcl3 2.2.13   arguments:...
> >-------------------------------------
> >
> >If you use RemoteBlast using the same settings, the version in the header
> >looks like this:
> >
> >BLASTP 2.2.13 [Nov-27-2005]
> >
> >I'm wondering if all the blast executables (blast and netblast) from NCBI
> >have text output like v.2.2.12, while the wwwblast outputs a new format
> >(2.2.13).  I'll ask blast-help at NCBI about this.
> >
> >
> >
> >>To clarify some stuff -
> >>Chris I don't necessarily think the XML is best way forward
> >>for BLAST reports generated locally, it isn't as detailed as
> >>the Text format and it is what most people expect to be able
> >>to scroll through and parse -- it is also harder for the
> >>format to change dramatically if you have a static binary on
> >>your machine =).  I think for remoteblast the XML format
> >>should be the way forward but I expect Bioperl to maintain
> >>support of any plain text BLAST report format that people use
> >>on a regular basis.
> >>
> >>
> >>
> >
> >Does XML lack some specific info that text output has?  Didn't know that. 
>  I
> >believe that XML should be default in RemoteBlast since it will not 
>break,
> >but I agree with you about text output.  I also agree that it will need
> >somebody to maintain it constantly, much like RemoteBlast.
> >
> >
> >
> >>-jason
> >>
> >>
> >>>Chris Fields wrote:
> >>>
> >>>
> >>>
> >>>>My guess is you're running into text parsing problems in
> >>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
> >>>>(1.5.1) or
> >>>>bioperl-live (CVS), then see the bug below.
> >>>>
> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>
> >>>>I think the first problem you ran into is solved in bioperl 1.5.1,
> >>>>the last problem (more recent, not related to the first) has been
> >>>>fixed but hasn't been committed to bioperl-live yet.  The fixed
> >>>>SearchIO::blast is available in the link above, but
> >>>>
> >>>>
> >>realize it hasn't
> >>
> >>
> >>>>been committed yet and may change.
> >>>>
> >>>>Christopher Fields
> >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
> >>>>University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>-----Original Message-----
> >>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
> >>>>>Prielinger
> >>>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>>To: bioperl-l at bioperl.org
> >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>output
> >>>>>
> >>>>>Hi,
> >>>>>If I want to parse a Blast Output (Version 2.2.12) with
> >>>>>Bio::SearchIO, I get the following error message:
> >>>>>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>
> >>>>>is that a bug......
> >>>>>
> >>>>>If I want to parse Blast Output (version 2.2.13), I don't get
> >>>>>anything.....
> >>>>>I'm using bioperl 1.4
> >>>>>
> >>>>>before, I have installed bioperl 1.4, it worked fine
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>Output (version 2.2.12), but I don't remember which
> >>>>>
> >>>>>
> >>bioperl version
> >>
> >>
> >>>>>I had installed
> >>>>>
> >>>>>thanks in advance
> >>>>>
> >>>>>Hubert
> >>>>>
> >>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Bioperl-l mailing list
> >>>>>Bioperl-l at lists.open-bio.org
> >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at lists.open-bio.org
> >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>--
> >>Jason Stajich
> >>Duke University
> >>http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l




From jason.stajich at duke.edu  Thu Feb  9 17:13:16 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 17:13:16 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast
	output
In-Reply-To: 
References: 
Message-ID: 

Uh, that was done in sept see the CVS log...

On Feb 9, 2006, at 4:33 PM, Joel Steele wrote:

> Greetings again,
> Its the colon...
> observe.
>
> -=Code Snippet=-
> #!/usr/bin/perl -w
> use strict;
>
> #the string as reported from your error.
> my $string1 = 'Query  1   WWWKWRW  7';
>
> #your string with a colon thrown in for testing.
> my $string2 = 'Query:  1   WWWKWRW  7';
>
> foreach ($string1, $string2){
> 	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
> 		print "Match Found in $_\n";
> 		print $1."\n";
> 		print $2."\n";
> 		print $3."\n";
> 		print $4."\n";
> 		print $5."\n";
> 	}else{
> 		print "no Match for $_\n";
> 	}
> }
>
> -=End Code=-
>
> The Output
>
> -=Code Snippet=-
> no Match for Query  1   WWWKWRW  7
> Match Found in Query:  1   WWWKWRW  7
> Query:  1
> Query
> 1
> WWWKWRW
> 7
>
> -=End Code=-
>
>
> Now I would suggest changing the regexp
>
> From:
> /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> To:
> /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> in SearchIO::Blast.
>
> General suggestion:
> Again I would like to suggest that everyone get use to using the  
> strict
> pragma. Though it may not applicable to this particular problem it  
> becomes
> essential if you wish progress in your use of Perl.
> It is a core module so there is nothing to download from CPAN. It  
> helps with
> development and once your code can run without warnings and errors  
> you can
> remove it. This is not a targeted attack as some may interpret it,  
> rather a
> general FYI for those out there new to Perl or programming in general.
> Better to start learning the rules early before bad habits creep in.
> One more thing. There is a wonderfully supportive Perl community  
> available
> to anyone who wants to join at PerlMonks.org check it out, who  
> knows you may
> even catch a glimpse of Larry Wall while youre there.
>
> -Joel Steele
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger 
>> To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields
>> ,        Jason Stajich 
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>> parsingBlast	output
>> Date: Thu, 09 Feb 2006 14:13:31 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211);  
>> Thu, 9
>> Feb 2006 13:14:17 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k19LAD2j009778;Thu, 9
>> Feb 2006 16:10:49 -0500
>> Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for
>> ; Thu, 9 Feb 2006 16:09:59 -0500
>> Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006  
>> 22:10:05
>> +0100
>> X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Thu, 09
>> Feb 2006 16:09:59 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List > bio.org>
>> List-Unsubscribe:
>> > l>,
>> List-Archive: 
>> List-Post: 
>> List-Help: 
>> List-Subscribe:
>> > l>,
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC)
>> FILETIME=[C95D94A0:01C62DBD]
>>
>> dear roger,
>> this error message I got, when I tried to parse Blast output (version
>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>> a lot
>> of Blast output files
>> with version 2.2.13 and for that I don't get any error message.....it
>> just doesn't work
>>
>> Hubert
>>
>>
>>
>> Roger Hall wrote:
>>
>>> Guys - I'm looking at the error message:
>>>
>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> This is my line of thought:
>>> 1. "no data for midline $_" is a unique message generated by  
>>> blast.pm in
>> one
>>> location only at the point of a. reading three lines b. dropping  
>>> lines
>> with
>>> spaces only c. identifying the Query, Midline, and Match lines (0  
>>> <= $i <
>> 3)
>>> 2. There is a regexp match that fails in order to reach that error
>> message
>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>>> expression
>>> 4. It does anyway
>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in  
>>> the blast
>>> reports
>>>
>>> I suspect a newline/chomp/metacharacter issue. Not finding the  
>>> string
>>> anywhere has me thoroughly confused - I asked Hubert for the  
>>> additional
>>> file, assuming that I didn't have it.
>>>
>>> My next thought is to write a quick script to test perl behavior on
>> "Fedora
>>> Core 9".
>>>
>>> Thoughts?
>>>
>>> Did I misread the issue entirely? :}
>>>
>>> Roger
>>>
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>>> Fields
>>> Sent: Thursday, February 09, 2006 10:16 AM
>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>> Cc: bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast
>>> output
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>> To: Hubert Prielinger
>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi chris,
>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>
>>>>>
>>>> working,
>>>>
>>>>
>>>>> do you have any ohter idea, the problem I have is that I
>>>>>
>>>>>
>>>> have to parse
>>>>
>>>>
>>>>> a lot of textfiles....
>>>>> or shall I look for another option to parse those files...
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>> 2.2.13 reports but unless you post your blast report we can't
>>>> really determine the problem.
>>>>
>>>> If you are still getting the same error like this I am not
>>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>>> the fact that NCBI changed the HSP result format to remove
>>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>>> as it was apparent sometime in September.
>>>>
>>>>
>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> If you are just getting no results but also no warnings wrt
>>>> parsing, are you sure your logic is correct?
>>>>
>>>> If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>> while (my $result = $search->next_result) {
>>>>     print $result->query_name, "\n";
>>>>     #iterate over each hit on the query sequence
>>>>     while (my $hit = $result->next_hit) {
>>>> 	print $hit->name, "\n";
>>>>         #iterate over each HSP in the hit
>>>>         while (my $hsp = $hit->next_hsp) {
>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>> hit_string, "\n";
>>>>        }
>>>>    }
>>>> }
>>>>
>>>>
>>>
>>> I tested some of the BLAST results that Hubert sent Roger and me  
>>> with a
>>> similar script to the above.  I removed the file parsing logic  
>>> and it
>> seemed
>>> to work just fine.  It may very well be a logic issue or that he  
>>> hasn't
>>> installed the latest fix.
>>>
>>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>>> 2.2.13),
>> even
>>> though the returned output was from nr, the top of the blast output
>> showed
>>> that it was v2.2.12:
>>>
>>> BLASTP 2.2.12 [Aug-07-2005]
>>>
>>> I double-checked my local version and it's definitely v.2.2.13:
>>> -------------------------------------
>>> C:\Perl\Scripts>blastcl3 -
>>>
>>> blastcl3 2.2.13   arguments:...
>>> -------------------------------------
>>>
>>> If you use RemoteBlast using the same settings, the version in  
>>> the header
>>> looks like this:
>>>
>>> BLASTP 2.2.13 [Nov-27-2005]
>>>
>>> I'm wondering if all the blast executables (blast and netblast)  
>>> from NCBI
>>> have text output like v.2.2.12, while the wwwblast outputs a new  
>>> format
>>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>>
>>>
>>>> To clarify some stuff -
>>>> Chris I don't necessarily think the XML is best way forward
>>>> for BLAST reports generated locally, it isn't as detailed as
>>>> the Text format and it is what most people expect to be able
>>>> to scroll through and parse -- it is also harder for the
>>>> format to change dramatically if you have a static binary on
>>>> your machine =).  I think for remoteblast the XML format
>>>> should be the way forward but I expect Bioperl to maintain
>>>> support of any plain text BLAST report format that people use
>>>> on a regular basis.
>>>>
>>>>
>>>>
>>>
>>> Does XML lack some specific info that text output has?  Didn't  
>>> know that.
>>  I
>>> believe that XML should be default in RemoteBlast since it will not
>> break,
>>> but I agree with you about text output.  I also agree that it  
>>> will need
>>> somebody to maintain it constantly, much like RemoteBlast.
>>>
>>>
>>>
>>>> -jason
>>>>
>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>
>>>>>
>>>>>> My guess is you're running into text parsing problems in
>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>> (1.5.1) or
>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>> I think the first problem you ran into is solved in bioperl  
>>>>>> 1.5.1,
>>>>>> the last problem (more recent, not related to the first) has been
>>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>>> SearchIO::blast is available in the link above, but
>>>>>>
>>>>>>
>>>> realize it hasn't
>>>>
>>>>
>>>>>> been committed yet and may change.
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>>>>>>> Hubert
>>>>>>> Prielinger
>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>> To: bioperl-l at bioperl.org
>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> output
>>>>>>>
>>>>>>> Hi,
>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>>
>>>>>>> is that a bug......
>>>>>>>
>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>>> anything.....
>>>>>>> I'm using bioperl 1.4
>>>>>>>
>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>
>>>>>>>
>>>> bioperl version
>>>>
>>>>
>>>>>>> I had installed
>>>>>>>
>>>>>>> thanks in advance
>>>>>>>
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From boris.steipe at utoronto.ca  Thu Feb  9 16:54:53 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Thu, 9 Feb 2006 16:54:53 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
Message-ID: <1B7E8DA9-86F5-4411-B16C-E6573E5E8C36@utoronto.ca>

Golf, anyone?


#!/usr/bin/perl -nl
for(split//){push at a,$_}
END{
   while($n/@a<0.5) {
     $p=rand(@a);
     if($a[$p]=~/[A-Z]/){$a[$p]=lc((grep!/$a[$p]/,split//,"ACGT")[rand 
(3)]);
       $n++;
     }
   }
print @a;
}

(144, not counting \s and the # !line )

:-)


B.



>> Does anyone know of tool to mutate a DNA sequence by a specified
>> amount?
>> For instance, say I have a DNA sequence 1000 bases long, and I  
>> want to
>> simulate mutations to make it 75% (or 80%, etc) similar to the
>> original.
>>
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Thu Feb  9 17:20:46 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 16:20:46 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
	output
In-Reply-To: <000e01c62dca$bc66df60$15327e82@pyrimidine>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>
Message-ID: <43EBC03E.4040900@gmx.at>

Hi Chris,
I'm incredibly sorry for causing so much inconvenience, yes you are 
right, I had only to change the blast.pm file, it is working very fine, 
thank you very much, and you are right, you have mentioned it ealier 
either to change the file... ;)

but I have another question: does it work with the WU-Blast output too? 

regards
Hubert


Chris Fields wrote:

>Ha!  I come back from meeting and there's a billion emails!  What have we
>started? ;p .  Sorry about this Jason; I know you're busy.
>
>Hubert, if you're out there, I sent you an email with an attachment.  You
>said the output looks like what you were expecting.  So I think we have two
>problems:
>
>1)  I haven't delved into the file scanning, but the fact that it takes so
>long should tell you something's seriously wrong there.  Strip that part out
>and start with a simple script, say, like the one Jason or that I sent you;
>the script I used to generate that output works fine (on two OS's, WinXP and
>Mac OS X).  Use it on one file at a time.  Do everything on command line
>(not through Eclipse).  IDE's can be notoriously flaky about running
>scripts, esp. when they run debugging.  
>
>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast will still
>not work whenever the text blast output has the following header, which
>comes from the new web version of BLAST:
>
>-----------------------------------------------------
>BLASTP 2.2.13 [Nov-27-2005]
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman 
>(1997), "Gapped BLAST and PSI-BLAST: a new generation of 
>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>
>RID: 1139501210-857-165793005128.BLASTQ1
>
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           3,292,813 sequences; 1,128,164,434 total letters
>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis 
>H37Rv].
>Length=193
>.......
>-----------------------------------------------------
>
>It will work if the text output has the following header (or is an older
>version of BLAST):
>
>-----------------------------------------------------
>BLASTP 2.2.12 [Aug-07-2005]
>
>
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
>"Gapped BLAST and PSI-BLAST: a new generation of protein database search
>programs",  Nucleic Acids Res. 25:3389-3402.
>
>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis H37Rv].
>         (193 letters)
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           2,895,325 sequences; 997,103,285 total letters
>-----------------------------------------------------
>You have the former (2.2.13) version.  I know b/c I have your BLAST files.
>Therefore, even bioperl-1.5.1 will not work!
>
>If you want the really gory details on why this is a problem, look here:
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>So, any text output with the above header will not work; it will either hang
>or end abruptly (depending on OS, perl version, memory, patience).  If you
>look in the above, I have added a preliminary fix for this.  I'll reiterate
>for the billionth time, it hasn't been committed yet, so don't kill me if
>blows your computer up ;>   
>
>Here's the direct link:
>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>This is a modified version of Bio::SearchIO::blast.pm (it says it's version
>1.90, but it's lying, I didn't change the version, only the regex; sorry
>Jason).  From what you've been posting it doesn't sound like you've tried
>this, and I believe I've suggested this fix before.
>
>Replace the one in your Bio/SearchIO directory (which looks like
>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your prev.
>message) with this file.  Make sure the filename stays the same (blast.pm).
>
>Run everything again, one file at a time.  Make sure you use Jason's script
>as well as the one I sent you.  Do NOT rely on running through multiple
>files yet.  Fix one bug at a time.  And heed Joel's words about file checks.
>
>
>Here's a small chunk of output from one of your blast files using the
>modifed script I sent you:
>
>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>Query:   1  RWKWKRKK  8
>Seq:     542  RWAWRRKK  549
>
>Look familiar?
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Roger Hall [mailto:rahall2 at ualr.edu] 
>>Sent: Thursday, February 09, 2006 3:24 PM
>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>In other words, yes, I'm on the wrong trail. :}
>>
>>Sorry - I'll look at the output issue this evening (or 
>>realize that Chris already solved the issue).  ;}
>>
>>Thanks!
>>
>>Roger
>>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>Hubert Prielinger
>>Sent: Thursday, February 09, 2006 2:14 PM
>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; 
>>Jason Stajich
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>dear roger,
>>this error message I got, when I tried to parse Blast output (version
>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I 
>>have a lot of Blast output files with version 2.2.13 and for 
>>that I don't get any error message.....it just doesn't work
>>
>>Hubert
>>
>>
>>
>>Roger Hall wrote:
>>
>>    
>>
>>>Guys - I'm looking at the error message:
>>>
>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>STACK Bio::SearchIO::blast::next_result
>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>STACK toplevel
>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>This is my line of thought:
>>>1. "no data for midline $_" is a unique message generated by 
>>>      
>>>
>>blast.pm 
>>    
>>
>>>in
>>>      
>>>
>>one
>>    
>>
>>>location only at the point of a. reading three lines b. 
>>>      
>>>
>>dropping lines 
>>    
>>
>>>with spaces only c. identifying the Query, Midline, and 
>>>      
>>>
>>Match lines (0 
>>    
>>
>>><= $i <
>>>      
>>>
>>3)
>>    
>>
>>>2. There is a regexp match that fails in order to reach that 
>>>      
>>>
>>error message
>>    
>>
>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the 
>>>      
>>>
>>expression
>>    
>>
>>>4. It does anyway
>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere 
>>>      
>>>
>>in the blast
>>    
>>
>>>reports
>>>
>>>I suspect a newline/chomp/metacharacter issue. Not finding 
>>>      
>>>
>>the string 
>>    
>>
>>>anywhere has me thoroughly confused - I asked Hubert for the 
>>>      
>>>
>>additional 
>>    
>>
>>>file, assuming that I didn't have it.
>>>
>>>My next thought is to write a quick script to test perl behavior on 
>>>"Fedora Core 9".
>>>
>>>Thoughts?
>>>
>>>Did I misread the issue entirely? :}
>>>
>>>Roger
>>>
>>>
>>>-----Original Message-----
>>>From: bioperl-l-bounces at lists.open-bio.org
>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>>      
>>>
>>Chris Fields
>>    
>>
>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>Cc: bioperl-l at bioperl.org
>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>Blast output
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>To: Hubert Prielinger
>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>>Blast output
>>>>
>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>   
>>>>
>>>>        
>>>>
>>>>>hi chris,
>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>working,
>>>>   
>>>>
>>>>        
>>>>
>>>>>do you have any ohter idea, the problem I have is that I
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>have to parse
>>>>   
>>>>
>>>>        
>>>>
>>>>>a lot of textfiles....
>>>>>or shall I look for another option to parse those files...
>>>>>
>>>>>regards
>>>>>Hubert
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>2.2.13 reports but unless you post your blast report we 
>>>>        
>>>>
>>can't really 
>>    
>>
>>>>determine the problem.
>>>>
>>>>If you are still getting the same error like this I am not 
>>>>        
>>>>
>>convinced 
>>    
>>
>>>>you have upgraded to 1.5.1 which includes a fix in the fact 
>>>>        
>>>>
>>that NCBI 
>>    
>>
>>>>changed the HSP result format to remove the ':' from the 
>>>>        
>>>>
>>Query/Sbjct 
>>    
>>
>>>>prefixes.  We fixed this as soon as it was apparent sometime in 
>>>>September.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>If you are just getting no results but also no warnings wrt 
>>>>        
>>>>
>>parsing, 
>>    
>>
>>>>are you sure your logic is correct?
>>>>
>>>>If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>>while (my $result = $search->next_result) {
>>>>    print $result->query_name, "\n";
>>>>    #iterate over each hit on the query sequence
>>>>    while (my $hit = $result->next_hit) {
>>>>	print $hit->name, "\n";
>>>>        #iterate over each HSP in the hit
>>>>        while (my $hsp = $hit->next_hsp) {
>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>        
>>>>
>>>>>hit_string, "\n";	
>>>>>          
>>>>>
>>>>       }
>>>>   }
>>>>}
>>>>   
>>>>
>>>>        
>>>>
>>>I tested some of the BLAST results that Hubert sent Roger 
>>>      
>>>
>>and me with a 
>>    
>>
>>>similar script to the above.  I removed the file parsing logic and it
>>>      
>>>
>>seemed
>>    
>>
>>>to work just fine.  It may very well be a logic issue or 
>>>      
>>>
>>that he hasn't 
>>    
>>
>>>installed the latest fix.
>>>   
>>>It's a funny thing, though.  When I tried using blastcl3 (v. 
>>>      
>>>
>>2.2.13), 
>>    
>>
>>>even though the returned output was from nr, the top of the blast 
>>>output showed that it was v2.2.12:
>>>
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>I double-checked my local version and it's definitely v.2.2.13:
>>>-------------------------------------
>>>C:\Perl\Scripts>blastcl3 -
>>>
>>>blastcl3 2.2.13   arguments:...
>>>-------------------------------------
>>>
>>>If you use RemoteBlast using the same settings, the version in the 
>>>header looks like this:
>>>
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>
>>>I'm wondering if all the blast executables (blast and netblast) from 
>>>NCBI have text output like v.2.2.12, while the wwwblast 
>>>      
>>>
>>outputs a new 
>>    
>>
>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>> 
>>>
>>>      
>>>
>>>>To clarify some stuff -
>>>>Chris I don't necessarily think the XML is best way forward 
>>>>        
>>>>
>>for BLAST 
>>    
>>
>>>>reports generated locally, it isn't as detailed as the Text 
>>>>        
>>>>
>>format and 
>>    
>>
>>>>it is what most people expect to be able to scroll through 
>>>>        
>>>>
>>and parse 
>>    
>>
>>>>-- it is also harder for the format to change dramatically 
>>>>        
>>>>
>>if you have 
>>    
>>
>>>>a static binary on your machine =).  I think for 
>>>>        
>>>>
>>remoteblast the XML 
>>    
>>
>>>>format should be the way forward but I expect Bioperl to maintain 
>>>>support of any plain text BLAST report format that people use on a 
>>>>regular basis.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Does XML lack some specific info that text output has?  
>>>      
>>>
>>Didn't know that.
>>I
>>    
>>
>>>believe that XML should be default in RemoteBlast since it will not 
>>>break, but I agree with you about text output.  I also agree that it 
>>>will need somebody to maintain it constantly, much like RemoteBlast.
>>>
>>> 
>>>
>>>      
>>>
>>>>-jason
>>>>   
>>>>
>>>>        
>>>>
>>>>>Chris Fields wrote:
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>My guess is you're running into text parsing problems in 
>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>(1.5.1) or
>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>>I think the first problem you ran into is solved in 
>>>>>>            
>>>>>>
>>bioperl 1.5.1, 
>>    
>>
>>>>>>the last problem (more recent, not related to the first) has been 
>>>>>>fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>>>>SearchIO::blast is available in the link above, but
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>realize it hasn't
>>>>   
>>>>
>>>>        
>>>>
>>>>>>been committed yet and may change.
>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf 
>>>>>>>              
>>>>>>>
>>Of Hubert 
>>    
>>
>>>>>>>Prielinger
>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>output
>>>>>>>
>>>>>>>Hi,
>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>is that a bug......
>>>>>>>
>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>>>>anything.....
>>>>>>>I'm using bioperl 1.4
>>>>>>>
>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>bioperl version
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>I had installed
>>>>>>>
>>>>>>>thanks in advance
>>>>>>>
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>--
>>>>Jason Stajich
>>>>Duke University
>>>>http://www.duke.edu/~jes12
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>    
>>
>
>
>  
>



From olenka.m at gmail.com  Thu Feb  9 17:49:48 2006
From: olenka.m at gmail.com (Olena Morozova)
Date: Thu, 9 Feb 2006 17:49:48 -0500
Subject: [Bioperl-l] Bio::TreeIO
Message-ID: <259a224c0602091449u353e4bf1g5a3cfbb46297217a@mail.gmail.com>

Hi all,

Probably a very stupid question, but the get_lca function does not
work for unrooted trees, does it?
I am trying to get the LCA for a set of nodes in a phylip tree, and I
am using the script in the HOWTO.
Thanks,
Olena

On 2/8/06, Hubert Prielinger  wrote:
> Hi,
> If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO,
> I get the following error message:
>
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
> is that a bug......
>
> If I want to parse Blast Output (version 2.2.13), I don't get anything.....
> I'm using bioperl 1.4
>
> before, I have installed bioperl 1.4, it worked fine parsing Blast
> Output (version 2.2.12), but I don't remember which bioperl version I
> had installed
>
> thanks in advance
>
> Hubert
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From victor.ruotti at gmail.com  Thu Feb  9 18:22:11 2006
From: victor.ruotti at gmail.com (Victor)
Date: Thu, 9 Feb 2006 17:22:11 -0600
Subject: [Bioperl-l] Running BLAT with BioPerl
Message-ID: <36d7e5550602091522g114728a2w57f2a1cb7c1383ee@mail.gmail.com>

Hi,
Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to date
in the lastest bioperl release?



use Bio::Tools::Run::Alignment::Blat;
my $factory = Bio::Tools::Run::Alignment::Blat->new();
my $seq =
"TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";

my @feats = $factory->run( $seq);

Here is what I get when tring to use it:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
/tmp/fB09bp5F76) crashed: -1

Notice that it is using "blat' twice in the path. The way that I fixed this
is by going to the blat.pm module and changing the following lines:
#my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
my $str= Bio::Root::IO->catfile($self->program_name);

Any ideas, maybe I'm missing the $ENV variable somewhere?
I'd like to avoid making this change. Also does anyone have a known synopsis
of this blat module (where to set the parameters, and whether it allows you
to have a config file).
I'll be happy to add a better synopsis to the module if needed.

Thanks in advance,
Victor



From osborne1 at optonline.net  Thu Feb  9 20:37:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 09 Feb 2006 20:37:39 -0500
Subject: [Bioperl-l] module for finding restriction site in batch of
 sequences?
In-Reply-To: 
Message-ID: 

Claudia,

Yes, Bio::Restricion does this, see bptutorial.pl for code examples. Note
that statement "@fragments = $analysis->fragments($enzyme)". If the array
@fragments has more than 1 element that means your sequence has a site for
the enzyme in question.

Alternatively it sounds like you could use some kind of regular expression.

Brian O.


On 2/9/06 3:53 PM, "Lalancette, Claudia"  wrote:

> Greetings,
> 
>  
> 
> I need to find a way to look for a specific restriction enzyme site in
> hundreds of sequences.  Been looking at Bio::Restriction, but not sure
> if will work...  Any suggestions?
> 
>  
> 
> Thanks,
> 
> Claudia
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Thu Feb  9 20:52:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 19:52:34 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
	output
In-Reply-To: <43EBC03E.4040900@gmx.at>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>
	<43EBC03E.4040900@gmx.at>
Message-ID: 

 From 'perldoc Bio::SearchIO::blast':

DESCRIPTION
        This object encapsulated the necessary methods for generating  
events
        suitable for building Bio::Search objects from a BLAST report  
file.
        Read the Bio::SearchIO for more information about how to use  
this.

        This driver can parse:

        o   NCBI produced plain text BLAST reports from blastall,  
this also
            includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
reports.  NCBI
            XML BLAST output is parsed with the blastxml SearchIO driver

        o   WU-BLAST all reports

        o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
BLAT)

        o   BLAST-like output from Paracel BTK output

So, it should.  Let us know if it doesn't.

On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:

> Hi Chris,
> I'm incredibly sorry for causing so much inconvenience, yes you are  
> right, I had only to change the blast.pm file, it is working very  
> fine, thank you very much, and you are right, you have mentioned it  
> ealier either to change the file... ;)
>
> but I have another question: does it work with the WU-Blast output  
> too?
> regards
> Hubert
>
>
> Chris Fields wrote:
>
>> Ha!  I come back from meeting and there's a billion emails!  What  
>> have we
>> started? ;p .  Sorry about this Jason; I know you're busy.
>>
>> Hubert, if you're out there, I sent you an email with an  
>> attachment.  You
>> said the output looks like what you were expecting.  So I think we  
>> have two
>> problems:
>>
>> 1)  I haven't delved into the file scanning, but the fact that it  
>> takes so
>> long should tell you something's seriously wrong there.  Strip  
>> that part out
>> and start with a simple script, say, like the one Jason or that I  
>> sent you;
>> the script I used to generate that output works fine (on two OS's,  
>> WinXP and
>> Mac OS X).  Use it on one file at a time.  Do everything on  
>> command line
>> (not through Eclipse).  IDE's can be notoriously flaky about running
>> scripts, esp. when they run debugging.
>> 2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>> will still
>> not work whenever the text blast output has the following header,  
>> which
>> comes from the new web version of BLAST:
>>
>> -----------------------------------------------------
>> BLASTP 2.2.13 [Nov-27-2005]
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>
>> RID: 1139501210-857-165793005128.BLASTQ1
>>
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           3,292,813 sequences; 1,128,164,434 total letters
>> Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>> Length=193
>> .......
>> -----------------------------------------------------
>>
>> It will work if the text output has the following header (or is an  
>> older
>> version of BLAST):
>>
>> -----------------------------------------------------
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>>
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search
>> programs",  Nucleic Acids Res. 25:3389-3402.
>>
>> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>>         (193 letters)
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           2,895,325 sequences; 997,103,285 total letters
>> -----------------------------------------------------
>> You have the former (2.2.13) version.  I know b/c I have your  
>> BLAST files.
>> Therefore, even bioperl-1.5.1 will not work!
>>
>> If you want the really gory details on why this is a problem, look  
>> here:
>>
>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>
>> So, any text output with the above header will not work; it will  
>> either hang
>> or end abruptly (depending on OS, perl version, memory,  
>> patience).  If you
>> look in the above, I have added a preliminary fix for this.  I'll  
>> reiterate
>> for the billionth time, it hasn't been committed yet, so don't  
>> kill me if
>> blows your computer up ;>
>> Here's the direct link:
>> http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>> This is a modified version of Bio::SearchIO::blast.pm (it says  
>> it's version
>> 1.90, but it's lying, I didn't change the version, only the regex;  
>> sorry
>> Jason).  From what you've been posting it doesn't sound like  
>> you've tried
>> this, and I believe I've suggested this fix before.
>>
>> Replace the one in your Bio/SearchIO directory (which looks like
>> '/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>> prev.
>> message) with this file.  Make sure the filename stays the same  
>> (blast.pm).
>>
>> Run everything again, one file at a time.  Make sure you use  
>> Jason's script
>> as well as the one I sent you.  Do NOT rely on running through  
>> multiple
>> files yet.  Fix one bug at a time.  And heed Joel's words about  
>> file checks.
>>
>>
>> Here's a small chunk of output from one of your blast files using the
>> modifed script I sent you:
>>
>> sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>> Query:   1  RWKWKRKK  8
>> Seq:     542  RWAWRRKK  549
>>
>> Look familiar?
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>> February 09, 2006 3:24 PM
>>> To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> In other words, yes, I'm on the wrong trail. :}
>>>
>>> Sorry - I'll look at the output issue this evening (or realize  
>>> that Chris already solved the issue).  ;}
>>>
>>> Thanks!
>>>
>>> Roger
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>> Prielinger
>>> Sent: Thursday, February 09, 2006 2:14 PM
>>> To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>> Stajich
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> dear roger,
>>> this error message I got, when I tried to parse Blast output  
>>> (version
>>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>> a lot of Blast output files with version 2.2.13 and for that I  
>>> don't get any error message.....it just doesn't work
>>>
>>> Hubert
>>>
>>>
>>>
>>> Roger Hall wrote:
>>>
>>>
>>>> Guys - I'm looking at the error message:
>>>>
>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>> STACK Bio::SearchIO::blast::next_result
>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>> STACK toplevel
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> This is my line of thought:
>>>> 1. "no data for midline $_" is a unique message generated by
>>> blast.pm
>>>> in
>>>>
>>> one
>>>
>>>> location only at the point of a. reading three lines b.
>>> dropping lines
>>>> with spaces only c. identifying the Query, Midline, and
>>> Match lines (0
>>>> <= $i <
>>>>
>>> 3)
>>>
>>>> 2. There is a regexp match that fails in order to reach that
>>> error message
>>>
>>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>> expression
>>>
>>>> 4. It does anyway
>>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>> in the blast
>>>
>>>> reports
>>>>
>>>> I suspect a newline/chomp/metacharacter issue. Not finding
>>> the string
>>>> anywhere has me thoroughly confused - I asked Hubert for the
>>> additional
>>>> file, assuming that I didn't have it.
>>>>
>>>> My next thought is to write a quick script to test perl behavior  
>>>> on "Fedora Core 9".
>>>>
>>>> Thoughts?
>>>>
>>>> Did I misread the issue entirely? :}
>>>>
>>>> Roger
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Chris Fields
>>>
>>>> Sent: Thursday, February 09, 2006 10:16 AM
>>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>> parsing Blast output
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>>> To: Hubert Prielinger
>>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>> parsing Blast output
>>>>>
>>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi chris,
>>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>
>>>>>>
>>>>> working,
>>>>>
>>>>>
>>>>>> do you have any ohter idea, the problem I have is that I
>>>>>>
>>>>>>
>>>>> have to parse
>>>>>
>>>>>
>>>>>> a lot of textfiles....
>>>>>> or shall I look for another option to parse those files...
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>>> 2.2.13 reports but unless you post your blast report we
>>> can't really
>>>>> determine the problem.
>>>>>
>>>>> If you are still getting the same error like this I am not
>>> convinced
>>>>> you have upgraded to 1.5.1 which includes a fix in the fact
>>> that NCBI
>>>>> changed the HSP result format to remove the ':' from the
>>> Query/Sbjct
>>>>> prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>> September.
>>>>>
>>>>>
>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>> If you are just getting no results but also no warnings wrt
>>> parsing,
>>>>> are you sure your logic is correct?
>>>>>
>>>>> If you remove your filters do you see all the HSPS?
>>>>>
>>>>>
>>>>> while (my $result = $search->next_result) {
>>>>>    print $result->query_name, "\n";
>>>>>    #iterate over each hit on the query sequence
>>>>>    while (my $hit = $result->next_hit) {
>>>>> 	print $hit->name, "\n";
>>>>>        #iterate over each HSP in the hit
>>>>>        while (my $hsp = $hit->next_hsp) {
>>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>
>>>>>> hit_string, "\n";	
>>>>>>
>>>>>       }
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>> I tested some of the BLAST results that Hubert sent Roger
>>> and me with a
>>>> similar script to the above.  I removed the file parsing logic  
>>>> and it
>>>>
>>> seemed
>>>
>>>> to work just fine.  It may very well be a logic issue or
>>> that he hasn't
>>>> installed the latest fix.
>>>>   It's a funny thing, though.  When I tried using blastcl3 (v.
>>> 2.2.13),
>>>> even though the returned output was from nr, the top of the  
>>>> blast output showed that it was v2.2.12:
>>>>
>>>> BLASTP 2.2.12 [Aug-07-2005]
>>>>
>>>> I double-checked my local version and it's definitely v.2.2.13:
>>>> -------------------------------------
>>>> C:\Perl\Scripts>blastcl3 -
>>>>
>>>> blastcl3 2.2.13   arguments:...
>>>> -------------------------------------
>>>>
>>>> If you use RemoteBlast using the same settings, the version in  
>>>> the header looks like this:
>>>>
>>>> BLASTP 2.2.13 [Nov-27-2005]
>>>>
>>>> I'm wondering if all the blast executables (blast and netblast)  
>>>> from NCBI have text output like v.2.2.12, while the wwwblast
>>> outputs a new
>>>> format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>
>>>>
>>>>
>>>>> To clarify some stuff -
>>>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST
>>>>> reports generated locally, it isn't as detailed as the Text
>>> format and
>>>>> it is what most people expect to be able to scroll through
>>> and parse
>>>>> -- it is also harder for the format to change dramatically        
>>> if you have
>>>>> a static binary on your machine =).  I think for
>>> remoteblast the XML
>>>>> format should be the way forward but I expect Bioperl to  
>>>>> maintain support of any plain text BLAST report format that  
>>>>> people use on a regular basis.
>>>>>
>>>>>
>>>>>
>>>> Does XML lack some specific info that text output has?
>>> Didn't know that.
>>> I
>>>
>>>> believe that XML should be default in RemoteBlast since it will  
>>>> not break, but I agree with you about text output.  I also agree  
>>>> that it will need somebody to maintain it constantly, much like  
>>>> RemoteBlast.
>>>>
>>>>
>>>>
>>>>> -jason
>>>>>
>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My guess is you're running into text parsing problems in  
>>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>> (1.5.1) or
>>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>>
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>
>>>>>>> I think the first problem you ran into is solved in
>>> bioperl 1.5.1,
>>>>>>> the last problem (more recent, not related to the first) has  
>>>>>>> been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>> The fixed SearchIO::blast is available in the link above, but
>>>>>>>
>>>>>>>
>>>>> realize it hasn't
>>>>>
>>>>>
>>>>>>> been committed yet and may change.
>>>>>>>
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>> Of Hubert
>>>>>>>> Prielinger
>>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>> To: bioperl-l at bioperl.org
>>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> output
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>>
>>>>>>>> is that a bug......
>>>>>>>>
>>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>> get anything.....
>>>>>>>> I'm using bioperl 1.4
>>>>>>>>
>>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>>
>>>>>>>>
>>>>> bioperl version
>>>>>
>>>>>
>>>>>>>> I had installed
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> --
>>>>> Jason Stajich
>>>>> Duke University
>>>>> http://www.duke.edu/~jes12
>>>>>
>>>>>
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






From heikki at sanbi.ac.za  Thu Feb  9 23:47:42 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 06:47:42 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <000901c62dbf$49bfae20$15327e82@pyrimidine>
References: <000901c62dbf$49bfae20$15327e82@pyrimidine>
Message-ID: <200602100647.43173.heikki@sanbi.ac.za>

On Thursday 09 February 2006 23:25, Chris Fields wrote:
> Thanks!  I think, as long as the tests pass everything is fine with me.  I
> may be submitting another module or two in the next few weeks; just depends
> on how much time I can spend on them.

Looking forwart to them!

	-Heikki

> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za]
> > Sent: Thursday, February 09, 2006 1:42 PM
> > To: bioperl-l at lists.open-bio.org
> > Cc: Chris Fields
> > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> >
> > Chris,
> >
> > I committed your file. All tests pass; code looks like
> > written by a long term bioperl contributor! Impressive.
> >
> > I truncated the larger test file from 270K to 20K (200
> > lines), to not bloat the distribution unnecessarily. Tests
> > pass which is the main thing. Shout if if you disagree.
> >
> > Great job!
> >
> > 	-Heikki
> >
> > On Thursday 09 February 2006 19:53, Chris Fields wrote:
> > > Heikki,
> > >
> > > I've added the Bio::Tools::RNAMotif module with test suite
> >
> > (24 tests)
> >
> > > and two test data files to bugzilla.  The first data file is needed
> > > for normal tests, the second is for testing parsing with
> >
> > modified data
> >
> > > in the score tag (using sprintf() in the RNAMotif
> >
> > descriptor).  I ran
> >
> > > 'perl t\RNAMotif.t' and they all passed.
> > >
> > > Thanks!
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org
> > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
> > > > Lehvaslaiho
> > > > Sent: Wednesday, February 08, 2006 12:54 AM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Cc: Chris Fields
> > > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> > > >
> > > > Chris,
> > > >
> > > > Post your files to bugzilla (ticket type enhancement, add
> >
> > files to
> >
> > > > ticket after creation)  and someone with commit ability will add
> > > > them to CVS once the code is in satisfactory condition.
> > > >
> > > > Thanks,
> > > >
> > > > 	-Heikki
> > > >
> > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > > > I want to submit a module for parsing RNAMotif output
> > > > > (Bio::Tools::RNAMotif).  It is capable, at the moment,
> >
> > of scanning
> >
> > > > > output and returning Bio::SeqFeature::Generic objects with
> > > >
> > > > added tags
> > > >
> > > > > for descriptors/sequences/file info.  I'm in the process of
> > > >
> > > > writing up
> > > >
> > > > > tests and going through biodesign to make sure everything's
> > > > > kosher, but the module itself is essentially ready-to-go.  What
> > > > > should I do next?
> > > > >
> > > > > Christopher Fields
> > > > > Postdoctoral Researcher
> > > > > Lab of Dr. Robert Switzer
> > > > > Dept of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > ______ _/
> >
> > _/_____________________________________________________
> >
> > > >       _/      _/
> > > >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > > >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> > > >    _/  _/  _/  SANBI, South African National
> >
> > Bioinformatics Institute
> >
> > > >   _/  _/  _/  University of Western Cape, South Africa
> > > >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > > ___
> > > > _/_/_/_/_/________________________________________________________
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___
> > _/_/_/_/_/________________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 23:51:11 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 06:51:11 +0200
Subject: [Bioperl-l] module for finding restriction site in batch of
	sequences?
In-Reply-To: 
References: 
Message-ID: <200602100651.12028.heikki@sanbi.ac.za>


It should:

#loop over each seq
    my $ra=Bio::Restriction::Analysis->new(-seq=>$seq1);
    @cuts = $ra->fragments('EcoRI'); # or call some other method

or is it something else you are trying to do?

Yours,
	-Heikki


On Thursday 09 February 2006 22:53, Lalancette, Claudia wrote:
> Greetings,
>
>
>
> I need to find a way to look for a specific restriction enzyme site in
> hundreds of sequences.  Been looking at Bio::Restriction, but not sure
> if will work...  Any suggestions?
>
>
>
> Thanks,
>
> Claudia
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Fri Feb 10 02:06:11 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 09:06:11 +0200
Subject: [Bioperl-l] planning sequence mutating modules
Message-ID: <200602100906.11885.heikki@sanbi.ac.za>


Ryan Golhar's mail got me thinking that we should have a simple framework for 
mutating sequences to a desired level. The model can then be extended to 
necessary complexity when needed by subclassing.

To start with, I have been planning:


Bio::SeqEvolution::EvolutionI - interface file
Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
        (defaults to Bio::PrimarySeq)
Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
Bio::SeqEvolution::EvolutionI::each_seqs($count) 
       - returns an array of $count seqs
Bio::SeqEvolution::EvolutionI::_generate_seq() 
Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
      converteed to probabilites of change internally

  various methods to define the extent of divergence:
  only one to start with:
Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
   (= 100% - identity)

Bio::SeqEvolution::Factory - core class to call,
         instantiates subclasses, Bio::SeqEvolution::DNASimple for nucleotides
Bio::SeqEvolution::EvolutionI::type() - evolution model,
      defaults to Bio::SeqEvolution::DNASimple for nucleotides


Bio::SeqEvolution::DNASimple - default for nucleotides
Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
        e.g. 5 => 5:1, defaults to 1:1
        simple alternative to a scoring matrix


I am soliciting usual comments and suggestions about naming and minimal 
functionality.


   -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Pieter.Monsieurs at esat.kuleuven.be  Fri Feb 10 03:53:43 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Fri, 10 Feb 2006 09:53:43 +0100
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	blast	output
In-Reply-To: 
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>
	
Message-ID: <43EC5497.3050505@esat.kuleuven.be>

Hi Chris,

The parsing of the Blast output still doesn't work for me with the bug 
fix download of blast.pm.
The module keeps turning around in the while loop at line 487 looking 
for a database or query-size:

while( defined ($_) ) {
	if( /^Database:/ ) {
		$self->_pushback($_);
		last;
	}
	chomp;               
	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
		$size = $1;
		$size =~ s/,//g;
		last;
	} else {
		$q .= " $_";
		$q =~ s/ +/ /g;
		$q =~ s/^ | $//g;
	}
	$_ = $self->_readline;
}


The code keeps looking for the database information, however - as you 
mentioned - this information is given before the query line in the new 
Blast output format.
This way, all hits and hsps are stored in the query_description 
($hit->query_description), no hits are found and query_length is 0.
Because you already adapted the module to retrieve database information 
at another position in the module, deleting the while loop and adding 
the following lines after $_ = $self->_readline (line 486), worked fine 
for me (using blastn and blastp):

if (/Length=([\d,]+)/) {
	$size = $1;
	$size =~ s/,//g;
}


Regards,
Pieter



Chris Fields wrote:

> From 'perldoc Bio::SearchIO::blast':
>
>DESCRIPTION
>        This object encapsulated the necessary methods for generating  
>events
>        suitable for building Bio::Search objects from a BLAST report  
>file.
>        Read the Bio::SearchIO for more information about how to use  
>this.
>
>        This driver can parse:
>
>        o   NCBI produced plain text BLAST reports from blastall,  
>this also
>            includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>reports.  NCBI
>            XML BLAST output is parsed with the blastxml SearchIO driver
>
>        o   WU-BLAST all reports
>
>        o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>BLAT)
>
>        o   BLAST-like output from Paracel BTK output
>
>So, it should.  Let us know if it doesn't.
>
>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>
>  
>
>>Hi Chris,
>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>right, I had only to change the blast.pm file, it is working very  
>>fine, thank you very much, and you are right, you have mentioned it  
>>ealier either to change the file... ;)
>>
>>but I have another question: does it work with the WU-Blast output  
>>too?
>>regards
>>Hubert
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>have we
>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>
>>>Hubert, if you're out there, I sent you an email with an  
>>>attachment.  You
>>>said the output looks like what you were expecting.  So I think we  
>>>have two
>>>problems:
>>>
>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>takes so
>>>long should tell you something's seriously wrong there.  Strip  
>>>that part out
>>>and start with a simple script, say, like the one Jason or that I  
>>>sent you;
>>>the script I used to generate that output works fine (on two OS's,  
>>>WinXP and
>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>command line
>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>scripts, esp. when they run debugging.
>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>will still
>>>not work whenever the text blast output has the following header,  
>>>which
>>>comes from the new web version of BLAST:
>>>
>>>-----------------------------------------------------
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>
>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>
>>>
>>>Database: All non-redundant GenBank CDS
>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>          3,292,813 sequences; 1,128,164,434 total letters
>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>tuberculosis H37Rv].
>>>Length=193
>>>.......
>>>-----------------------------------------------------
>>>
>>>It will work if the text output has the following header (or is an  
>>>older
>>>version of BLAST):
>>>
>>>-----------------------------------------------------
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>
>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>protein database search
>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>
>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>tuberculosis H37Rv].
>>>        (193 letters)
>>>
>>>Database: All non-redundant GenBank CDS
>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>          2,895,325 sequences; 997,103,285 total letters
>>>-----------------------------------------------------
>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>BLAST files.
>>>Therefore, even bioperl-1.5.1 will not work!
>>>
>>>If you want the really gory details on why this is a problem, look  
>>>here:
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>So, any text output with the above header will not work; it will  
>>>either hang
>>>or end abruptly (depending on OS, perl version, memory,  
>>>patience).  If you
>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>reiterate
>>>for the billionth time, it hasn't been committed yet, so don't  
>>>kill me if
>>>blows your computer up ;>
>>>Here's the direct link:
>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>it's version
>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>sorry
>>>Jason).  From what you've been posting it doesn't sound like  
>>>you've tried
>>>this, and I believe I've suggested this fix before.
>>>
>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>prev.
>>>message) with this file.  Make sure the filename stays the same  
>>>(blast.pm).
>>>
>>>Run everything again, one file at a time.  Make sure you use  
>>>Jason's script
>>>as well as the one I sent you.  Do NOT rely on running through  
>>>multiple
>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>file checks.
>>>
>>>
>>>Here's a small chunk of output from one of your blast files using the
>>>modifed script I sent you:
>>>
>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>Query:   1  RWKWKRKK  8
>>>Seq:     542  RWAWRRKK  549
>>>
>>>Look familiar?
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>February 09, 2006 3:24 PM
>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>parsing Blast output
>>>>
>>>>In other words, yes, I'm on the wrong trail. :}
>>>>
>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>that Chris already solved the issue).  ;}
>>>>
>>>>Thanks!
>>>>
>>>>Roger
>>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>Prielinger
>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>Stajich
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>parsing Blast output
>>>>
>>>>dear roger,
>>>>this error message I got, when I tried to parse Blast output  
>>>>(version
>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>don't get any error message.....it just doesn't work
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>Roger Hall wrote:
>>>>
>>>>
>>>>        
>>>>
>>>>>Guys - I'm looking at the error message:
>>>>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>Blast.pl:21
>>>>>
>>>>>This is my line of thought:
>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>          
>>>>>
>>>>blast.pm
>>>>        
>>>>
>>>>>in
>>>>>
>>>>>          
>>>>>
>>>>one
>>>>
>>>>        
>>>>
>>>>>location only at the point of a. reading three lines b.
>>>>>          
>>>>>
>>>>dropping lines
>>>>        
>>>>
>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>          
>>>>>
>>>>Match lines (0
>>>>        
>>>>
>>>>><= $i <
>>>>>
>>>>>          
>>>>>
>>>>3)
>>>>
>>>>        
>>>>
>>>>>2. There is a regexp match that fails in order to reach that
>>>>>          
>>>>>
>>>>error message
>>>>
>>>>        
>>>>
>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>          
>>>>>
>>>>expression
>>>>
>>>>        
>>>>
>>>>>4. It does anyway
>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>          
>>>>>
>>>>in the blast
>>>>
>>>>        
>>>>
>>>>>reports
>>>>>
>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>          
>>>>>
>>>>the string
>>>>        
>>>>
>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>          
>>>>>
>>>>additional
>>>>        
>>>>
>>>>>file, assuming that I didn't have it.
>>>>>
>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>on "Fedora Core 9".
>>>>>
>>>>>Thoughts?
>>>>>
>>>>>Did I misread the issue entirely? :}
>>>>>
>>>>>Roger
>>>>>
>>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>          
>>>>>
>>>>Chris Fields
>>>>
>>>>        
>>>>
>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>Cc: bioperl-l at bioperl.org
>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>To: Hubert Prielinger
>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>hi chris,
>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>working,
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>have to parse
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>a lot of textfiles....
>>>>>>>or shall I look for another option to parse those files...
>>>>>>>
>>>>>>>regards
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>            
>>>>>>
>>>>can't really
>>>>        
>>>>
>>>>>>determine the problem.
>>>>>>
>>>>>>If you are still getting the same error like this I am not
>>>>>>            
>>>>>>
>>>>convinced
>>>>        
>>>>
>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>            
>>>>>>
>>>>that NCBI
>>>>        
>>>>
>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>            
>>>>>>
>>>>Query/Sbjct
>>>>        
>>>>
>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>September.
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>STACK toplevel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>            
>>>>>>
>>>>parsing,
>>>>        
>>>>
>>>>>>are you sure your logic is correct?
>>>>>>
>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>
>>>>>>
>>>>>>while (my $result = $search->next_result) {
>>>>>>   print $result->query_name, "\n";
>>>>>>   #iterate over each hit on the query sequence
>>>>>>   while (my $hit = $result->next_hit) {
>>>>>>	print $hit->name, "\n";
>>>>>>       #iterate over each HSP in the hit
>>>>>>       while (my $hsp = $hit->next_hsp) {
>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>hit_string, "\n";	
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>      }
>>>>>>  }
>>>>>>}
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>          
>>>>>
>>>>and me with a
>>>>        
>>>>
>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>and it
>>>>>
>>>>>          
>>>>>
>>>>seemed
>>>>
>>>>        
>>>>
>>>>>to work just fine.  It may very well be a logic issue or
>>>>>          
>>>>>
>>>>that he hasn't
>>>>        
>>>>
>>>>>installed the latest fix.
>>>>>  It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>          
>>>>>
>>>>2.2.13),
>>>>        
>>>>
>>>>>even though the returned output was from nr, the top of the  
>>>>>blast output showed that it was v2.2.12:
>>>>>
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>-------------------------------------
>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>
>>>>>blastcl3 2.2.13   arguments:...
>>>>>-------------------------------------
>>>>>
>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>the header looks like this:
>>>>>
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>
>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>          
>>>>>
>>>>outputs a new
>>>>        
>>>>
>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>To clarify some stuff -
>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>            
>>>>>>
>>>>for BLAST
>>>>        
>>>>
>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>            
>>>>>>
>>>>format and
>>>>        
>>>>
>>>>>>it is what most people expect to be able to scroll through
>>>>>>            
>>>>>>
>>>>and parse
>>>>        
>>>>
>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>            
>>>>>>
>>>>if you have
>>>>        
>>>>
>>>>>>a static binary on your machine =).  I think for
>>>>>>            
>>>>>>
>>>>remoteblast the XML
>>>>        
>>>>
>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>people use on a regular basis.
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>Does XML lack some specific info that text output has?
>>>>>          
>>>>>
>>>>Didn't know that.
>>>>I
>>>>
>>>>        
>>>>
>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>RemoteBlast.
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>-jason
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>(1.5.1) or
>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>
>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>
>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>                
>>>>>>>>
>>>>bioperl 1.5.1,
>>>>        
>>>>
>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>realize it hasn't
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>been committed yet and may change.
>>>>>>>>
>>>>>>>>Christopher Fields
>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>-----Original Message-----
>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>                  
>>>>>>>>>
>>>>Of Hubert
>>>>        
>>>>
>>>>>>>>>Prielinger
>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>parsing Blast
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>output
>>>>>>>>>
>>>>>>>>>Hi,
>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>
>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>STACK toplevel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>is that a bug......
>>>>>>>>>
>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>get anything.....
>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>
>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>parsing Blast
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>bioperl version
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>I had installed
>>>>>>>>>
>>>>>>>>>thanks in advance
>>>>>>>>>
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>--
>>>>>>Jason Stajich
>>>>>>Duke University
>>>>>>http://www.duke.edu/~jes12
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>
>Christopher Fields
>Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>Dept of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From Pieter.Monsieurs at esat.kuleuven.be  Fri Feb 10 04:44:10 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Fri, 10 Feb 2006 10:44:10 +0100
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	blast	output
In-Reply-To: <43EC5497.3050505@esat.kuleuven.be>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>	
	<43EC5497.3050505@esat.kuleuven.be>
Message-ID: <43EC606A.20003@esat.kuleuven.be>

Sorry for disturbing. I now works correctly with the bug fix of Chris. 
Thanx,
Pieter

Pieter Monsieurs wrote:

>Hi Chris,
>
>The parsing of the Blast output still doesn't work for me with the bug 
>fix download of blast.pm.
>The module keeps turning around in the while loop at line 487 looking 
>for a database or query-size:
>
>while( defined ($_) ) {
>	if( /^Database:/ ) {
>		$self->_pushback($_);
>		last;
>	}
>	chomp;               
>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>		$size = $1;
>		$size =~ s/,//g;
>		last;
>	} else {
>		$q .= " $_";
>		$q =~ s/ +/ /g;
>		$q =~ s/^ | $//g;
>	}
>	$_ = $self->_readline;
>}
>
>
>The code keeps looking for the database information, however - as you 
>mentioned - this information is given before the query line in the new 
>Blast output format.
>This way, all hits and hsps are stored in the query_description 
>($hit->query_description), no hits are found and query_length is 0.
>Because you already adapted the module to retrieve database information 
>at another position in the module, deleting the while loop and adding 
>the following lines after $_ = $self->_readline (line 486), worked fine 
>for me (using blastn and blastp):
>
>if (/Length=([\d,]+)/) {
>	$size = $1;
>	$size =~ s/,//g;
>}
>
>
>Regards,
>Pieter
>
>
>
>Chris Fields wrote:
>
>  
>
>>From 'perldoc Bio::SearchIO::blast':
>>
>>DESCRIPTION
>>       This object encapsulated the necessary methods for generating  
>>events
>>       suitable for building Bio::Search objects from a BLAST report  
>>file.
>>       Read the Bio::SearchIO for more information about how to use  
>>this.
>>
>>       This driver can parse:
>>
>>       o   NCBI produced plain text BLAST reports from blastall,  
>>this also
>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>reports.  NCBI
>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>
>>       o   WU-BLAST all reports
>>
>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>BLAT)
>>
>>       o   BLAST-like output from Paracel BTK output
>>
>>So, it should.  Let us know if it doesn't.
>>
>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>
>> 
>>
>>    
>>
>>>Hi Chris,
>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>right, I had only to change the blast.pm file, it is working very  
>>>fine, thank you very much, and you are right, you have mentioned it  
>>>ealier either to change the file... ;)
>>>
>>>but I have another question: does it work with the WU-Blast output  
>>>too?
>>>regards
>>>Hubert
>>>
>>>
>>>Chris Fields wrote:
>>>
>>>   
>>>
>>>      
>>>
>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>have we
>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>
>>>>Hubert, if you're out there, I sent you an email with an  
>>>>attachment.  You
>>>>said the output looks like what you were expecting.  So I think we  
>>>>have two
>>>>problems:
>>>>
>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>takes so
>>>>long should tell you something's seriously wrong there.  Strip  
>>>>that part out
>>>>and start with a simple script, say, like the one Jason or that I  
>>>>sent you;
>>>>the script I used to generate that output works fine (on two OS's,  
>>>>WinXP and
>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>command line
>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>scripts, esp. when they run debugging.
>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>will still
>>>>not work whenever the text blast output has the following header,  
>>>>which
>>>>comes from the new web version of BLAST:
>>>>
>>>>-----------------------------------------------------
>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>
>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>
>>>>
>>>>Database: All non-redundant GenBank CDS
>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>tuberculosis H37Rv].
>>>>Length=193
>>>>.......
>>>>-----------------------------------------------------
>>>>
>>>>It will work if the text output has the following header (or is an  
>>>>older
>>>>version of BLAST):
>>>>
>>>>-----------------------------------------------------
>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>
>>>>
>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>protein database search
>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>
>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>tuberculosis H37Rv].
>>>>       (193 letters)
>>>>
>>>>Database: All non-redundant GenBank CDS
>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>-----------------------------------------------------
>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>BLAST files.
>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>
>>>>If you want the really gory details on why this is a problem, look  
>>>>here:
>>>>
>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>>So, any text output with the above header will not work; it will  
>>>>either hang
>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>patience).  If you
>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>reiterate
>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>kill me if
>>>>blows your computer up ;>
>>>>Here's the direct link:
>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>it's version
>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>sorry
>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>you've tried
>>>>this, and I believe I've suggested this fix before.
>>>>
>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>prev.
>>>>message) with this file.  Make sure the filename stays the same  
>>>>(blast.pm).
>>>>
>>>>Run everything again, one file at a time.  Make sure you use  
>>>>Jason's script
>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>multiple
>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>file checks.
>>>>
>>>>
>>>>Here's a small chunk of output from one of your blast files using the
>>>>modifed script I sent you:
>>>>
>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>Query:   1  RWKWKRKK  8
>>>>Seq:     542  RWAWRRKK  549
>>>>
>>>>Look familiar?
>>>>
>>>>Christopher Fields
>>>>Postdoctoral Researcher - Switzer Lab
>>>>Dept. of Biochemistry
>>>>University of Illinois Urbana-Champaign
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>February 09, 2006 3:24 PM
>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>
>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>that Chris already solved the issue).  ;}
>>>>>
>>>>>Thanks!
>>>>>
>>>>>Roger
>>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>Prielinger
>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>Stajich
>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>dear roger,
>>>>>this error message I got, when I tried to parse Blast output  
>>>>>(version
>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>don't get any error message.....it just doesn't work
>>>>>
>>>>>Hubert
>>>>>
>>>>>
>>>>>
>>>>>Roger Hall wrote:
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Guys - I'm looking at the error message:
>>>>>>
>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>STACK toplevel
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>This is my line of thought:
>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>blast.pm
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>in
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>one
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>location only at the point of a. reading three lines b.
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>dropping lines
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Match lines (0
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>><= $i <
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>3)
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>error message
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>expression
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>4. It does anyway
>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>in the blast
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>reports
>>>>>>
>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>the string
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>additional
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>file, assuming that I didn't have it.
>>>>>>
>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>on "Fedora Core 9".
>>>>>>
>>>>>>Thoughts?
>>>>>>
>>>>>>Did I misread the issue entirely? :}
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Chris Fields
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>To: Hubert Prielinger
>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>hi chris,
>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>working,
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>have to parse
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>a lot of textfiles....
>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>
>>>>>>>>regards
>>>>>>>>Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>can't really
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>determine the problem.
>>>>>>>
>>>>>>>If you are still getting the same error like this I am not
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>convinced
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>that NCBI
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>Query/Sbjct
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>September.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>STACK toplevel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>parsing,
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>are you sure your logic is correct?
>>>>>>>
>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>
>>>>>>>
>>>>>>>while (my $result = $search->next_result) {
>>>>>>>  print $result->query_name, "\n";
>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>	print $hit->name, "\n";
>>>>>>>      #iterate over each HSP in the hit
>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>hit_string, "\n";	
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>     }
>>>>>>> }
>>>>>>>}
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>and me with a
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>and it
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>seemed
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>that he hasn't
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>installed the latest fix.
>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>2.2.13),
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>even though the returned output was from nr, the top of the  
>>>>>>blast output showed that it was v2.2.12:
>>>>>>
>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>
>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>-------------------------------------
>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>
>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>-------------------------------------
>>>>>>
>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>the header looks like this:
>>>>>>
>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>
>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>            
>>>>>>
>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>          
>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>outputs a new
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>To clarify some stuff -
>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>for BLAST
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>format and
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>and parse
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>if you have
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>a static binary on your machine =).  I think for
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>remoteblast the XML
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>people use on a regular basis.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Does XML lack some specific info that text output has?
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Didn't know that.
>>>>>I
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>RemoteBlast.
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-jason
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>(1.5.1) or
>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>
>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>bioperl 1.5.1,
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>realize it hasn't
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>been committed yet and may change.
>>>>>>>>>
>>>>>>>>>Christopher Fields
>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>Of Hubert
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>>>>Prielinger
>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>parsing Blast
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>output
>>>>>>>>>>
>>>>>>>>>>Hi,
>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>
>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>STACK toplevel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>is that a bug......
>>>>>>>>>>
>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>get anything.....
>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>
>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>parsing Blast
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>bioperl version
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>I had installed
>>>>>>>>>>
>>>>>>>>>>thanks in advance
>>>>>>>>>>
>>>>>>>>>>Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>Bioperl-l mailing list
>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>--
>>>>>>>Jason Stajich
>>>>>>>Duke University
>>>>>>>http://www.duke.edu/~jes12
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>Dept. of Biochemistry
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>     
>>>>
>>>>        
>>>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> 
>>
>>    
>>
>
>
>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From andrej.kastrin at guest.arnes.si  Fri Feb 10 09:28:19 2006
From: andrej.kastrin at guest.arnes.si (Andrej Kastrin)
Date: Fri, 10 Feb 2006 15:28:19 +0100
Subject: [Bioperl-l] Medline to XML
Message-ID: <43ECA303.8090904@guest.arnes.si>

Dear users,

my problem is not directly related to this list, by I hope, you can help 
me. Is there any tool to convert standard Medline record to XML format. 
I know there is build in function (med2xml) in Pubmed, but I'm looking 
for some independent perl script.

Thanks in advance for any suggesions or pointers.

Cheers, Andrej


From cjfields at uiuc.edu  Fri Feb 10 12:01:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 11:01:27 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <001801c62e63$a4a71090$15327e82@pyrimidine>

I don't think there's anything like this in Bioperl, and I'm unfamilar with
the naming scheme you're using.  If you're searching for specific miRNA's, a
good resource looks like the miRNA database, which seems to be updated
regularly (http://microrna.sanger.ac.uk/sequences/) and uses the same system
for RNA annotation that you use (which, I'm guessing, is a standardized
annotation scheme of some sort).  I believe the database is downloadable and
searchable by name, so you could probably build a querying scheme using LWP
or HTTP::Request (if the web interface allows for this).  I know that Sean
Eddy's Rfam database (http://www.sanger.ac.uk/Software/Rfam/) also has
information on miRNA's, but it's somewhat limited. 


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Wednesday, February 08, 2006 3:45 PM
> To: 'bioperl-l'; bioperl-l-bounces at lists.open-bio.org
> Cc: James.R.Brown at gsk.com
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> Hi Chris--
> 
>         The problem I am solving is given a mature miRna 
> name, how do I use it to search for its pre/pri miRna and 
> vice versa. For example, how to go from mir-102a* to 
> hsa-mir-102a-1*. Yes, I can write a parser for it, but I'm 
> hoping that someone else has already done it and has some 
> bells and whistles to go with it.  Below is a hierarchy chart 
> of a data structure to hold the naming information. The 
> parsing is not trivial and given data in that structure there 
> could be all kinds of neat functions that return various 
> aspects of the names.
> 
> Barry
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> "Chris Fields" 
> Sent by: bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 17:40
>  
> To
> barry.m.dancis at gsk.com, "'bioperl-l'"  cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Are you talking about sequences or text output from a 
> specific program? If you are talking about sequences in a 
> particular format, then listen to Brian.  If you are talking 
> about output, then we need to know which program you're 
> using, as a parser may exist or could be built. 
> 
> There are a few modules in Bio::Tools that handle RNA (like 
> QRNA, tRNAscan-SE), so check those out first.  I'm currently 
> finishing up a Bio::Tools module for RNAMotif and have plans 
> for making an ERPIN parser.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> > barry.m.dancis at gsk.com
> > Sent: Tuesday, February 07, 2006 2:26 PM
> > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Handling miRNA's
> > 
> > It's the parser in particular that I need
> > 
> > 
> > 
> > 
> > "Brian Osborne"  Sent by: 
> > bioperl-l-bounces at lists.open-bio.org
> > 07-Feb-2006 12:05
> > 
> > To
> > barry.m.dancis at gsk.com, "bioperl-l" , 
> > bioperl-l-bounces at lists.open-bio.org
> > cc
> > 
> > Subject
> > Re: [Bioperl-l] Handling miRNA's
> > 
> > 
> > 
> > 
> > 
> > 
> > Barry,
> > 
> > If the sequence information is in one of the formats that Bioperl 
> > understands (Genbank, Swissprot flat, and so on) then the answer is 
> > yes.
> > This assumes that the details on sequence that you 
> mentioned are found 
> > in some sequence feature section in the file. But it looks 
> to me like 
> > there's no specialized parser for miRNA sequence per se, I'll be 
> > corrected if I'm wrong.
> > 
> > Brian O.
> > 
> > 
> > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> 
> > wrote:
> > 
> > > Hi --
> > > 
> > >         Are there any classes for manipulating miRNA's with
> > functions
> > such
> > > as parsing the name, storing and interlinking pri/pre/mat 
> sequences,
> > etc?
> > > 
> > > Thanks,
> > > 
> > > Barry
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 



From allenday at ucla.edu  Fri Feb 10 11:13:39 2006
From: allenday at ucla.edu (Allen Day)
Date: Fri, 10 Feb 2006 08:13:39 -0800 (PST)
Subject: [Bioperl-l] Medline to XML
In-Reply-To: <43ECA303.8090904@guest.arnes.si>
References: <43ECA303.8090904@guest.arnes.si>
Message-ID: 

why not just retrieve xml directly from the eutils service?

-allen

On Fri, 10 Feb 2006, Andrej Kastrin wrote:

> Dear users,
> 
> my problem is not directly related to this list, by I hope, you can help 
> me. Is there any tool to convert standard Medline record to XML format. 
> I know there is build in function (med2xml) in Pubmed, but I'm looking 
> for some independent perl script.
> 
> Thanks in advance for any suggesions or pointers.
> 
> Cheers, Andrej
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Fri Feb 10 12:15:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 10 Feb 2006 12:15:17 -0500
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
Message-ID: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>

Paul -

The reason for suggesting a change has to do with the instability of  
the CGI interface/format of the returned data, the text format is not  
a stable format from the webserver which reportedly will cease to be  
reliably parsed.  Yes we can keep hacking the blast parser code to  
handle this, but the bioperl release cycle is certainly not tied to  
the NCBI blast release cycle so I find it unsatisfying to know that  
we are going to have broken code when they change the output formats  
(but not know when).

Mostly I think we need to try and support something that will  
"ALWAYS" work so that individuals setting up webservices which rely  
on remote blast functionality.  In theory, netblast/blastcl3 should  
always work since NCBI has to update the exe when they change their  
server setup.

In terms of the web-based queues - I think the best change we can  
make is have the XML be the preferred retrieval method.

I also see value in providing a wrapper for netblast since it should  
look an awful lot like running blast locally.

Ideally I'd like to see a more extensible system, something like (and  
please feel free to come up with better names for the modules!):

Bio::Tools::Run::Blast
  -->             StandAlone (support for both WU-BLAST and NCBI- 
BLAST local binaries and MPI-BLAST too if simple)
  -->             RemoteNCBI (currently the RemoteBlast server)
  -->             RemoteEBISOAP (EBI has a nice SOAP interface that  
works quite well, but may not provide all the same databases as what  
people expect from NCBI)
  -->             RemoteNetBlast (blastcl3 or netblast local executable)
  (other things that people want)

[note: If these ideas are appealing or not, someone should archive  
the discussions and discussions on the wiki page so we can rely less  
on people searching the mailing archives for how a decision was  
made.  Perhaps Roger can do this sort of editing in addition to the  
planning for support of this module].

-jason

On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote:

> Hi Roger,
>
> I would definitely prefer a fully Perl-based implementation.  For  
> starters, I have not
> been successful in compiling the Toolkit that contains netblast for  
> some platforms (e.g.
> AIX 5.2 w/gcc 4.0).
>
> I haven't been following the discussion: is there some compelling  
> reason to prefer a
> netblast-based system that's come up recently?  I'm guessing that  
> adding a new non-perl
> dependency would only be done if there was considerable  
> justification for this type of
> change, but I'm not clear from your message what that justification  
> is.
>
> Paul
>
>
>
> ------------------------------
>
> Message: 12
> Date: Mon, 6 Feb 2006 20:46:44 -0600
> From: "Roger Hall" 
> Subject: [Bioperl-l] RemoteBlast users - potentially major changes -
>         please        reply
> To: 
> Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL>
> Content-Type: text/plain;        charset="us-ascii"
>
> To everyone who uses RemoteBlast.pm:
>
> Would anyone object to RemoteBlast being rewritten in a way that  
> requires
> NCBI's blastcl3 executable?
>
> Binary downloads of blastcl3 (column "netblast") are available for  
> numerous
> platforms at: http://ncbi.nih.gov/BLAST/download.shtml
>
> Does anyone require or desire a "pure perl" implementation? If so,  
> please
> explain the advantage you see with such an implementation.
>
> Thanks!
>
>
> Roger Hall
>
> Technical Director
>
> MidSouth Bioinformatics Center
>
> University of Arkansas at Little Rock
>
> (501) 569-8074
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From hubert.prielinger at gmx.at  Fri Feb 10 11:26:47 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 10 Feb 2006 10:26:47 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	blast	output
In-Reply-To: <43EC606A.20003@esat.kuleuven.be>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>	
	<43EC5497.3050505@esat.kuleuven.be>
	<43EC606A.20003@esat.kuleuven.be>
Message-ID: <43ECBEC7.7040506@gmx.at>

Hi,
I'm sorry for disturbing once more. Yesterday the script was working, 
today it isn't working at all, but I didn't change anything, I get the 
following error message:

------------- EXCEPTION  -------------
MSG: Could not open comp80swiss2114.txt: No such file or directory
STACK Bio::Root::IO::_initialize_io 
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
STACK toplevel ./Blast.pl:14

--------------------------------------

the file exists and the bug I have fixed yesterday
thanks for help

Hubert




Pieter Monsieurs wrote:

> Sorry for disturbing. I now works correctly with the bug fix of Chris. 
> Thanx,
> Pieter
>
> Pieter Monsieurs wrote:
>
>>Hi Chris,
>>
>>The parsing of the Blast output still doesn't work for me with the bug 
>>fix download of blast.pm.
>>The module keeps turning around in the while loop at line 487 looking 
>>for a database or query-size:
>>
>>while( defined ($_) ) {
>>	if( /^Database:/ ) {
>>		$self->_pushback($_);
>>		last;
>>	}
>>	chomp;               
>>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>>		$size = $1;
>>		$size =~ s/,//g;
>>		last;
>>	} else {
>>		$q .= " $_";
>>		$q =~ s/ +/ /g;
>>		$q =~ s/^ | $//g;
>>	}
>>	$_ = $self->_readline;
>>}
>>
>>
>>The code keeps looking for the database information, however - as you 
>>mentioned - this information is given before the query line in the new 
>>Blast output format.
>>This way, all hits and hsps are stored in the query_description 
>>($hit->query_description), no hits are found and query_length is 0.
>>Because you already adapted the module to retrieve database information 
>>at another position in the module, deleting the while loop and adding 
>>the following lines after $_ = $self->_readline (line 486), worked fine 
>>for me (using blastn and blastp):
>>
>>if (/Length=([\d,]+)/) {
>>	$size = $1;
>>	$size =~ s/,//g;
>>}
>>
>>
>>Regards,
>>Pieter
>>
>>
>>
>>Chris Fields wrote:
>>
>>  
>>
>>>From 'perldoc Bio::SearchIO::blast':
>>>
>>>DESCRIPTION
>>>       This object encapsulated the necessary methods for generating  
>>>events
>>>       suitable for building Bio::Search objects from a BLAST report  
>>>file.
>>>       Read the Bio::SearchIO for more information about how to use  
>>>this.
>>>
>>>       This driver can parse:
>>>
>>>       o   NCBI produced plain text BLAST reports from blastall,  
>>>this also
>>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>>reports.  NCBI
>>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>>
>>>       o   WU-BLAST all reports
>>>
>>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>>BLAT)
>>>
>>>       o   BLAST-like output from Paracel BTK output
>>>
>>>So, it should.  Let us know if it doesn't.
>>>
>>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>>
>>> 
>>>
>>>    
>>>
>>>>Hi Chris,
>>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>>right, I had only to change the blast.pm file, it is working very  
>>>>fine, thank you very much, and you are right, you have mentioned it  
>>>>ealier either to change the file... ;)
>>>>
>>>>but I have another question: does it work with the WU-Blast output  
>>>>too?
>>>>regards
>>>>Hubert
>>>>
>>>>
>>>>Chris Fields wrote:
>>>>
>>>>   
>>>>
>>>>      
>>>>
>>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>>have we
>>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>>
>>>>>Hubert, if you're out there, I sent you an email with an  
>>>>>attachment.  You
>>>>>said the output looks like what you were expecting.  So I think we  
>>>>>have two
>>>>>problems:
>>>>>
>>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>>takes so
>>>>>long should tell you something's seriously wrong there.  Strip  
>>>>>that part out
>>>>>and start with a simple script, say, like the one Jason or that I  
>>>>>sent you;
>>>>>the script I used to generate that output works fine (on two OS's,  
>>>>>WinXP and
>>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>>command line
>>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>>scripts, esp. when they run debugging.
>>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>>will still
>>>>>not work whenever the text blast output has the following header,  
>>>>>which
>>>>>comes from the new web version of BLAST:
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>>
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>Length=193
>>>>>.......
>>>>>-----------------------------------------------------
>>>>>
>>>>>It will work if the text output has the following header (or is an  
>>>>>older
>>>>>version of BLAST):
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search
>>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>       (193 letters)
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>>-----------------------------------------------------
>>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>>BLAST files.
>>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>>
>>>>>If you want the really gory details on why this is a problem, look  
>>>>>here:
>>>>>
>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>>So, any text output with the above header will not work; it will  
>>>>>either hang
>>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>>patience).  If you
>>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>>reiterate
>>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>>kill me if
>>>>>blows your computer up ;>
>>>>>Here's the direct link:
>>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>>it's version
>>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>>sorry
>>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>>you've tried
>>>>>this, and I believe I've suggested this fix before.
>>>>>
>>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>>prev.
>>>>>message) with this file.  Make sure the filename stays the same  
>>>>>(blast.pm).
>>>>>
>>>>>Run everything again, one file at a time.  Make sure you use  
>>>>>Jason's script
>>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>>multiple
>>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>>file checks.
>>>>>
>>>>>
>>>>>Here's a small chunk of output from one of your blast files using the
>>>>>modifed script I sent you:
>>>>>
>>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>>Query:   1  RWKWKRKK  8
>>>>>Seq:     542  RWAWRRKK  549
>>>>>
>>>>>Look familiar?
>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>>February 09, 2006 3:24 PM
>>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>>
>>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>>that Chris already solved the issue).  ;}
>>>>>>
>>>>>>Thanks!
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>>Prielinger
>>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>>Stajich
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>dear roger,
>>>>>>this error message I got, when I tried to parse Blast output  
>>>>>>(version
>>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>>don't get any error message.....it just doesn't work
>>>>>>
>>>>>>Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>>Roger Hall wrote:
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Guys - I'm looking at the error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>This is my line of thought:
>>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>blast.pm
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>in
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>one
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>location only at the point of a. reading three lines b.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>dropping lines
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Match lines (0
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>><= $i <
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>3)
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>error message
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>expression
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>4. It does anyway
>>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>in the blast
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>reports
>>>>>>>
>>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>the string
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>additional
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>file, assuming that I didn't have it.
>>>>>>>
>>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>>on "Fedora Core 9".
>>>>>>>
>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Did I misread the issue entirely? :}
>>>>>>>
>>>>>>>Roger
>>>>>>>
>>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Chris Fields
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>>To: Hubert Prielinger
>>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>>parsing Blast output
>>>>>>>>
>>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hi chris,
>>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>working,
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>have to parse
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>a lot of textfiles....
>>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>>
>>>>>>>>>regards
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>can't really
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>determine the problem.
>>>>>>>>
>>>>>>>>If you are still getting the same error like this I am not
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>convinced
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>that NCBI
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>Query/Sbjct
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>>September.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>parsing,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>are you sure your logic is correct?
>>>>>>>>
>>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>>
>>>>>>>>
>>>>>>>>while (my $result = $search->next_result) {
>>>>>>>>  print $result->query_name, "\n";
>>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>>	print $hit->name, "\n";
>>>>>>>>      #iterate over each HSP in the hit
>>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hit_string, "\n";	
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>and me with a
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>>and it
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>seemed
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>that he hasn't
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>installed the latest fix.
>>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>2.2.13),
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>even though the returned output was from nr, the top of the  
>>>>>>>blast output showed that it was v2.2.12:
>>>>>>>
>>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>>
>>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>>-------------------------------------
>>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>>
>>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>>-------------------------------------
>>>>>>>
>>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>>the header looks like this:
>>>>>>>
>>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>>
>>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>>            
>>>>>>>
>>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>>          
>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>outputs a new
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>To clarify some stuff -
>>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>for BLAST
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>format and
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>and parse
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>if you have
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>a static binary on your machine =).  I think for
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>remoteblast the XML
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>>people use on a regular basis.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Does XML lack some specific info that text output has?
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Didn't know that.
>>>>>>I
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>>RemoteBlast.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-jason
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>>(1.5.1) or
>>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>
>>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>bioperl 1.5.1,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>realize it hasn't
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>been committed yet and may change.
>>>>>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>Of Hubert
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>>Prielinger
>>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>output
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>is that a bug......
>>>>>>>>>>>
>>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>>get anything.....
>>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>>
>>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>bioperl version
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>I had installed
>>>>>>>>>>>
>>>>>>>>>>>thanks in advance
>>>>>>>>>>>
>>>>>>>>>>>Hubert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>--
>>>>>>>>Jason Stajich
>>>>>>>>Duke University
>>>>>>>>http://www.duke.edu/~jes12
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Christopher Fields
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> 
>>>
>>>    
>>>
>>
>>
>>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>  
>>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> information.
>



From cjfields at uiuc.edu  Fri Feb 10 12:45:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 11:45:32 -0600
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
Message-ID: <002201c62e69$ca8363d0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Friday, February 10, 2006 11:15 AM
> To: Paul.Boutros at utoronto.ca
> Cc: BioPerl Mailing List
> Subject: [Bioperl-l] Remote BLAST support discussion
> 
> Paul -
> 
> The reason for suggesting a change has to do with the 
> instability of the CGI interface/format of the returned data, 
> the text format is not a stable format from the webserver 
> which reportedly will cease to be reliably parsed.  Yes we 
> can keep hacking the blast parser code to handle this, but 
> the bioperl release cycle is certainly not tied to the NCBI 
> blast release cycle so I find it unsatisfying to know that we 
> are going to have broken code when they change the output 
> formats (but not know when).
> 
> Mostly I think we need to try and support something that will 
> "ALWAYS" work so that individuals setting up webservices 
> which rely on remote blast functionality.  In theory, 
> netblast/blastcl3 should always work since NCBI has to update 
> the exe when they change their server setup.
> 
> In terms of the web-based queues - I think the best change we 
> can make is have the XML be the preferred retrieval method.
> 
> I also see value in providing a wrapper for netblast since it 
> should look an awful lot like running blast locally.
> 
> Ideally I'd like to see a more extensible system, something 
> like (and please feel free to come up with better names for 
> the modules!):
> 
> Bio::Tools::Run::Blast
>   -->             StandAlone (support for both WU-BLAST and NCBI-> BLAST
local binaries and MPI-BLAST too if simple)
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that works
quite well, but may not provide all the same databases as what people expect
from NCBI)
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Sounds good to me.  I think any wrapper for netblast could most easily be
based on StandAloneBlast; the parameters look pretty much identical, though
it'll probably need a little configuring as a quick text search through
StandAloneBlast didn't show any 'xml' tags.  Roger seemed to agree on this.
 
> [note: If these ideas are appealing or not, someone should 
> archive the discussions and discussions on the wiki page so 
> we can rely less on people searching the mailing archives for 
> how a decision was made.  Perhaps Roger can do this sort of 
> editing in addition to the planning for support of this module].
> 
> -jason
> 
> On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote:
> 
> > Hi Roger,
> >
> > I would definitely prefer a fully Perl-based implementation.  For 
> > starters, I have not been successful in compiling the Toolkit that 
> > contains netblast for some platforms (e.g.
> > AIX 5.2 w/gcc 4.0).
> >
> > I haven't been following the discussion: is there some compelling 
> > reason to prefer a netblast-based system that's come up 
> recently?  I'm 
> > guessing that adding a new non-perl dependency would only 
> be done if 
> > there was considerable justification for this type of 
> change, but I'm 
> > not clear from your message what that justification is.
> >
> > Paul
> >
> >
> >
> > ------------------------------
> >
> > Message: 12
> > Date: Mon, 6 Feb 2006 20:46:44 -0600
> > From: "Roger Hall" 
> > Subject: [Bioperl-l] RemoteBlast users - potentially major changes -
> >         please        reply
> > To: 
> > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL>
> > Content-Type: text/plain;        charset="us-ascii"
> >
> > To everyone who uses RemoteBlast.pm:
> >
> > Would anyone object to RemoteBlast being rewritten in a way that 
> > requires NCBI's blastcl3 executable?
> >
> > Binary downloads of blastcl3 (column "netblast") are available for 
> > numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml
> >
> > Does anyone require or desire a "pure perl" implementation? If so, 
> > please explain the advantage you see with such an implementation.
> >
> > Thanks!
> >
> >
> > Roger Hall
> >
> > Technical Director
> >
> > MidSouth Bioinformatics Center
> >
> > University of Arkansas at Little Rock
> >
> > (501) 569-8074
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  



From rahall2 at ualr.edu  Fri Feb 10 12:54:23 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 10 Feb 2006 11:54:23 -0600
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <002201c62e69$ca8363d0$15327e82@pyrimidine>
Message-ID: <002501c62e6b$0686be30$d416a790@LIBERAL>

It seems so obvious now. :}

The only issue I see is likely obvious to those of you who have maintained
this over the years - no backward compatibility, but I can live with that if
yall can.

I will document on wikki as suggested and then build the RemoteNCBI module
described. After that is tested and committed, I will contact Torsten to see
if I can help with the rest.

Thanks!

Roger 

> 
> Bio::Tools::Run::Blast
>   -->             StandAlone (support for both WU-BLAST and NCBI-> BLAST
local binaries and MPI-BLAST too if simple)
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that works
quite well, but may not provide all the same databases as what people expect
from NCBI)
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Sounds good to me.  I think any wrapper for netblast could most easily be
based on StandAloneBlast; the parameters look pretty much identical, though
it'll probably need a little configuring as a quick text search through
StandAloneBlast didn't show any 'xml' tags.  Roger seemed to agree on this.
 




From rahall2 at ualr.edu  Fri Feb 10 13:00:51 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 10 Feb 2006 12:00:51 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't
	work	parsing	blast	output
In-Reply-To: <43ECBEC7.7040506@gmx.at>
Message-ID: <002701c62e6b$edd845b0$d416a790@LIBERAL>

Hubert,

I got the same message when I first ran your script. The issue for me was
that "readdir(DIR)" doesn't return the full path, only the file name.

I edited your script to include:

	$file = $directory . '/' . $file;

just before the Bio::SearchIO call.

Roger


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
Sent: Friday, February 10, 2006 10:27 AM
To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; rahall2 at ualr.edu
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
output

Hi,
I'm sorry for disturbing once more. Yesterday the script was working, 
today it isn't working at all, but I didn't change anything, I get the 
following error message:

------------- EXCEPTION  -------------
MSG: Could not open comp80swiss2114.txt: No such file or directory
STACK Bio::Root::IO::_initialize_io 
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
STACK toplevel ./Blast.pl:14

--------------------------------------

the file exists and the bug I have fixed yesterday
thanks for help

Hubert




Pieter Monsieurs wrote:

> Sorry for disturbing. I now works correctly with the bug fix of Chris. 
> Thanx,
> Pieter
>
> Pieter Monsieurs wrote:
>
>>Hi Chris,
>>
>>The parsing of the Blast output still doesn't work for me with the bug 
>>fix download of blast.pm.
>>The module keeps turning around in the while loop at line 487 looking 
>>for a database or query-size:
>>
>>while( defined ($_) ) {
>>	if( /^Database:/ ) {
>>		$self->_pushback($_);
>>		last;
>>	}
>>	chomp;               
>>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>>		$size = $1;
>>		$size =~ s/,//g;
>>		last;
>>	} else {
>>		$q .= " $_";
>>		$q =~ s/ +/ /g;
>>		$q =~ s/^ | $//g;
>>	}
>>	$_ = $self->_readline;
>>}
>>
>>
>>The code keeps looking for the database information, however - as you 
>>mentioned - this information is given before the query line in the new 
>>Blast output format.
>>This way, all hits and hsps are stored in the query_description 
>>($hit->query_description), no hits are found and query_length is 0.
>>Because you already adapted the module to retrieve database information 
>>at another position in the module, deleting the while loop and adding 
>>the following lines after $_ = $self->_readline (line 486), worked fine 
>>for me (using blastn and blastp):
>>
>>if (/Length=([\d,]+)/) {
>>	$size = $1;
>>	$size =~ s/,//g;
>>}
>>
>>
>>Regards,
>>Pieter
>>
>>
>>
>>Chris Fields wrote:
>>
>>  
>>
>>>From 'perldoc Bio::SearchIO::blast':
>>>
>>>DESCRIPTION
>>>       This object encapsulated the necessary methods for generating  
>>>events
>>>       suitable for building Bio::Search objects from a BLAST report  
>>>file.
>>>       Read the Bio::SearchIO for more information about how to use  
>>>this.
>>>
>>>       This driver can parse:
>>>
>>>       o   NCBI produced plain text BLAST reports from blastall,  
>>>this also
>>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>>reports.  NCBI
>>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>>
>>>       o   WU-BLAST all reports
>>>
>>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>>BLAT)
>>>
>>>       o   BLAST-like output from Paracel BTK output
>>>
>>>So, it should.  Let us know if it doesn't.
>>>
>>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>>
>>> 
>>>
>>>    
>>>
>>>>Hi Chris,
>>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>>right, I had only to change the blast.pm file, it is working very  
>>>>fine, thank you very much, and you are right, you have mentioned it  
>>>>ealier either to change the file... ;)
>>>>
>>>>but I have another question: does it work with the WU-Blast output  
>>>>too?
>>>>regards
>>>>Hubert
>>>>
>>>>
>>>>Chris Fields wrote:
>>>>
>>>>   
>>>>
>>>>      
>>>>
>>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>>have we
>>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>>
>>>>>Hubert, if you're out there, I sent you an email with an  
>>>>>attachment.  You
>>>>>said the output looks like what you were expecting.  So I think we  
>>>>>have two
>>>>>problems:
>>>>>
>>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>>takes so
>>>>>long should tell you something's seriously wrong there.  Strip  
>>>>>that part out
>>>>>and start with a simple script, say, like the one Jason or that I  
>>>>>sent you;
>>>>>the script I used to generate that output works fine (on two OS's,  
>>>>>WinXP and
>>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>>command line
>>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>>scripts, esp. when they run debugging.
>>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>>will still
>>>>>not work whenever the text blast output has the following header,  
>>>>>which
>>>>>comes from the new web version of BLAST:
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>>
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>Length=193
>>>>>.......
>>>>>-----------------------------------------------------
>>>>>
>>>>>It will work if the text output has the following header (or is an  
>>>>>older
>>>>>version of BLAST):
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search
>>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>       (193 letters)
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>>-----------------------------------------------------
>>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>>BLAST files.
>>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>>
>>>>>If you want the really gory details on why this is a problem, look  
>>>>>here:
>>>>>
>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>>So, any text output with the above header will not work; it will  
>>>>>either hang
>>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>>patience).  If you
>>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>>reiterate
>>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>>kill me if
>>>>>blows your computer up ;>
>>>>>Here's the direct link:
>>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>>it's version
>>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>>sorry
>>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>>you've tried
>>>>>this, and I believe I've suggested this fix before.
>>>>>
>>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>>prev.
>>>>>message) with this file.  Make sure the filename stays the same  
>>>>>(blast.pm).
>>>>>
>>>>>Run everything again, one file at a time.  Make sure you use  
>>>>>Jason's script
>>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>>multiple
>>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>>file checks.
>>>>>
>>>>>
>>>>>Here's a small chunk of output from one of your blast files using the
>>>>>modifed script I sent you:
>>>>>
>>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>>Query:   1  RWKWKRKK  8
>>>>>Seq:     542  RWAWRRKK  549
>>>>>
>>>>>Look familiar?
>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>>February 09, 2006 3:24 PM
>>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>>
>>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>>that Chris already solved the issue).  ;}
>>>>>>
>>>>>>Thanks!
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>>Prielinger
>>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>>Stajich
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>dear roger,
>>>>>>this error message I got, when I tried to parse Blast output  
>>>>>>(version
>>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>>don't get any error message.....it just doesn't work
>>>>>>
>>>>>>Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>>Roger Hall wrote:
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Guys - I'm looking at the error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>This is my line of thought:
>>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>blast.pm
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>in
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>one
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>location only at the point of a. reading three lines b.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>dropping lines
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Match lines (0
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>><= $i <
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>3)
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>error message
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>expression
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>4. It does anyway
>>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>in the blast
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>reports
>>>>>>>
>>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>the string
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>additional
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>file, assuming that I didn't have it.
>>>>>>>
>>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>>on "Fedora Core 9".
>>>>>>>
>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Did I misread the issue entirely? :}
>>>>>>>
>>>>>>>Roger
>>>>>>>
>>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Chris Fields
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>>To: Hubert Prielinger
>>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>>parsing Blast output
>>>>>>>>
>>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hi chris,
>>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>working,
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>have to parse
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>a lot of textfiles....
>>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>>
>>>>>>>>>regards
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>can't really
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>determine the problem.
>>>>>>>>
>>>>>>>>If you are still getting the same error like this I am not
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>convinced
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>that NCBI
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>Query/Sbjct
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>>September.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>parsing,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>are you sure your logic is correct?
>>>>>>>>
>>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>>
>>>>>>>>
>>>>>>>>while (my $result = $search->next_result) {
>>>>>>>>  print $result->query_name, "\n";
>>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>>	print $hit->name, "\n";
>>>>>>>>      #iterate over each HSP in the hit
>>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hit_string, "\n";	
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>and me with a
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>>and it
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>seemed
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>that he hasn't
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>installed the latest fix.
>>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>2.2.13),
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>even though the returned output was from nr, the top of the  
>>>>>>>blast output showed that it was v2.2.12:
>>>>>>>
>>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>>
>>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>>-------------------------------------
>>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>>
>>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>>-------------------------------------
>>>>>>>
>>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>>the header looks like this:
>>>>>>>
>>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>>
>>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>>            
>>>>>>>
>>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>>          
>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>outputs a new
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>To clarify some stuff -
>>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>for BLAST
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>format and
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>and parse
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>if you have
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>a static binary on your machine =).  I think for
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>remoteblast the XML
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>>people use on a regular basis.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Does XML lack some specific info that text output has?
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Didn't know that.
>>>>>>I
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>>RemoteBlast.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-jason
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>>(1.5.1) or
>>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>
>>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>bioperl 1.5.1,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>realize it hasn't
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>been committed yet and may change.
>>>>>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>Of Hubert
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>>Prielinger
>>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>output
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>is that a bug......
>>>>>>>>>>>
>>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>>get anything.....
>>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>>
>>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>bioperl version
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>I had installed
>>>>>>>>>>>
>>>>>>>>>>>thanks in advance
>>>>>>>>>>>
>>>>>>>>>>>Hubert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>--
>>>>>>>>Jason Stajich
>>>>>>>>Duke University
>>>>>>>>http://www.duke.edu/~jes12
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Christopher Fields
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> 
>>>
>>>    
>>>
>>
>>
>>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>  
>>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> information.
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Fri Feb 10 13:08:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 12:08:37 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't
	work	parsing	blast	output
In-Reply-To: <002701c62e6b$edd845b0$d416a790@LIBERAL>
Message-ID: <002501c62e6d$04158530$15327e82@pyrimidine>

Makes sense.  I didn't see this since I passed the files directly from
command-line.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Roger Hall [mailto:rahall2 at ualr.edu] 
> Sent: Friday, February 10, 2006 12:01 PM
> To: 'Hubert Prielinger'; 'Pieter Monsieurs'; 
> bioperl-l at bioperl.org; 'Chris Fields'
> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing blast output
> 
> Hubert,
> 
> I got the same message when I first ran your script. The 
> issue for me was that "readdir(DIR)" doesn't return the full 
> path, only the file name.
> 
> I edited your script to include:
> 
> 	$file = $directory . '/' . $file;
> 
> just before the Bio::SearchIO call.
> 
> Roger
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Hubert Prielinger
> Sent: Friday, February 10, 2006 10:27 AM
> To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; 
> rahall2 at ualr.edu
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing blast output
> 
> Hi,
> I'm sorry for disturbing once more. Yesterday the script was 
> working, today it isn't working at all, but I didn't change 
> anything, I get the following error message:
> 
> ------------- EXCEPTION  -------------
> MSG: Could not open comp80swiss2114.txt: No such file or 
> directory STACK Bio::Root::IO::_initialize_io
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
> STACK Bio::Root::IO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
> STACK toplevel ./Blast.pl:14
> 
> --------------------------------------
> 
> the file exists and the bug I have fixed yesterday thanks for help
> 
> Hubert
> 
> 
> 
> 
> Pieter Monsieurs wrote:
> 
> > Sorry for disturbing. I now works correctly with the bug 
> fix of Chris. 
> > Thanx,
> > Pieter
> >
> > Pieter Monsieurs wrote:
> >
> >>Hi Chris,
> >>
> >>The parsing of the Blast output still doesn't work for me 
> with the bug 
> >>fix download of blast.pm.
> >>The module keeps turning around in the while loop at line 
> 487 looking 
> >>for a database or query-size:
> >>
> >>while( defined ($_) ) {
> >>	if( /^Database:/ ) {
> >>		$self->_pushback($_);
> >>		last;
> >>	}
> >>	chomp;               
> >>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
> >>		$size = $1;
> >>		$size =~ s/,//g;
> >>		last;
> >>	} else {
> >>		$q .= " $_";
> >>		$q =~ s/ +/ /g;
> >>		$q =~ s/^ | $//g;
> >>	}
> >>	$_ = $self->_readline;
> >>}
> >>
> >>
> >>The code keeps looking for the database information, 
> however - as you 
> >>mentioned - this information is given before the query line 
> in the new 
> >>Blast output format.
> >>This way, all hits and hsps are stored in the query_description 
> >>($hit->query_description), no hits are found and query_length is 0.
> >>Because you already adapted the module to retrieve database 
> >>information at another position in the module, deleting the 
> while loop 
> >>and adding the following lines after $_ = $self->_readline 
> (line 486), 
> >>worked fine for me (using blastn and blastp):
> >>
> >>if (/Length=([\d,]+)/) {
> >>	$size = $1;
> >>	$size =~ s/,//g;
> >>}
> >>
> >>
> >>Regards,
> >>Pieter
> >>
> >>
> >>
> >>Chris Fields wrote:
> >>
> >>  
> >>
> >>>From 'perldoc Bio::SearchIO::blast':
> >>>
> >>>DESCRIPTION
> >>>       This object encapsulated the necessary methods for 
> generating 
> >>>events
> >>>       suitable for building Bio::Search objects from a 
> BLAST report 
> >>>file.
> >>>       Read the Bio::SearchIO for more information about 
> how to use 
> >>>this.
> >>>
> >>>       This driver can parse:
> >>>
> >>>       o   NCBI produced plain text BLAST reports from blastall,  
> >>>this also
> >>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq 
> >>>reports.  NCBI
> >>>           XML BLAST output is parsed with the blastxml SearchIO 
> >>>driver
> >>>
> >>>       o   WU-BLAST all reports
> >>>
> >>>       o   Jim Kent's BLAST-like output from his programs 
> (BLASTZ,  
> >>>BLAT)
> >>>
> >>>       o   BLAST-like output from Paracel BTK output
> >>>
> >>>So, it should.  Let us know if it doesn't.
> >>>
> >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
> >>>
> >>> 
> >>>
> >>>    
> >>>
> >>>>Hi Chris,
> >>>>I'm incredibly sorry for causing so much inconvenience, 
> yes you are 
> >>>>right, I had only to change the blast.pm file, it is working very 
> >>>>fine, thank you very much, and you are right, you have 
> mentioned it 
> >>>>ealier either to change the file... ;)
> >>>>
> >>>>but I have another question: does it work with the 
> WU-Blast output 
> >>>>too?
> >>>>regards
> >>>>Hubert
> >>>>
> >>>>
> >>>>Chris Fields wrote:
> >>>>
> >>>>   
> >>>>
> >>>>      
> >>>>
> >>>>>Ha!  I come back from meeting and there's a billion 
> emails!  What 
> >>>>>have we started? ;p .  Sorry about this Jason; I know 
> you're busy.
> >>>>>
> >>>>>Hubert, if you're out there, I sent you an email with an 
> >>>>>attachment.  You said the output looks like what you were 
> >>>>>expecting.  So I think we have two
> >>>>>problems:
> >>>>>
> >>>>>1)  I haven't delved into the file scanning, but the 
> fact that it 
> >>>>>takes so long should tell you something's seriously 
> wrong there.  
> >>>>>Strip that part out and start with a simple script, say, 
> like the 
> >>>>>one Jason or that I sent you; the script I used to generate that 
> >>>>>output works fine (on two OS's, WinXP and Mac OS X).  
> Use it on one 
> >>>>>file at a time.  Do everything on command line (not through 
> >>>>>Eclipse).  IDE's can be notoriously flaky about running scripts, 
> >>>>>esp. when they run debugging.
> >>>>>2) Even if you have bioperl-1.5.1 installed, 
> Bio::SearchIO::blast 
> >>>>>will still not work whenever the text blast output has the 
> >>>>>following header, which comes from the new web version of BLAST:
> >>>>>
> >>>>>-----------------------------------------------------
> >>>>>BLASTP 2.2.13 [Nov-27-2005]
> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and 
> David J.  
> >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of 
> >>>>>protein database search programs", Nucleic Acids Res. 
> 25:3389-3402.
> >>>>>
> >>>>>RID: 1139501210-857-165793005128.BLASTQ1
> >>>>>
> >>>>>
> >>>>>Database: All non-redundant GenBank CDS
> >>>>>translations+PDB+SwissProt+PIR+PRF excluding 
> environmental samples
> >>>>>         3,292,813 sequences; 1,128,164,434 total 
> letters Query=  
> >>>>>NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium 
> >>>>>tuberculosis H37Rv].
> >>>>>Length=193
> >>>>>.......
> >>>>>-----------------------------------------------------
> >>>>>
> >>>>>It will work if the text output has the following header 
> (or is an 
> >>>>>older version of BLAST):
> >>>>>
> >>>>>-----------------------------------------------------
> >>>>>BLASTP 2.2.12 [Aug-07-2005]
> >>>>>
> >>>>>
> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
> >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of 
> >>>>>protein database search programs",  Nucleic Acids Res. 
> >>>>>25:3389-3402.
> >>>>>
> >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR 
> [Mycobacterium 
> >>>>>tuberculosis H37Rv].
> >>>>>       (193 letters)
> >>>>>
> >>>>>Database: All non-redundant GenBank CDS
> >>>>>translations+PDB+SwissProt+PIR+PRF excluding 
> environmental samples
> >>>>>         2,895,325 sequences; 997,103,285 total letters
> >>>>>-----------------------------------------------------
> >>>>>You have the former (2.2.13) version.  I know b/c I have 
> your BLAST 
> >>>>>files.
> >>>>>Therefore, even bioperl-1.5.1 will not work!
> >>>>>
> >>>>>If you want the really gory details on why this is a 
> problem, look
> >>>>>here:
> >>>>>
> >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>
> >>>>>So, any text output with the above header will not work; it will 
> >>>>>either hang or end abruptly (depending on OS, perl 
> version, memory, 
> >>>>>patience).  If you look in the above, I have added a preliminary 
> >>>>>fix for this.  I'll reiterate for the billionth time, it hasn't 
> >>>>>been committed yet, so don't kill me if blows your 
> computer up ;> 
> >>>>>Here's the direct link:
> >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
> >>>>>This is a modified version of Bio::SearchIO::blast.pm 
> (it says it's 
> >>>>>version 1.90, but it's lying, I didn't change the 
> version, only the 
> >>>>>regex; sorry Jason).  From what you've been posting it doesn't 
> >>>>>sound like you've tried this, and I believe I've 
> suggested this fix 
> >>>>>before.
> >>>>>
> >>>>>Replace the one in your Bio/SearchIO directory (which looks like 
> >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging 
> from your 
> >>>>>prev.
> >>>>>message) with this file.  Make sure the filename stays the same 
> >>>>>(blast.pm).
> >>>>>
> >>>>>Run everything again, one file at a time.  Make sure you use 
> >>>>>Jason's script as well as the one I sent you.  Do NOT rely on 
> >>>>>running through multiple files yet.  Fix one bug at a time.  And 
> >>>>>heed Joel's words about file checks.
> >>>>>
> >>>>>
> >>>>>Here's a small chunk of output from one of your blast 
> files using 
> >>>>>the modifed script I sent you:
> >>>>>
> >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
> >>>>>Query:   1  RWKWKRKK  8
> >>>>>Seq:     542  RWAWRRKK  549
> >>>>>
> >>>>>Look familiar?
> >>>>>
> >>>>>Christopher Fields
> >>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>University of Illinois Urbana-Champaign
> >>>>>
> >>>>>     
> >>>>>
> >>>>>        
> >>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, 
> >>>>>>February 09, 2006 3:24 PM
> >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
> >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't 
> work parsing 
> >>>>>>Blast output
> >>>>>>
> >>>>>>In other words, yes, I'm on the wrong trail. :}
> >>>>>>
> >>>>>>Sorry - I'll look at the output issue this evening (or realize 
> >>>>>>that Chris already solved the issue).  ;}
> >>>>>>
> >>>>>>Thanks!
> >>>>>>
> >>>>>>Roger
> >>>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf 
> Of Hubert 
> >>>>>>Prielinger
> >>>>>>Sent: Thursday, February 09, 2006 2:14 PM
> >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris 
> Fields; Jason 
> >>>>>>Stajich
> >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't 
> work parsing 
> >>>>>>Blast output
> >>>>>>
> >>>>>>dear roger,
> >>>>>>this error message I got, when I tried to parse Blast output 
> >>>>>>(version
> >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, 
> because I have 
> >>>>>>a lot of Blast output files with version 2.2.13 and for that I 
> >>>>>>don't get any error message.....it just doesn't work
> >>>>>>
> >>>>>>Hubert
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>Roger Hall wrote:
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>Guys - I'm looking at the error message:
> >>>>>>>
> >>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>STACK toplevel
> >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>Blast.pl:21
> >>>>>>>
> >>>>>>>This is my line of thought:
> >>>>>>>1. "no data for midline $_" is a unique message generated by
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>blast.pm
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>in
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>one
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>location only at the point of a. reading three lines b.
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>dropping lines
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>with spaces only c. identifying the Query, Midline, and
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Match lines (0
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>><= $i <
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>3)
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>2. There is a regexp match that fails in order to reach that
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>error message
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>expression
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>4. It does anyway
> >>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>in the blast
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>reports
> >>>>>>>
> >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>the string
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>additional
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>file, assuming that I didn't have it.
> >>>>>>>
> >>>>>>>My next thought is to write a quick script to test 
> perl behavior 
> >>>>>>>on "Fedora Core 9".
> >>>>>>>
> >>>>>>>Thoughts?
> >>>>>>>
> >>>>>>>Did I misread the issue entirely? :}
> >>>>>>>
> >>>>>>>Roger
> >>>>>>>
> >>>>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Chris Fields
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
> >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
> >>>>>>>Cc: bioperl-l at bioperl.org
> >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> >>>>>>>parsing Blast output
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>-----Original Message-----
> >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
> >>>>>>>>To: Hubert Prielinger
> >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
> >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> >>>>>>>>parsing Blast output
> >>>>>>>>
> >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>hi chris,
> >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>working,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>do you have any ohter idea, the problem I have is that I
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>have to parse
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>a lot of textfiles....
> >>>>>>>>>or shall I look for another option to parse those files...
> >>>>>>>>>
> >>>>>>>>>regards
> >>>>>>>>>Hubert
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
> >>>>>>>>2.2.13 reports but unless you post your blast report we
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>can't really
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>determine the problem.
> >>>>>>>>
> >>>>>>>>If you are still getting the same error like this I am not
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>convinced
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>that NCBI
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>changed the HSP result format to remove the ':' from the
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>Query/Sbjct
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>prefixes.  We fixed this as soon as it was apparent 
> sometime in 
> >>>>>>>>September.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>>>>>STACK toplevel
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>>Blast.pl:21
> >>>>>>>>
> >>>>>>>>If you are just getting no results but also no warnings wrt
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>parsing,
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>are you sure your logic is correct?
> >>>>>>>>
> >>>>>>>>If you remove your filters do you see all the HSPS?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>while (my $result = $search->next_result) {
> >>>>>>>>  print $result->query_name, "\n";
> >>>>>>>>  #iterate over each hit on the query sequence
> >>>>>>>>  while (my $hit = $result->next_hit) {
> >>>>>>>>	print $hit->name, "\n";
> >>>>>>>>      #iterate over each HSP in the hit
> >>>>>>>>      while (my $hsp = $hit->next_hsp) {
> >>>>>>>>	 print $hsp->evalue, " ", 
> $hsp->length('sbjct'), " ", $hsp-
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>hit_string, "\n";	
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>     }
> >>>>>>>> }
> >>>>>>>>}
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>I tested some of the BLAST results that Hubert sent Roger
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>and me with a
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>similar script to the above.  I removed the file parsing logic 
> >>>>>>>and it
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>seemed
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>to work just fine.  It may very well be a logic issue or
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>that he hasn't
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>installed the latest fix.
> >>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>2.2.13),
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>even though the returned output was from nr, the top 
> of the blast 
> >>>>>>>output showed that it was v2.2.12:
> >>>>>>>
> >>>>>>>BLASTP 2.2.12 [Aug-07-2005]
> >>>>>>>
> >>>>>>>I double-checked my local version and it's definitely v.2.2.13:
> >>>>>>>-------------------------------------
> >>>>>>>C:\Perl\Scripts>blastcl3 -
> >>>>>>>
> >>>>>>>blastcl3 2.2.13   arguments:...
> >>>>>>>-------------------------------------
> >>>>>>>
> >>>>>>>If you use RemoteBlast using the same settings, the version in 
> >>>>>>>the header looks like this:
> >>>>>>>
> >>>>>>>BLASTP 2.2.13 [Nov-27-2005]
> >>>>>>>
> >>>>>>>I'm wondering if all the blast executables (blast and netblast)
> >>>>>>>            
> >>>>>>>
> >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
> >>>>>>          
> >>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>outputs a new
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>To clarify some stuff -
> >>>>>>>>Chris I don't necessarily think the XML is best way forward
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>for BLAST
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>reports generated locally, it isn't as detailed as the Text
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>format and
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>it is what most people expect to be able to scroll through
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>and parse
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>-- it is also harder for the format to change 
> dramatically        
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>if you have
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>a static binary on your machine =).  I think for
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>remoteblast the XML
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>format should be the way forward but I expect Bioperl to 
> >>>>>>>>maintain support of any plain text BLAST report format that 
> >>>>>>>>people use on a regular basis.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>Does XML lack some specific info that text output has?
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Didn't know that.
> >>>>>>I
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>believe that XML should be default in RemoteBlast 
> since it will 
> >>>>>>>not break, but I agree with you about text output.  I 
> also agree 
> >>>>>>>that it will need somebody to maintain it constantly, 
> much like 
> >>>>>>>RemoteBlast.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>-jason
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>Chris Fields wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>>>My guess is you're running into text parsing problems in 
> >>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest 
> developer version
> >>>>>>>>>>(1.5.1) or
> >>>>>>>>>>bioperl-live (CVS), then see the bug below.
> >>>>>>>>>>
> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>>
> >>>>>>>>>>I think the first problem you ran into is solved in
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>bioperl 1.5.1,
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>>>the last problem (more recent, not related to the 
> first) has  
> >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
> >>>>>>>>>>The fixed SearchIO::blast is available in the link 
> above, but
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>realize it hasn't
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>been committed yet and may change.
> >>>>>>>>>>
> >>>>>>>>>>Christopher Fields
> >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>Of Hubert
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>>>>Prielinger
> >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>>>>>>>>To: bioperl-l at bioperl.org
> >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>parsing Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>output
> >>>>>>>>>>>
> >>>>>>>>>>>Hi,
> >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
> >>>>>>>>>>>Bio::SearchIO, I get the following error message:
> >>>>>>>>>>>
> >>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>>>>>STACK toplevel
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>>Blast.pl:21
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>is that a bug......
> >>>>>>>>>>>
> >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), 
> I don't get 
> >>>>>>>>>>>anything.....
> >>>>>>>>>>>I'm using bioperl 1.4
> >>>>>>>>>>>
> >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>parsing Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>Output (version 2.2.12), but I don't remember which
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>bioperl version
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>I had installed
> >>>>>>>>>>>
> >>>>>>>>>>>thanks in advance
> >>>>>>>>>>>
> >>>>>>>>>>>Hubert
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>_______________________________________________
> >>>>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>--
> >>>>>>>>Jason Stajich
> >>>>>>>>Duke University
> >>>>>>>>http://www.duke.edu/~jes12
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>Christopher Fields
> >>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>
> >>>>>>>_______________________________________________
> >>>>>>>Bioperl-l mailing list
> >>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>_______________________________________________
> >>>>>>Bioperl-l mailing list
> >>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>     
> >>>>>
> >>>>>        
> >>>>>
> >>>Christopher Fields
> >>>Postdoctoral Researcher
> >>>Lab of Dr. Robert Switzer
> >>>Dept of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at lists.open-bio.org
> >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> 
> >>>
> >>>    
> >>>
> >>
> >>
> >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>  
> >>
> >
> >
> > Disclaimer: 
> http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> > information.
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From victor.ruotti at gmail.com  Fri Feb 10 15:09:16 2006
From: victor.ruotti at gmail.com (Victor)
Date: Fri, 10 Feb 2006 14:09:16 -0600
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: 
References: 
	
Message-ID: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>

Hi Jason,
Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to
/usr/local/bin, I get the same problem. I think this might have to do with
the _run internal method/sub. If you look at that subroutine, you'll see
that it is using both $self->executable and $self->program_name. The test
passes fine, but we might need to write a better test for this particular
case.

Instead of saying:
     my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
I think the author meant to say:
     my $str=
Bio::Root::IO->catfile($self->program_dir,$self->program_name);

I quickly used Data::Dumper on both executate and program_name and this is
what I get:
$VAR1 = 'blat';
$VAR1 = 'blat';

So the path is hardcoded to be /usr/local/bin/blat/blat when calling run
though factory.

I'd like to change the constructor a bit to deal with the params a little
better and include a config file using
Config::General. Also, I noticed that there is a another Blat.pm module, a
parser module. Should we integrate this parser with the blat run module?

Brian/Jason. Does that sound like a good idea?

Victor


On 2/10/06, Jason Stajich  wrote:
>
> brian -   just FYI -
>
> The AUTOLOAD stuff is present a great number of the run modules so  this
> is standard per se in that set.
>
> I think Victor's problem may have been the BLATDIR env variable pointing
> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor?
>
> tests passed for me before I did the 1.5.1 release for  this module so it
> basically works.   It definitely needs a carekeeper as lot of these run
> modules were built during the fugu group annotation project and never got
> audited/re-vised after that.
>
>
> -jason
> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>
> Victor,
>
> Fantastic, this is certainly a module in need, in fact there was already a
> note on this in the Wiki, I'll update it:
>
> http://bioperl.open-bio.org/wiki/Orphan_modules
>
> So all I did was:
>
> >cd bioperl-run
> >perl ?I. -w t/Blat.t
>
> This is the most recent bioperl-run, the live version, and all tests
> passed. I'd downloaded the most recent binaries and put them in my
> /usr/local/bin, already in my PATH. That's it.
>
> That's the saddest looking new() I've ever seen in Bioperl, a mixture of
> named and unnamed parameters like that, how bizarre. The "proper" way, of
> course, is to use _rearrange, and not use AUTOLOAD.
>
> Thanks again,
>
> Brian O.
>
>
> On 2/10/06 11:02 AM, "Victor"  wrote:
>
> Brian,
> I'd be happy to do that. Can you send me a quick snap on how you got it to
> work first. I'd like to see what is working first, before I start fixing
> things.
>
> And yes I'll take a look at the Blat.t to see more on it.
>
> Victor
>
>
> On 2/9/06, *Brian Osborne*  wrote:
>
> Victor,
>
> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is
> working for me even though I haven't set BLATDIR. This is using the latest
> blat, v. 33.
>
> There is a problem here though, you can see it if you read Blat.t. The
> constructor does not look like your usual new():
>
> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>
> -verbose => $verbose,
>                             "DB"     => $db);
>
> Unfortunate - would you be willing to do more than add a useful SYNOPSIS
> and
> actually fix new()? There is a subtext here, we're trying to find people
> who
> would be willing to maintain useful modules like these, the ideal person
> in
> this case would be someone who'd regularly use the module.
>
> Brian O.
>
>
> On 2/9/06 6:22 PM, "Victor"  wrote:
>
> > Hi,
> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to
> date
> > in the lastest bioperl release?
> >
> >
> >
> > use Bio::Tools::Run::Alignment::Blat;
> > my $factory = Bio::Tools::Run::Alignment::Blat->new();
> > my $seq =
> > "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";
> >
> > my @feats = $factory->run( $seq);
> >
> > Here is what I get when tring to use it:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
> > /tmp/fB09bp5F76) crashed: -1
> >
> > Notice that it is using "blat' twice in the path. The way that I fixed
> this
> > is by going to the blat.pm    module and
> changing the following lines:
> > #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
> > my $str= Bio::Root::IO->catfile($self->program_name);
> >
> > Any ideas, maybe I'm missing the $ENV variable somewhere?
> > I'd like to avoid making this change. Also does anyone have a known
> synopsis
> > of this blat module (where to set the parameters, and whether it allows
> you
> > to have a config file).
> > I'll be happy to add a better synopsis to the module if needed.
> >
> > Thanks in advance,
> > Victor
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>
>
>
>
>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12 
>
>
>



From jason.stajich at duke.edu  Fri Feb 10 15:36:04 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 10 Feb 2006 15:36:04 -0500
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
References: 
	
	<36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
Message-ID: <7F520AFA-84C9-485B-A408-7A9DEFC1186E@duke.edu>


On Feb 10, 2006, at 3:09 PM, Victor wrote:

> Hi Jason,
> Well, in my env. BLATDIR was not setup at all. When setting BLATDIR  
> to /usr/local/bin, I get the same problem. I think this might have  
> to do with the _run internal method/sub. If you look at that  
> subroutine, you'll see that it is using both $self->executable and  
> $self->program_name. The test passes fine, but we might need to  
> write a better test for this particular case.
>
> Instead of saying:
>      my $str= Bio::Root::IO->catfile($self->executable,$self- 
> >program_name);
> I think the author meant to say:
>      my $str= Bio::Root::IO->catfile($self->program_dir,$self- 
> >program_name);
>
> I quickly used Data::Dumper on both executate and program_name and  
> this is what I get:
> $VAR1 = 'blat';
> $VAR1 = 'blat';
>
> So the path is hardcoded to be /usr/local/bin/blat/blat when  
> calling run though factory.
>
Hmm are you sure you are looking at the 1.5.1 code and/or what is in  
CVS?

> I'd like to change the constructor a bit to deal with the params a  
> little better and include a config file using
> Config::General. Also, I noticed that there is a another Blat.pm  
> module, a parser module. Should we integrate this parser with the  
> blat run module?
>
Well maybe as another parser option - I believe I added/edited it to  
use the PSL parser in Bio::SearchIO is that not what you see?

Ick there are also some system commands in this module too which need  
to be removed and replaced with File::Copy or figure out how to  
remove them all together.


> Brian/Jason. Does that sound like a good idea?

But yes it needs some TLC
  I'm not sure I know enough about Config::General  to say  yes or no  
- but all of the run modules need some help in standardization so I  
would propose trying to integrate some changes into the base class  
(WrapperBase) that can be utilized by all the sub-classes -- if you  
want to use this as a model for how to do it that would be great too.

thx,
-j
>
> Victor
>
>
> On 2/10/06, Jason Stajich  wrote:
> brian -
>   just FYI -
>
> The AUTOLOAD stuff is present a great number of the run modules so   
> this is standard per se in that set.
>
> I think Victor's problem may have been the BLATDIR env variable  
> pointing to /usr/local/bin/blat instead of /usr/local/bin - is that  
> the case victor?
>
> tests passed for me before I did the 1.5.1 release for  this module  
> so it basically works.   It definitely needs a carekeeper as lot of  
> these run modules were built during the fugu group annotation  
> project and never got audited/re-vised after that.
>
>
> -jason
>
> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>
>> Victor,
>>
>> Fantastic, this is certainly a module in need, in fact there was  
>> already a note on this in the Wiki, I'll update it:
>>
>> http://bioperl.open-bio.org/wiki/Orphan_modules
>>
>> So all I did was:
>>
>> >cd bioperl-run
>> >perl ?I. -w t/Blat.t
>>
>> This is the most recent bioperl-run, the live version, and all  
>> tests passed. I'd downloaded the most recent binaries and put them  
>> in my /usr/local/bin, already in my PATH. That's it.
>>
>> That's the saddest looking new() I've ever seen in Bioperl, a  
>> mixture of named and unnamed parameters like that, how bizarre.  
>> The "proper" way, of course, is to use _rearrange, and not use  
>> AUTOLOAD.
>>
>> Thanks again,
>>
>> Brian O.
>>
>>
>> On 2/10/06 11:02 AM, "Victor"  wrote:
>>
>>> Brian,
>>> I'd be happy to do that. Can you send me a quick snap on how you  
>>> got it to work first. I'd like to see what is working first,  
>>> before I start fixing things.
>>>
>>> And yes I'll take a look at the Blat.t to see more on it.
>>>
>>> Victor
>>>
>>>
>>> On 2/9/06, Brian Osborne < osborne1 at optonline.net> wrote:
>>>> Victor,
>>>>
>>>> Yes, it may be that blat is not in your path, bioperl-run/t/ 
>>>> Blat.t is
>>>> working for me even though I haven't set BLATDIR. This is using  
>>>> the latest
>>>> blat, v. 33.
>>>>
>>>> There is a problem here though, you can see it if you read  
>>>> Blat.t. The
>>>> constructor does not look like your usual new():
>>>>
>>>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>>>>
>>>> -verbose => $verbose,
>>>>                             "DB"     => $db);
>>>>
>>>> Unfortunate - would you be willing to do more than add a useful  
>>>> SYNOPSIS and
>>>> actually fix new()? There is a subtext here, we're trying to  
>>>> find people who
>>>> would be willing to maintain useful modules like these, the  
>>>> ideal person in
>>>> this case would be someone who'd regularly use the module.
>>>>
>>>> Brian O.
>>>>
>>>>
>>>> On 2/9/06 6:22 PM, "Victor"  wrote:
>>>>
>>>> > Hi,
>>>> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module  
>>>> is up to date
>>>> > in the lastest bioperl release?
>>>> >
>>>> >
>>>> >
>>>> > use Bio::Tools::Run::Alignment::Blat;
>>>> > my $factory = Bio::Tools::Run::Alignment::Blat->new();
>>>> > my $seq =
>>>> >  
>>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAG 
>>>> TA";
>>>> >
>>>> > my @feats = $factory->run( $seq);
>>>> >
>>>> > Here is what I get when tring to use it:
>>>> >
>>>> > ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast   
>>>> TGAAATAAAACTCAGTA
>>>> > /tmp/fB09bp5F76) crashed: -1
>>>> >
>>>> > Notice that it is using "blat' twice in the path. The way that  
>>>> I fixed this
>>>> > is by going to the blat.pm   module and  
>>>> changing the following lines:
>>>> > #my $str= Bio::Root::IO->catfile($self->executable,$self- 
>>>> >program_name);
>>>> > my $str= Bio::Root::IO->catfile($self->program_name);
>>>> >
>>>> > Any ideas, maybe I'm missing the $ENV variable somewhere?
>>>> > I'd like to avoid making this change. Also does anyone have a  
>>>> known synopsis
>>>> > of this blat module (where to set the parameters, and whether  
>>>> it allows you
>>>> > to have a config file).
>>>> > I'll be happy to add a better synopsis to the module if needed.
>>>> >
>>>> > Thanks in advance,
>>>> > Victor
>>>> >
>>>> > _______________________________________________
>>>> > Bioperl-l mailing list
>>>> > Bioperl-l at lists.open-bio.org
>>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l  >>> lists.open-bio.org/mailman/listinfo/bioperl-l>
>>>>
>>>>
>>>
>>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12





From hlapp at gmx.net  Fri Feb 10 16:39:39 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 10 Feb 2006 13:39:39 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c62e60$9acecca0$c2987ca5@pc13>
References: <000001c62e60$9acecca0$c2987ca5@pc13>
Message-ID: 

Sohel,

please allow me to copy the list in my response. There's many good and 
insightful people on the list who may have something to add or 
different ideas.

I've come across that problem myself, for instance with InterPro. What 
I've done so far simply is to stick it unstructured into the definition 
slot, which is not helpful if your purpose goes further than just 
displaying it in an unstructured fashion.

I'm not sure you would want to create another class for this (like 
AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the 
implementation, probably not the interface) annotatable (i.e., 
implement Bio::Annotatable), which supposedly would be simple to do 
(AnnotationCollection is already implemented, you'd just return an 
instance of it).

Even though tag/value pairs sound like quick&fast way to go I'm leaning 
against it; in essence we're moving away from that elsewhere 
(SeqFeatureI) and hence I don't think we should restart it here.

I'm not giving a definitive answer here, just my (initial) thoughts. 
Hope that helps nonetheless. Can you fancy yourself trying the 
Annotatable approach and let us know how it goes?

	-hilmar


On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:

> Hi Hilmar,
> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, 
> Northwestern University. I am working on a parser for an ontology 
> file. I really like the ontology object model which you have 
> contributed to Bioperl. I think its just Awesome!! One of things which 
> I thought would be great to capture is the ontology headers. Right now 
> one can specify only the name, authority information. I was wondering 
> if there is any way, I could also capture other ontology file headers 
> like version of the file, date when that ontology file was made. I was 
> thinking of making a header class or alternatively it could go as Hash 
> of values in the Bio::Ontology::Ontology class itself. I wanted to 
> know whets your thoughts about on this.
> ?
> Thanks,
> Sohel Merchant
> dictyBase
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





From osborne1 at optonline.net  Fri Feb 10 16:49:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 10 Feb 2006 16:49:18 -0500
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
Message-ID: 

Victor,

Just a note on "convention", excuse me if this is obvious. A few different
greps on the modules in bioperl-run shows that executable() gets or sets the
full path to the program in question, program() or program_name() gets or
sets the name of the app (e.g. "blat"). program_dir() does what it sounds
like. So you're right, "($self->executable,$self->program_name)", doesn't
make sense.

I can't speak to Config::General but I'd say that my first concern would be
that the things works in the normal way, either by naming parameters or by
passing an array of arguments, but not a mixture of both!

Of course you're right in thinking that tying execution to parsing is a good
idea, and it looks like this is done already, just glancing at t/Blat.t.

Brian O.


On 2/10/06 3:09 PM, "Victor"  wrote:

> Hi Jason,
> Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to
> /usr/local/bin, I get the same problem. I think this might have to do with
> the _run internal method/sub. If you look at that subroutine, you'll see
> that it is using both $self->executable and $self->program_name. The test
> passes fine, but we might need to write a better test for this particular
> case.
> 
> Instead of saying:
>      my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
> I think the author meant to say:
>      my $str=
> Bio::Root::IO->catfile($self->program_dir,$self->program_name);
> 
> I quickly used Data::Dumper on both executate and program_name and this is
> what I get:
> $VAR1 = 'blat';
> $VAR1 = 'blat';
> 
> So the path is hardcoded to be /usr/local/bin/blat/blat when calling run
> though factory.
> 
> I'd like to change the constructor a bit to deal with the params a little
> better and include a config file using
> Config::General. Also, I noticed that there is a another Blat.pm module, a
> parser module. Should we integrate this parser with the blat run module?
> 
> Brian/Jason. Does that sound like a good idea?
> 
> Victor
> 
> 
> On 2/10/06, Jason Stajich  wrote:
>> 
>> brian -   just FYI -
>> 
>> The AUTOLOAD stuff is present a great number of the run modules so  this
>> is standard per se in that set.
>> 
>> I think Victor's problem may have been the BLATDIR env variable pointing
>> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor?
>> 
>> tests passed for me before I did the 1.5.1 release for  this module so it
>> basically works.   It definitely needs a carekeeper as lot of these run
>> modules were built during the fugu group annotation project and never got
>> audited/re-vised after that.
>> 
>> 
>> -jason
>> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>> 
>> Victor,
>> 
>> Fantastic, this is certainly a module in need, in fact there was already a
>> note on this in the Wiki, I'll update it:
>> 
>> http://bioperl.open-bio.org/wiki/Orphan_modules
>> 
>> So all I did was:
>> 
>>> cd bioperl-run
>>> perl ?I. -w t/Blat.t
>> 
>> This is the most recent bioperl-run, the live version, and all tests
>> passed. I'd downloaded the most recent binaries and put them in my
>> /usr/local/bin, already in my PATH. That's it.
>> 
>> That's the saddest looking new() I've ever seen in Bioperl, a mixture of
>> named and unnamed parameters like that, how bizarre. The "proper" way, of
>> course, is to use _rearrange, and not use AUTOLOAD.
>> 
>> Thanks again,
>> 
>> Brian O.
>> 
>> 
>> On 2/10/06 11:02 AM, "Victor"  wrote:
>> 
>> Brian,
>> I'd be happy to do that. Can you send me a quick snap on how you got it to
>> work first. I'd like to see what is working first, before I start fixing
>> things.
>> 
>> And yes I'll take a look at the Blat.t to see more on it.
>> 
>> Victor
>> 
>> 
>> On 2/9/06, *Brian Osborne*  wrote:
>> 
>> Victor,
>> 
>> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is
>> working for me even though I haven't set BLATDIR. This is using the latest
>> blat, v. 33.
>> 
>> There is a problem here though, you can see it if you read Blat.t. The
>> constructor does not look like your usual new():
>> 
>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>> 
>> -verbose => $verbose,
>>                             "DB"     => $db);
>> 
>> Unfortunate - would you be willing to do more than add a useful SYNOPSIS
>> and
>> actually fix new()? There is a subtext here, we're trying to find people
>> who
>> would be willing to maintain useful modules like these, the ideal person
>> in
>> this case would be someone who'd regularly use the module.
>> 
>> Brian O.
>> 
>> 
>> On 2/9/06 6:22 PM, "Victor"  wrote:
>> 
>>> Hi,
>>> Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to
>> date
>>> in the lastest bioperl release?
>>> 
>>> 
>>> 
>>> use Bio::Tools::Run::Alignment::Blat;
>>> my $factory = Bio::Tools::Run::Alignment::Blat->new();
>>> my $seq =
>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";
>>> 
>>> my @feats = $factory->run( $seq);
>>> 
>>> Here is what I get when tring to use it:
>>> 
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
>>> /tmp/fB09bp5F76) crashed: -1
>>> 
>>> Notice that it is using "blat' twice in the path. The way that I fixed
>> this
>>> is by going to the blat.pm    module and
>> changing the following lines:
>>> #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
>>> my $str= Bio::Root::IO->catfile($self->program_name);
>>> 
>>> Any ideas, maybe I'm missing the $ENV variable somewhere?
>>> I'd like to avoid making this change. Also does anyone have a known
>> synopsis
>>> of this blat module (where to set the parameters, and whether it allows
>> you
>>> to have a config file).
>>> I'll be happy to add a better synopsis to the module if needed.
>>> 
>>> Thanks in advance,
>>> Victor
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > org/mailman/listinfo/bioperl-l>
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12 
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From heikki at sanbi.ac.za  Sat Feb 11 01:54:51 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sat, 11 Feb 2006 08:54:51 +0200
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: 
References: <000001c62e60$9acecca0$c2987ca5@pc13>
	
Message-ID: <200602110854.52116.heikki@sanbi.ac.za>


I second Hilmar's suggestion to use Bio::Annotation::Collection for database 
(ontology database in this case) metadata. While you are at it, why do not 
define or use an existing (?) public ontology to do that. ;-)

	-Heikki

On Friday 10 February 2006 23:39, Hilmar Lapp wrote:
> Sohel,
>
> please allow me to copy the list in my response. There's many good and
> insightful people on the list who may have something to add or
> different ideas.
>
> I've come across that problem myself, for instance with InterPro. What
> I've done so far simply is to stick it unstructured into the definition
> slot, which is not helpful if your purpose goes further than just
> displaying it in an unstructured fashion.
>
> I'm not sure you would want to create another class for this (like
> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> implementation, probably not the interface) annotatable (i.e.,
> implement Bio::Annotatable), which supposedly would be simple to do
> (AnnotationCollection is already implemented, you'd just return an
> instance of it).
>
> Even though tag/value pairs sound like quick&fast way to go I'm leaning
> against it; in essence we're moving away from that elsewhere
> (SeqFeatureI) and hence I don't think we should restart it here.
>
> I'm not giving a definitive answer here, just my (initial) thoughts.
> Hope that helps nonetheless. Can you fancy yourself trying the
> Annotatable approach and let us know how it goes?
>
> 	-hilmar
>
> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> > Hi Hilmar,
> > ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> > Northwestern University. I am working on a parser for an ontology
> > file. I really like the ontology object model which you have
> > contributed to Bioperl. I think its just Awesome!! One of things which
> > I thought would be great to capture is the ontology headers. Right now
> > one can specify only the name, authority information. I was wondering
> > if there is any way, I could also capture other ontology file headers
> > like version of the file, date when that ontology file was made. I was
> > thinking of making a header class or alternatively it could go as Hash
> > of values in the Bio::Ontology::Ontology class itself. I wanted to
> > know whets your thoughts about on this.
> > ?
> > Thanks,
> > Sohel Merchant
> > dictyBase

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________



From hlapp at gmx.net  Sun Feb 12 00:10:35 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 11 Feb 2006 21:10:35 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c62e9a$4f82eee0$c2987ca5@pc13>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
Message-ID: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net>

Sohel, please do keep the discussion on the list, in your own interest 
as there's a multitude of people who can respond to you.

SimpleValue would probably be what I'd use too. As Heikki hinted you 
might even create an ontology for annotating ontologies, which would 
allow you to use Annotation::OntologyTerm for annotation, but then 
there's no qualifier value ...

Bioperl 1.5.1 has been released last year, please check the website.

	-hilmar

On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:

> Hi Hilmar,
>   I really like your suggestion of implementing the Bio::AnnotatableI
> interface in the Bio::Ontology::Ontology class. I am going to implement
> this and play around a little with it. I am planning to use
> Bio::Annotation::SimpleValue for annotating the header as it provides a
> good way of specifying the Tag/value pair. What are your thoughts on
> using this?
>
>   Also, I was wondering if you have any idea about the scheduled date
> for the Bioperl 1.51 release. I would like to contribute some stuff in
> the next release.
>
> Thanks,
> Sohel.
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Friday, February 10, 2006 3:40 PM
> To: Sohel Merchant
> Cc: Bioperl
> Subject: Re: Bio::Ontology::Ontology
>
> Sohel,
>
> please allow me to copy the list in my response. There's many good and
> insightful people on the list who may have something to add or
> different ideas.
>
> I've come across that problem myself, for instance with InterPro. What
> I've done so far simply is to stick it unstructured into the definition
> slot, which is not helpful if your purpose goes further than just
> displaying it in an unstructured fashion.
>
> I'm not sure you would want to create another class for this (like
> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> implementation, probably not the interface) annotatable (i.e.,
> implement Bio::Annotatable), which supposedly would be simple to do
> (AnnotationCollection is already implemented, you'd just return an
> instance of it).
>
> Even though tag/value pairs sound like quick&fast way to go I'm leaning
> against it; in essence we're moving away from that elsewhere
> (SeqFeatureI) and hence I don't think we should restart it here.
>
> I'm not giving a definitive answer here, just my (initial) thoughts.
> Hope that helps nonetheless. Can you fancy yourself trying the
> Annotatable approach and let us know how it goes?
>
> 	-hilmar
>
>
> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
>
>> Hi Hilmar,
>> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
>> Northwestern University. I am working on a parser for an ontology
>> file. I really like the ontology object model which you have
>> contributed to Bioperl. I think its just Awesome!! One of things which
>
>> I thought would be great to capture is the ontology headers. Right now
>
>> one can specify only the name, authority information. I was wondering
>> if there is any way, I could also capture other ontology file headers
>> like version of the file, date when that ontology file was made. I was
>
>> thinking of making a header class or alternatively it could go as Hash
>
>> of values in the Bio::Ontology::Ontology class itself. I wanted to
>> know whets your thoughts about on this.
>> ?
>> Thanks,
>> Sohel Merchant
>> dictyBase
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





From hjm at tacgi.com  Sun Feb 12 01:46:38 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Sat, 11 Feb 2006 22:46:38 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
Message-ID: <200602112246.38926.hjm@tacgi.com>

Hi All,

After perusing the tutorial and other docs for a an evening, I still can't 
find the answer to this.  Forgive me if I've missed something obvious.

This should not be a novel request, but I've not found it answered.  If 
bioperl isn't the best way to do this, I'd be grateful to a pointer to a 
better way, especially if it includes an illuminating bit of code.

The problem is to retrieve genomic sequences plus & minus some offset from a 
locus determined by HUGO keyword or GeneID.  This would be a common followup 
chore for some extra analysis from a gene expression expt.  Or maybe this is 
in the DBFetch routines, but I've missed the sequence type to specify...?


TIA!

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From osborne1 at optonline.net  Sun Feb 12 11:37:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 12 Feb 2006 11:37:39 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602112246.38926.hjm@tacgi.com>
Message-ID: 

Harry,

Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
from its documentation:

  use Bio::DB::Fasta;

  # create database from directory of fasta files
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  # simple access (for those without Bioperl)
  my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
  my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
  my @ids     = $db->ids;
  my $length   = $db->length('CHROMOSOME_I');
  my $alphabet = $db->alphabet('CHROMOSOME_I');
  my $header   = $db->header('CHROMOSOME_I');

  # Bioperl-style access
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
  my $seq     = $obj->seq;
  my $subseq  = $obj->subseq(4_000_000 => 4_100_000);

Do you already have the offsets?

Brian O.


On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:

> Hi All,
> 
> After perusing the tutorial and other docs for a an evening, I still can't
> find the answer to this.  Forgive me if I've missed something obvious.
> 
> This should not be a novel request, but I've not found it answered.  If
> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> better way, especially if it includes an illuminating bit of code.
> 
> The problem is to retrieve genomic sequences plus & minus some offset from a
> locus determined by HUGO keyword or GeneID.  This would be a common followup
> chore for some extra analysis from a gene expression expt.  Or maybe this is
> in the DBFetch routines, but I've missed the sequence type to specify...?
> 
> 
> TIA!




From pmiguel at purdue.edu  Sun Feb 12 15:05:47 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sun, 12 Feb 2006 15:05:47 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	Blast	output
In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
Message-ID: <43EF951B.4030601@purdue.edu>

Roger,
Just a data point, but in case you were not already aware of it, the 
characters W, K and R may be included in some DNA sequences. 'W' means 
'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember 
correctly. These are ambiguous bases, where a basecaller isn't sure, for 
example, whether a particular peak is an A or a T. Although I see these 
ambiguous bases less frequently these days, even common modern 
basecallers (such as Applied Biosystems basecallers) can generally be 
configured so they will generate them. Downstream applications may not 
like them, however.
    I may be just stating the obvious, or this might be irrelevant to 
the issue at hand. If so, my apologies.

Phillip
Roger Hall wrote:
> Guys - I'm looking at the error message:
>
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
> This is my line of thought:
> 1. "no data for midline $_" is a unique message generated by blast.pm in one
> location only at the point of a. reading three lines b. dropping lines with
> spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
> 2. There is a regexp match that fails in order to reach that error message
> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
> 4. It does anyway
> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
> reports
>
> I suspect a newline/chomp/metacharacter issue. Not finding the string
> anywhere has me thoroughly confused - I asked Hubert for the additional
> file, assuming that I didn't have it.
>
> My next thought is to write a quick script to test perl behavior on "Fedora
> Core 9".
>
> Thoughts?
>
> Did I misread the issue entirely? :}
>
> Roger
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 09, 2006 10:16 AM
> To: 'Jason Stajich'; 'Hubert Prielinger'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> output
>
>
>   
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Thursday, February 09, 2006 9:13 AM
>> To: Hubert Prielinger
>> Cc: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>> parsing Blast output
>>
>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>     
>>> hi chris,
>>> thanks, I have upgraded to version 1.5.1 but it isn't still 
>>>       
>> working, 
>>     
>>> do you have any ohter idea, the problem I have is that I 
>>>       
>> have to parse 
>>     
>>> a lot of textfiles....
>>> or shall I look for another option to parse those files...
>>>
>>> regards
>>> Hubert
>>>       
>> The code from Bioperl 1.5.1 works fine for me for blast 
>> 2.2.13 reports but unless you post your blast report we can't 
>> really determine the problem.
>>
>> If you are still getting the same error like this I am not 
>> convinced you have upgraded to 1.5.1 which includes a fix in 
>> the fact that NCBI changed the HSP result format to remove 
>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
>> as it was apparent sometime in September.
>>
>>     
>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>           
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> If you are just getting no results but also no warnings wrt 
>> parsing, are you sure your logic is correct?
>>
>> If you remove your filters do you see all the HSPS?
>>
>>
>> while (my $result = $search->next_result) {
>>      print $result->query_name, "\n";
>>      #iterate over each hit on the query sequence
>>      while (my $hit = $result->next_hit) {
>> 	print $hit->name, "\n";
>>          #iterate over each HSP in the hit
>>          while (my $hsp = $hit->next_hsp) {
>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>>  >hit_string, "\n";	
>>         }
>>     }
>> }
>>     
>
> I tested some of the BLAST results that Hubert sent Roger and me with a
> similar script to the above.  I removed the file parsing logic and it seemed
> to work just fine.  It may very well be a logic issue or that he hasn't
> installed the latest fix.
>     
> It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
> though the returned output was from nr, the top of the blast output showed
> that it was v2.2.12:  
>
> BLASTP 2.2.12 [Aug-07-2005]
>
> I double-checked my local version and it's definitely v.2.2.13:
> -------------------------------------
> C:\Perl\Scripts>blastcl3 -
>
> blastcl3 2.2.13   arguments:...
> -------------------------------------
>
> If you use RemoteBlast using the same settings, the version in the header
> looks like this:
>
> BLASTP 2.2.13 [Nov-27-2005]
>
> I'm wondering if all the blast executables (blast and netblast) from NCBI
> have text output like v.2.2.12, while the wwwblast outputs a new format
> (2.2.13).  I'll ask blast-help at NCBI about this.
>
>   
>> To clarify some stuff -
>> Chris I don't necessarily think the XML is best way forward 
>> for BLAST reports generated locally, it isn't as detailed as 
>> the Text format and it is what most people expect to be able 
>> to scroll through and parse -- it is also harder for the 
>> format to change dramatically if you have a static binary on 
>> your machine =).  I think for remoteblast the XML format 
>> should be the way forward but I expect Bioperl to maintain 
>> support of any plain text BLAST report format that people use 
>> on a regular basis.
>>
>>     
>
> Does XML lack some specific info that text output has?  Didn't know that.  I
> believe that XML should be default in RemoteBlast since it will not break,
> but I agree with you about text output.  I also agree that it will need
> somebody to maintain it constantly, much like RemoteBlast.
>
>   
>> -jason
>>     
>>> Chris Fields wrote:
>>>
>>>       
>>>> My guess is you're running into text parsing problems in 
>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>> (1.5.1) or
>>>> bioperl-live (CVS), then see the bug below.
>>>>
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>> I think the first problem you ran into is solved in bioperl 1.5.1, 
>>>> the last problem (more recent, not related to the first) has been 
>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>> SearchIO::blast is available in the link above, but 
>>>>         
>> realize it hasn't 
>>     
>>>> been committed yet and may change.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>> Prielinger
>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>> To: bioperl-l at bioperl.org
>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>>           
>> parsing Blast 
>>     
>>>>> output
>>>>>
>>>>> Hi,
>>>>> If I want to parse a Blast Output (Version 2.2.12) with 
>>>>> Bio::SearchIO, I get the following error message:
>>>>>
>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>           
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>     
>>>>> is that a bug......
>>>>>
>>>>> If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>> anything.....
>>>>> I'm using bioperl 1.4
>>>>>
>>>>> before, I have installed bioperl 1.4, it worked fine 
>>>>>           
>> parsing Blast 
>>     
>>>>> Output (version 2.2.12), but I don't remember which 
>>>>>           
>> bioperl version 
>>     
>>>>> I had installed
>>>>>
>>>>> thanks in advance
>>>>>
>>>>> Hubert
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>           
>>>>
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>     
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign  
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   



From cjfields at uiuc.edu  Sun Feb 12 17:30:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 12 Feb 2006 16:30:07 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	Blast	output
In-Reply-To: <43EF951B.4030601@purdue.edu>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
	<43EF951B.4030601@purdue.edu>
Message-ID: <855DEC6F-8057-47BA-9D1D-9BDC16D1D83B@uiuc.edu>

Sequences are converted to FASTA format in RemoteBlast using  
Bio::SeqIO, which I think includes IUPAC base and amino acid  
ambiguities like you mention, so my guess is any errors (like odd non- 
IUPAC letters in nucleotide or aa queries) are likely caught there.   
As long as it passes Bio::SeqIO it shouldn't be a problem.  Haven't  
tried this myself, though, so I can't say that with absolute certainty.

Chris



On Feb 12, 2006, at 2:05 PM, Phillip SanMiguel wrote:

> Roger,
> Just a data point, but in case you were not already aware of it, the
> characters W, K and R may be included in some DNA sequences. 'W' means
> 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember
> correctly. These are ambiguous bases, where a basecaller isn't  
> sure, for
> example, whether a particular peak is an A or a T. Although I see  
> these
> ambiguous bases less frequently these days, even common modern
> basecallers (such as Applied Biosystems basecallers) can generally be
> configured so they will generate them. Downstream applications may not
> like them, however.
>     I may be just stating the obvious, or this might be irrelevant to
> the issue at hand. If so, my apologies.
>
> Phillip
> Roger Hall wrote:
>> Guys - I'm looking at the error message:
>>
>> MSG: no data for midline Query  1   WWWKWRW  7
>> STACK Bio::SearchIO::blast::next_result
>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>> STACK toplevel
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> This is my line of thought:
>> 1. "no data for midline $_" is a unique message generated by  
>> blast.pm in one
>> location only at the point of a. reading three lines b. dropping  
>> lines with
>> spaces only c. identifying the Query, Midline, and Match lines (0  
>> <= $i < 3)
>> 2. There is a regexp match that fails in order to reach that error  
>> message
>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>> expression
>> 4. It does anyway
>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the  
>> blast
>> reports
>>
>> I suspect a newline/chomp/metacharacter issue. Not finding the string
>> anywhere has me thoroughly confused - I asked Hubert for the  
>> additional
>> file, assuming that I didn't have it.
>>
>> My next thought is to write a quick script to test perl behavior  
>> on "Fedora
>> Core 9".
>>
>> Thoughts?
>>
>> Did I misread the issue entirely? :}
>>
>> Roger
>>
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>> Fields
>> Sent: Thursday, February 09, 2006 10:16 AM
>> To: 'Jason Stajich'; 'Hubert Prielinger'
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing  
>> Blast
>> output
>>
>>
>>
>>> -----Original Message-----
>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>> Sent: Thursday, February 09, 2006 9:13 AM
>>> To: Hubert Prielinger
>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>> parsing Blast output
>>>
>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>
>>> working,
>>>
>>>> do you have any ohter idea, the problem I have is that I
>>>>
>>> have to parse
>>>
>>>> a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>> The code from Bioperl 1.5.1 works fine for me for blast
>>> 2.2.13 reports but unless you post your blast report we can't
>>> really determine the problem.
>>>
>>> If you are still getting the same error like this I am not
>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>> the fact that NCBI changed the HSP result format to remove
>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>> as it was apparent sometime in September.
>>>
>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> If you are just getting no results but also no warnings wrt
>>> parsing, are you sure your logic is correct?
>>>
>>> If you remove your filters do you see all the HSPS?
>>>
>>>
>>> while (my $result = $search->next_result) {
>>>      print $result->query_name, "\n";
>>>      #iterate over each hit on the query sequence
>>>      while (my $hit = $result->next_hit) {
>>> 	print $hit->name, "\n";
>>>          #iterate over each HSP in the hit
>>>          while (my $hsp = $hit->next_hsp) {
>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>> hit_string, "\n";	
>>>         }
>>>     }
>>> }
>>>
>>
>> I tested some of the BLAST results that Hubert sent Roger and me  
>> with a
>> similar script to the above.  I removed the file parsing logic and  
>> it seemed
>> to work just fine.  It may very well be a logic issue or that he  
>> hasn't
>> installed the latest fix.
>>
>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>> 2.2.13), even
>> though the returned output was from nr, the top of the blast  
>> output showed
>> that it was v2.2.12:
>>
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>> I double-checked my local version and it's definitely v.2.2.13:
>> -------------------------------------
>> C:\Perl\Scripts>blastcl3 -
>>
>> blastcl3 2.2.13   arguments:...
>> -------------------------------------
>>
>> If you use RemoteBlast using the same settings, the version in the  
>> header
>> looks like this:
>>
>> BLASTP 2.2.13 [Nov-27-2005]
>>
>> I'm wondering if all the blast executables (blast and netblast)  
>> from NCBI
>> have text output like v.2.2.12, while the wwwblast outputs a new  
>> format
>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>
>>
>>> To clarify some stuff -
>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST reports generated locally, it isn't as detailed as
>>> the Text format and it is what most people expect to be able
>>> to scroll through and parse -- it is also harder for the
>>> format to change dramatically if you have a static binary on
>>> your machine =).  I think for remoteblast the XML format
>>> should be the way forward but I expect Bioperl to maintain
>>> support of any plain text BLAST report format that people use
>>> on a regular basis.
>>>
>>>
>>
>> Does XML lack some specific info that text output has?  Didn't  
>> know that.  I
>> believe that XML should be default in RemoteBlast since it will  
>> not break,
>> but I agree with you about text output.  I also agree that it will  
>> need
>> somebody to maintain it constantly, much like RemoteBlast.
>>
>>
>>> -jason
>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>> (1.5.1) or
>>>>> bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl 1.5.1,
>>>>> the last problem (more recent, not related to the first) has been
>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>> SearchIO::blast is available in the link above, but
>>>>>
>>> realize it hasn't
>>>
>>>>> been committed yet and may change.
>>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>> parsing Blast
>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>
>>> parsing Blast
>>>
>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>
>>> bioperl version
>>>
>>>>>> I had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From torsten.seemann at infotech.monash.edu.au  Sun Feb 12 18:56:32 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 13 Feb 2006 10:56:32 +1100
Subject: [Bioperl-l] RemoteBlast
In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
References: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
Message-ID: <1139788592.29375.13.camel@chauvel.csse.monash.edu.au>

Roger,

> I think that most core Bioperl folks have long since moved away from
> RemoteBlast and are using the functionality in StandAloneBlast to run their
> own local servers. 

Agreed. Even smaller centres like my workplace need the throughput that
a local PC, SMP system or Cluster can provide.

> wave of the future, but I think there is still some concern that not every
> flavor of BLAST produces XML yet. Even so, the XML parser is considered to
> be very strong, and only helps hasten the end of text-formatted support,
> since parsing text-formatted reports is the primary source of pain. 

If BioPerl switches primarily to XML parsing, the tool authors will soon
add support for XML (not very difficult really) due to BioPerl's
pervasiveness?

> I do, however, see the advantage in shifting to XML-formatted reporting and
> parsing *only* as soon as every BLAST flavor supports it, if not before.
> (Anyone - is this still an issue. Please educate me.)

The four BLAST flavours I utilise all support XML output: 
1) NCBI BLAST 2) WU-BLAST 3) MPI-BLAST 4) FSA-BLAST.

> At the moment, I'm leaning towards adding an option to RemoteBlast. The
> default (no option) would use a "pure perl" implementation, and the
> enhancement (with explicit option) would merely wrap the NCBI executable.

If the API is done correctly both of these could co-exist with very
little redundant code. (I personally rarely use remote blast).

-- 
Torsten Seemann 
Victorian Bioinformatics Consortium



From torsten.seemann at infotech.monash.edu.au  Sun Feb 12 19:35:06 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 13 Feb 2006 11:35:06 +1100
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
	<1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
Message-ID: <1139790906.29375.27.camel@chauvel.csse.monash.edu.au>

> Mostly I think we need to try and support something that will  
> "ALWAYS" work so that individuals setting up webservices which rely  
> on remote blast functionality.  In theory, netblast/blastcl3 should  
> always work since NCBI has to update the exe when they change their  
> server setup.

What usually happens when an older 'blastcl3' binary is used on a newer
server setup? I guess it fails in a deterministic manner so the BioPerl
user can throw a useful exception.

> I also see value in providing a wrapper for netblast since it should  
> look an awful lot like running blast locally.

Agreed - they are virtually indistinguishable.

> Ideally I'd like to see a more extensible system, something like (and  
> please feel free to come up with better names for the modules!):

Do BioPerl coding standards require "::Blast" over "::BLAST" ?
(not important anyway)

> Bio::Tools::Run::Blast
>   -->             StandAlone (support for [..as many flavours as poss])
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that  
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Looks reasonable. I assume there's some interfaces in there like
Bio::Tools::Blast::BlastI etc.

Could probably call "RemoteNetBlast" just "RemoteNet" because it is
already in the Blast:: namespace. (not important though)

My only suggestion for StandAlone (and RemoteNetBlast) is that they both
do a generic "run a local binary with env. vars and parameters and
capture the stdout, stderr and return code". This needs to be abstracted
away (or re-use existing code from bioperl-run?). Jason mentioned
Ensembl::Runnable as a source of code we could incorporate into Bioperl.

-- 
Torsten Seemann 
Victorian Bioinformatics Consortium



From cjfields at uiuc.edu  Mon Feb 13 11:45:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 10:45:14 -0600
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <20060213152603.ed3f3118@dogwood.plantbio.uga.edu>
Message-ID: <001801c630bc$dd35bff0$15327e82@pyrimidine>

If you're using RemoteBlast 1.28, then you've likely updated from CVS which
isn't the latest fix.  

 

Make sure that you check the following: 

 

1) Always post to the mailing list:
http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .  

 

2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed
first.  Perform a clean installation; do not upgrade only
Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee
that mixing modules from old and new distributions (1.4 and 1.5.1, for
instance) will work.  A bioperl-1.5.1 or bioperl-live installation will
allow text output from BLAST v.2.2.12 to be saved and parsed; it will not
parse the newest BLAST text output from NCBI (v2.2.13) but it should still
save it. I believe as long as next_results() isn't called, it will work.

 

3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
are NOT in CVS; they haven't been cleared and checked in by Roger Hall
(who's now taking care of RemoteBlast) and the powers that be (Jason or
whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:

 

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

http://bugzilla.bioperl.org/show_bug.cgi?id=1935

 

The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving
XML output, so isn't necessary if you don't plan on using this option.  And,
remember, they haven't been committed yet to CVS, which means that the final
version will change to refle the new version.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

  _____  

From: Guojun Yang [mailto:gyang at plantbio.uga.edu] 
Sent: Monday, February 13, 2006 9:26 AM
To: Chris Fields
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

 

Hi, Chris

Thanks for your suggestion, however, it doesn't seem to work for my cgi even
after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID.
Is there any suggestion?

 

Guojun



Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun

  _____  

From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
Sent: Fri, 03 Feb 2006 16:07:29 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below). I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed. The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml. There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless. If you look back the
last month or so there has been a bit of discussion here about it. Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission). In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem? Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> _____
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




 

 



From gyang at plantbio.uga.edu  Mon Feb 13 13:32:14 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 13 Feb 2006 13:32:14 -0500
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
In-Reply-To: <001801c630bc$dd35bff0$15327e82@pyrimidine>
Message-ID: <20060213183214.342b90da@dogwood.plantbio.uga.edu>

Hi, Chris,  
I do have different versions of bioperl on my Linux machine (1.4. and 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I need to uninstall and remove the previous versions. I could not find any hint on uninstalling bioperl on linux. Could you please give me some suggestion?  
Thanks,  
Guojun

Department of Plant Biology
University of Georgia
      _____  

  From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Sent: Mon, 13 Feb 2006 11:45:14 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

  
  
If you?re using RemoteBlast 1.28, then you?ve likely updated from CVS which isn?t the latest fix.    
   
Make sure that you check the following:   
   
1) Always post to the mailing list: http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .    
   
2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed first.  Perform a clean installation; do not upgrade only Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee that mixing modules from old and new distributions (1.4 and 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live installation will allow text output from BLAST v.2.2.12 to be saved and parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) but it should still save it. I believe as long as next_results() isn?t called, it will work.  
   
3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output are NOT in CVS; they haven?t been cleared and checked in by Roger Hall (who?s now taking care of RemoteBlast) and the powers that be (Jason or whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:  
   
http://bugzilla.bioperl.org/show_bug.cgi?id=1934  
http://bugzilla.bioperl.org/show_bug.cgi?id=1935  
   
The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving XML output, so isn?t necessary if you don?t plan on using this option.  And, remember, they haven?t been committed yet to CVS, which means that the final version will change to refle the new version.  
  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign   
  
  
    _____  

    
From: Guojun Yang [mailto:gyang at plantbio.uga.edu] 
Sent: Monday, February 13, 2006 9:26 AM
To: Chris Fields
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28  
   
  
Hi, Chris  
  
Thanks for your suggestion, however, it doesn't seem to work for my cgi even after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID. Is there any suggestion?  
  
   
  
Guojun  


Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun  
    _____  

    
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
Sent: Fri, 03 Feb 2006 16:07:29 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below). I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed. The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml. There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless. If you look back the
last month or so there has been a bit of discussion here about it. Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission). In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem? Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> _____
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


  
  
   
  
       
   
 


From cjfields at uiuc.edu  Mon Feb 13 15:39:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 14:39:38 -0600
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
In-Reply-To: <20060213183214.342b90da@dogwood.plantbio.uga.edu>
Message-ID: <000901c630dd$9be54f40$15327e82@pyrimidine>

How do you know two versions are installed (i.e. how are you checking the
version)?  Do you see have two complete bioperl distributions (in two
separate directories) or are you looking in modules?  Here's the way to
check the version (from the FAQ):

perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'

If you have two full bioperl distributions on your computer, normally only
one will be in use unless you have explicitly set the environment variable
PERL5LIB.  The PERL5LIB  directories will be searched first before your
normal perl directory list (@INC) is searched.  You MAY get some mixing
then, but only if perl can't find a particular module in the path designated
in PERL5LIB; then it will progress through the directories listed in @INC.
This may happen if a module is unique to a particular release, but shouldn't
happen for the majority of modules, including RemoteBlast.  You can check
what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will differ
depending on your OS, perl build, etc.

Regardless, if you follow the directions for installing bioperl for your
system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you
explicitly change the installation directory when using 'perl Makefile.PL'),
then 'uninstalling' Bioperl shouldn't be a problem as it will install the
Bioperl distribution you downloaded over the old version in @INC.  See this
page:

http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL

for more details.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Monday, February 13, 2006 12:32 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> 
> Hi, Chris,
> I do have different versions of bioperl on my Linux machine (1.4. and
> 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I
> need to uninstall and remove the previous versions. I could not find any
> hint on uninstalling bioperl on linux. Could you please give me some
> suggestion?
> Thanks,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
>       _____
> 
>   From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Sent: Mon, 13 Feb 2006 11:45:14 -0500
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> 
> 
> If you're using RemoteBlast 1.28, then you've likely updated from CVS
> which isn't the latest fix.
> 
> Make sure that you check the following:
> 
> 1) Always post to the mailing list:
> http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> 
> 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS)
> installed first.  Perform a clean installation; do not upgrade only
> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't
> guarantee that mixing modules from old and new distributions (1.4 and
> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> installation will allow text output from BLAST v.2.2.12 to be saved and
> parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13)
> but it should still save it. I believe as long as next_results() isn't
> called, it will work.
> 
> 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
> are NOT in CVS; they haven't been cleared and checked in by Roger Hall
> (who's now taking care of RemoteBlast) and the powers that be (Jason or
> whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of
> saving XML output, so isn't necessary if you don't plan on using this
> option.  And, remember, they haven't been committed yet to CVS, which
> means that the final version will change to refle the new version.
> 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
>     _____
> 
> 
> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> Sent: Monday, February 13, 2006 9:26 AM
> To: Chris Fields
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> 
> Hi, Chris
> 
> Thanks for your suggestion, however, it doesn't seem to work for my cgi
> even after I replace both blast.pm and RemoteBlast.pm. I didn't even get
> any RID. Is there any suggestion?
> 
> 
> 
> Guojun
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
>     _____
> 
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> Sent: Fri, 03 Feb 2006 16:07:29 -0500
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> I would say give the new code a try, but realize that it hasn't been
> checked
> in (like I said below). I will try going over the modified
> Bio::SearchIO::blast again this weekend to see if there is anything I
> might
> have missed. The changed order in the header of BLAST text output has me a
> bit worried that it might not catch everything, but it at least doesn't
> hang
> in the while() loop I described in the bug report below (bug #1934) and
> seems to process everything fine.
> 
> If you want more stability in the code, you might consider changing over
> to
> XML output and parsing with Bio::SearchIO::blastxml. There are some
> changes
> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
> output, but I believe it parses everything regardless. If you look back
> the
> last month or so there has been a bit of discussion here about it. Jason
> describes a bit on how to set up RemoteBlast for XML:
> 
> http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > Sent: Friday, February 03, 2006 1:45 PM
> > To: bioperl-l at bioperl.org
> > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> >
> > Hi, Everybody,
> > I see this post and am wondering if this is the reason for the
> > malfunctionning of my webserver. We set up a webserver named MAK, for
> MITE
> > sequence analysis. It was working very well until around November 2005,
> > when it stopped returning any result (the site is fine and seems to be
> > doing sth after submission). In the CGI script, I used remoteblast (that
> > work was done in 2003) to do searches. I currently do not have access to
> > the server because I moved. Quite several people sent emails to us about
> > its malfunctioning. Is there any suggestion on fixing the problem?
> Should
> > I simplily ask the remoteblast.pm be replaced with the new version?
> > Thanks a lot,
> > Guojun
> >
> > Department of Plant Biology
> > University of Georgia
> > Tel: 706-542-1857
> > Fax: 706-542-1805
> > http://www.arches.uga.edu/~guojun
> > _____
> >
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> > l at bioperl.org]
> > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> > will
> > work for saving text output. However, it will not parse anything using
> > next_result (it will likely hang) and will not save XML format. See
> these
> > bugs:
> >
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >
> > for explanations and possible fixes (changes to RemoteBlast and
> > Bio::SearchIO::blast). Note that these haven't been checked in yet so
> are
> > still not included in bioperl-live; they may be further modified before
> > committing to CVS. If you're not worried about XML, you could just try
> the
> > first fix, which is a change to SearchIO::blast.
> >
> > Nagesh, I remember you posting to the list a month ago using a script
> > which
> > had problems; the script you used saves the output but doesn't actually
> > parse it (i.e. you don't use next_result() to go through the data). Is
> the
> > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> > parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> > blast"
> > using your version of RemoteBlast and method next_result()? Like below
> > (from
> > perldoc):
> >
> > while ( my @rids = $factory->each_rid ) {
> > foreach my $rid ( @rids ) {
> > my $rc = $factory->retrieve_blast($rid);
> > if( !ref($rc) ) {
> > if( $rc < 0 ) {
> > $factory->remove_rid($rid);
> > }
> > print STDERR "." if ( $v > 0 );
> > sleep 5;
> > } else { # parsing
> > starts here
> > my $result = $rc->next_result(); # it should hang
> > here
> > #save the output
> > my $filename = $result->query_name()."\.out";
> > $factory->save_output($filename);
> > $factory->remove_rid($rid);
> > print "\nQuery Name: ", $result->query_name(), "\n";
> > while ( my $hit = $result->next_hit ) {
> > next unless ( $v > 0);
> > print "\thit name is ", $hit->name, "\n";
> > while( my $hsp = $hit->next_hsp ) {
> > print "\t\tscore is ", $hsp->score, "\n";
> > }
> > }
> > }
> > }
> > }
> > }
> >
> >
> > My script hanged if I used next_result() in any way prior to the fixes.
> I
> > want to see how many others are having the same issues with parsing
> using
> > the CVS version of bioperl-live.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > Sent: Thursday, February 02, 2006 7:24 PM
> > > To: Huang Jian; bioperl-l
> > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >
> > > Hi Huang,
> > > Thanks for the message. The older version of RemoteBlast.pm works on
> the
> > > logic of checking the temporary file size to determine whether the
> Blast
> > > results are ready. This condition is not getting satisfied may be due
> to
> > > some changes brought about by NCBI. I had this problem recently and
> > > figured out that the solution was to use the latest version which has
> > > this problem fixed (does not use file size logic any more) which is
> not
> > > yet included in the BioPerl package.
> > > Cheers
> > > Nagesh
> > >
> > > Huang Jian wrote:
> > >
> > > > Dear Nagesh,
> > > >
> > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > > me. Now it works perfectly!!!
> > > >
> > > > Thank you!!
> > > >
> > > > Huang
> > > >
> > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > 
> > > > To: "Huang Jian" ; "bioperl-l"
> > > > 
> > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > > via email
> > > >
> > > >
> > > >> Hi Huang,
> > > >> I see that you are submitting a sequence for a remote blast search.
> > Can
> > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09).
> If
> > > >> not I have attached it with this email, try to replace it with the
> > old
> > > >> one which has a bug.
> > > >> Let me know if it works.
> > > >> Nagesh
> > > >
> > > >
> > > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Mon Feb 13 16:00:11 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 13 Feb 2006 16:00:11 -0500
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
Message-ID: <20060213160011.1e89108c@dogwood.plantbio.uga.edu>

Thanks, Chris,
I installed version 1.5.1 and replaced the blast.pm file with the one from your bug report. The running version is 1.5 when I use the command you sent me. But when I tried the script, it doesn't change much. My remoteblast code (portion) is here:

sub search {
local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
local $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= 'no';
local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
my $query = Bio::Seq -> new ( -seq=>"$_[0]",
			      -id=>"query",
			      -desc=>"new seq");
my $len=$query->length();
@db=('nr','htgs','wgs');
foreach my $db (@db) {
my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
						'-data' =>"$db",
					        '-expect'=>"$E_value");


my $blast_report = $factory->submit_blast($query);

my @rids = $factory->each_rid();
foreach my $rid ( @rids ) {
    print STDERR "$rid\n";
}
# RID = Remote Blast ID (e.g: 1017772174-16400-6638)
print STDERR "waiting...";
sleep 60;

foreach my $rid ( @rids ) {
    my $rc = $factory->retrieve_blast($rid);
    while (!ref($rc) ) {
	if( $rc < 0 ) {
# retrieve_blast returns -1 on error
	    $factory->remove_rid($rid);
	    print "Error!\n";
	    send_error($email,$function,$seqname,$queryname[$ST]);
	    die "Can't retrieve $rid";
	} if ($rc==0) { # retrieve_blast returns 0 on 'job not finished'
	    sleep 60;
	    $rc = $factory->retrieve_blast($rid);
	}	
    }
    if (ref($rc)) {
	print STDERR "Done.\n";
	 while( my $result = $rc->next_result) {
	    while( my $hit = $result->next_hit()) {
	    	$hit_name=$hit->name;
		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
		$name=$1;
		@left_plus_start=();
		@left_plus_end=();
		@left_minus_start=();
		@left_minus_end=();
		@right_plus_start=();
		@right_plus_end=();
		@right_minus_start=();
		@right_minus_end=();

		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
		while( my $hsp = $hit->next_hsp()) { 
......

It was working quite well before around October laster year, but it has stopped since then, When a submission is sent via a webpage, the cgi starts to work and use a memory of ~20 Mb. Then it hangs there, finally the expected email is received but without real results although it does contain something from other parts of the script. Apparently the search sub did not return anything (I know there is something should be returned.). Is it also possible the format of the NCBI output for each result has changed?
Thank you,
Guojun


Department of Plant Biology
University of Georgia



----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28


> How do you know two versions are installed (i.e. how are you checking the
> version)?  Do you see have two complete bioperl distributions (in two
> separate directories) or are you looking in modules?  Here's the way to
> check the version (from the FAQ):
> > perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> > If you have two full bioperl distributions on your computer, normally only
> one will be in use unless you have explicitly set the environment variable
> PERL5LIB.  The PERL5LIB  directories will be searched first before your
> normal perl directory list (@INC) is searched.  You MAY get some mixing
> then, but only if perl can't find a particular module in the path designated
> in PERL5LIB; then it will progress through the directories listed in @INC.
> This may happen if a module is unique to a particular release, but shouldn't
> happen for the majority of modules, including RemoteBlast.  You can check
> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will differ
> depending on your OS, perl build, etc.
> > Regardless, if you follow the directions for installing bioperl for your
> system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you
> explicitly change the installation directory when using 'perl Makefile.PL'),
> then 'uninstalling' Bioperl shouldn't be a problem as it will install the
> Bioperl distribution you downloaded over the old version in @INC.  See this
> page:
> > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > for more details.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > Sent: Monday, February 13, 2006 12:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > Hi, Chris,
> > I do have different versions of bioperl on my Linux machine (1.4. and
> > 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I
> > need to uninstall and remove the previous versions. I could not find any
> > hint on uninstalling bioperl on linux. Could you please give me some
> > suggestion?
> > Thanks,
> > Guojun
> > > > Department of Plant Biology
> > University of Georgia
> >       _____
> > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > > > > > If you're using RemoteBlast 1.28, then you've likely updated from CVS
> > which isn't the latest fix.
> > > > Make sure that you check the following:
> > > > 1) Always post to the mailing list:
> > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS)
> > installed first.  Perform a clean installation; do not upgrade only
> > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't
> > guarantee that mixing modules from old and new distributions (1.4 and
> > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > installation will allow text output from BLAST v.2.2.12 to be saved and
> > parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13)
> > but it should still save it. I believe as long as next_results() isn't
> > called, it will work.
> > > > 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
> > are NOT in CVS; they haven't been cleared and checked in by Roger Hall
> > (who's now taking care of RemoteBlast) and the powers that be (Jason or
> > whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:
> > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of
> > saving XML output, so isn't necessary if you don't plan on using this
> > option.  And, remember, they haven't been committed yet to CVS, which
> > means that the final version will change to refle the new version.
> > > > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > >     _____
> > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > Sent: Monday, February 13, 2006 9:26 AM
> > To: Chris Fields
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > > > Hi, Chris
> > > > Thanks for your suggestion, however, it doesn't seem to work for my cgi
> > even after I replace both blast.pm and RemoteBlast.pm. I didn't even get
> > any RID. Is there any suggestion?
> > > > > > > > Guojun
> > > > > > Guojun Yang
> > Department of Plant Biology
> > University of Georgia
> > Tel: 706-542-1857
> > Fax: 706-542-1805
> > http://www.arches.uga.edu/~guojun
> >     _____
> > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > I would say give the new code a try, but realize that it hasn't been
> > checked
> > in (like I said below). I will try going over the modified
> > Bio::SearchIO::blast again this weekend to see if there is anything I
> > might
> > have missed. The changed order in the header of BLAST text output has me a
> > bit worried that it might not catch everything, but it at least doesn't
> > hang
> > in the while() loop I described in the bug report below (bug #1934) and
> > seems to process everything fine.
> > > > If you want more stability in the code, you might consider changing over
> > to
> > XML output and parsing with Bio::SearchIO::blastxml. There are some
> > changes
> > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
> > output, but I believe it parses everything regardless. If you look back
> > the
> > last month or so there has been a bit of discussion here about it. Jason
> > describes a bit on how to set up RemoteBlast for XML:
> > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/
> > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > Sent: Friday, February 03, 2006 1:45 PM
> > > To: bioperl-l at bioperl.org
> > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> > >
> > > Hi, Everybody,
> > > I see this post and am wondering if this is the reason for the
> > > malfunctionning of my webserver. We set up a webserver named MAK, for
> > MITE
> > > sequence analysis. It was working very well until around November 2005,
> > > when it stopped returning any result (the site is fine and seems to be
> > > doing sth after submission). In the CGI script, I used remoteblast (that
> > > work was done in 2003) to do searches. I currently do not have access to
> > > the server because I moved. Quite several people sent emails to us about
> > > its malfunctioning. Is there any suggestion on fixing the problem?
> > Should
> > > I simplily ask the remoteblast.pm be replaced with the new version?
> > > Thanks a lot,
> > > Guojun
> > >
> > > Department of Plant Biology
> > > University of Georgia
> > > Tel: 706-542-1857
> > > Fax: 706-542-1805
> > > http://www.arches.uga.edu/~guojun
> > > _____
> > >
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> > > l at bioperl.org]
> > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >
> > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> > > will
> > > work for saving text output. However, it will not parse anything using
> > > next_result (it will likely hang) and will not save XML format. See
> > these
> > > bugs:
> > >
> > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >
> > > for explanations and possible fixes (changes to RemoteBlast and
> > > Bio::SearchIO::blast). Note that these haven't been checked in yet so
> > are
> > > still not included in bioperl-live; they may be further modified before
> > > committing to CVS. If you're not worried about XML, you could just try
> > the
> > > first fix, which is a change to SearchIO::blast.
> > >
> > > Nagesh, I remember you posting to the list a month ago using a script
> > > which
> > > had problems; the script you used saves the output but doesn't actually
> > > parse it (i.e. you don't use next_result() to go through the data). Is
> > the
> > > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> > > parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> > > blast"
> > > using your version of RemoteBlast and method next_result()? Like below
> > > (from
> > > perldoc):
> > >
> > > while ( my @rids = $factory->each_rid ) {
> > > foreach my $rid ( @rids ) {
> > > my $rc = $factory->retrieve_blast($rid);
> > > if( !ref($rc) ) {
> > > if( $rc < 0 ) {
> > > $factory->remove_rid($rid);
> > > }
> > > print STDERR "." if ( $v > 0 );
> > > sleep 5;
> > > } else { # parsing
> > > starts here
> > > my $result = $rc->next_result(); # it should hang
> > > here
> > > #save the output
> > > my $filename = $result->query_name()."\.out";
> > > $factory->save_output($filename);
> > > $factory->remove_rid($rid);
> > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > while ( my $hit = $result->next_hit ) {
> > > next unless ( $v > 0);
> > > print "\thit name is ", $hit->name, "\n";
> > > while( my $hsp = $hit->next_hsp ) {
> > > print "\t\tscore is ", $hsp->score, "\n";
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > >
> > >
> > > My script hanged if I used next_result() in any way prior to the fixes.
> > I
> > > want to see how many others are having the same issues with parsing
> > using
> > > the CVS version of bioperl-live.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > To: Huang Jian; bioperl-l
> > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >
> > > > Hi Huang,
> > > > Thanks for the message. The older version of RemoteBlast.pm works on
> > the
> > > > logic of checking the temporary file size to determine whether the
> > Blast
> > > > results are ready. This condition is not getting satisfied may be due
> > to
> > > > some changes brought about by NCBI. I had this problem recently and
> > > > figured out that the solution was to use the latest version which has
> > > > this problem fixed (does not use file size logic any more) which is
> > not
> > > > yet included in the BioPerl package.
> > > > Cheers
> > > > Nagesh
> > > >
> > > > Huang Jian wrote:
> > > >
> > > > > Dear Nagesh,
> > > > >
> > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > > > me. Now it works perfectly!!!
> > > > >
> > > > > Thank you!!
> > > > >
> > > > > Huang
> > > > >
> > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > 
> > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > 
> > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > > > via email
> > > > >
> > > > >
> > > > >> Hi Huang,
> > > > >> I see that you are submitting a sequence for a remote blast search.
> > > Can
> > > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09).
> > If
> > > > >> not I have attached it with this email, try to replace it with the
> > > old
> > > > >> one which has a bug.
> > > > >> Let me know if it works.
> > > > >> Nagesh
> > > > >
> > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 



From akarger at CGR.Harvard.edu  Mon Feb 13 15:57:08 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 13 Feb 2006 15:57:08 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
Message-ID: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>

I'm trying to get the sequences of each exon in a gene. I have a genbank
file with mRNA and exon features (among others) that look like: 
     mRNA            join(complement(22257..22386),complement(22067..22186),
                     complement(16753..17101),complement(13840..13962),
                     complement(10649..10820),complement(502..3028))
                     /gene="ENSG00000005812"
                     /note="transcript_id=ENST00000355619"
     exon            complement(13840..13962)
                     /note="exon_id=ENSE00000802462"

I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
the mRNA above. I tried writing the below code, but it doesn't do what I
want. (You'll note that the code is stolen from the Bio::Seq and Feature
HOWTOs.)

my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
while (my $seq = $inseq->next_seq) {
    my @features = $seq->get_SeqFeatures(); # just top level
    foreach my $feat ( @features ) {
        my $type = $feat->primary_tag;
        if ($type eq "mRNA") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
                my @feats = $feat->get_SeqFeatures();
                print "Found ", scalar @feats, " sub-features\n";
        } elsif ($type eq "exon") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
        }
     }
}

When I run the above, it says that the mRNA features have no sub-features.
So how do I pull out the 6 sequences?

Thanks,
- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626


From cjfields at uiuc.edu  Mon Feb 13 18:18:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 17:18:24 -0600
Subject: [Bioperl-l] INSTALL.WIN in wiki
Message-ID: <000001c630f3$c9efa5f0$15327e82@pyrimidine>

I just added "Installing Bioperl on Windows" to the wiki.  It needs some
major updating and changes in formatting:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Jason has mentioned changing up some of the INSTALL docs for the wiki
(http://www.bioperl.org/wiki/Talk:Getting_BioPerl).  Any thoughts?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From osborne1 at optonline.net  Mon Feb 13 20:38:30 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 13 Feb 2006 20:38:30 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
Message-ID: 

Amir,

The idea is to look at the sub-locations in the SplitLocation object, this
is discussed in FAQ 5.2:

http://www.bioperl.org/wiki/FAQ#How_do_I_parse_the_CDS_join_or_complement_st
atements_in_GenBank_or_EMBL_files_to_get_the_sub-locations.3F

The sequence of the feature itself can be obtained by using the entire_seq()
method:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences


Brian O.


On 2/13/06 3:57 PM, "Amir Karger"  wrote:

> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
> 
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
> 
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
> 
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From hlapp at gmx.net  Mon Feb 13 18:58:46 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 13 Feb 2006 15:58:46 -0800
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
References: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
Message-ID: 

Why you want subfeatures? This is genbank format you're parsing,
right? Your mRNA features will have a split location. Loop over
$feat->location->each_Location() and get $seq->subseq() with the start
and end of each sublocation. If you don't know how to do this check
out the implementation of $feature->splice_seq().

This should be in the HOWTO. Is it not?

    -hilmar


On 2/13/06, Amir Karger  wrote:
> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
>
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
>
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
>
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From osborne1 at optonline.net  Mon Feb 13 21:11:33 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 13 Feb 2006 21:11:33 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: 
Message-ID: 

Hilmar,

It could be spelled out a bit more explicitly.

Brian O.


On 2/13/06 6:58 PM, "Hilmar Lapp"  wrote:

> This should be in the HOWTO. Is it not?




From rmb32 at cornell.edu  Mon Feb 13 17:12:10 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Mon, 13 Feb 2006 17:12:10 -0500
Subject: [Bioperl-l] game xml SeqIO
Message-ID: <43F1043A.2000205@cornell.edu>

Hi all,

Currently, the SeqIO for doing GAME XML does not seem to support writing 
(or reading?)  elements.  Am I correct?

If I am, are there any plans to add this functionality?  Can I help / do it?

If there are plans to add this, how would one distinguish SeqFeatures 
that should be rendered as  from SeqFeatures 
that should be rendered as ?  Would we do that with 
Bio::SeqFeature::Computation?  I assume that a given Seq can have 
SeqFeatures of different types associated with it (I don't know, I'm a 
bioperl newb).

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 607-255-2360
rmb32 at cornell.edu
http://www.sgn.cornell.edu




From heikki at sanbi.ac.za  Tue Feb 14 01:59:29 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 14 Feb 2006 08:59:29 +0200
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602100906.11885.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
Message-ID: <200602140859.30136.heikki@sanbi.ac.za>

I've committed an interim solution to the sequence evolution problem:

    $newseq = Bio::SeqUtils-> evolve
        ($seq, $similarity, $transition_transversion_rate);

I will go on to transform this code to fully OO, extensible solution.

   -Heikki


On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> Ryan Golhar's mail got me thinking that we should have a simple framework
> for mutating sequences to a desired level. The model can then be extended
> to necessary complexity when needed by subclassing.
>
> To start with, I have been planning:
>
>
> Bio::SeqEvolution::EvolutionI - interface file
> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>         (defaults to Bio::PrimarySeq)
> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>        - returns an array of $count seqs
> Bio::SeqEvolution::EvolutionI::_generate_seq()
> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>       converteed to probabilites of change internally
>
>   various methods to define the extent of divergence:
>   only one to start with:
> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>    (= 100% - identity)
>
> Bio::SeqEvolution::Factory - core class to call,
>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>
>
> Bio::SeqEvolution::DNASimple - default for nucleotides
> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>         e.g. 5 => 5:1, defaults to 1:1
>         simple alternative to a scoring matrix
>
>
> I am soliciting usual comments and suggestions about naming and minimal
> functionality.
>
>
>    -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From gbazykin at Princeton.EDU  Tue Feb 14 09:34:54 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Tue, 14 Feb 2006 09:34:54 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602140859.30136.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
Message-ID: <214316262.20060214093454@princeton.edu>

Hi,

Just a thought: I really think that in perspective, it would be nice
to be able to evolve the sequence along a tree of given shape. I think
PAML's "evolver" has this functionality. I've already been doing this
in my scripts, but I am not sure how to couple the tree and the
sequence data properly.

Yegor (George) Bazykin


------------------------------
Tuesday, February 14, 2006, 1:59:29 AM, you wrote:

> I've committed an interim solution to the sequence evolution problem:

>     $newseq = Bio::SeqUtils-> evolve
>         ($seq, $similarity, $transition_transversion_rate);

> I will go on to transform this code to fully OO, extensible solution.

>    -Heikki


> On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> Ryan Golhar's mail got me thinking that we should have a simple framework
>> for mutating sequences to a desired level. The model can then be extended
>> to necessary complexity when needed by subclassing.
>>
>> To start with, I have been planning:
>>
>>
>> Bio::SeqEvolution::EvolutionI - interface file
>> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>>         (defaults to Bio::PrimarySeq)
>> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>>        - returns an array of $count seqs
>> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>>       converteed to probabilites of change internally
>>
>>   various methods to define the extent of divergence:
>>   only one to start with:
>> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>>    (= 100% - identity)
>>
>> Bio::SeqEvolution::Factory - core class to call,
>>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>>
>>
>> Bio::SeqEvolution::DNASimple - default for nucleotides
>> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>>         e.g. 5 => 5:1, defaults to 1:1
>>         simple alternative to a scoring matrix
>>
>>
>> I am soliciting usual comments and suggestions about naming and minimal
>> functionality.
>>
>>
>>    -Heikki




From maximilianh at gmail.com  Tue Feb 14 05:11:42 2006
From: maximilianh at gmail.com (Maximilian Haeussler)
Date: Tue, 14 Feb 2006 11:11:42 +0100
Subject: [Bioperl-l] [BiO BB] Re:  Tool to mutate DNA sequence
In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
Message-ID: <76f031ae0602140211n2a0bbf4fl@mail.gmail.com>

The tool ROSE also evolves sequences on a tree. There is a web
interface and downloadable source at
http://bibiserv.techfak.uni-bielefeld.de/rose/

Max

On 09/02/06, Jason Stajich  wrote:
> Depending on whether or not you want to use evolutionary realistic
> models...
> * evolver which comes with PAML lets you evolve sequences on a tree
> * SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html?
> id=seqgen
> also lets you do this
> I believe there are PISE interfaces to both of these at the pasteur
> bioweb site - http://bioweb.pasteur.fr/
>
> -jason
> On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote:
>
> > Does anyone know of tool to mutate a DNA sequence by a specified
> > amount?
> > For instance, say I have a DNA sequence 1000 bases long, and I want to
> > simulate mutations to make it 75% (or 80%, etc) similar to the
> > original.
> >
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


--
Maximilian Haeussler,
CNRS Gif-sur-Yvette, Paris
tel: +33 6 12 82 76 16
icq: 3825815  -- msn: maximilian.haeussler at hpi.uni-potsdam.de
skype: maximilianhaeussler



From heikki at sanbi.ac.za  Tue Feb 14 11:09:27 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 14 Feb 2006 18:09:27 +0200
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <214316262.20060214093454@princeton.edu>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
	<214316262.20060214093454@princeton.edu>
Message-ID: <200602141809.28057.heikki@sanbi.ac.za>


Yegor,

Like you said, there are examples how it is done.. It should be possible to 
evolve sequences based on a rooted tree. You just walk the tree and evolve 
each sequence from its parent.  If there is  an agreement how the branch 
lengths get translated to  mutations, even that could be done. Do you have 
any suggestions?

	-Heikki



On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
> Hi,
>
> Just a thought: I really think that in perspective, it would be nice
> to be able to evolve the sequence along a tree of given shape. I think
> PAML's "evolver" has this functionality. I've already been doing this
> in my scripts, but I am not sure how to couple the tree and the
> sequence data properly.
>
> Yegor (George) Bazykin
>
>
> ------------------------------
>
> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
> > I've committed an interim solution to the sequence evolution problem:
> >
> >     $newseq = Bio::SeqUtils-> evolve
> >         ($seq, $similarity, $transition_transversion_rate);
> >
> > I will go on to transform this code to fully OO, extensible solution.
> >
> >    -Heikki
> >
> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> >> Ryan Golhar's mail got me thinking that we should have a simple
> >> framework for mutating sequences to a desired level. The model can then
> >> be extended to necessary complexity when needed by subclassing.
> >>
> >> To start with, I have been planning:
> >>
> >>
> >> Bio::SeqEvolution::EvolutionI - interface file
> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
> >>         (defaults to Bio::PrimarySeq)
> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
> >>        - returns an array of $count seqs
> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
> >>       converteed to probabilites of change internally
> >>
> >>   various methods to define the extent of divergence:
> >>   only one to start with:
> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
> >>    (= 100% - identity)
> >>
> >> Bio::SeqEvolution::Factory - core class to call,
> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
> >>
> >>
> >> Bio::SeqEvolution::DNASimple - default for nucleotides
> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
> >>         e.g. 5 => 5:1, defaults to 1:1
> >>         simple alternative to a scoring matrix
> >>
> >>
> >> I am soliciting usual comments and suggestions about naming and minimal
> >> functionality.
> >>
> >>
> >>    -Heikki
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From golharam at umdnj.edu  Tue Feb 14 12:01:38 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 14 Feb 2006 12:01:38 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za>
Message-ID: <016401c63188$52c9d4b0$2f01a8c0@GOLHARMOBILE1>

Here are my two cents....

1.  Allow sequences to be mutated by some percent amount.
2.  Use mutation patterns implied by PAM matrices or some known models
of mutation.
3.  Have the output show the original sequences and the mutated sequence
so you can easily identify what was mutated and what is conserved.

Ryan


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Tuesday, February 14, 2006 11:09 AM
To: bioperl-l at lists.open-bio.org; Georgii A Bazykin
Subject: Re: [Bioperl-l] planning sequence mutating modules



Yegor,

Like you said, there are examples how it is done.. It should be possible
to 
evolve sequences based on a rooted tree. You just walk the tree and
evolve 
each sequence from its parent.  If there is  an agreement how the branch

lengths get translated to  mutations, even that could be done. Do you
have 
any suggestions?

	-Heikki



On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
> Hi,
>
> Just a thought: I really think that in perspective, it would be nice 
> to be able to evolve the sequence along a tree of given shape. I think

> PAML's "evolver" has this functionality. I've already been doing this 
> in my scripts, but I am not sure how to couple the tree and the 
> sequence data properly.
>
> Yegor (George) Bazykin
>
>
> ------------------------------
>
> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
> > I've committed an interim solution to the sequence evolution 
> > problem:
> >
> >     $newseq = Bio::SeqUtils-> evolve
> >         ($seq, $similarity, $transition_transversion_rate);
> >
> > I will go on to transform this code to fully OO, extensible 
> > solution.
> >
> >    -Heikki
> >
> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> >> Ryan Golhar's mail got me thinking that we should have a simple 
> >> framework for mutating sequences to a desired level. The model can 
> >> then be extended to necessary complexity when needed by 
> >> subclassing.
> >>
> >> To start with, I have been planning:
> >>
> >>
> >> Bio::SeqEvolution::EvolutionI - interface file
> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
> >>         (defaults to Bio::PrimarySeq)
> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by 
> >> subclasses
> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
> >>        - returns an array of $count seqs
> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
> >>       converteed to probabilites of change internally
> >>
> >>   various methods to define the extent of divergence:
> >>   only one to start with:
> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
> >>    (= 100% - identity)
> >>
> >> Bio::SeqEvolution::Factory - core class to call,
> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for 
> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution
model,
> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
> >>
> >>
> >> Bio::SeqEvolution::DNASimple - default for nucleotides 
> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
> >>         e.g. 5 => 5:1, defaults to 1:1
> >>         simple alternative to a scoring matrix
> >>
> >>
> >> I am soliciting usual comments and suggestions about naming and 
> >> minimal functionality.
> >>
> >>
> >>    -Heikki
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hjm at tacgi.com  Tue Feb 14 12:15:11 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Tue, 14 Feb 2006 09:15:11 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
Message-ID: <200602140915.11604.hjm@tacgi.com>

Hi Brian,

Thanks very much for the pointers and the speed of your reply and apologies 
for the speed of mine.

This looks good, but what I was looking for was a bioP approach for hooking to 
an API at NCBI or EBI so I could get this info and seqs from them.  In this 
case, speed of retrieval is not critical and I'd rather not download the 
entirety of the sequences to a local disk to hack at them.

I've determined a screen-scraping approach to get them and could script that, 
but I thought that bioP had a method for using NCBI's external API's, tho it 
may be that my memory is faulty or the approach is no longer supported due to 
overload.  

Does NCBI make such APIs available anymore?  I searched a bit for docs on them 
but couldn't find anything (unless it's buried in the NCBI tookit, which I 
haven't started to excavate).

Failing that, would SEALS provide such a service? Any PerlPinipeds listening?

Harry






On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> Harry,
>
> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
> from its documentation:
>
>   use Bio::DB::Fasta;
>
>   # create database from directory of fasta files
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   # simple access (for those without Bioperl)
>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>   my @ids     = $db->ids;
>   my $length   = $db->length('CHROMOSOME_I');
>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>   my $header   = $db->header('CHROMOSOME_I');
>
>   # Bioperl-style access
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>   my $seq     = $obj->seq;
>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>
> Do you already have the offsets?
>
> Brian O.
>
> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > Hi All,
> >
> > After perusing the tutorial and other docs for a an evening, I still
> > can't find the answer to this.  Forgive me if I've missed something
> > obvious.
> >
> > This should not be a novel request, but I've not found it answered.  If
> > bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> > better way, especially if it includes an illuminating bit of code.
> >
> > The problem is to retrieve genomic sequences plus & minus some offset
> > from a locus determined by HUGO keyword or GeneID.  This would be a
> > common followup chore for some extra analysis from a gene expression
> > expt.  Or maybe this is in the DBFetch routines, but I've missed the
> > sequence type to specify...?
> >
> >
> > TIA!

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From jason.stajich at duke.edu  Tue Feb 14 13:25:21 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 14 Feb 2006 13:25:21 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
References: 
	<200602140915.11604.hjm@tacgi.com>
Message-ID: <13B3724F-3716-4C4B-95A7-6849EF167A80@duke.edu>

Are you working spp that are in Ensembl?  Is what you need not  
provided by Ensembl/EnsMart? Seems like they are doing the best job  
integrating gene ids to a central place.

It is not exactly clear what API you are referring to - you can query  
Entrez via Bio::DB::Query::GenBank so if you can construct your query  
via the Entrez syntax you can access and retrieve it in bioperl.

-jason
On Feb 14, 2006, at 12:15 PM, Harry Mangalam wrote:

> Hi Brian,
>
> Thanks very much for the pointers and the speed of your reply and  
> apologies
> for the speed of mine.
>
> This looks good, but what I was looking for was a bioP approach for  
> hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.   
> In this
> case, speed of retrieval is not critical and I'd rather not  
> download the
> entirety of the sequences to a local disk to hack at them.
>
> I've determined a screen-scraping approach to get them and could  
> script that,
> but I thought that bioP had a method for using NCBI's external  
> API's, tho it
> may be that my memory is faulty or the approach is no longer  
> supported due to
> overload.
>
> Does NCBI make such APIs available anymore?  I searched a bit for  
> docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit,  
> which I
> haven't started to excavate).
>
> Failing that, would SEALS provide such a service? Any PerlPinipeds  
> listening?
>
> Harry
>
>
>
>
>
>
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>>
>> Hope you're doing well. The approach could be based on  
>> Bio::DB::Fasta. So,
>> from its documentation:
>>
>>   use Bio::DB::Fasta;
>>
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>>
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>
>> Do you already have the offsets?
>>
>> Brian O.
>>
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>>
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>>
>>> This should not be a novel request, but I've not found it  
>>> answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a  
>>> pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>>
>>> The problem is to retrieve genomic sequences plus & minus some  
>>> offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>>
>>>
>>> TIA!
>
> -- 
> Cheers, Harry
> Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com
>             <>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Tue Feb 14 13:40:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 12:40:31 -0600
Subject: [Bioperl-l] FW:  more on RemoteBlast.pm version 1.2
Message-ID: <000e01c63196$225159d0$15327e82@pyrimidine>

Sorry, forgot to add that I didn't see the regex issue that you mentioned.
It could be a perl-related issue.  Try the fixes I mentioned and see what
happens.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, February 14, 2006 12:36 PM
> To: 'gyang at plantbio.uga.edu'
> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> 
> It's a good habit to always add single quotes around words.  The perl
> interpreter may think a single bare word is a subroutine or perlfunc
> called with no args so will try to find a subroutine named blastp().  My
> debugger actually gives the error that the bare word blastp may conflict
> with a future reserved word.  Like you said, 'use strict' will point that
> out.
> 
> As for the regex, it should match all the blast programs at NCBI (blastp,
> blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> else passes through.
> 
> So, if you are using the script below, there are several errors.  The bare
> words for $prog and $db need quotes, and the flags for you @params array
> don't have a dash before them.  I get this after adding quotes but before
> adding the dashes to @params:
> 
> C:\Perl\Scripts>test_blast.pl
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG:
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> live/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> live/Bio/Tools/Run/RemoteBlast.pm:256
> STACK: C:\Perl\Scripts\test_blast.pl:15
> -----------------------------------------------------------
> 
> The last line indicates a problem with this line:
> 
> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> 
> Changing the @params to this:
> 
> my @params=( -prog=>$prog,
> 	-data=>$db,
> 	-expect=>$e_val,
> 	-readmethod=>'SearchIO');
> 
> fixes it, and I get output as expected.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > Sent: Tuesday, February 14, 2006 11:48 AM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >
> > Hi, Chris,
> > When I tried with the perldoc script, It did not work either. First it
> > says $prog can not be bare word if I "use strict". I added quotes on the
> > words, then it says the value for $prog does not match expression
> > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> script
> > is shown below. Why is the expression "t?blast[pnx]"?
> >
> > #!/usr/bin/perl
> >
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use Bio::Tools::Run::RemoteBlast;
> > use Bio::SearchIO;
> >
> >
> > my $prog=blastp;
> > my $db=swissprot;
> > my $e_val=1e-10;
> > my @params=( prog=>$prog,
> > 	data=>$db,
> > 	expect=>$e_val,
> > 	readmethod=>'SearchIO');
> > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >
> > my $v = 1;
> >
> > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >
> > while (my $input = $str->next_seq()){
> >   #Blast a sequence against a database:
> >   #Alternatively, you could  pass in a file with many
> >   #sequences rather than loop through sequence one at a time
> >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> >   #and swap the two lines below for an example of that.
> >   my $r = $factory->submit_blast($input);
> >   #my $r = $factory->submit_blast('amino.fa');
> >   print STDERR "waiting..." if( $v > 0 );
> >   while ( my @rids = $factory->each_rid ) {
> >     foreach my $rid ( @rids ) {
> >       my $rc = $factory->retrieve_blast($rid);
> >       if( !ref($rc) ) {
> >         if( $rc < 0 ) {
> >           $factory->remove_rid($rid);
> >         }
> >         print STDERR "." if ( $v > 0 );
> >         sleep 5;
> >       } else {
> >         my $result = $rc->next_result();
> >         #save the output
> >         my $filename = $result->query_name()."\.out";
> >         $factory->save_output($filename);
> >         $factory->remove_rid($rid);
> >         print "\nQuery Name: ", $result->query_name(), "\n";
> >         while ( my $hit = $result->next_hit ) {
> >           next unless ( $v > 0);
> >           print "\thit name is ", $hit->name, "\n";
> >           while( my $hsp = $hit->next_hsp ) {
> >             print "\t\tscore is ", $hsp->score, "\n";
> >           }
> >         }
> >       }
> >     }
> >   }
> > }
> >
> > Thank you for your help!
> >
> >
> > Guojun
> > Department of Plant Biology
> > University of Georgia
> >
> > ----- Original Message -----
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >
> >
> > > Try two things:
> > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > wrong
> > > with the logic in your subroutine:
> > > > my $v = 1;
> > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > > while (my $input = $str->next_seq()){
> > >   #Blast a sequence against a database:
> > >   #Alternatively, you could  pass in a file with many
> > >   #sequences rather than loop through sequence one at a time
> > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >   #and swap the two lines below for an example of that.
> > >   my $r = $factory->submit_blast($input);
> > >   #my $r = $factory->submit_blast('amino.fa');
> > >   print STDERR "waiting..." if( $v > 0 );
> > >   while ( my @rids = $factory->each_rid ) {
> > >     foreach my $rid ( @rids ) {
> > >       my $rc = $factory->retrieve_blast($rid);
> > >       if( !ref($rc) ) {
> > >         if( $rc < 0 ) {
> > >           $factory->remove_rid($rid);
> > >         }
> > >         print STDERR "." if ( $v > 0 );
> > >         sleep 5;
> > >       } else {
> > >         my $result = $rc->next_result();
> > >         #save the output
> > >         my $filename = $result->query_name()."\.out";
> > >         $factory->save_output($filename);
> > >         $factory->remove_rid($rid);
> > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > >         while ( my $hit = $result->next_hit ) {
> > >           next unless ( $v > 0);
> > >           print "\thit name is ", $hit->name, "\n";
> > >           while( my $hsp = $hit->next_hsp ) {
> > >             print "\t\tscore is ", $hsp->score, "\n";
> > >           }
> > >         }
> > >       }
> > >     }
> > >   }
> > > }
> > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> really
> > > shouldn't make that much of a difference, but I noticed that the CVS
> > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > released; the Bugzilla version is based off CVS.
> > > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > Thanks, Chris,
> > > > I installed version 1.5.1 and replaced the blast.pm file with the
> one
> > from
> > > > your bug report. The running version is 1.5 when I use the command
> you
> > > > sent me. But when I tried the script, it doesn't change much. My
> > > > remoteblast code (portion) is here:
> > > > > > sub search {
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > local
> > > >
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > 'no';
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > 			      -id=>"query",
> > > > 			      -desc=>"new seq");
> > > > my $len=$query->length();
> > > > @db=('nr','htgs','wgs');
> > > > foreach my $db (@db) {
> > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> > > > 						'-data' =>"$db",
> > > >
'-expect'=>"$E_value");
> > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > my @rids = $factory->each_rid();
> > > > foreach my $rid ( @rids ) {
> > > >     print STDERR "$rid\n";
> > > > }
> > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > print STDERR "waiting...";
> > > > sleep 60;
> > > > > > foreach my $rid ( @rids ) {
> > > >     my $rc = $factory->retrieve_blast($rid);
> > > >     while (!ref($rc) ) {
> > > > 	if( $rc < 0 ) {
> > > > # retrieve_blast returns -1 on error
> > > > 	    $factory->remove_rid($rid);
> > > > 	    print "Error!\n";
> > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > 	    die "Can't retrieve $rid";
> > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> finished'
> > > > 	    sleep 60;
> > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > 	}
> > > >     }
> > > >     if (ref($rc)) {
> > > > 	print STDERR "Done.\n";
> > > > 	 while( my $result = $rc->next_result) {
> > > > 	    while( my $hit = $result->next_hit()) {
> > > > 	    	$hit_name=$hit->name;
> > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > 		$name=$1;
> > > > 		@left_plus_start=();
> > > > 		@left_plus_end=();
> > > > 		@left_minus_start=();
> > > > 		@left_minus_end=();
> > > > 		@right_plus_start=();
> > > > 		@right_plus_end=();
> > > > 		@right_minus_start=();
> > > > 		@right_minus_end=();
> > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > ......
> > > > > > It was working quite well before around October laster year, but
> > it has
> > > > stopped since then, When a submission is sent via a webpage, the cgi
> > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > finally
> > > > the expected email is received but without real results although it
> > does
> > > > contain something from other parts of the script. Apparently the
> > search
> > > > sub did not return anything (I know there is something should be
> > > > returned.). Is it also possible the format of the NCBI output for
> each
> > > > result has changed?
> > > > Thank you,
> > > > Guojun
> > > > > > > > Department of Plant Biology
> > > > University of Georgia
> > > > > > > > > > ----- Original Message -----
> > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > How do you know two versions are installed (i.e. how are
> you
> > checking
> > > > the
> > > > > version)?  Do you see have two complete bioperl distributions (in
> > two
> > > > > separate directories) or are you looking in modules?  Here's the
> way
> > to
> > > > > check the version (from the FAQ):
> > > > > > perl -MBio::Root::Version -e 'print
> > $Bio::Root::Version::VERSION,"\n"'
> > > > > > If you have two full bioperl distributions on your computer,
> > normally
> > > > only
> > > > > one will be in use unless you have explicitly set the environment
> > > > variable
> > > > > PERL5LIB.  The PERL5LIB  directories will be searched first before
> > your
> > > > > normal perl directory list (@INC) is searched.  You MAY get some
> > mixing
> > > > > then, but only if perl can't find a particular module in the path
> > > > designated
> > > > > in PERL5LIB; then it will progress through the directories listed
> in
> > > > @INC.
> > > > > This may happen if a module is unique to a particular release, but
> > > > shouldn't
> > > > > happen for the majority of modules, including RemoteBlast.  You
> can
> > > > check
> > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > differ
> > > > > depending on your OS, perl build, etc.
> > > > > > Regardless, if you follow the directions for installing bioperl
> > for
> > > > your
> > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > unless
> > > > you
> > > > > explicitly change the installation directory when using 'perl
> > > > Makefile.PL'),
> > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > install
> > > > the
> > > > > Bioperl distribution you downloaded over the old version in @INC.
> > See
> > > > this
> > > > > page:
> > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > for more details.
> > > > > > Christopher Fields
> > > > > Postdoctoral Researcher - Switzer Lab
> > > > > Dept. of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > > > > > -----Original Message-----
> > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > Hi, Chris,
> > > > > > I do have different versions of bioperl on my Linux machine
> (1.4.
> > and
> > > > > > 1.5.0), this may be the problem. Should I just install bioperl-
> > 1.5.1
> > > > or I
> > > > > > need to uninstall and remove the previous versions. I could not
> > find
> > > > any
> > > > > > hint on uninstalling bioperl on linux. Could you please give me
> > some
> > > > > > suggestion?
> > > > > > Thanks,
> > > > > > Guojun
> > > > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > >       _____
> > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely
> > > > updated from CVS
> > > > > > which isn't the latest fix.
> > > > > > > > Make sure that you check the following:
> > > > > > > > 1) Always post to the mailing list:
> > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > (CVS)
> > > > > > installed first.  Perform a clean installation; do not upgrade
> > only
> > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> can't
> > > > > > guarantee that mixing modules from old and new distributions
> (1.4
> > and
> > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > > > > installation will allow text output from BLAST v.2.2.12 to be
> > saved
> > > > and
> > > > > > parsed; it will not parse the newest BLAST text output from NCBI
> > > > (v2.2.13)
> > > > > > but it should still save it. I believe as long as next_results()
> > isn't
> > > > > > called, it will work.
> > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> 2.2.13
> > > > text output
> > > > > > are NOT in CVS; they haven't been cleared and checked in by
> Roger
> > Hall
> > > > > > (who's now taking care of RemoteBlast) and the powers that be
> > (Jason
> > > > or
> > > > > > whomever is in charge of Bio::SearchIO).  They can be found in
> > > > Bugzilla:
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > option
> > > > of
> > > > > > saving XML output, so isn't necessary if you don't plan on using
> > this
> > > > > > option.  And, remember, they haven't been committed yet to CVS,
> > which
> > > > > > means that the final version will change to refle the new
> version.
> > > > > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > >     _____
> > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > To: Chris Fields
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > > > Hi, Chris
> > > > > > > > Thanks for your suggestion, however, it doesn't seem to work
> > for
> > > > my cgi
> > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > even
> > > > get
> > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > Guojun
> > > > > > > > > > Guojun Yang
> > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > > Tel: 706-542-1857
> > > > > > Fax: 706-542-1805
> > > > > > http://www.arches.uga.edu/~guojun
> > > > > >     _____
> > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > I would say give the new code a try, but realize that it
> > hasn't
> > > > been
> > > > > > checked
> > > > > > in (like I said below). I will try going over the modified
> > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > anything I
> > > > > > might
> > > > > > have missed. The changed order in the header of BLAST text
> output
> > has
> > > > me a
> > > > > > bit worried that it might not catch everything, but it at least
> > > > doesn't
> > > > > > hang
> > > > > > in the while() loop I described in the bug report below (bug
> > #1934)
> > > > and
> > > > > > seems to process everything fine.
> > > > > > > > If you want more stability in the code, you might consider
> > > > changing over
> > > > > > to
> > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are
> > some
> > > > > > changes
> > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > saving
> > > > XML
> > > > > > output, but I believe it parses everything regardless. If you
> look
> > > > back
> > > > > > the
> > > > > > last month or so there has been a bit of discussion here about
> it.
> > > > Jason
> > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > > remoteblast/
> > > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > -----Original Message-----
> > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > To: bioperl-l at bioperl.org
> > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > version
> > > > 1.28
> > > > > > >
> > > > > > > Hi, Everybody,
> > > > > > > I see this post and am wondering if this is the reason for the
> > > > > > > malfunctionning of my webserver. We set up a webserver named
> > MAK,
> > > > for
> > > > > > MITE
> > > > > > > sequence analysis. It was working very well until around
> > November
> > > > 2005,
> > > > > > > when it stopped returning any result (the site is fine and
> seems
> > to
> > > > be
> > > > > > > doing sth after submission). In the CGI script, I used
> > remoteblast
> > > > (that
> > > > > > > work was done in 2003) to do searches. I currently do not have
> > > > access to
> > > > > > > the server because I moved. Quite several people sent emails
> to
> > us
> > > > about
> > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > problem?
> > > > > > Should
> > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > version?
> > > > > > > Thanks a lot,
> > > > > > > Guojun
> > > > > > >
> > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > > Tel: 706-542-1857
> > > > > > > Fax: 706-542-1805
> > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > _____
> > > > > > >
> > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > Jian'
> > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> [mailto:bioperl-
> > > > > > > l at bioperl.org]
> > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > >
> > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > CVS.
> > > > It
> > > > > > > will
> > > > > > > work for saving text output. However, it will not parse
> anything
> > > > using
> > > > > > > next_result (it will likely hang) and will not save XML
> format.
> > See
> > > > > > these
> > > > > > > bugs:
> > > > > > >
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > >
> > > > > > > for explanations and possible fixes (changes to RemoteBlast
> and
> > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in
> > yet
> > > > so
> > > > > > are
> > > > > > > still not included in bioperl-live; they may be further
> modified
> > > > before
> > > > > > > committing to CVS. If you're not worried about XML, you could
> > just
> > > > try
> > > > > > the
> > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > >
> > > > > > > Nagesh, I remember you posting to the list a month ago using a
> > > > script
> > > > > > > which
> > > > > > > had problems; the script you used saves the output but doesn't
> > > > actually
> > > > > > > parse it (i.e. you don't use next_result() to go through the
> > data).
> > > > Is
> > > > > > the
> > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have
> you
> > > > tried
> > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > readmethod
> > > > =>
> > > > > > > blast"
> > > > > > > using your version of RemoteBlast and method next_result()?
> Like
> > > > below
> > > > > > > (from
> > > > > > > perldoc):
> > > > > > >
> > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > foreach my $rid ( @rids ) {
> > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > if( !ref($rc) ) {
> > > > > > > if( $rc < 0 ) {
> > > > > > > $factory->remove_rid($rid);
> > > > > > > }
> > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > sleep 5;
> > > > > > > } else { # parsing
> > > > > > > starts here
> > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > here
> > > > > > > #save the output
> > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > $factory->save_output($filename);
> > > > > > > $factory->remove_rid($rid);
> > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > next unless ( $v > 0);
> > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > My script hanged if I used next_result() in any way prior to
> the
> > > > fixes.
> > > > > > I
> > > > > > > want to see how many others are having the same issues with
> > parsing
> > > > > > using
> > > > > > > the CVS version of bioperl-live.
> > > > > > >
> > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > >
> > > > > > > > Hi Huang,
> > > > > > > > Thanks for the message. The older version of RemoteBlast.pm
> > works
> > > > on
> > > > > > the
> > > > > > > > logic of checking the temporary file size to determine
> whether
> > the
> > > > > > Blast
> > > > > > > > results are ready. This condition is not getting satisfied
> may
> > be
> > > > due
> > > > > > to
> > > > > > > > some changes brought about by NCBI. I had this problem
> > recently
> > > > and
> > > > > > > > figured out that the solution was to use the latest version
> > which
> > > > has
> > > > > > > > this problem fixed (does not use file size logic any more)
> > which
> > > > is
> > > > > > not
> > > > > > > > yet included in the BioPerl package.
> > > > > > > > Cheers
> > > > > > > > Nagesh
> > > > > > > >
> > > > > > > > Huang Jian wrote:
> > > > > > > >
> > > > > > > > > Dear Nagesh,
> > > > > > > > >
> > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > you
> > > > send
> > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > >
> > > > > > > > > Thank you!!
> > > > > > > > >
> > > > > > > > > Huang
> > > > > > > > >
> > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > 
> > > > > > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > > > > > 
> > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> net,
> > so
> > > > still
> > > > > > > > > via email
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> Hi Huang,
> > > > > > > > >> I see that you are submitting a sequence for a remote
> blast
> > > > search.
> > > > > > > Can
> > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > (2005/12/09).
> > > > > > If
> > > > > > > > >> not I have attached it with this email, try to replace it
> > with
> > > > the
> > > > > > > old
> > > > > > > > >> one which has a bug.
> > > > > > > > >> Let me know if it works.
> > > > > > > > >> Nagesh
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > _______________________________________________
> > > > > > Bioperl-l mailing list
> > > > > > Bioperl-l at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > >
> > > > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >



From sdavis2 at mail.nih.gov  Tue Feb 14 15:02:59 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 14 Feb 2006 15:02:59 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

You can look get the upstream regions for genes via the table browser at
UCSC.  If you want to do it yourself, just download their refGene table (as
a tab-delimited text file) that includes the HUGO gene name.  Then, use the
method given by Brian to look up the locations.  The genome just isn't THAT
big to download and to store locally.  Note that most of the big sites (like
NCBI, for example) impose restrictions on the number and timing of hits, so
utilizing them for high-thoughput analysis (like for gene expression
studies) is not always feasible.  I have found that having the data locally
is almost always better.

Sean
 


On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From cjfields at uiuc.edu  Tue Feb 14 15:32:42 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 14:32:42 -0600
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
Message-ID: <001201c631a5$ce7496f0$15327e82@pyrimidine>

Hilmar, 

Good News: I've added a section to the bioperl wiki on installing bioperl-db
in Windows:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Installing_bioperl
-db

Bad News:  There's a new problem now. I updated from CVS yesterday; I walked
through the steps and ran 'nmake test', with everything passing fine.
However, load_seqdatabase.pl is extremely slow; it's loading a sequence
every 5 minutes or so.  I noticed (when using '-debug') that it is hanging
up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a database,
load the biosql schema, and load sequences w/o loading taxonomy, the problem
goes away.

Here's the debugging output (I cut it off at the point it hangs up):
----------------------------------------------------------------------------
-------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -driver
mysql -namespace test -dbname biosql -dbuser root -dbpass ********** -format
genbank  -debug NP_252217.gpt
Loading NP_252217.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
        attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
        attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Annotation::Collection
        attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::Root::Root
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
        attempting to load module Bio::DB::BioSQL::RootIAdaptor
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionIAdaptor
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
        attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
        attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
        attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
        attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
        attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
        attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
        attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
        attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
        attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
        attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
        attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
        attempting to load module Bio::DB::BioSQL::LocationIAdaptor
        attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
        attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer
for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "test" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES
(?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "test" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id =
?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)  
----------------------------------------------------------------------------
-------------------------

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From osborne1 at optonline.net  Tue Feb 14 16:32:42 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 14 Feb 2006 16:32:42 -0500
Subject: [Bioperl-l] game xml SeqIO
In-Reply-To: <43F1043A.2000205@cornell.edu>
Message-ID: 

Robert,

It looks like you're right that this data isn't handled by SeqIO/game. If
you'd like to add this then feel free to do it, the modified files or
patches can be submitted to bugzilla.bioperl.org. If you take this on then
please add a test or 2 to t/game.t as well.

Yes, Bio::SeqFeature::Computation sounds right - does it match the data
you're trying to parse? SeqFeature::Generic is the most commonly used, and
it's flexible, but if another type of SeqFeature fits your data more
precisely then that's the one you should use.

Brian O.


On 2/13/06 5:12 PM, "Robert Buels"  wrote:

> Hi all,
> 
> Currently, the SeqIO for doing GAME XML does not seem to support writing
> (or reading?)  elements.  Am I correct?
> 
> If I am, are there any plans to add this functionality?  Can I help / do it?
> 
> If there are plans to add this, how would one distinguish SeqFeatures
> that should be rendered as  from SeqFeatures
> that should be rendered as ?  Would we do that with
> Bio::SeqFeature::Computation?  I assume that a given Seq can have
> SeqFeatures of different types associated with it (I don't know, I'm a
> bioperl newb).
> 
> Rob




From saldroubi at yahoo.com  Tue Feb 14 22:54:42 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Tue, 14 Feb 2006 19:54:42 -0800 (PST)
Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix
Message-ID: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>

All,
 
 I am trying to use Bio::Matrix::GenericMatrix module.  
 I simply put this line in my program:
     use Bio::Matrix::GenericMatrix;
 
 but I get the followin error:
 
 Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: /usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 /usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18.
 BEGIN failed--compilation aborted at sf.pl line 18.
 
 I found this module using find which is called Generic.pm in this directory
     /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix
 
 Could someone tell me why it is not working.  I have no trouble including these modules in my file.  
     use Bio::SeqIO;
     use Bio::DB::GenBank;
 
 Thank you. 
 
   

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From jason.stajich at duke.edu  Tue Feb 14 23:10:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 14 Feb 2006 23:10:56 -0500
Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix
In-Reply-To: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>
References: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>
Message-ID: 

try:
use Bio::Matrix::Generic;

Apparently I screwed up the SYNOPSIS.  fixed that just now.

-jason
On Feb 14, 2006, at 10:54 PM, Sam Al-Droubi wrote:

> All,
>
>  I am trying to use Bio::Matrix::GenericMatrix module.
>  I simply put this line in my program:
>      use Bio::Matrix::GenericMatrix;
>
>  but I get the followin error:
>
>  Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: / 
> usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 / 
> usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/ 
> perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/ 
> vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/ 
> vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18.
>  BEGIN failed--compilation aborted at sf.pl line 18.
>
>  I found this module using find which is called Generic.pm in this  
> directory
>      /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix
>
>  Could someone tell me why it is not working.  I have no trouble  
> including these modules in my file.
>      use Bio::SeqIO;
>      use Bio::DB::GenBank;
>
>  Thank you.
>
>
>
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From daniel.lang at biologie.uni-freiburg.de  Wed Feb 15 05:35:40 2006
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Wed, 15 Feb 2006 11:35:40 +0100
Subject: [Bioperl-l] distmat matrix
Message-ID: <43F303FC.9000806@biologie.uni-freiburg.de>

Hi,

I need to go through a uncorrected distmat matrix (EMBOSS, run locally)
to filter sequences from an MSA.
I had a look around and didn't find an obvious candidate. Before I start
writing something my own...
Is there a bioperl parser for reading distmat matrices or can I trick
the Bio::MapIO parsers for scoring or PHYLIP in doing so?
If anyone knows of course a tool to generate an uncorrected distance
matrix of protein MSAs that is supported by bioperl, would be also OK
for me:)

I have no experience with the Pise
(Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand
it it's only to execute the application on a remote web server? Or can I
solve my task with Pise?

Thanks in advance!

Daniel



From praveecbt at yahoo.co.in  Wed Feb 15 03:57:44 2006
From: praveecbt at yahoo.co.in (Praveen Raj)
Date: Wed, 15 Feb 2006 08:57:44 +0000 (GMT)
Subject: [Bioperl-l] Help
Message-ID: <20060215085744.14911.qmail@web8711.mail.in.yahoo.com>

Dear  Peter Schattner Sir,
   
                                       I have one problem with the profile_align() of  Clustalw object.
   
  I have given the code like this,
   ......
  12 @seq_array=($seqobj1,$seqobj2,$seqobj3);
13 $seq_array_ref=\@seq_array;
  14 $aln=$factory->align($seq_array_ref);
  15 print $out $aln;   # this works fine
  16 $sen = Bio::Seq->new(-display_id => '>gi|userdata|',
17                      -seq => "MTKKPGGPGKNRA....",
18                      -format => "fasta");
19 $aln=$factory->profile_align($aln,$sen); #problem here
  20 print $out1 $aln;
   
  I have got one error like this in Line No. 19
   
  ERROR: Could not open sequence file (-profile) 
  No. of seqs. read = -1. No alignment!
   
  How I can I solve this problem?
  Hope you provide a proper solution.
   
                           Thanking you,
                                         Praveen Raj,
                                         Project Student,
                                         NIV, India.

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.


From jason.stajich at duke.edu  Wed Feb 15 08:19:41 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 08:19:41 -0500
Subject: [Bioperl-l] distmat matrix
In-Reply-To: <43F303FC.9000806@biologie.uni-freiburg.de>
References: <43F303FC.9000806@biologie.uni-freiburg.de>
Message-ID: <550C115C-1216-4285-8BE5-EC217C3F1BE9@duke.edu>

Bioperl can parse PHYLIP distance matricies, see Bio::Matrix::IO.  I  
didn't write an EMBOSS distmat result parser but that would be nice  
to have (but check that EMBOSS doesn't already allow output in phylip  
format first).

There is pure-perl distance matrix calculation of a MSA for DNA  
sequences
Bio::Align::DNAStatistics
and for protein
Bio::Align::ProteinStatistics

There is some initial discussion here on the website, but could  
certainly use some more details.

http://bioperl.org/wiki/Phylogenetics
http://bioperl.org/wiki/HOWTO:Trees
http://bioperl.org/wiki/Module:Bio::Align::DNAStatistics


-jason
On Feb 15, 2006, at 5:35 AM, Daniel Lang wrote:

> Hi,
>
> I need to go through a uncorrected distmat matrix (EMBOSS, run  
> locally)
> to filter sequences from an MSA.
> I had a look around and didn't find an obvious candidate. Before I  
> start
> writing something my own...
> Is there a bioperl parser for reading distmat matrices or can I trick
> the Bio::MapIO parsers for scoring or PHYLIP in doing so?
> If anyone knows of course a tool to generate an uncorrected distance
> matrix of protein MSAs that is supported by bioperl, would be also OK
> for me:)
>
> I have no experience with the Pise
> (Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand
> it it's only to execute the application on a remote web server? Or  
> can I
> solve my task with Pise?
>
> Thanks in advance!
>
> Daniel
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From michael.watson at bbsrc.ac.uk  Wed Feb 15 10:06:29 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed, 15 Feb 2006 15:06:29 -0000
Subject: [Bioperl-l] Website issues
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

The links on the left of bioperl.org don't work in konqueror 3.1.1,
which is a real b*gger because that's the browser I use on Linux... :-S

Mick



From rmb32 at cornell.edu  Wed Feb 15 11:01:07 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Wed, 15 Feb 2006 11:01:07 -0500
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
Message-ID: <43F35043.7070705@cornell.edu>

Hi all,

I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using 
FeatureIO, except it purports not to support gff 2), and the file looks 
like:

##gff-version 2
##date 2006-02-13
##sequence-region C01HBa0088L02.seq 1 120525
C01HBa0088L02   RepeatMasker    similarity      3537    4267     3.3    
-       .       Target "Motif:bac_end_repeat_family_345" 1 740
C01HBa0088L02   RepeatMasker    similarity      4172    4279     2.9    
+       .       Target "Motif:HRSiTERT00100141" 1 104
C01HBa0088L02   RepeatMasker    similarity      4267    4323     0.0    
-       .       Target "Motif:k_29" 150 206
C01HBa0088L02   RepeatMasker    similarity      4322    4492    26.6    
+       .       Target "Motif:PRSiTERT00300001" 1960 2129
C01HBa0088L02   RepeatMasker    similarity      4557    5124    29.5    
+       .       Target "Motif:PRSiTERT00300001" 2142 2711

Notice the score column is padded with spaces.

Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid 
score.  My question is, who is wrong here, my input file or 
Bio::Tools::GFF?  Should Bio::Tools::GFF be able to read this file?

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 607-255-2360
rmb32 at cornell.edu
http://www.sgn.cornell.edu




From jason.stajich at duke.edu  Wed Feb 15 11:12:59 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 11:12:59 -0500
Subject: [Bioperl-l] Website issues
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>

Okay I guess someone will have to look into that.  Can you normally  
browse on wikipedia, we're just using their software, maybe it is a  
javascript problem?

Please send a system bug request to our helpdesk:
support at open-bio.org

-jason
On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> The links on the left of bioperl.org don't work in konqueror 3.1.1,
> which is a real b*gger because that's the browser I use on  
> Linux... :-S
>
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From Marc.Logghe at DEVGEN.com  Wed Feb 15 11:13:16 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 15 Feb 2006 17:13:16 +0100
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B2E@ANTARESIA.be.devgen.com>

Hi Rob,
According to the GFF Specifications Document @
http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml :

All of the above described fields should be separated by TAB characters
('\t'). All values of the mandatory fields should not include whitespace
(i.e. the strings for ,  and  fields).

Reading that, I am afraid you have to pre-process your gff input file
...
HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Robert Buels
> Sent: Wednesday, February 15, 2006 5:01 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::GFF parsing error
> 
> Hi all,
> 
> I'm parsing a GFF2 file with Bio::Tools::GFF (I would be 
> using FeatureIO, except it purports not to support gff 2), 
> and the file looks
> like:
> 
> ##gff-version 2
> ##date 2006-02-13
> ##sequence-region C01HBa0088L02.seq 1 120525
> C01HBa0088L02   RepeatMasker    similarity      3537    4267  
>    3.3    
> -       .       Target "Motif:bac_end_repeat_family_345" 1 740
> C01HBa0088L02   RepeatMasker    similarity      4172    4279  
>    2.9    
> +       .       Target "Motif:HRSiTERT00100141" 1 104
> C01HBa0088L02   RepeatMasker    similarity      4267    4323  
>    0.0    
> -       .       Target "Motif:k_29" 150 206
> C01HBa0088L02   RepeatMasker    similarity      4322    4492  
>   26.6    
> +       .       Target "Motif:PRSiTERT00300001" 1960 2129
> C01HBa0088L02   RepeatMasker    similarity      4557    5124  
>   29.5    
> +       .       Target "Motif:PRSiTERT00300001" 2142 2711
> 
> Notice the score column is padded with spaces.
> 
> Bio::Tools::GFF does not like this, and says that ' 3.3' is 
> not a valid score.  My question is, who is wrong here, my 
> input file or Bio::Tools::GFF?  Should Bio::Tools::GFF be 
> able to read this file?
> 
> Rob
> 
> --
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 607-255-2360
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From jason.stajich at duke.edu  Wed Feb 15 11:29:14 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 11:29:14 -0500
Subject: [Bioperl-l] Website issues
In-Reply-To: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
Message-ID: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>

I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE  
3.1.4-9)

But it works fine for me on 3.2.2-8.FC2 ....

So I'm going to go with this being a konqueror bug, sorry to say, but  
feel free to still report the bug to the helpdesk.
	
-jason
On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote:

> Okay I guess someone will have to look into that.  Can you normally
> browse on wikipedia, we're just using their software, maybe it is a
> javascript problem?
>
> Please send a system bug request to our helpdesk:
> support at open-bio.org
>
> -jason
> On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:
>
>> Hi
>>
>> The links on the left of bioperl.org don't work in konqueror 3.1.1,
>> which is a real b*gger because that's the browser I use on
>> Linux... :-S
>>
>> Mick
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Wed Feb 15 11:57:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 10:57:13 -0600
Subject: [Bioperl-l] Added 'Installing Bioperl for Unix' to wiki
Message-ID: <000301c63250$de506120$15327e82@pyrimidine>

I added an Installing Bioperl for Unix page, 

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

which is a quick redo of the INSTALL text file in the bioperl distribution.
It's in workable shape but needs links revisions etc.  

Please leave any comments on the discussion pages here.  

http://www.bioperl.org/wiki/Talk:Getting_BioPerl
http://www.bioperl.org/wiki/Talk:Installing_Bioperl_for_Unix

Thanks to Brian for helping out with the Windows install doc!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From khoueiry at ibdm.univ-mrs.fr  Wed Feb 15 12:23:21 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Wed, 15 Feb 2006 18:23:21 +0100
Subject: [Bioperl-l] Website issues
In-Reply-To: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
	<82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
Message-ID: <1140024202.2689.45.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 

From heikki at sanbi.ac.za  Wed Feb 15 13:55:07 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 15 Feb 2006 20:55:07 +0200
Subject: [Bioperl-l] Website issues
In-Reply-To: <1140024202.2689.45.camel@localhost>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
	<1140024202.2689.45.camel@localhost>
Message-ID: <200602152055.07667.heikki@sanbi.ac.za>

Konqueror 3.5.1.  has no problems, either. Clearly, older konqueror had a bug 
that has been permanently fixed.

Michael, time for you to upgrade.

	-Heikki

On Wednesday 15 February 2006 19:23, khoueiry wrote:
> I test it on konqueror 3.4.2 and it works well !!!
>
> On Wed, 2006-02-15 at 11:29 -0500, Jason Stajich wrote:
> > I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE
> > 3.1.4-9)
> >
> > But it works fine for me on 3.2.2-8.FC2 ....
> >
> > So I'm going to go with this being a konqueror bug, sorry to say, but
> > feel free to still report the bug to the helpdesk.
> >
> > -jason
> >
> > On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote:
> > > Okay I guess someone will have to look into that.  Can you normally
> > > browse on wikipedia, we're just using their software, maybe it is a
> > > javascript problem?
> > >
> > > Please send a system bug request to our helpdesk:
> > > support at open-bio.org
> > >
> > > -jason
> > >
> > > On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:
> > >> Hi
> > >>
> > >> The links on the left of bioperl.org don't work in konqueror 3.1.1,
> > >> which is a real b*gger because that's the browser I use on
> > >> Linux... :-S
> > >>
> > >> Mick
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > http://www.duke.edu/~jes12
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From gyang at plantbio.uga.edu  Wed Feb 15 14:39:41 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Wed, 15 Feb 2006 14:39:41 -0500
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
Message-ID: <20060215143941.54e91487@dogwood.plantbio.uga.edu>

Hi, Chris,
Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: 
"
waiting........
------------- EXCEPTION  -------------
MSG: no data for midline  Features flanking this part of subject sequence:
STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc                             hIO/blast.pm:1172
STACK toplevel remoteblast_test:40
"
The query sequence is:
CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG

The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'):
#!/usr/bin/perl

use Bio::SeqIO;
use Bio::Seq;
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use strict;
my $prog='blastn';
my $db='nr';
my $e_val=1e-10;
my @params=( -prog=>$prog,
	-data=>$db,
	-expect=>$e_val,
	-readmethod=>'SearchIO');
my $factory=Bio::Tools::Run::RemoteBlast->new(@params);

my $v = 1;

my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );

while (my $input = $str->next_seq()){
  #Blast a sequence against a database:
  #Alternatively, you could  pass in a file with many
  #sequences rather than loop through sequence one at a time
  #Remove the loop starting 'while (my $input = $str->next_seq())'
  #and swap the two lines below for an example of that.
  my $r = $factory->submit_blast($input);
  #my $r = $factory->submit_blast('amino.fa');
  print STDERR "waiting..." if( $v > 0 );
  while ( my @rids = $factory->each_rid ) {
    foreach my $rid ( @rids ) {
      my $rc = $factory->retrieve_blast($rid);
      if( !ref($rc) ) {
        if( $rc < 0 ) {
          $factory->remove_rid($rid);
        }
        print STDERR "." if ( $v > 0 );
        sleep 5;
      } else {
        my $result = $rc->next_result();
        #save the output
        my $filename = $result->query_name()."\.out";
        $factory->save_output($filename);
        $factory->remove_rid($rid);
        print "\nQuery Name: ", $result->query_name(), "\n";
        while ( my $hit = $result->next_hit ) {
          next unless ( $v > 0);
          print "\thit name is ", $hit->name, "\n";
          while( my $hsp = $hit->next_hsp ) {
            print "\t\tscore is ", $hsp->score, "\n";
          }
        }
      }
    }
  }
}


Do you think there might still be something in the NCBI output format?

Thank you,
Guojun




Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun



----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2


> Sorry, forgot to add that I didn't see the regex issue that you mentioned.
> It could be a perl-related issue.  Try the fixes I mentioned and see what
> happens.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > Sent: Tuesday, February 14, 2006 12:36 PM
> > To: 'gyang at plantbio.uga.edu'
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > > It's a good habit to always add single quotes around words.  The perl
> > interpreter may think a single bare word is a subroutine or perlfunc
> > called with no args so will try to find a subroutine named blastp().  My
> > debugger actually gives the error that the bare word blastp may conflict
> > with a future reserved word.  Like you said, 'use strict' will point that
> > out.
> > > > As for the regex, it should match all the blast programs at NCBI (blastp,
> > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> > else passes through.
> > > > So, if you are using the script below, there are several errors.  The bare
> > words for $prog and $db need quotes, and the flags for you @params array
> > don't have a dash before them.  I get this after adding quotes but before
> > adding the dashes to @params:
> > > > C:\Perl\Scripts>test_blast.pl
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG:
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > live/Bio/Root/Root.pm:328
> > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > live/Bio/Tools/Run/RemoteBlast.pm:256
> > STACK: C:\Perl\Scripts\test_blast.pl:15
> > -----------------------------------------------------------
> > > > The last line indicates a problem with this line:
> > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > > Changing the @params to this:
> > > > my @params=( -prog=>$prog,
> > 	-data=>$db,
> > 	-expect=>$e_val,
> > 	-readmethod=>'SearchIO');
> > > > fixes it, and I get output as expected.
> > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > Sent: Tuesday, February 14, 2006 11:48 AM
> > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >
> > > Hi, Chris,
> > > When I tried with the perldoc script, It did not work either. First it
> > > says $prog can not be bare word if I "use strict". I added quotes on the
> > > words, then it says the value for $prog does not match expression
> > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > script
> > > is shown below. Why is the expression "t?blast[pnx]"?
> > >
> > > #!/usr/bin/perl
> > >
> > > use Bio::SeqIO;
> > > use Bio::Seq;
> > > use Bio::Tools::Run::RemoteBlast;
> > > use Bio::SearchIO;
> > >
> > >
> > > my $prog=blastp;
> > > my $db=swissprot;
> > > my $e_val=1e-10;
> > > my @params=( prog=>$prog,
> > > 	data=>$db,
> > > 	expect=>$e_val,
> > > 	readmethod=>'SearchIO');
> > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >
> > > my $v = 1;
> > >
> > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > >
> > > while (my $input = $str->next_seq()){
> > >   #Blast a sequence against a database:
> > >   #Alternatively, you could  pass in a file with many
> > >   #sequences rather than loop through sequence one at a time
> > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >   #and swap the two lines below for an example of that.
> > >   my $r = $factory->submit_blast($input);
> > >   #my $r = $factory->submit_blast('amino.fa');
> > >   print STDERR "waiting..." if( $v > 0 );
> > >   while ( my @rids = $factory->each_rid ) {
> > >     foreach my $rid ( @rids ) {
> > >       my $rc = $factory->retrieve_blast($rid);
> > >       if( !ref($rc) ) {
> > >         if( $rc < 0 ) {
> > >           $factory->remove_rid($rid);
> > >         }
> > >         print STDERR "." if ( $v > 0 );
> > >         sleep 5;
> > >       } else {
> > >         my $result = $rc->next_result();
> > >         #save the output
> > >         my $filename = $result->query_name()."\.out";
> > >         $factory->save_output($filename);
> > >         $factory->remove_rid($rid);
> > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > >         while ( my $hit = $result->next_hit ) {
> > >           next unless ( $v > 0);
> > >           print "\thit name is ", $hit->name, "\n";
> > >           while( my $hsp = $hit->next_hsp ) {
> > >             print "\t\tscore is ", $hsp->score, "\n";
> > >           }
> > >         }
> > >       }
> > >     }
> > >   }
> > > }
> > >
> > > Thank you for your help!
> > >
> > >
> > > Guojun
> > > Department of Plant Biology
> > > University of Georgia
> > >
> > > ----- Original Message -----
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > To: gyang at plantbio.uga.edu
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >
> > >
> > > > Try two things:
> > > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > > wrong
> > > > with the logic in your subroutine:
> > > > > my $v = 1;
> > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > > > while (my $input = $str->next_seq()){
> > > >   #Blast a sequence against a database:
> > > >   #Alternatively, you could  pass in a file with many
> > > >   #sequences rather than loop through sequence one at a time
> > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >   #and swap the two lines below for an example of that.
> > > >   my $r = $factory->submit_blast($input);
> > > >   #my $r = $factory->submit_blast('amino.fa');
> > > >   print STDERR "waiting..." if( $v > 0 );
> > > >   while ( my @rids = $factory->each_rid ) {
> > > >     foreach my $rid ( @rids ) {
> > > >       my $rc = $factory->retrieve_blast($rid);
> > > >       if( !ref($rc) ) {
> > > >         if( $rc < 0 ) {
> > > >           $factory->remove_rid($rid);
> > > >         }
> > > >         print STDERR "." if ( $v > 0 );
> > > >         sleep 5;
> > > >       } else {
> > > >         my $result = $rc->next_result();
> > > >         #save the output
> > > >         my $filename = $result->query_name()."\.out";
> > > >         $factory->save_output($filename);
> > > >         $factory->remove_rid($rid);
> > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > >         while ( my $hit = $result->next_hit ) {
> > > >           next unless ( $v > 0);
> > > >           print "\thit name is ", $hit->name, "\n";
> > > >           while( my $hsp = $hit->next_hsp ) {
> > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > >           }
> > > >         }
> > > >       }
> > > >     }
> > > >   }
> > > > }
> > > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > really
> > > > shouldn't make that much of a difference, but I noticed that the CVS
> > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > > released; the Bugzilla version is based off CVS.
> > > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > > > > -----Original Message-----
> > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > > To: bioperl-l at lists.open-bio.org
> > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > Thanks, Chris,
> > > > > I installed version 1.5.1 and replaced the blast.pm file with the
> > one
> > > from
> > > > > your bug report. The running version is 1.5 when I use the command
> > you
> > > > > sent me. But when I tried the script, it doesn't change much. My
> > > > > remoteblast code (portion) is here:
> > > > > > > sub search {
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > > local
> > > > >
> > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > > 'no';
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > > 			      -id=>"query",
> > > > > 			      -desc=>"new seq");
> > > > > my $len=$query->length();
> > > > > @db=('nr','htgs','wgs');
> > > > > foreach my $db (@db) {
> > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> > > > > 						'-data' =>"$db",
> > > > >
> '-expect'=>"$E_value");
> > > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > > my @rids = $factory->each_rid();
> > > > > foreach my $rid ( @rids ) {
> > > > >     print STDERR "$rid\n";
> > > > > }
> > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > > print STDERR "waiting...";
> > > > > sleep 60;
> > > > > > > foreach my $rid ( @rids ) {
> > > > >     my $rc = $factory->retrieve_blast($rid);
> > > > >     while (!ref($rc) ) {
> > > > > 	if( $rc < 0 ) {
> > > > > # retrieve_blast returns -1 on error
> > > > > 	    $factory->remove_rid($rid);
> > > > > 	    print "Error!\n";
> > > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > > 	    die "Can't retrieve $rid";
> > > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > finished'
> > > > > 	    sleep 60;
> > > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > > 	}
> > > > >     }
> > > > >     if (ref($rc)) {
> > > > > 	print STDERR "Done.\n";
> > > > > 	 while( my $result = $rc->next_result) {
> > > > > 	    while( my $hit = $result->next_hit()) {
> > > > > 	    	$hit_name=$hit->name;
> > > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > > 		$name=$1;
> > > > > 		@left_plus_start=();
> > > > > 		@left_plus_end=();
> > > > > 		@left_minus_start=();
> > > > > 		@left_minus_end=();
> > > > > 		@right_plus_start=();
> > > > > 		@right_plus_end=();
> > > > > 		@right_minus_start=();
> > > > > 		@right_minus_end=();
> > > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > > ......
> > > > > > > It was working quite well before around October laster year, but
> > > it has
> > > > > stopped since then, When a submission is sent via a webpage, the cgi
> > > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > finally
> > > > > the expected email is received but without real results although it
> > > does
> > > > > contain something from other parts of the script. Apparently the
> > > search
> > > > > sub did not return anything (I know there is something should be
> > > > > returned.). Is it also possible the format of the NCBI output for
> > each
> > > > > result has changed?
> > > > > Thank you,
> > > > > Guojun
> > > > > > > > > Department of Plant Biology
> > > > > University of Georgia
> > > > > > > > > > > ----- Original Message -----
> > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > How do you know two versions are installed (i.e. how are
> > you
> > > checking
> > > > > the
> > > > > > version)?  Do you see have two complete bioperl distributions (in
> > > two
> > > > > > separate directories) or are you looking in modules?  Here's the
> > way
> > > to
> > > > > > check the version (from the FAQ):
> > > > > > > perl -MBio::Root::Version -e 'print
> > > $Bio::Root::Version::VERSION,"\n"'
> > > > > > > If you have two full bioperl distributions on your computer,
> > > normally
> > > > > only
> > > > > > one will be in use unless you have explicitly set the environment
> > > > > variable
> > > > > > PERL5LIB.  The PERL5LIB  directories will be searched first before
> > > your
> > > > > > normal perl directory list (@INC) is searched.  You MAY get some
> > > mixing
> > > > > > then, but only if perl can't find a particular module in the path
> > > > > designated
> > > > > > in PERL5LIB; then it will progress through the directories listed
> > in
> > > > > @INC.
> > > > > > This may happen if a module is unique to a particular release, but
> > > > > shouldn't
> > > > > > happen for the majority of modules, including RemoteBlast.  You
> > can
> > > > > check
> > > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > > differ
> > > > > > depending on your OS, perl build, etc.
> > > > > > > Regardless, if you follow the directions for installing bioperl
> > > for
> > > > > your
> > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > > unless
> > > > > you
> > > > > > explicitly change the installation directory when using 'perl
> > > > > Makefile.PL'),
> > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > install
> > > > > the
> > > > > > Bioperl distribution you downloaded over the old version in @INC.
> > > See
> > > > > this
> > > > > > page:
> > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > > for more details.
> > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > -----Original Message-----
> > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > Hi, Chris,
> > > > > > > I do have different versions of bioperl on my Linux machine
> > (1.4.
> > > and
> > > > > > > 1.5.0), this may be the problem. Should I just install bioperl-
> > > 1.5.1
> > > > > or I
> > > > > > > need to uninstall and remove the previous versions. I could not
> > > find
> > > > > any
> > > > > > > hint on uninstalling bioperl on linux. Could you please give me
> > > some
> > > > > > > suggestion?
> > > > > > > Thanks,
> > > > > > > Guojun
> > > > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > >       _____
> > > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely
> > > > > updated from CVS
> > > > > > > which isn't the latest fix.
> > > > > > > > > Make sure that you check the following:
> > > > > > > > > 1) Always post to the mailing list:
> > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > > (CVS)
> > > > > > > installed first.  Perform a clean installation; do not upgrade
> > > only
> > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > can't
> > > > > > > guarantee that mixing modules from old and new distributions
> > (1.4
> > > and
> > > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > > > > > installation will allow text output from BLAST v.2.2.12 to be
> > > saved
> > > > > and
> > > > > > > parsed; it will not parse the newest BLAST text output from NCBI
> > > > > (v2.2.13)
> > > > > > > but it should still save it. I believe as long as next_results()
> > > isn't
> > > > > > > called, it will work.
> > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> > 2.2.13
> > > > > text output
> > > > > > > are NOT in CVS; they haven't been cleared and checked in by
> > Roger
> > > Hall
> > > > > > > (who's now taking care of RemoteBlast) and the powers that be
> > > (Jason
> > > > > or
> > > > > > > whomever is in charge of Bio::SearchIO).  They can be found in
> > > > > Bugzilla:
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > > option
> > > > > of
> > > > > > > saving XML output, so isn't necessary if you don't plan on using
> > > this
> > > > > > > option.  And, remember, they haven't been committed yet to CVS,
> > > which
> > > > > > > means that the final version will change to refle the new
> > version.
> > > > > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > >     _____
> > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > > To: Chris Fields
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > > > Hi, Chris
> > > > > > > > > Thanks for your suggestion, however, it doesn't seem to work
> > > for
> > > > > my cgi
> > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > > even
> > > > > get
> > > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > > Guojun
> > > > > > > > > > > Guojun Yang
> > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > > Tel: 706-542-1857
> > > > > > > Fax: 706-542-1805
> > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > >     _____
> > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > I would say give the new code a try, but realize that it
> > > hasn't
> > > > > been
> > > > > > > checked
> > > > > > > in (like I said below). I will try going over the modified
> > > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > > anything I
> > > > > > > might
> > > > > > > have missed. The changed order in the header of BLAST text
> > output
> > > has
> > > > > me a
> > > > > > > bit worried that it might not catch everything, but it at least
> > > > > doesn't
> > > > > > > hang
> > > > > > > in the while() loop I described in the bug report below (bug
> > > #1934)
> > > > > and
> > > > > > > seems to process everything fine.
> > > > > > > > > If you want more stability in the code, you might consider
> > > > > changing over
> > > > > > > to
> > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are
> > > some
> > > > > > > changes
> > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > saving
> > > > > XML
> > > > > > > output, but I believe it parses everything regardless. If you
> > look
> > > > > back
> > > > > > > the
> > > > > > > last month or so there has been a bit of discussion here about
> > it.
> > > > > Jason
> > > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > > > remoteblast/
> > > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > > To: bioperl-l at bioperl.org
> > > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > > version
> > > > > 1.28
> > > > > > > >
> > > > > > > > Hi, Everybody,
> > > > > > > > I see this post and am wondering if this is the reason for the
> > > > > > > > malfunctionning of my webserver. We set up a webserver named
> > > MAK,
> > > > > for
> > > > > > > MITE
> > > > > > > > sequence analysis. It was working very well until around
> > > November
> > > > > 2005,
> > > > > > > > when it stopped returning any result (the site is fine and
> > seems
> > > to
> > > > > be
> > > > > > > > doing sth after submission). In the CGI script, I used
> > > remoteblast
> > > > > (that
> > > > > > > > work was done in 2003) to do searches. I currently do not have
> > > > > access to
> > > > > > > > the server because I moved. Quite several people sent emails
> > to
> > > us
> > > > > about
> > > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > > problem?
> > > > > > > Should
> > > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > > version?
> > > > > > > > Thanks a lot,
> > > > > > > > Guojun
> > > > > > > >
> > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > > Tel: 706-542-1857
> > > > > > > > Fax: 706-542-1805
> > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > > _____
> > > > > > > >
> > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > > Jian'
> > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > [mailto:bioperl-
> > > > > > > > l at bioperl.org]
> > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > >
> > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > > CVS.
> > > > > It
> > > > > > > > will
> > > > > > > > work for saving text output. However, it will not parse
> > anything
> > > > > using
> > > > > > > > next_result (it will likely hang) and will not save XML
> > format.
> > > See
> > > > > > > these
> > > > > > > > bugs:
> > > > > > > >
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > >
> > > > > > > > for explanations and possible fixes (changes to RemoteBlast
> > and
> > > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in
> > > yet
> > > > > so
> > > > > > > are
> > > > > > > > still not included in bioperl-live; they may be further
> > modified
> > > > > before
> > > > > > > > committing to CVS. If you're not worried about XML, you could
> > > just
> > > > > try
> > > > > > > the
> > > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > > >
> > > > > > > > Nagesh, I remember you posting to the list a month ago using a
> > > > > script
> > > > > > > > which
> > > > > > > > had problems; the script you used saves the output but doesn't
> > > > > actually
> > > > > > > > parse it (i.e. you don't use next_result() to go through the
> > > data).
> > > > > Is
> > > > > > > the
> > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > you
> > > > > tried
> > > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > > readmethod
> > > > > =>
> > > > > > > > blast"
> > > > > > > > using your version of RemoteBlast and method next_result()?
> > Like
> > > > > below
> > > > > > > > (from
> > > > > > > > perldoc):
> > > > > > > >
> > > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > > foreach my $rid ( @rids ) {
> > > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > > if( !ref($rc) ) {
> > > > > > > > if( $rc < 0 ) {
> > > > > > > > $factory->remove_rid($rid);
> > > > > > > > }
> > > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > > sleep 5;
> > > > > > > > } else { # parsing
> > > > > > > > starts here
> > > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > > here
> > > > > > > > #save the output
> > > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > > $factory->save_output($filename);
> > > > > > > > $factory->remove_rid($rid);
> > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > > next unless ( $v > 0);
> > > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > > My script hanged if I used next_result() in any way prior to
> > the
> > > > > fixes.
> > > > > > > I
> > > > > > > > want to see how many others are having the same issues with
> > > parsing
> > > > > > > using
> > > > > > > > the CVS version of bioperl-live.
> > > > > > > >
> > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > l-
> > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > >
> > > > > > > > > Hi Huang,
> > > > > > > > > Thanks for the message. The older version of RemoteBlast.pm
> > > works
> > > > > on
> > > > > > > the
> > > > > > > > > logic of checking the temporary file size to determine
> > whether
> > > the
> > > > > > > Blast
> > > > > > > > > results are ready. This condition is not getting satisfied
> > may
> > > be
> > > > > due
> > > > > > > to
> > > > > > > > > some changes brought about by NCBI. I had this problem
> > > recently
> > > > > and
> > > > > > > > > figured out that the solution was to use the latest version
> > > which
> > > > > has
> > > > > > > > > this problem fixed (does not use file size logic any more)
> > > which
> > > > > is
> > > > > > > not
> > > > > > > > > yet included in the BioPerl package.
> > > > > > > > > Cheers
> > > > > > > > > Nagesh
> > > > > > > > >
> > > > > > > > > Huang Jian wrote:
> > > > > > > > >
> > > > > > > > > > Dear Nagesh,
> > > > > > > > > >
> > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > > you
> > > > > send
> > > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > > >
> > > > > > > > > > Thank you!!
> > > > > > > > > >
> > > > > > > > > > Huang
> > > > > > > > > >
> > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > > 
> > > > > > > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > > > > > > 
> > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > net,
> > > so
> > > > > still
> > > > > > > > > > via email
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> Hi Huang,
> > > > > > > > > >> I see that you are submitting a sequence for a remote
> > blast
> > > > > search.
> > > > > > > > Can
> > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > > (2005/12/09).
> > > > > > > If
> > > > > > > > > >> not I have attached it with this email, try to replace it
> > > with
> > > > > the
> > > > > > > > old
> > > > > > > > > >> one which has a bug.
> > > > > > > > > >> Let me know if it works.
> > > > > > > > > >> Nagesh
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > 



From cjfields at uiuc.edu  Wed Feb 15 15:17:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 14:17:27 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on
	RemoteBlast.pmversion 1.28
In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
Message-ID: <000001c6326c$d72dd640$15327e82@pyrimidine>

This looks like a genuine bug and may be something that changed in BLASTN
text output; I'm getting it here, too.  Running verbose shows that text
output is returned, so, from that and from the stack trace it looks like
another error in text parsing in Bio::SearchIO::blast.  Bio::SearchIO::blast
line 1172 throws a conditional exception.  

I'm adding this to bug 1934 in bugzilla (reference to your email and this
response) for now.  I'll try messing around with it when I can; I'm really
busy this week.  I'll also forward this to Roger Hall.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Wednesday, February 15, 2006 1:40 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pmversion 1.28
> 
> Hi, Chris,
> Finally the remoteblast test script works for the amino.fa query. but when
> I try a nucleic acid sequence (see below), Error occurs:
> "
> waiting........
> ------------- EXCEPTION  -------------
> MSG: no data for midline  Features flanking this part of subject sequence:
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> hIO/blast.pm:1172
> STACK toplevel remoteblast_test:40
> "
> The query sequence is:
> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> 
> The script (basically same as the remoteblast test, I only changed
> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> #!/usr/bin/perl
> 
> use Bio::SeqIO;
> use Bio::Seq;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> use strict;
> my $prog='blastn';
> my $db='nr';
> my $e_val=1e-10;
> my @params=( -prog=>$prog,
> 	-data=>$db,
> 	-expect=>$e_val,
> 	-readmethod=>'SearchIO');
> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> 
> my $v = 1;
> 
> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> 
> while (my $input = $str->next_seq()){
>   #Blast a sequence against a database:
>   #Alternatively, you could  pass in a file with many
>   #sequences rather than loop through sequence one at a time
>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>   #and swap the two lines below for an example of that.
>   my $r = $factory->submit_blast($input);
>   #my $r = $factory->submit_blast('amino.fa');
>   print STDERR "waiting..." if( $v > 0 );
>   while ( my @rids = $factory->each_rid ) {
>     foreach my $rid ( @rids ) {
>       my $rc = $factory->retrieve_blast($rid);
>       if( !ref($rc) ) {
>         if( $rc < 0 ) {
>           $factory->remove_rid($rid);
>         }
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>       } else {
>         my $result = $rc->next_result();
>         #save the output
>         my $filename = $result->query_name()."\.out";
>         $factory->save_output($filename);
>         $factory->remove_rid($rid);
>         print "\nQuery Name: ", $result->query_name(), "\n";
>         while ( my $hit = $result->next_hit ) {
>           next unless ( $v > 0);
>           print "\thit name is ", $hit->name, "\n";
>           while( my $hsp = $hit->next_hsp ) {
>             print "\t\tscore is ", $hsp->score, "\n";
>           }
>         }
>       }
>     }
>   }
> }
> 
> 
> Do you think there might still be something in the NCBI output format?
> 
> Thank you,
> Guojun
> 
> 
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> 
> 
> 
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> 
> 
> > Sorry, forgot to add that I didn't see the regex issue that you
> mentioned.
> > It could be a perl-related issue.  Try the fixes I mentioned and see
> what
> > happens.
> > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > Sent: Tuesday, February 14, 2006 12:36 PM
> > > To: 'gyang at plantbio.uga.edu'
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > > > It's a good habit to always add single quotes around words.  The
> perl
> > > interpreter may think a single bare word is a subroutine or perlfunc
> > > called with no args so will try to find a subroutine named blastp().
> My
> > > debugger actually gives the error that the bare word blastp may
> conflict
> > > with a future reserved word.  Like you said, 'use strict' will point
> that
> > > out.
> > > > > As for the regex, it should match all the blast programs at NCBI
> (blastp,
> > > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> > > else passes through.
> > > > > So, if you are using the script below, there are several errors.
> The bare
> > > words for $prog and $db need quotes, and the flags for you @params
> array
> > > don't have a dash before them.  I get this after adding quotes but
> before
> > > adding the dashes to @params:
> > > > > C:\Perl\Scripts>test_blast.pl
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG:
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > > live/Bio/Root/Root.pm:328
> > > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > > live/Bio/Tools/Run/RemoteBlast.pm:256
> > > STACK: C:\Perl\Scripts\test_blast.pl:15
> > > -----------------------------------------------------------
> > > > > The last line indicates a problem with this line:
> > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > > > Changing the @params to this:
> > > > > my @params=( -prog=>$prog,
> > > 	-data=>$db,
> > > 	-expect=>$e_val,
> > > 	-readmethod=>'SearchIO');
> > > > > fixes it, and I get output as expected.
> > > > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > > > > > > -----Original Message-----
> > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > Sent: Tuesday, February 14, 2006 11:48 AM
> > > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >
> > > > Hi, Chris,
> > > > When I tried with the perldoc script, It did not work either. First
> it
> > > > says $prog can not be bare word if I "use strict". I added quotes on
> the
> > > > words, then it says the value for $prog does not match expression
> > > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > > script
> > > > is shown below. Why is the expression "t?blast[pnx]"?
> > > >
> > > > #!/usr/bin/perl
> > > >
> > > > use Bio::SeqIO;
> > > > use Bio::Seq;
> > > > use Bio::Tools::Run::RemoteBlast;
> > > > use Bio::SearchIO;
> > > >
> > > >
> > > > my $prog=blastp;
> > > > my $db=swissprot;
> > > > my $e_val=1e-10;
> > > > my @params=( prog=>$prog,
> > > > 	data=>$db,
> > > > 	expect=>$e_val,
> > > > 	readmethod=>'SearchIO');
> > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >
> > > > my $v = 1;
> > > >
> > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > >
> > > > while (my $input = $str->next_seq()){
> > > >   #Blast a sequence against a database:
> > > >   #Alternatively, you could  pass in a file with many
> > > >   #sequences rather than loop through sequence one at a time
> > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >   #and swap the two lines below for an example of that.
> > > >   my $r = $factory->submit_blast($input);
> > > >   #my $r = $factory->submit_blast('amino.fa');
> > > >   print STDERR "waiting..." if( $v > 0 );
> > > >   while ( my @rids = $factory->each_rid ) {
> > > >     foreach my $rid ( @rids ) {
> > > >       my $rc = $factory->retrieve_blast($rid);
> > > >       if( !ref($rc) ) {
> > > >         if( $rc < 0 ) {
> > > >           $factory->remove_rid($rid);
> > > >         }
> > > >         print STDERR "." if ( $v > 0 );
> > > >         sleep 5;
> > > >       } else {
> > > >         my $result = $rc->next_result();
> > > >         #save the output
> > > >         my $filename = $result->query_name()."\.out";
> > > >         $factory->save_output($filename);
> > > >         $factory->remove_rid($rid);
> > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > >         while ( my $hit = $result->next_hit ) {
> > > >           next unless ( $v > 0);
> > > >           print "\thit name is ", $hit->name, "\n";
> > > >           while( my $hsp = $hit->next_hsp ) {
> > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > >           }
> > > >         }
> > > >       }
> > > >     }
> > > >   }
> > > > }
> > > >
> > > > Thank you for your help!
> > > >
> > > >
> > > > Guojun
> > > > Department of Plant Biology
> > > > University of Georgia
> > > >
> > > > ----- Original Message -----
> > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > To: gyang at plantbio.uga.edu
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >
> > > >
> > > > > Try two things:
> > > > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's
> something
> > > > wrong
> > > > > with the logic in your subroutine:
> > > > > > my $v = 1;
> > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> );
> > > > > > while (my $input = $str->next_seq()){
> > > > >   #Blast a sequence against a database:
> > > > >   #Alternatively, you could  pass in a file with many
> > > > >   #sequences rather than loop through sequence one at a time
> > > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > > >   #and swap the two lines below for an example of that.
> > > > >   my $r = $factory->submit_blast($input);
> > > > >   #my $r = $factory->submit_blast('amino.fa');
> > > > >   print STDERR "waiting..." if( $v > 0 );
> > > > >   while ( my @rids = $factory->each_rid ) {
> > > > >     foreach my $rid ( @rids ) {
> > > > >       my $rc = $factory->retrieve_blast($rid);
> > > > >       if( !ref($rc) ) {
> > > > >         if( $rc < 0 ) {
> > > > >           $factory->remove_rid($rid);
> > > > >         }
> > > > >         print STDERR "." if ( $v > 0 );
> > > > >         sleep 5;
> > > > >       } else {
> > > > >         my $result = $rc->next_result();
> > > > >         #save the output
> > > > >         my $filename = $result->query_name()."\.out";
> > > > >         $factory->save_output($filename);
> > > > >         $factory->remove_rid($rid);
> > > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > > >         while ( my $hit = $result->next_hit ) {
> > > > >           next unless ( $v > 0);
> > > > >           print "\thit name is ", $hit->name, "\n";
> > > > >           while( my $hsp = $hit->next_hsp ) {
> > > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > > >           }
> > > > >         }
> > > > >       }
> > > > >     }
> > > > >   }
> > > > > }
> > > > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > > really
> > > > > shouldn't make that much of a difference, but I noticed that the
> CVS
> > > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1
> was
> > > > > released; the Bugzilla version is based off CVS.
> > > > > > Christopher Fields
> > > > > Postdoctoral Researcher - Switzer Lab
> > > > > Dept. of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > Thanks, Chris,
> > > > > > I installed version 1.5.1 and replaced the blast.pm file with
> the
> > > one
> > > > from
> > > > > > your bug report. The running version is 1.5 when I use the
> command
> > > you
> > > > > > sent me. But when I tried the script, it doesn't change much. My
> > > > > > remoteblast code (portion) is here:
> > > > > > > > sub search {
> > > > > > local
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > > > local
> $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > > > local
> > > > > >
> > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > > > 'no';
> > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > > > 			      -id=>"query",
> > > > > > 			      -desc=>"new seq");
> > > > > > my $len=$query->length();
> > > > > > @db=('nr','htgs','wgs');
> > > > > > foreach my $db (@db) {
> > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> =>'blastn',
> > > > > > 						'-data' =>"$db",
> > > > > >
> > '-expect'=>"$E_value");
> > > > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > > > my @rids = $factory->each_rid();
> > > > > > foreach my $rid ( @rids ) {
> > > > > >     print STDERR "$rid\n";
> > > > > > }
> > > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > > > print STDERR "waiting...";
> > > > > > sleep 60;
> > > > > > > > foreach my $rid ( @rids ) {
> > > > > >     my $rc = $factory->retrieve_blast($rid);
> > > > > >     while (!ref($rc) ) {
> > > > > > 	if( $rc < 0 ) {
> > > > > > # retrieve_blast returns -1 on error
> > > > > > 	    $factory->remove_rid($rid);
> > > > > > 	    print "Error!\n";
> > > > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > > > 	    die "Can't retrieve $rid";
> > > > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > > finished'
> > > > > > 	    sleep 60;
> > > > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > > > 	}
> > > > > >     }
> > > > > >     if (ref($rc)) {
> > > > > > 	print STDERR "Done.\n";
> > > > > > 	 while( my $result = $rc->next_result) {
> > > > > > 	    while( my $hit = $result->next_hit()) {
> > > > > > 	    	$hit_name=$hit->name;
> > > > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > > > 		$name=$1;
> > > > > > 		@left_plus_start=();
> > > > > > 		@left_plus_end=();
> > > > > > 		@left_minus_start=();
> > > > > > 		@left_minus_end=();
> > > > > > 		@right_plus_start=();
> > > > > > 		@right_plus_end=();
> > > > > > 		@right_minus_start=();
> > > > > > 		@right_minus_end=();
> > > > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i))
{
> > > > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > > > ......
> > > > > > > > It was working quite well before around October laster year,
> but
> > > > it has
> > > > > > stopped since then, When a submission is sent via a webpage, the
> cgi
> > > > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > > finally
> > > > > > the expected email is received but without real results although
> it
> > > > does
> > > > > > contain something from other parts of the script. Apparently the
> > > > search
> > > > > > sub did not return anything (I know there is something should be
> > > > > > returned.). Is it also possible the format of the NCBI output
> for
> > > each
> > > > > > result has changed?
> > > > > > Thank you,
> > > > > > Guojun
> > > > > > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > > > > > > > > ----- Original Message -----
> > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > > How do you know two versions are installed (i.e. how
> are
> > > you
> > > > checking
> > > > > > the
> > > > > > > version)?  Do you see have two complete bioperl distributions
> (in
> > > > two
> > > > > > > separate directories) or are you looking in modules?  Here's
> the
> > > way
> > > > to
> > > > > > > check the version (from the FAQ):
> > > > > > > > perl -MBio::Root::Version -e 'print
> > > > $Bio::Root::Version::VERSION,"\n"'
> > > > > > > > If you have two full bioperl distributions on your computer,
> > > > normally
> > > > > > only
> > > > > > > one will be in use unless you have explicitly set the
> environment
> > > > > > variable
> > > > > > > PERL5LIB.  The PERL5LIB  directories will be searched first
> before
> > > > your
> > > > > > > normal perl directory list (@INC) is searched.  You MAY get
> some
> > > > mixing
> > > > > > > then, but only if perl can't find a particular module in the
> path
> > > > > > designated
> > > > > > > in PERL5LIB; then it will progress through the directories
> listed
> > > in
> > > > > > @INC.
> > > > > > > This may happen if a module is unique to a particular release,
> but
> > > > > > shouldn't
> > > > > > > happen for the majority of modules, including RemoteBlast.
> You
> > > can
> > > > > > check
> > > > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC
> will
> > > > differ
> > > > > > > depending on your OS, perl build, etc.
> > > > > > > > Regardless, if you follow the directions for installing
> bioperl
> > > > for
> > > > > > your
> > > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make
> install',
> > > > unless
> > > > > > you
> > > > > > > explicitly change the installation directory when using 'perl
> > > > > > Makefile.PL'),
> > > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > > install
> > > > > > the
> > > > > > > Bioperl distribution you downloaded over the old version in
> @INC.
> > > > See
> > > > > > this
> > > > > > > page:
> > > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > > > for more details.
> > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > Hi, Chris,
> > > > > > > > I do have different versions of bioperl on my Linux machine
> > > (1.4.
> > > > and
> > > > > > > > 1.5.0), this may be the problem. Should I just install
> bioperl-
> > > > 1.5.1
> > > > > > or I
> > > > > > > > need to uninstall and remove the previous versions. I could
> not
> > > > find
> > > > > > any
> > > > > > > > hint on uninstalling bioperl on linux. Could you please give
> me
> > > > some
> > > > > > > > suggestion?
> > > > > > > > Thanks,
> > > > > > > > Guojun
> > > > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > >       _____
> > > > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've
> likely
> > > > > > updated from CVS
> > > > > > > > which isn't the latest fix.
> > > > > > > > > > Make sure that you check the following:
> > > > > > > > > > 1) Always post to the mailing list:
> > > > > > > >
> http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-
> live
> > > > (CVS)
> > > > > > > > installed first.  Perform a clean installation; do not
> upgrade
> > > > only
> > > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > > can't
> > > > > > > > guarantee that mixing modules from old and new distributions
> > > (1.4
> > > > and
> > > > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-
> live
> > > > > > > > installation will allow text output from BLAST v.2.2.12 to
> be
> > > > saved
> > > > > > and
> > > > > > > > parsed; it will not parse the newest BLAST text output from
> NCBI
> > > > > > (v2.2.13)
> > > > > > > > but it should still save it. I believe as long as
> next_results()
> > > > isn't
> > > > > > > > called, it will work.
> > > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> > > 2.2.13
> > > > > > text output
> > > > > > > > are NOT in CVS; they haven't been cleared and checked in by
> > > Roger
> > > > Hall
> > > > > > > > (who's now taking care of RemoteBlast) and the powers that
> be
> > > > (Jason
> > > > > > or
> > > > > > > > whomever is in charge of Bio::SearchIO).  They can be found
> in
> > > > > > Bugzilla:
> > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow
> the
> > > > option
> > > > > > of
> > > > > > > > saving XML output, so isn't necessary if you don't plan on
> using
> > > > this
> > > > > > > > option.  And, remember, they haven't been committed yet to
> CVS,
> > > > which
> > > > > > > > means that the final version will change to refle the new
> > > version.
> > > > > > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > > >     _____
> > > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > > > To: Chris Fields
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > > > Hi, Chris
> > > > > > > > > > Thanks for your suggestion, however, it doesn't seem to
> work
> > > > for
> > > > > > my cgi
> > > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I
> didn't
> > > > even
> > > > > > get
> > > > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > > > Guojun
> > > > > > > > > > > > Guojun Yang
> > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > > Tel: 706-542-1857
> > > > > > > > Fax: 706-542-1805
> > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > >     _____
> > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > I would say give the new code a try, but realize that it
> > > > hasn't
> > > > > > been
> > > > > > > > checked
> > > > > > > > in (like I said below). I will try going over the modified
> > > > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > > > anything I
> > > > > > > > might
> > > > > > > > have missed. The changed order in the header of BLAST text
> > > output
> > > > has
> > > > > > me a
> > > > > > > > bit worried that it might not catch everything, but it at
> least
> > > > > > doesn't
> > > > > > > > hang
> > > > > > > > in the while() loop I described in the bug report below (bug
> > > > #1934)
> > > > > > and
> > > > > > > > seems to process everything fine.
> > > > > > > > > > If you want more stability in the code, you might
> consider
> > > > > > changing over
> > > > > > > > to
> > > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There
> are
> > > > some
> > > > > > > > changes
> > > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > > saving
> > > > > > XML
> > > > > > > > output, but I believe it parses everything regardless. If
> you
> > > look
> > > > > > back
> > > > > > > > the
> > > > > > > > last month or so there has been a bit of discussion here
> about
> > > it.
> > > > > > Jason
> > > > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-
> using-
> > > > > > remoteblast/
> > > > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > > > To: bioperl-l at bioperl.org
> > > > > > > > > Subject: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > >
> > > > > > > > > Hi, Everybody,
> > > > > > > > > I see this post and am wondering if this is the reason for
> the
> > > > > > > > > malfunctionning of my webserver. We set up a webserver
> named
> > > > MAK,
> > > > > > for
> > > > > > > > MITE
> > > > > > > > > sequence analysis. It was working very well until around
> > > > November
> > > > > > 2005,
> > > > > > > > > when it stopped returning any result (the site is fine and
> > > seems
> > > > to
> > > > > > be
> > > > > > > > > doing sth after submission). In the CGI script, I used
> > > > remoteblast
> > > > > > (that
> > > > > > > > > work was done in 2003) to do searches. I currently do not
> have
> > > > > > access to
> > > > > > > > > the server because I moved. Quite several people sent
> emails
> > > to
> > > > us
> > > > > > about
> > > > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > > > problem?
> > > > > > > > Should
> > > > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > > > version?
> > > > > > > > > Thanks a lot,
> > > > > > > > > Guojun
> > > > > > > > >
> > > > > > > > > Department of Plant Biology
> > > > > > > > > University of Georgia
> > > > > > > > > Tel: 706-542-1857
> > > > > > > > > Fax: 706-542-1805
> > > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > > > _____
> > > > > > > > >
> > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au],
> 'Huang
> > > > Jian'
> > > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > > [mailto:bioperl-
> > > > > > > > > l at bioperl.org]
> > > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > >
> > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-
> live
> > > > CVS.
> > > > > > It
> > > > > > > > > will
> > > > > > > > > work for saving text output. However, it will not parse
> > > anything
> > > > > > using
> > > > > > > > > next_result (it will likely hang) and will not save XML
> > > format.
> > > > See
> > > > > > > > these
> > > > > > > > > bugs:
> > > > > > > > >
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > >
> > > > > > > > > for explanations and possible fixes (changes to
> RemoteBlast
> > > and
> > > > > > > > > Bio::SearchIO::blast). Note that these haven't been
> checked in
> > > > yet
> > > > > > so
> > > > > > > > are
> > > > > > > > > still not included in bioperl-live; they may be further
> > > modified
> > > > > > before
> > > > > > > > > committing to CVS. If you're not worried about XML, you
> could
> > > > just
> > > > > > try
> > > > > > > > the
> > > > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > > > >
> > > > > > > > > Nagesh, I remember you posting to the list a month ago
> using a
> > > > > > script
> > > > > > > > > which
> > > > > > > > > had problems; the script you used saves the output but
> doesn't
> > > > > > actually
> > > > > > > > > parse it (i.e. you don't use next_result() to go through
> the
> > > > data).
> > > > > > Is
> > > > > > > > the
> > > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13?
> Have
> > > you
> > > > > > tried
> > > > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > > > readmethod
> > > > > > =>
> > > > > > > > > blast"
> > > > > > > > > using your version of RemoteBlast and method
> next_result()?
> > > Like
> > > > > > below
> > > > > > > > > (from
> > > > > > > > > perldoc):
> > > > > > > > >
> > > > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > > > foreach my $rid ( @rids ) {
> > > > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > > > if( !ref($rc) ) {
> > > > > > > > > if( $rc < 0 ) {
> > > > > > > > > $factory->remove_rid($rid);
> > > > > > > > > }
> > > > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > > > sleep 5;
> > > > > > > > > } else { # parsing
> > > > > > > > > starts here
> > > > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > > > here
> > > > > > > > > #save the output
> > > > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > > > $factory->save_output($filename);
> > > > > > > > > $factory->remove_rid($rid);
> > > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > > > next unless ( $v > 0);
> > > > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > My script hanged if I used next_result() in any way prior
> to
> > > the
> > > > > > fixes.
> > > > > > > > I
> > > > > > > > > want to see how many others are having the same issues
> with
> > > > parsing
> > > > > > > > using
> > > > > > > > > the CVS version of bioperl-live.
> > > > > > > > >
> > > > > > > > > Christopher Fields
> > > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > > Dept. of Biochemistry
> > > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-
> > > l-
> > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > > >
> > > > > > > > > > Hi Huang,
> > > > > > > > > > Thanks for the message. The older version of
> RemoteBlast.pm
> > > > works
> > > > > > on
> > > > > > > > the
> > > > > > > > > > logic of checking the temporary file size to determine
> > > whether
> > > > the
> > > > > > > > Blast
> > > > > > > > > > results are ready. This condition is not getting
> satisfied
> > > may
> > > > be
> > > > > > due
> > > > > > > > to
> > > > > > > > > > some changes brought about by NCBI. I had this problem
> > > > recently
> > > > > > and
> > > > > > > > > > figured out that the solution was to use the latest
> version
> > > > which
> > > > > > has
> > > > > > > > > > this problem fixed (does not use file size logic any
> more)
> > > > which
> > > > > > is
> > > > > > > > not
> > > > > > > > > > yet included in the BioPerl package.
> > > > > > > > > > Cheers
> > > > > > > > > > Nagesh
> > > > > > > > > >
> > > > > > > > > > Huang Jian wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Nagesh,
> > > > > > > > > > >
> > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v
> 1.28
> > > > you
> > > > > > send
> > > > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > > > >
> > > > > > > > > > > Thank you!!
> > > > > > > > > > >
> > > > > > > > > > > Huang
> > > > > > > > > > >
> > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > > > 
> > > > > > > > > > > To: "Huang Jian" ;
> "bioperl-l"
> > > > > > > > > > > 
> > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > > net,
> > > > so
> > > > > > still
> > > > > > > > > > > via email
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >> Hi Huang,
> > > > > > > > > > >> I see that you are submitting a sequence for a remote
> > > blast
> > > > > > search.
> > > > > > > > > Can
> > > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > > > (2005/12/09).
> > > > > > > > If
> > > > > > > > > > >> not I have attached it with this email, try to
> replace it
> > > > with
> > > > > > the
> > > > > > > > > old
> > > > > > > > > > >> one which has a bug.
> > > > > > > > > > >> Let me know if it works.
> > > > > > > > > > >> Nagesh
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Bioperl-l mailing list
> > > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > Bioperl-l mailing list
> > > > > > Bioperl-l at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > >
> > >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From sdavis2 at mail.nih.gov  Wed Feb 15 19:39:33 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 16 Feb 2006 00:39:33 -0000
Subject: [Bioperl-l] error running load_seqdatabase.pl
References: 
Message-ID: <000c01c63291$5de08600$6601a8c0@WATSON>


----- Original Message ----- 
From: "Angshu Kar" 
To: "bioperl-l" 
Sent: Thursday, December 29, 2005 5:50 PM
Subject: [Bioperl-l] error running load_seqdatabase.pl


> Hi,
>
> I'm getting the following error while trying to run :
>
> ./load_seqdatabase.pl -host localhost -dbname USBA -dbuser 
> postgres -format
> genbank NC_003076.gbk
>
> But I've a postgreSQL db and not a MySQL one...could anyone please guide 
> me
> troubleshoot this?

Angshu,

I would probably start with:

perldoc load_seqdatabase.pl

I think that will likely give you your answer.  Again, it is best to exhaust 
the resources at hand and to let the list know that you have done so 
(like--"I read the perldoc and tried this....").

Sean




From cain at cshl.edu  Wed Feb 15 11:07:28 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 15 Feb 2006 11:07:28 -0500
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
In-Reply-To: <43F35043.7070705@cornell.edu>
References: <43F35043.7070705@cornell.edu>
Message-ID: <1140019648.2849.58.camel@localhost.localdomain>

Hi Robert,

No column should ever be padded with spaces; GFF columns should always
be separated by a single tab.  Therefore, I don't thing Bio::Tools::GFF
is at fault here.

Scott


On Wed, 2006-02-15 at 11:01 -0500, Robert Buels wrote:
> Hi all,
> 
> I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using 
> FeatureIO, except it purports not to support gff 2), and the file looks 
> like:
> 
> ##gff-version 2
> ##date 2006-02-13
> ##sequence-region C01HBa0088L02.seq 1 120525
> C01HBa0088L02   RepeatMasker    similarity      3537    4267     3.3    
> -       .       Target "Motif:bac_end_repeat_family_345" 1 740
> C01HBa0088L02   RepeatMasker    similarity      4172    4279     2.9    
> +       .       Target "Motif:HRSiTERT00100141" 1 104
> C01HBa0088L02   RepeatMasker    similarity      4267    4323     0.0    
> -       .       Target "Motif:k_29" 150 206
> C01HBa0088L02   RepeatMasker    similarity      4322    4492    26.6    
> +       .       Target "Motif:PRSiTERT00300001" 1960 2129
> C01HBa0088L02   RepeatMasker    similarity      4557    5124    29.5    
> +       .       Target "Motif:PRSiTERT00300001" 2142 2711
> 
> Notice the score column is padded with spaces.
> 
> Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid 
> score.  My question is, who is wrong here, my input file or 
> Bio::Tools::GFF?  Should Bio::Tools::GFF be able to read this file?
> 
> Rob
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hlapp at gmx.net  Wed Feb 15 20:54:01 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 17:54:01 -0800
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: <001201c631a5$ce7496f0$15327e82@pyrimidine>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
Message-ID: 


On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:

> Hilmar,
>
> Good News: I've added a section to the bioperl wiki on installing  
> bioperl-db
> in Windows:
>
> http://www.bioperl.org/wiki/ 
> Installing_Bioperl_on_Windows#Installing_bioperl
> -db
>
> Bad News:  There's a new problem now. I updated from CVS yesterday; I  
> walked
> through the steps and ran 'nmake test', with everything passing fine.
> However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> every 5 minutes or so.  I noticed (when using '-debug') that it is  
> hanging
> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a  
> database,
> load the biosql schema, and load sequences w/o loading taxonomy, the  
> problem
> goes away.
>
> Here's the debugging output (I cut it off at the point it hangs up):
> [...]

> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND  
> ncbi_taxon_id =
> ?
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)

I'm a bit surprised if this is the query where it hangs. Are the  
indexes all there? There should be a primary key index on  
taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name  
over (taxon_id,name,name_class). Also, there should be separate indexes  
on taxon_name.taxon_id and taxon_name.name. Are they all there? If you  
reinstantiated the schema from the DDL then it seems unlikely that  
somehow the indexes have vanished except if you messed with the schema  
or the DDL.

Putting an index on taxon_name.name_class really can't make sense, so  
let's assume it can't be that.

So really I suspect this has something to do with the state of the  
database and the version of MySQL. In particular, from some 4.x version  
of MySQL under certain circumstances you have to analyze the statistics  
of the tables in order to get the optimizer pick up the indexes  
properly. Are you on MySQL 4.x and if so, have you done that?

There's the ANALYZE TABLE command:
http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html

Note the comment: "This statement works with MyISAM, BDB, and (as of  
MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?

Also, you can check the execution plan for the query using EXPLAIN.
http://dev.mysql.com/doc/refman/4.1/en/explain.html

This should show you whether the index would be picked up for the query  
or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to  
the db using the mysql shell (mysql).

I believe something similarly strange was encountered by someone using  
DB::GFF (or Chado) under MySQL, and if I recall correctly the solution  
was to optimize (analyze) the tables. Maybe someone who was in that  
thread reads this and can comment?

	-hilmar


>
> ----------------------------------------------------------------------- 
> -----
> -------------------------
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb 15 22:56:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 21:56:14 -0600
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: 
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	
Message-ID: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>



On Feb 15, 2006, at 7:54 PM, Hilmar Lapp wrote:

>
> On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:
>
>> Hilmar,
>>
>> Good News: I've added a section to the bioperl wiki on installing
>> bioperl-db
>> in Windows:
>>
>> http://www.bioperl.org/wiki/
>> Installing_Bioperl_on_Windows#Installing_bioperl
>> -db
>>
>> Bad News:  There's a new problem now. I updated from CVS yesterday; I
>> walked
>> through the steps and ran 'nmake test', with everything passing fine.
>> However, load_seqdatabase.pl is extremely slow; it's loading a  
>> sequence
>> every 5 minutes or so.  I noticed (when using '-debug') that it is
>> hanging
>> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
>> database,
>> load the biosql schema, and load sequences w/o loading taxonomy, the
>> problem
>> goes away.
>>
>> Here's the debugging output (I cut it off at the point it hangs up):
>> [...]
>
>> preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
>> NULL,
>> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name  
>> WHERE
>> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
>> ncbi_taxon_id =
>> ?
>> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
>> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
>
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on  
> taxon_name
> over (taxon_id,name,name_class). Also, there should be separate  
> indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

I looked in the mailing list archives and Barry mentions something here:

http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html

He rebuilt the database from scratch and got it working; no reason  
was given.  I wouldn't be surprised if it is something Mysql-related  
that pops up.  The strange thing is that only a few months ago  
everything ran well with this version of MySQL (v.5); this was with  
the first test database I installed on it.  Another strange thing (I  
think I mentioned it) is that NOT loading the taxonomy with  
load_ncbi_taxonomy.pl worked (everything was entered).  I'll try  
rebuilding the database from scratch to see what happens.  I am  
running this on Windows, so this is new territory...

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
>
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x  
> version
> of MySQL under certain circumstances you have to analyze the  
> statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
>
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
>
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
>
> This should show you whether the index would be picked up for the  
> query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).

I'll give these a shot and post what I find in the next few days.

> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?
>
> 	-hilmar

I wanted to also mention that we shouldn't check in the modifications  
to Bio::Root:Root until I confirm something (I'm at home and  
currently can't).  I tried running a script on an unrelated module  
using the modified Bio::Root::Roo (with the commas added after the  
'throw $class' statements.  Everything worked for $self->throw(),  
except the thrown message wasn't displayed.  I'll dig into it a bit  
more to see what happens.

>
>
>>
>> --------------------------------------------------------------------- 
>> --
>> -----
>> -------------------------
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From osborne1 at optonline.net  Thu Feb 16 00:16:04 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 00:16:04 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

Harry,

It's not clear to me that NCBI's eutils offers this capability directly. You
can probably download Entrez Gene entries and parse them for coordinates but
I know of no way to remotely retrieve genomic sequences like this from NCBI
(ENSEMBL API perhaps?). What I had in mind uses the local approach that some
of us favor and to prove to myself that this is simple to do I wrote a
script that I just added to examples/tools, it's called extract_genes.pl and
it's based on Bio::DB::Fasta. Download the sequence files for a given
species to some dir, download Entrez Gene's gene2accession file, and run. It
creates and stores a hash for lookups, it won't read gene2accession each
time it runs.

Brian O.


On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From hlapp at gmx.net  Thu Feb 16 01:31:54 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 22:31:54 -0800
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	
	<12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
Message-ID: 


On Feb 15, 2006, at 7:56 PM, Chris Fields wrote:

> [...]
> I looked in the mailing list archives and Barry mentions something 
> here:
>
> http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html
>
> He rebuilt the database from scratch and got it working; no reason
> was given.  I wouldn't be surprised if it is something Mysql-related
> that pops up.

Note though that he was using PostgreSQL. With Pg you definitely need 
to 'vacuum,' which is their name for analyzing/optimizing the table(s).

>   The strange thing is that only a few months ago
> everything ran well with this version of MySQL (v.5); this was with
> the first test database I installed on it.  Another strange thing (I
> think I mentioned it) is that NOT loading the taxonomy with
> load_ncbi_taxonomy.pl worked (everything was entered).

That's not really strange, it is in fact consistent with the query you 
report as taking a long time. If you don't pre-load the taxonomy then 
the taxon and taxon_name tables are empty or almost empty and look-ups 
and joins of empty tables are amazingly fast :-J

[...]
> I wanted to also mention that we shouldn't check in the modifications
> to Bio::Root:Root until I confirm something (I'm at home and
> currently can't).

OK we'll hold off.

	-hilmar
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From michael.watson at bbsrc.ac.uk  Thu Feb 16 05:31:54 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 16 Feb 2006 10:31:54 -0000
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

I have two questions really.  I fetched bacterial genome sequences from
the NCBI using Bio::DB::GenBank.

Some of these sequence entries are CONTIG sequences, ie they just point
to other sequences that need to be joined together to form the entire
genome.

Looking at my downloads, it looks as if bioperl has done all the
necessary joining for me - or maybe it was the NCBI that did the
joining?

OK, so firstly, did bioperl do the joining, and if so, are all the
co-ordinates of the features updated to reflect their new location on
the new, joined sequence?

And secondly, sequence versions... I'm thinking that possibly the
sequence version of the CONTIG may be 1 (as it hasn't changed) yet the
versions of the sequences it refers to might have changed, so when I ask
bioperl if these sequences have been updated, I will be told no because
the CONTIG sequence version is 1, but I should be told yes because the
underlying sequences have...?

Make sense?

Thanks
Mick



From cjfields at uiuc.edu  Thu Feb 16 07:51:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 06:51:50 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
References: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
	<43F449E1.80605@esat.kuleuven.be>
Message-ID: <369C1D1F-DBCB-4161-A24A-7C3E579D337A@uiuc.edu>

Yeah, looks like it broke text output nucleotide parsing with that.   
XML output parsing still works though (as expected).  I'll give it a  
look.

Chris

On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote:

> Hi,
>
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast- 
> output. (see e.g. "Features flanking this part..." or "Features in  
> this part ..."), example added.
> The blast.pm module starts looking for the hsp-alignement- 
> information, but it dies when it hits this Feature-information.
>
> Pieter
>
>
>> gi|77552765|gb|DP000011.1| > query.fcgi? 
>> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza  
>> sativa (japonica cultivar-group) chromosome 12, complete
>
> sequence
> Length=27492551
>
> Features flanking this part of subject sequence:
>   3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm  
> sub-class  val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts>
>   2655 bp at 3' side: hypothetical protein  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts>
>
> Score = 36.2 bits (18),  Expect = 0.22
> Identities = 18/18 (100%), Gaps = 0/18 (0%)
> Strand=Plus/Minus
>
> Query  4         GTACTACTCTACTCTACT  21
>                 ||||||||||||||||||
>
> Sbjct  19257436  GTACTACTCTACTCTACT  19257419
>
>
> Features flanking this part of subject sequence:
>   2991 bp at 5' side: hypothetical protein  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts>
>   1131 bp at 3' side: hypothetical protein
>  val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts>
>
> Score = 36.2 bits (18),  Expect = 0.22
> Identities = 18/18 (100%), Gaps = 0/18 (0%)
> Strand=Plus/Minus
>
> Query  2         ATGTACTACTCTACTCTA  19
>                 ||||||||||||||||||
> Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
>
>
>
> Features in this part of subject sequence:
>   DHHC zinc finger domain, putative
>  val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts>
>
> Score = 34.2 bits (17),  Expect = 0.87
> Identities = 17/17 (100%), Gaps = 0/17 (0%)
> Strand=Plus/Plus
>
> Query  5         TACTACTCTACTCTACT  21
>                 |||||||||||||||||
> Sbjct  17616437  TACTACTCTACTCTACT  17616453
>
>
>
> Features flanking this part of subject sequence:
>   102 bp at 5' side: bZIP transcription factor, putative
>  val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts>
>   3740 bp at 3' side: yeast dcp1, putative  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts>
>
> Score = 32.2 bits (16),  Expect = 3.4
> Identities = 16/16 (100%), Gaps = 0/16 (0%)
> Strand=Plus/Plus
>
> Query  7        CTACTCTACTCTACTC  22
>                ||||||||||||||||
> Sbjct  2775880  CTACTCTACTCTACTC  2775895
>
>
> Features flanking this part of subject sequence:
>
>   21 bp at 5' side: peptide transporter T17F3.11, putative  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts>
>   10230 bp at 3' side: transposon protein, putative, unclassified  
>  val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts>
>
> Score = 32.2 bits (16),  Expect = 3.4
> Identities = 16/16 (100%), Gaps = 0/16 (0%)
> Strand=Plus/Minus
>
> Query  7         CTACTCTACTCTACTC  22
>
>                 ||||||||||||||||
> Sbjct  27323153  CTACTCTACTCTACTC  27323138
>
>
>
>
> Guojun Yang wrote:
>
>> Hi, Chris,
>> Finally the remoteblast test script works for the amino.fa query.  
>> but when I try a nucleic acid sequence (see below), Error occurs: "
>> waiting........
>> ------------- EXCEPTION  -------------
>> MSG: no data for midline  Features flanking this part of subject  
>> sequence:
>> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/ 
>> 5.8.3/Bio/Searc                             hIO/blast.pm:1172
>> STACK toplevel remoteblast_test:40
>> "
>> The query sequence is:
>> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
>> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
>> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
>> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
>>
>> The script (basically same as the remoteblast test, I only changed  
>> database to 'nr' and program to 'blastn' and filename to 'ost3'):
>> #!/usr/bin/perl
>>
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use Bio::Tools::Run::RemoteBlast;
>> use Bio::SearchIO;
>> use strict;
>> my $prog='blastn';
>> my $db='nr';
>> my $e_val=1e-10;
>> my @params=( -prog=>$prog,
>> 	-data=>$db,
>> 	-expect=>$e_val,
>> 	-readmethod=>'SearchIO');
>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>
>> my $v = 1;
>>
>> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
>>
>> while (my $input = $str->next_seq()){
>>  #Blast a sequence against a database:
>>  #Alternatively, you could  pass in a file with many
>>  #sequences rather than loop through sequence one at a time
>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>  #and swap the two lines below for an example of that.
>>  my $r = $factory->submit_blast($input);
>>  #my $r = $factory->submit_blast('amino.fa');
>>  print STDERR "waiting..." if( $v > 0 );
>>  while ( my @rids = $factory->each_rid ) {
>>    foreach my $rid ( @rids ) {
>>      my $rc = $factory->retrieve_blast($rid);
>>      if( !ref($rc) ) {
>>        if( $rc < 0 ) {
>>          $factory->remove_rid($rid);
>>        }
>>        print STDERR "." if ( $v > 0 );
>>        sleep 5;
>>      } else {
>>        my $result = $rc->next_result();
>>        #save the output
>>        my $filename = $result->query_name()."\.out";
>>        $factory->save_output($filename);
>>        $factory->remove_rid($rid);
>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>        while ( my $hit = $result->next_hit ) {
>>          next unless ( $v > 0);
>>          print "\thit name is ", $hit->name, "\n";
>>          while( my $hsp = $hit->next_hsp ) {
>>            print "\t\tscore is ", $hsp->score, "\n";
>>          }
>>        }
>>      }
>>    }
>>  }
>> }
>>
>>
>> Do you think there might still be something in the NCBI output  
>> format?
>>
>> Thank you,
>> Guojun
>>
>>
>>
>>
>> Guojun Yang
>> Department of Plant Biology
>> University of Georgia
>> Tel: 706-542-1857
>> Fax: 706-542-1805
>> http://www.arches.uga.edu/~guojun
>>
>>
>>
>> ----- Original Message -----
>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>
>>
>>
>>> Sorry, forgot to add that I didn't see the regex issue that you  
>>> mentioned.
>>> It could be a perl-related issue.  Try the fixes I mentioned and  
>>> see what
>>> happens.
>>>
>>>> Christopher Fields
>>>>
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>>>> -----Original Message-----
>>>>>>
>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>> Sent: Tuesday, February 14, 2006 12:36 PM
>>>> To: 'gyang at plantbio.uga.edu'
>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>
>>>>>> It's a good habit to always add single quotes around words.   
>>>>>> The perl
>>>>>>
>>>> interpreter may think a single bare word is a subroutine or  
>>>> perlfunc
>>>> called with no args so will try to find a subroutine named blastp 
>>>> ().  My
>>>> debugger actually gives the error that the bare word blastp may  
>>>> conflict
>>>> with a future reserved word.  Like you said, 'use strict' will  
>>>> point that
>>>> out.
>>>>
>>>>>> As for the regex, it should match all the blast programs at  
>>>>>> NCBI (blastp,
>>>>>>
>>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure  
>>>> nothing
>>>> else passes through.
>>>>
>>>>>> So, if you are using the script below, there are several  
>>>>>> errors.  The bare
>>>>>>
>>>> words for $prog and $db need quotes, and the flags for you  
>>>> @params array
>>>> don't have a dash before them.  I get this after adding quotes  
>>>> but before
>>>> adding the dashes to @params:
>>>>
>>>>>> C:\Perl\Scripts>test_blast.pl
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>
>>>> MSG:
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
>>>> live/Bio/Root/Root.pm:328
>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
>>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl 
>>>> \bioperl-
>>>> live/Bio/Tools/Run/RemoteBlast.pm:256
>>>> STACK: C:\Perl\Scripts\test_blast.pl:15
>>>> -----------------------------------------------------------
>>>>
>>>>>> The last line indicates a problem with this line:
>>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>> Changing the @params to this:
>>>>>> my @params=( -prog=>$prog,
>>>>>>
>>>> 	-data=>$db,
>>>> 	-expect=>$e_val,
>>>> 	-readmethod=>'SearchIO');
>>>>
>>>>>> fixes it, and I get output as expected.
>>>>>> Christopher Fields
>>>>>>
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>>
>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>> Sent: Tuesday, February 14, 2006 11:48 AM
>>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>>
>>>>> Hi, Chris,
>>>>> When I tried with the perldoc script, It did not work either.  
>>>>> First it
>>>>> says $prog can not be bare word if I "use strict". I added  
>>>>> quotes on the
>>>>> words, then it says the value for $prog does not match expression
>>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
>>>>>
>>>> script
>>>>
>>>>> is shown below. Why is the expression "t?blast[pnx]"?
>>>>>
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use Bio::SeqIO;
>>>>> use Bio::Seq;
>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>> use Bio::SearchIO;
>>>>>
>>>>>
>>>>> my $prog=blastp;
>>>>> my $db=swissprot;
>>>>> my $e_val=1e-10;
>>>>> my @params=( prog=>$prog,
>>>>> 	data=>$db,
>>>>> 	expect=>$e_val,
>>>>> 	readmethod=>'SearchIO');
>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>
>>>>> my $v = 1;
>>>>>
>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  
>>>>> 'fasta' );
>>>>>
>>>>> while (my $input = $str->next_seq()){
>>>>>  #Blast a sequence against a database:
>>>>>  #Alternatively, you could  pass in a file with many
>>>>>  #sequences rather than loop through sequence one at a time
>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>  #and swap the two lines below for an example of that.
>>>>>  my $r = $factory->submit_blast($input);
>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>    foreach my $rid ( @rids ) {
>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>      if( !ref($rc) ) {
>>>>>        if( $rc < 0 ) {
>>>>>          $factory->remove_rid($rid);
>>>>>        }
>>>>>        print STDERR "." if ( $v > 0 );
>>>>>        sleep 5;
>>>>>      } else {
>>>>>        my $result = $rc->next_result();
>>>>>        #save the output
>>>>>        my $filename = $result->query_name()."\.out";
>>>>>        $factory->save_output($filename);
>>>>>        $factory->remove_rid($rid);
>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>          next unless ( $v > 0);
>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>          }
>>>>>        }
>>>>>      }
>>>>>    }
>>>>>  }
>>>>> }
>>>>>
>>>>> Thank you for your help!
>>>>>
>>>>>
>>>>> Guojun
>>>>> Department of Plant Biology
>>>>> University of Georgia
>>>>>
>>>>> ----- Original Message -----
>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>> To: gyang at plantbio.uga.edu
>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>
>>>>>
>>>>>
>>>>>> Try two things:
>>>>>>
>>>>>>> 1)  Use a much simpler script, like the one in 'perldoc
>>>>>>>
>>>>>> Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's  
>>>>>> something
>>>>>>
>>>>> wrong
>>>>>
>>>>>> with the logic in your subroutine:
>>>>>>
>>>>>>> my $v = 1;
>>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  
>>>>>>> 'fasta' );
>>>>>>> while (my $input = $str->next_seq()){
>>>>>>>
>>>>>>  #Blast a sequence against a database:
>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>  #and swap the two lines below for an example of that.
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>      if( !ref($rc) ) {
>>>>>>        if( $rc < 0 ) {
>>>>>>          $factory->remove_rid($rid);
>>>>>>        }
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>      } else {
>>>>>>        my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>        my $filename = $result->query_name()."\.out";
>>>>>>        $factory->save_output($filename);
>>>>>>        $factory->remove_rid($rid);
>>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>>          next unless ( $v > 0);
>>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>>          }
>>>>>>        }
>>>>>>      }
>>>>>>    }
>>>>>>  }
>>>>>> }
>>>>>>
>>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works.  It
>>>>>>>
>>>> really
>>>>
>>>>>> shouldn't make that much of a difference, but I noticed that  
>>>>>> the CVS
>>>>>> RemoteBlast (1.28) was changed in Dec 2005, after  
>>>>>> bioperl-1.5.1 was
>>>>>> released; the Bugzilla version is based off CVS.
>>>>>>
>>>>>>> Christopher Fields
>>>>>>>
>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>> Dept. of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>>
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>> Sent: Monday, February 13, 2006 3:00 PM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>
>>>>>>>>> Thanks, Chris,
>>>>>>>>>
>>>>>>> I installed version 1.5.1 and replaced the blast.pm file with  
>>>>>>> the
>>>>>>>
>>>> one
>>>>
>>>>> from
>>>>>
>>>>>>> your bug report. The running version is 1.5 when I use the  
>>>>>>> command
>>>>>>>
>>>> you
>>>>
>>>>>>> sent me. But when I tried the script, it doesn't change much. My
>>>>>>> remoteblast code (portion) is here:
>>>>>>>
>>>>>>>>> sub search {
>>>>>>>>>
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} 
>>>>>>> ="$ORGN";
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} 
>>>>>>> =5000;
>>>>>>> local
>>>>>>>
>>>>>>>
>>>> $Bio::Tools::Run::RemoteBlast::HEADER 
>>>> {'COMPOSITION_BASED_STATISTICS'}=
>>>>
>>>>>>> 'no';
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
>>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]",
>>>>>>> 			      -id=>"query",
>>>>>>> 			      -desc=>"new seq");
>>>>>>> my $len=$query->length();
>>>>>>> @db=('nr','htgs','wgs');
>>>>>>> foreach my $db (@db) {
>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'  
>>>>>>> =>'blastn',
>>>>>>> 						'-data' =>"$db",
>>>>>>>
>>>>>>>
>>> '-expect'=>"$E_value");
>>>
>>>>>>>>>>> my $blast_report = $factory->submit_blast($query);
>>>>>>>>>>>
>>>>>>>>> my @rids = $factory->each_rid();
>>>>>>>>>
>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>    print STDERR "$rid\n";
>>>>>>> }
>>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
>>>>>>> print STDERR "waiting...";
>>>>>>> sleep 60;
>>>>>>>
>>>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>>>
>>>>>>>    my $rc = $factory->retrieve_blast($rid);
>>>>>>>    while (!ref($rc) ) {
>>>>>>> 	if( $rc < 0 ) {
>>>>>>> # retrieve_blast returns -1 on error
>>>>>>> 	    $factory->remove_rid($rid);
>>>>>>> 	    print "Error!\n";
>>>>>>> 	    send_error($email,$function,$seqname,$queryname[$ST]);
>>>>>>> 	    die "Can't retrieve $rid";
>>>>>>> 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
>>>>>>>
>>>> finished'
>>>>
>>>>>>> 	    sleep 60;
>>>>>>> 	    $rc = $factory->retrieve_blast($rid);
>>>>>>> 	}
>>>>>>>    }
>>>>>>>    if (ref($rc)) {
>>>>>>> 	print STDERR "Done.\n";
>>>>>>> 	 while( my $result = $rc->next_result) {
>>>>>>> 	    while( my $hit = $result->next_hit()) {
>>>>>>> 	    	$hit_name=$hit->name;
>>>>>>> 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
>>>>>>> 		$name=$1;
>>>>>>> 		@left_plus_start=();
>>>>>>> 		@left_plus_end=();
>>>>>>> 		@left_minus_start=();
>>>>>>> 		@left_minus_end=();
>>>>>>> 		@right_plus_start=();
>>>>>>> 		@right_plus_end=();
>>>>>>> 		@right_minus_start=();
>>>>>>> 		@right_minus_end=();
>>>>>>>
>>>>>>>>> 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
>>>>>>>>>
>>>>>>> 		while( my $hsp = $hit->next_hsp()) {
>>>>>>> ......
>>>>>>>
>>>>>>>>> It was working quite well before around October laster  
>>>>>>>>> year, but
>>>>>>>>>
>>>>> it has
>>>>>
>>>>>>> stopped since then, When a submission is sent via a webpage,  
>>>>>>> the cgi
>>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there,
>>>>>>>
>>>>> finally
>>>>>
>>>>>>> the expected email is received but without real results  
>>>>>>> although it
>>>>>>>
>>>>> does
>>>>>
>>>>>>> contain something from other parts of the script. Apparently the
>>>>>>>
>>>>> search
>>>>>
>>>>>>> sub did not return anything (I know there is something should be
>>>>>>> returned.). Is it also possible the format of the NCBI output  
>>>>>>> for
>>>>>>>
>>>> each
>>>>
>>>>>>> result has changed?
>>>>>>> Thank you,
>>>>>>> Guojun
>>>>>>>
>>>>>>>>>>> Department of Plant Biology
>>>>>>>>>>>
>>>>>>> University of Georgia
>>>>>>>
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>
>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>
>>>>>>>>>>>> How do you know two versions are installed (i.e. how are
>>>>>>>>>>>>
>>>> you
>>>>
>>>>> checking
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> version)?  Do you see have two complete bioperl  
>>>>>>>> distributions (in
>>>>>>>>
>>>>> two
>>>>>
>>>>>>>> separate directories) or are you looking in modules?  Here's  
>>>>>>>> the
>>>>>>>>
>>>> way
>>>>
>>>>> to
>>>>>
>>>>>>>> check the version (from the FAQ):
>>>>>>>>
>>>>>>>>> perl -MBio::Root::Version -e 'print
>>>>>>>>>
>>>>> $Bio::Root::Version::VERSION,"\n"'
>>>>>
>>>>>>>>> If you have two full bioperl distributions on your computer,
>>>>>>>>>
>>>>> normally
>>>>>
>>>>>>> only
>>>>>>>
>>>>>>>> one will be in use unless you have explicitly set the  
>>>>>>>> environment
>>>>>>>>
>>>>>>> variable
>>>>>>>
>>>>>>>> PERL5LIB.  The PERL5LIB  directories will be searched first  
>>>>>>>> before
>>>>>>>>
>>>>> your
>>>>>
>>>>>>>> normal perl directory list (@INC) is searched.  You MAY get  
>>>>>>>> some
>>>>>>>>
>>>>> mixing
>>>>>
>>>>>>>> then, but only if perl can't find a particular module in the  
>>>>>>>> path
>>>>>>>>
>>>>>>> designated
>>>>>>>
>>>>>>>> in PERL5LIB; then it will progress through the directories  
>>>>>>>> listed
>>>>>>>>
>>>> in
>>>>
>>>>>>> @INC.
>>>>>>>
>>>>>>>> This may happen if a module is unique to a particular  
>>>>>>>> release, but
>>>>>>>>
>>>>>>> shouldn't
>>>>>>>
>>>>>>>> happen for the majority of modules, including RemoteBlast.  You
>>>>>>>>
>>>> can
>>>>
>>>>>>> check
>>>>>>>
>>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC  
>>>>>>>> will
>>>>>>>>
>>>>> differ
>>>>>
>>>>>>>> depending on your OS, perl build, etc.
>>>>>>>>
>>>>>>>>> Regardless, if you follow the directions for installing  
>>>>>>>>> bioperl
>>>>>>>>>
>>>>> for
>>>>>
>>>>>>> your
>>>>>>>
>>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make  
>>>>>>>> install',
>>>>>>>>
>>>>> unless
>>>>>
>>>>>>> you
>>>>>>>
>>>>>>>> explicitly change the installation directory when using 'perl
>>>>>>>>
>>>>>>> Makefile.PL'),
>>>>>>>
>>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will
>>>>>>>>
>>>>> install
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> Bioperl distribution you downloaded over the old version in  
>>>>>>>> @INC.
>>>>>>>>
>>>>> See
>>>>>
>>>>>>> this
>>>>>>>
>>>>>>>> page:
>>>>>>>>
>>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
>>>>>>>>> for more details.
>>>>>>>>> Christopher Fields
>>>>>>>>>
>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>> Dept. of Biochemistry
>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>>>
>>>>>>>>>>> Hi, Chris,
>>>>>>>>>>>
>>>>>>>>> I do have different versions of bioperl on my Linux machine
>>>>>>>>>
>>>> (1.4.
>>>>
>>>>> and
>>>>>
>>>>>>>>> 1.5.0), this may be the problem. Should I just install  
>>>>>>>>> bioperl-
>>>>>>>>>
>>>>> 1.5.1
>>>>>
>>>>>>> or I
>>>>>>>
>>>>>>>>> need to uninstall and remove the previous versions. I could  
>>>>>>>>> not
>>>>>>>>>
>>>>> find
>>>>>
>>>>>>> any
>>>>>>>
>>>>>>>>> hint on uninstalling bioperl on linux. Could you please  
>>>>>>>>> give me
>>>>>>>>>
>>>>> some
>>>>>
>>>>>>>>> suggestion?
>>>>>>>>> Thanks,
>>>>>>>>> Guojun
>>>>>>>>>
>>>>>>>>>>> Department of Plant Biology
>>>>>>>>>>>
>>>>>>>>> University of Georgia
>>>>>>>>>      _____
>>>>>>>>>
>>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>
>>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely
>>>>>>>>>>>>>>>
>>>>>>> updated from CVS
>>>>>>>
>>>>>>>>> which isn't the latest fix.
>>>>>>>>>
>>>>>>>>>>> Make sure that you check the following:
>>>>>>>>>>> 1) Always post to the mailing list:
>>>>>>>>>>>
>>>>>>>>> http://www.bioperl.org/wiki/ 
>>>>>>>>> HOWTO:Beginners#Getting_Assistance .
>>>>>>>>>
>>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live
>>>>>>>>>>>
>>>>> (CVS)
>>>>>
>>>>>>>>> installed first.  Perform a clean installation; do not upgrade
>>>>>>>>>
>>>>> only
>>>>>
>>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
>>>>>>>>>
>>>> can't
>>>>
>>>>>>>>> guarantee that mixing modules from old and new distributions
>>>>>>>>>
>>>> (1.4
>>>>
>>>>> and
>>>>>
>>>>>>>>> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl- 
>>>>>>>>> live
>>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be
>>>>>>>>>
>>>>> saved
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>> parsed; it will not parse the newest BLAST text output from  
>>>>>>>>> NCBI
>>>>>>>>>
>>>>>>> (v2.2.13)
>>>>>>>
>>>>>>>>> but it should still save it. I believe as long as  
>>>>>>>>> next_results()
>>>>>>>>>
>>>>> isn't
>>>>>
>>>>>>>>> called, it will work.
>>>>>>>>>
>>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST
>>>>>>>>>>>
>>>> 2.2.13
>>>>
>>>>>>> text output
>>>>>>>
>>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by
>>>>>>>>>
>>>> Roger
>>>>
>>>>> Hall
>>>>>
>>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be
>>>>>>>>>
>>>>> (Jason
>>>>>
>>>>>>> or
>>>>>>>
>>>>>>>>> whomever is in charge of Bio::SearchIO).  They can be found in
>>>>>>>>>
>>>>>>> Bugzilla:
>>>>>>>
>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>>
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>
>>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the
>>>>>>>>>>>
>>>>> option
>>>>>
>>>>>>> of
>>>>>>>
>>>>>>>>> saving XML output, so isn't necessary if you don't plan on  
>>>>>>>>> using
>>>>>>>>>
>>>>> this
>>>>>
>>>>>>>>> option.  And, remember, they haven't been committed yet to  
>>>>>>>>> CVS,
>>>>>>>>>
>>>>> which
>>>>>
>>>>>>>>> means that the final version will change to refle the new
>>>>>>>>>
>>>> version.
>>>>
>>>>>>>>>>>>> Christopher Fields
>>>>>>>>>>>>>
>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>> Dept. of Biochemistry
>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>>>>>    _____
>>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>>>>>>>>>>
>>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM
>>>>>>>>> To: Chris Fields
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>>>> Hi, Chris
>>>>>>>>>>>>>
>>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work
>>>>>>>>>>>
>>>>> for
>>>>>
>>>>>>> my cgi
>>>>>>>
>>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I  
>>>>>>>>> didn't
>>>>>>>>>
>>>>> even
>>>>>
>>>>>>> get
>>>>>>>
>>>>>>>>> any RID. Is there any suggestion?
>>>>>>>>>
>>>>>>>>>>>>>>> Guojun
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Guojun Yang
>>>>>>>>>>>>>
>>>>>>>>> Department of Plant Biology
>>>>>>>>> University of Georgia
>>>>>>>>> Tel: 706-542-1857
>>>>>>>>> Fax: 706-542-1805
>>>>>>>>> http://www.arches.uga.edu/~guojun
>>>>>>>>>    _____
>>>>>>>>>
>>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>>>
>>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
>>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>> I would say give the new code a try, but realize that it
>>>>>>>>>>>
>>>>> hasn't
>>>>>
>>>>>>> been
>>>>>>>
>>>>>>>>> checked
>>>>>>>>> in (like I said below). I will try going over the modified
>>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is
>>>>>>>>>
>>>>> anything I
>>>>>
>>>>>>>>> might
>>>>>>>>> have missed. The changed order in the header of BLAST text
>>>>>>>>>
>>>> output
>>>>
>>>>> has
>>>>>
>>>>>>> me a
>>>>>>>
>>>>>>>>> bit worried that it might not catch everything, but it at  
>>>>>>>>> least
>>>>>>>>>
>>>>>>> doesn't
>>>>>>>
>>>>>>>>> hang
>>>>>>>>> in the while() loop I described in the bug report below (bug
>>>>>>>>>
>>>>> #1934)
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>> seems to process everything fine.
>>>>>>>>>
>>>>>>>>>>> If you want more stability in the code, you might consider
>>>>>>>>>>>
>>>>>>> changing over
>>>>>>>
>>>>>>>>> to
>>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are
>>>>>>>>>
>>>>> some
>>>>>
>>>>>>>>> changes
>>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
>>>>>>>>>
>>>>> saving
>>>>>
>>>>>>> XML
>>>>>>>
>>>>>>>>> output, but I believe it parses everything regardless. If you
>>>>>>>>>
>>>> look
>>>>
>>>>>>> back
>>>>>>>
>>>>>>>>> the
>>>>>>>>> last month or so there has been a bit of discussion here about
>>>>>>>>>
>>>> it.
>>>>
>>>>>>> Jason
>>>>>>>
>>>>>>>>> describes a bit on how to set up RemoteBlast for XML:
>>>>>>>>>
>>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using-
>>>>>>>>>>>
>>>>>>> remoteblast/
>>>>>>>
>>>>>>>>>>> Christopher Fields
>>>>>>>>>>>
>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>> Dept. of Biochemistry
>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM
>>>>>>>>>> To: bioperl-l at bioperl.org
>>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>>>
>>>>> version
>>>>>
>>>>>>> 1.28
>>>>>>>
>>>>>>>>>> Hi, Everybody,
>>>>>>>>>> I see this post and am wondering if this is the reason for  
>>>>>>>>>> the
>>>>>>>>>> malfunctionning of my webserver. We set up a webserver named
>>>>>>>>>>
>>>>> MAK,
>>>>>
>>>>>>> for
>>>>>>>
>>>>>>>>> MITE
>>>>>>>>>
>>>>>>>>>> sequence analysis. It was working very well until around
>>>>>>>>>>
>>>>> November
>>>>>
>>>>>>> 2005,
>>>>>>>
>>>>>>>>>> when it stopped returning any result (the site is fine and
>>>>>>>>>>
>>>> seems
>>>>
>>>>> to
>>>>>
>>>>>>> be
>>>>>>>
>>>>>>>>>> doing sth after submission). In the CGI script, I used
>>>>>>>>>>
>>>>> remoteblast
>>>>>
>>>>>>> (that
>>>>>>>
>>>>>>>>>> work was done in 2003) to do searches. I currently do not  
>>>>>>>>>> have
>>>>>>>>>>
>>>>>>> access to
>>>>>>>
>>>>>>>>>> the server because I moved. Quite several people sent emails
>>>>>>>>>>
>>>> to
>>>>
>>>>> us
>>>>>
>>>>>>> about
>>>>>>>
>>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the
>>>>>>>>>>
>>>>> problem?
>>>>>
>>>>>>>>> Should
>>>>>>>>>
>>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new
>>>>>>>>>>
>>>>> version?
>>>>>
>>>>>>>>>> Thanks a lot,
>>>>>>>>>> Guojun
>>>>>>>>>>
>>>>>>>>>> Department of Plant Biology
>>>>>>>>>> University of Georgia
>>>>>>>>>> Tel: 706-542-1857
>>>>>>>>>> Fax: 706-542-1805
>>>>>>>>>> http://www.arches.uga.edu/~guojun
>>>>>>>>>> _____
>>>>>>>>>>
>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
>>>>>>>>>>
>>>>> Jian'
>>>>>
>>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
>>>>>>>>>>
>>>> [mailto:bioperl-
>>>>
>>>>>>>>>> l at bioperl.org]
>>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500
>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>
>>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl- 
>>>>>>>>>> live
>>>>>>>>>>
>>>>> CVS.
>>>>>
>>>>>>> It
>>>>>>>
>>>>>>>>>> will
>>>>>>>>>> work for saving text output. However, it will not parse
>>>>>>>>>>
>>>> anything
>>>>
>>>>>>> using
>>>>>>>
>>>>>>>>>> next_result (it will likely hang) and will not save XML
>>>>>>>>>>
>>>> format.
>>>>
>>>>> See
>>>>>
>>>>>>>>> these
>>>>>>>>>
>>>>>>>>>> bugs:
>>>>>>>>>>
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>>
>>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast
>>>>>>>>>>
>>>> and
>>>>
>>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been  
>>>>>>>>>> checked in
>>>>>>>>>>
>>>>> yet
>>>>>
>>>>>>> so
>>>>>>>
>>>>>>>>> are
>>>>>>>>>
>>>>>>>>>> still not included in bioperl-live; they may be further
>>>>>>>>>>
>>>> modified
>>>>
>>>>>>> before
>>>>>>>
>>>>>>>>>> committing to CVS. If you're not worried about XML, you could
>>>>>>>>>>
>>>>> just
>>>>>
>>>>>>> try
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> first fix, which is a change to SearchIO::blast.
>>>>>>>>>>
>>>>>>>>>> Nagesh, I remember you posting to the list a month ago  
>>>>>>>>>> using a
>>>>>>>>>>
>>>>>>> script
>>>>>>>
>>>>>>>>>> which
>>>>>>>>>> had problems; the script you used saves the output but  
>>>>>>>>>> doesn't
>>>>>>>>>>
>>>>>>> actually
>>>>>>>
>>>>>>>>>> parse it (i.e. you don't use next_result() to go through the
>>>>>>>>>>
>>>>> data).
>>>>>
>>>>>>> Is
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have
>>>>>>>>>>
>>>> you
>>>>
>>>>>>> tried
>>>>>>>
>>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "-
>>>>>>>>>>
>>>>> readmethod
>>>>>
>>>>>>> =>
>>>>>>>
>>>>>>>>>> blast"
>>>>>>>>>> using your version of RemoteBlast and method next_result()?
>>>>>>>>>>
>>>> Like
>>>>
>>>>>>> below
>>>>>>>
>>>>>>>>>> (from
>>>>>>>>>> perldoc):
>>>>>>>>>>
>>>>>>>>>> while ( my @rids = $factory->each_rid ) {
>>>>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>>>> my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>> if( !ref($rc) ) {
>>>>>>>>>> if( $rc < 0 ) {
>>>>>>>>>> $factory->remove_rid($rid);
>>>>>>>>>> }
>>>>>>>>>> print STDERR "." if ( $v > 0 );
>>>>>>>>>> sleep 5;
>>>>>>>>>> } else { # parsing
>>>>>>>>>> starts here
>>>>>>>>>> my $result = $rc->next_result(); # it should hang
>>>>>>>>>> here
>>>>>>>>>> #save the output
>>>>>>>>>> my $filename = $result->query_name()."\.out";
>>>>>>>>>> $factory->save_output($filename);
>>>>>>>>>> $factory->remove_rid($rid);
>>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>>>>> while ( my $hit = $result->next_hit ) {
>>>>>>>>>> next unless ( $v > 0);
>>>>>>>>>> print "\thit name is ", $hit->name, "\n";
>>>>>>>>>> while( my $hsp = $hit->next_hsp ) {
>>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n";
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My script hanged if I used next_result() in any way prior to
>>>>>>>>>>
>>>> the
>>>>
>>>>>>> fixes.
>>>>>>>
>>>>>>>>> I
>>>>>>>>>
>>>>>>>>>> want to see how many others are having the same issues with
>>>>>>>>>>
>>>>> parsing
>>>>>
>>>>>>>>> using
>>>>>>>>>
>>>>>>>>>> the CVS version of bioperl-live.
>>>>>>>>>>
>>>>>>>>>> Christopher Fields
>>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>>> Dept. of Biochemistry
>>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
>>>>>>>>>>>
>>>> l-
>>>>
>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
>>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM
>>>>>>>>>>> To: Huang Jian; bioperl-l
>>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>>
>>>>>>>>>>> Hi Huang,
>>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm
>>>>>>>>>>>
>>>>> works
>>>>>
>>>>>>> on
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>> logic of checking the temporary file size to determine
>>>>>>>>>>>
>>>> whether
>>>>
>>>>> the
>>>>>
>>>>>>>>> Blast
>>>>>>>>>
>>>>>>>>>>> results are ready. This condition is not getting satisfied
>>>>>>>>>>>
>>>> may
>>>>
>>>>> be
>>>>>
>>>>>>> due
>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>>> some changes brought about by NCBI. I had this problem
>>>>>>>>>>>
>>>>> recently
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>>>> figured out that the solution was to use the latest version
>>>>>>>>>>>
>>>>> which
>>>>>
>>>>>>> has
>>>>>>>
>>>>>>>>>>> this problem fixed (does not use file size logic any more)
>>>>>>>>>>>
>>>>> which
>>>>>
>>>>>>> is
>>>>>>>
>>>>>>>>> not
>>>>>>>>>
>>>>>>>>>>> yet included in the BioPerl package.
>>>>>>>>>>> Cheers
>>>>>>>>>>> Nagesh
>>>>>>>>>>>
>>>>>>>>>>> Huang Jian wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Dear Nagesh,
>>>>>>>>>>>>
>>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
>>>>>>>>>>>>
>>>>> you
>>>>>
>>>>>>> send
>>>>>>>
>>>>>>>>>>>> me. Now it works perfectly!!!
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you!!
>>>>>>>>>>>>
>>>>>>>>>>>> Huang
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka"
>>>>>>>>>>>> 
>>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l"
>>>>>>>>>>>> 
>>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM
>>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the
>>>>>>>>>>>>
>>>> net,
>>>>
>>>>> so
>>>>>
>>>>>>> still
>>>>>>>
>>>>>>>>>>>> via email
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Huang,
>>>>>>>>>>>>> I see that you are submitting a sequence for a remote
>>>>>>>>>>>>>
>>>> blast
>>>>
>>>>>>> search.
>>>>>>>
>>>>>>>>>> Can
>>>>>>>>>>
>>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28
>>>>>>>>>>>>>
>>>>>>> (2005/12/09).
>>>>>>>
>>>>>>>>> If
>>>>>>>>>
>>>>>>>>>>>>> not I have attached it with this email, try to replace it
>>>>>>>>>>>>>
>>>>> with
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>>>> old
>>>>>>>>>>
>>>>>>>>>>>>> one which has a bug.
>>>>>>>>>>>>> Let me know if it works.
>>>>>>>>>>>>> Nagesh
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Thu Feb 16 07:52:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 06:52:31 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
Message-ID: 

I think a method was recently implemented in Bio::DB::GenBank to  
retrieve a segment of DNA given start and end coordinates in GenBank  
format; that should contain the features you need.  I requested it  
~Nov-Dec in the mailing list but didn't get a chance to test it.   
Would that help?

On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:

> Harry,
>
> It's not clear to me that NCBI's eutils offers this capability  
> directly. You
> can probably download Entrez Gene entries and parse them for  
> coordinates but
> I know of no way to remotely retrieve genomic sequences like this  
> from NCBI
> (ENSEMBL API perhaps?). What I had in mind uses the local approach  
> that some
> of us favor and to prove to myself that this is simple to do I wrote a
> script that I just added to examples/tools, it's called  
> extract_genes.pl and
> it's based on Bio::DB::Fasta. Download the sequence files for a given
> species to some dir, download Entrez Gene's gene2accession file,  
> and run. It
> creates and stores a hash for lookups, it won't read gene2accession  
> each
> time it runs.
>
> Brian O.
>
>
> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>
>> Hi Brian,
>>
>> Thanks very much for the pointers and the speed of your reply and  
>> apologies
>> for the speed of mine.
>>
>> This looks good, but what I was looking for was a bioP approach  
>> for hooking to
>> an API at NCBI or EBI so I could get this info and seqs from  
>> them.  In this
>> case, speed of retrieval is not critical and I'd rather not  
>> download the
>> entirety of the sequences to a local disk to hack at them.
>>
>> I've determined a screen-scraping approach to get them and could  
>> script that,
>> but I thought that bioP had a method for using NCBI's external  
>> API's, tho it
>> may be that my memory is faulty or the approach is no longer  
>> supported due to
>> overload.
>>
>> Does NCBI make such APIs available anymore?  I searched a bit for  
>> docs on them
>> but couldn't find anything (unless it's buried in the NCBI tookit,  
>> which I
>> haven't started to excavate).
>>
>> Failing that, would SEALS provide such a service? Any PerlPinipeds  
>> listening?
>>
>> Harry
>>
>>
>>
>>
>>
>>
>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>> Harry,
>>>
>>> Hope you're doing well. The approach could be based on  
>>> Bio::DB::Fasta. So,
>>> from its documentation:
>>>
>>>   use Bio::DB::Fasta;
>>>
>>>   # create database from directory of fasta files
>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>
>>>   # simple access (for those without Bioperl)
>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>   my @ids     = $db->ids;
>>>   my $length   = $db->length('CHROMOSOME_I');
>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>   my $header   = $db->header('CHROMOSOME_I');
>>>
>>>   # Bioperl-style access
>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>
>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>   my $seq     = $obj->seq;
>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>
>>> Do you already have the offsets?
>>>
>>> Brian O.
>>>
>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>> Hi All,
>>>>
>>>> After perusing the tutorial and other docs for a an evening, I  
>>>> still
>>>> can't find the answer to this.  Forgive me if I've missed something
>>>> obvious.
>>>>
>>>> This should not be a novel request, but I've not found it  
>>>> answered.  If
>>>> bioperl isn't the best way to do this, I'd be grateful to a  
>>>> pointer to a
>>>> better way, especially if it includes an illuminating bit of code.
>>>>
>>>> The problem is to retrieve genomic sequences plus & minus some  
>>>> offset
>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>> common followup chore for some extra analysis from a gene  
>>>> expression
>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed  
>>>> the
>>>> sequence type to specify...?
>>>>
>>>>
>>>> TIA!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From anst at kvl.dk  Thu Feb 16 04:24:51 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Thu, 16 Feb 2006 10:24:51 +0100
Subject: [Bioperl-l] searchIO bug?
Message-ID: <43F452F30200009B00000EC9@gwia.kvl.dk>

Hi! 
 
 
I am blasting a protein seq against an identical protein. 
I am trying to parse the protein header by using the query_description
method in the SearchIO module. 
After using the query_description method I use split / /      in order
to easily access the different header components. 
Here I discover that the query_description method is somehow introducing
a space between number 5 comma and the following chromosome position
number 
in the exon chromosome position list!? 
This truncates the list of exon chromosome positions from 7 to 4, later
yielding a wrong number of the introns counted. 
 
Is this a bug? 
 
Attached is: 
 
testblast1.pl: the blastprogram to run. 
 
Q0045 the seq that is used as both query and database seq. 
(Q0045 has to be formated in order to be used as a database: formatdb -i
Q0045 -p T -o F) 
 
 
Regards Anders. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastp5.pl
Type: application/octet-stream
Size: 50384 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 

From anst at kvl.dk  Thu Feb 16 05:20:06 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Thu, 16 Feb 2006 11:20:06 +0100
Subject: [Bioperl-l] another searchIO bug?
Message-ID: <43F45FE60200009B00000ED6@gwia.kvl.dk>

Hi! 
 
I am blasting a protein seq (query) against an identical seq with a
deletion of Aa nr 61 (subject). 
Then I print out the type of nomatch Aa and its position. 
The nomatch for the query seq is Aa G at position 61, which is correct. 
The nomatch for the subject seq is V at position 60, which is definitely
not correct!? 
 
Is this a bug? 
 
testblast2.pl is the program to run 
 
Q0045 is the query seq. 
 
Q0045del61 is the subject seq (it has to be formated: formatdb -i
Q0045del61 -p T -o F). 
 
Regards Anders. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testblast2.pl
Type: application/octet-stream
Size: 6109 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045del61
Type: application/octet-stream
Size: 872 bytes
Desc: not available
URL: 

From mcoyne at channing.harvard.edu  Wed Feb 15 16:20:17 2006
From: mcoyne at channing.harvard.edu (Michael Coyne)
Date: Wed, 15 Feb 2006 16:20:17 -0500
Subject: [Bioperl-l] Primer maps?
Message-ID: <6.2.0.14.0.20060215155422.01d44a98@localhost>

An HTML attachment was scrubbed...
URL: 

From Pieter.Monsieurs at esat.kuleuven.be  Thu Feb 16 04:46:09 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Thu, 16 Feb 2006 10:46:09 +0100
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
 version 1.28
In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
References: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
Message-ID: <43F449E1.80605@esat.kuleuven.be>

Hi,

I have the same problem with the blast.pm-file.
The people of NCBI added some extra info when giving the Blast-output. 
(see e.g. "Features flanking this part..." or "Features in this part 
..."), example added.
The blast.pm module starts looking for the hsp-alignement-information, 
but it dies when it hits this Feature-information.

Pieter


>gi|77552765|gb|DP000011.1|  Oryza sativa (japonica cultivar-group) chromosome 12, complete 

sequence
Length=27492551

 Features flanking this part of subject sequence:
   
3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class 
   
2655 bp at 3' side: hypothetical protein 

 Score = 36.2 bits (18),  Expect = 0.22
 Identities = 18/18 (100%), Gaps = 0/18 (0%)
 Strand=Plus/Minus

Query  4         GTACTACTCTACTCTACT  21
                 ||||||||||||||||||

Sbjct  19257436  GTACTACTCTACTCTACT  19257419


 Features flanking this part of subject sequence:
   
2991 bp at 5' side: hypothetical protein 
   1131 bp at 3' side: hypothetical protein
 

 Score = 36.2 bits (18),  Expect = 0.22
 Identities = 18/18 (100%), Gaps = 0/18 (0%)
 Strand=Plus/Minus

Query  2         ATGTACTACTCTACTCTA  19
                 ||||||||||||||||||
Sbjct  27006915  ATGTACTACTCTACTCTA  27006898



 Features in this part of subject sequence:
   DHHC zinc finger domain, putative
 

 Score = 34.2 bits (17),  Expect = 0.87
 Identities = 17/17 (100%), Gaps = 0/17 (0%)
 Strand=Plus/Plus

Query  5         TACTACTCTACTCTACT  21
                 |||||||||||||||||
Sbjct  17616437  TACTACTCTACTCTACT  17616453



 Features flanking this part of subject sequence:
   102 bp at 5' side: bZIP transcription factor, putative
 
   3740 bp at 3' side: yeast dcp1, putative 

 Score = 32.2 bits (16),  Expect = 
3.4
 Identities = 16/16 (100%), Gaps = 0/16 (0%)
 Strand=Plus/Plus

Query  7        CTACTCTACTCTACTC  22
                ||||||||||||||||
Sbjct  2775880  CTACTCTACTCTACTC  2775895


 Features flanking this part of subject sequence:

   21 bp at 5' side: peptide transporter T17F3.11, putative 
   
10230 bp at 3' side: transposon protein, putative, unclassified 

 Score = 32.2 bits (16),  Expect = 3.4
 Identities = 16/16 (100%), Gaps = 0/16 (0%)
 Strand=Plus/Minus

Query  7         CTACTCTACTCTACTC  22

                 ||||||||||||||||
Sbjct  27323153  CTACTCTACTCTACTC  27323138




Guojun Yang wrote:

>Hi, Chris,
>Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: 
>"
>waiting........
>------------- EXCEPTION  -------------
>MSG: no data for midline  Features flanking this part of subject sequence:
>STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc                             hIO/blast.pm:1172
>STACK toplevel remoteblast_test:40
>"
>The query sequence is:
>CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
>GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
>AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
>AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
>
>The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'):
>#!/usr/bin/perl
>
>use Bio::SeqIO;
>use Bio::Seq;
>use Bio::Tools::Run::RemoteBlast;
>use Bio::SearchIO;
>use strict;
>my $prog='blastn';
>my $db='nr';
>my $e_val=1e-10;
>my @params=( -prog=>$prog,
>	-data=>$db,
>	-expect=>$e_val,
>	-readmethod=>'SearchIO');
>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>
>my $v = 1;
>
>my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
>
>while (my $input = $str->next_seq()){
>  #Blast a sequence against a database:
>  #Alternatively, you could  pass in a file with many
>  #sequences rather than loop through sequence one at a time
>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>  #and swap the two lines below for an example of that.
>  my $r = $factory->submit_blast($input);
>  #my $r = $factory->submit_blast('amino.fa');
>  print STDERR "waiting..." if( $v > 0 );
>  while ( my @rids = $factory->each_rid ) {
>    foreach my $rid ( @rids ) {
>      my $rc = $factory->retrieve_blast($rid);
>      if( !ref($rc) ) {
>        if( $rc < 0 ) {
>          $factory->remove_rid($rid);
>        }
>        print STDERR "." if ( $v > 0 );
>        sleep 5;
>      } else {
>        my $result = $rc->next_result();
>        #save the output
>        my $filename = $result->query_name()."\.out";
>        $factory->save_output($filename);
>        $factory->remove_rid($rid);
>        print "\nQuery Name: ", $result->query_name(), "\n";
>        while ( my $hit = $result->next_hit ) {
>          next unless ( $v > 0);
>          print "\thit name is ", $hit->name, "\n";
>          while( my $hsp = $hit->next_hsp ) {
>            print "\t\tscore is ", $hsp->score, "\n";
>          }
>        }
>      }
>    }
>  }
>}
>
>
>Do you think there might still be something in the NCBI output format?
>
>Thank you,
>Guojun
>
>
>
>
>Guojun Yang
>Department of Plant Biology
>University of Georgia
>Tel: 706-542-1857
>Fax: 706-542-1805
>http://www.arches.uga.edu/~guojun
>
>
>
>----- Original Message -----
>From: Chris Fields [mailto:cjfields at uiuc.edu]
>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
>
>
>  
>
>>Sorry, forgot to add that I didn't see the regex issue that you mentioned.
>>It could be a perl-related issue.  Try the fixes I mentioned and see what
>>happens.
>>    
>>
>>>Christopher Fields
>>>      
>>>
>>Postdoctoral Researcher - Switzer Lab
>>Dept. of Biochemistry
>>University of Illinois Urbana-Champaign 
>>    
>>
>>>>>-----Original Message-----
>>>>>          
>>>>>
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Tuesday, February 14, 2006 12:36 PM
>>>To: 'gyang at plantbio.uga.edu'
>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>      
>>>
>>>>>It's a good habit to always add single quotes around words.  The perl
>>>>>          
>>>>>
>>>interpreter may think a single bare word is a subroutine or perlfunc
>>>called with no args so will try to find a subroutine named blastp().  My
>>>debugger actually gives the error that the bare word blastp may conflict
>>>with a future reserved word.  Like you said, 'use strict' will point that
>>>out.
>>>      
>>>
>>>>>As for the regex, it should match all the blast programs at NCBI (blastp,
>>>>>          
>>>>>
>>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
>>>else passes through.
>>>      
>>>
>>>>>So, if you are using the script below, there are several errors.  The bare
>>>>>          
>>>>>
>>>words for $prog and $db need quotes, and the flags for you @params array
>>>don't have a dash before them.  I get this after adding quotes but before
>>>adding the dashes to @params:
>>>      
>>>
>>>>>C:\Perl\Scripts>test_blast.pl
>>>>>------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>          
>>>>>
>>>MSG:
>>>STACK: Error::throw
>>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
>>>live/Bio/Root/Root.pm:328
>>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
>>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
>>>live/Bio/Tools/Run/RemoteBlast.pm:256
>>>STACK: C:\Perl\Scripts\test_blast.pl:15
>>>-----------------------------------------------------------
>>>      
>>>
>>>>>The last line indicates a problem with this line:
>>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>Changing the @params to this:
>>>>>my @params=( -prog=>$prog,
>>>>>          
>>>>>
>>>	-data=>$db,
>>>	-expect=>$e_val,
>>>	-readmethod=>'SearchIO');
>>>      
>>>
>>>>>fixes it, and I get output as expected.
>>>>>Christopher Fields
>>>>>          
>>>>>
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>      
>>>
>>>>>>>>-----Original Message-----
>>>>>>>>                
>>>>>>>>
>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>Sent: Tuesday, February 14, 2006 11:48 AM
>>>>To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>
>>>>Hi, Chris,
>>>>When I tried with the perldoc script, It did not work either. First it
>>>>says $prog can not be bare word if I "use strict". I added quotes on the
>>>>words, then it says the value for $prog does not match expression
>>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
>>>>        
>>>>
>>>script
>>>      
>>>
>>>>is shown below. Why is the expression "t?blast[pnx]"?
>>>>
>>>>#!/usr/bin/perl
>>>>
>>>>use Bio::SeqIO;
>>>>use Bio::Seq;
>>>>use Bio::Tools::Run::RemoteBlast;
>>>>use Bio::SearchIO;
>>>>
>>>>
>>>>my $prog=blastp;
>>>>my $db=swissprot;
>>>>my $e_val=1e-10;
>>>>my @params=( prog=>$prog,
>>>>	data=>$db,
>>>>	expect=>$e_val,
>>>>	readmethod=>'SearchIO');
>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>
>>>>my $v = 1;
>>>>
>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
>>>>
>>>>while (my $input = $str->next_seq()){
>>>>  #Blast a sequence against a database:
>>>>  #Alternatively, you could  pass in a file with many
>>>>  #sequences rather than loop through sequence one at a time
>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>  #and swap the two lines below for an example of that.
>>>>  my $r = $factory->submit_blast($input);
>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>  while ( my @rids = $factory->each_rid ) {
>>>>    foreach my $rid ( @rids ) {
>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>      if( !ref($rc) ) {
>>>>        if( $rc < 0 ) {
>>>>          $factory->remove_rid($rid);
>>>>        }
>>>>        print STDERR "." if ( $v > 0 );
>>>>        sleep 5;
>>>>      } else {
>>>>        my $result = $rc->next_result();
>>>>        #save the output
>>>>        my $filename = $result->query_name()."\.out";
>>>>        $factory->save_output($filename);
>>>>        $factory->remove_rid($rid);
>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>        while ( my $hit = $result->next_hit ) {
>>>>          next unless ( $v > 0);
>>>>          print "\thit name is ", $hit->name, "\n";
>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>          }
>>>>        }
>>>>      }
>>>>    }
>>>>  }
>>>>}
>>>>
>>>>Thank you for your help!
>>>>
>>>>
>>>>Guojun
>>>>Department of Plant Biology
>>>>University of Georgia
>>>>
>>>>----- Original Message -----
>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>To: gyang at plantbio.uga.edu
>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>
>>>>
>>>>        
>>>>
>>>>>Try two things:
>>>>>          
>>>>>
>>>>>>1)  Use a much simpler script, like the one in 'perldoc
>>>>>>            
>>>>>>
>>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
>>>>>          
>>>>>
>>>>wrong
>>>>        
>>>>
>>>>>with the logic in your subroutine:
>>>>>          
>>>>>
>>>>>>my $v = 1;
>>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
>>>>>>while (my $input = $str->next_seq()){
>>>>>>            
>>>>>>
>>>>>  #Blast a sequence against a database:
>>>>>  #Alternatively, you could  pass in a file with many
>>>>>  #sequences rather than loop through sequence one at a time
>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>  #and swap the two lines below for an example of that.
>>>>>  my $r = $factory->submit_blast($input);
>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>    foreach my $rid ( @rids ) {
>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>      if( !ref($rc) ) {
>>>>>        if( $rc < 0 ) {
>>>>>          $factory->remove_rid($rid);
>>>>>        }
>>>>>        print STDERR "." if ( $v > 0 );
>>>>>        sleep 5;
>>>>>      } else {
>>>>>        my $result = $rc->next_result();
>>>>>        #save the output
>>>>>        my $filename = $result->query_name()."\.out";
>>>>>        $factory->save_output($filename);
>>>>>        $factory->remove_rid($rid);
>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>          next unless ( $v > 0);
>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>          }
>>>>>        }
>>>>>      }
>>>>>    }
>>>>>  }
>>>>>}
>>>>>          
>>>>>
>>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
>>>>>>            
>>>>>>
>>>really
>>>      
>>>
>>>>>shouldn't make that much of a difference, but I noticed that the CVS
>>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
>>>>>released; the Bugzilla version is based off CVS.
>>>>>          
>>>>>
>>>>>>Christopher Fields
>>>>>>            
>>>>>>
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>          
>>>>>
>>>>>>>-----Original Message-----
>>>>>>>              
>>>>>>>
>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>Sent: Monday, February 13, 2006 3:00 PM
>>>>>>To: bioperl-l at lists.open-bio.org
>>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>            
>>>>>>
>>>>>>>>Thanks, Chris,
>>>>>>>>                
>>>>>>>>
>>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
>>>>>>            
>>>>>>
>>>one
>>>      
>>>
>>>>from
>>>>        
>>>>
>>>>>>your bug report. The running version is 1.5 when I use the command
>>>>>>            
>>>>>>
>>>you
>>>      
>>>
>>>>>>sent me. But when I tried the script, it doesn't change much. My
>>>>>>remoteblast code (portion) is here:
>>>>>>            
>>>>>>
>>>>>>>>sub search {
>>>>>>>>                
>>>>>>>>
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
>>>>>>local
>>>>>>
>>>>>>            
>>>>>>
>>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
>>>      
>>>
>>>>>>'no';
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
>>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
>>>>>>			      -id=>"query",
>>>>>>			      -desc=>"new seq");
>>>>>>my $len=$query->length();
>>>>>>@db=('nr','htgs','wgs');
>>>>>>foreach my $db (@db) {
>>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
>>>>>>						'-data' =>"$db",
>>>>>>
>>>>>>            
>>>>>>
>>'-expect'=>"$E_value");
>>    
>>
>>>>>>>>>>my $blast_report = $factory->submit_blast($query);
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>my @rids = $factory->each_rid();
>>>>>>>>                
>>>>>>>>
>>>>>>foreach my $rid ( @rids ) {
>>>>>>    print STDERR "$rid\n";
>>>>>>}
>>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
>>>>>>print STDERR "waiting...";
>>>>>>sleep 60;
>>>>>>            
>>>>>>
>>>>>>>>foreach my $rid ( @rids ) {
>>>>>>>>                
>>>>>>>>
>>>>>>    my $rc = $factory->retrieve_blast($rid);
>>>>>>    while (!ref($rc) ) {
>>>>>>	if( $rc < 0 ) {
>>>>>># retrieve_blast returns -1 on error
>>>>>>	    $factory->remove_rid($rid);
>>>>>>	    print "Error!\n";
>>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
>>>>>>	    die "Can't retrieve $rid";
>>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
>>>>>>            
>>>>>>
>>>finished'
>>>      
>>>
>>>>>>	    sleep 60;
>>>>>>	    $rc = $factory->retrieve_blast($rid);
>>>>>>	}
>>>>>>    }
>>>>>>    if (ref($rc)) {
>>>>>>	print STDERR "Done.\n";
>>>>>>	 while( my $result = $rc->next_result) {
>>>>>>	    while( my $hit = $result->next_hit()) {
>>>>>>	    	$hit_name=$hit->name;
>>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
>>>>>>		$name=$1;
>>>>>>		@left_plus_start=();
>>>>>>		@left_plus_end=();
>>>>>>		@left_minus_start=();
>>>>>>		@left_minus_end=();
>>>>>>		@right_plus_start=();
>>>>>>		@right_plus_end=();
>>>>>>		@right_minus_start=();
>>>>>>		@right_minus_end=();
>>>>>>            
>>>>>>
>>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
>>>>>>>>                
>>>>>>>>
>>>>>>		while( my $hsp = $hit->next_hsp()) {
>>>>>>......
>>>>>>            
>>>>>>
>>>>>>>>It was working quite well before around October laster year, but
>>>>>>>>                
>>>>>>>>
>>>>it has
>>>>        
>>>>
>>>>>>stopped since then, When a submission is sent via a webpage, the cgi
>>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
>>>>>>            
>>>>>>
>>>>finally
>>>>        
>>>>
>>>>>>the expected email is received but without real results although it
>>>>>>            
>>>>>>
>>>>does
>>>>        
>>>>
>>>>>>contain something from other parts of the script. Apparently the
>>>>>>            
>>>>>>
>>>>search
>>>>        
>>>>
>>>>>>sub did not return anything (I know there is something should be
>>>>>>returned.). Is it also possible the format of the NCBI output for
>>>>>>            
>>>>>>
>>>each
>>>      
>>>
>>>>>>result has changed?
>>>>>>Thank you,
>>>>>>Guojun
>>>>>>            
>>>>>>
>>>>>>>>>>Department of Plant Biology
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>University of Georgia
>>>>>>            
>>>>>>
>>>>>>>>>>>>----- Original Message -----
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>            
>>>>>>
>>>>>>>>>>>How do you know two versions are installed (i.e. how are
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>you
>>>      
>>>
>>>>checking
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>version)?  Do you see have two complete bioperl distributions (in
>>>>>>>              
>>>>>>>
>>>>two
>>>>        
>>>>
>>>>>>>separate directories) or are you looking in modules?  Here's the
>>>>>>>              
>>>>>>>
>>>way
>>>      
>>>
>>>>to
>>>>        
>>>>
>>>>>>>check the version (from the FAQ):
>>>>>>>              
>>>>>>>
>>>>>>>>perl -MBio::Root::Version -e 'print
>>>>>>>>                
>>>>>>>>
>>>>$Bio::Root::Version::VERSION,"\n"'
>>>>        
>>>>
>>>>>>>>If you have two full bioperl distributions on your computer,
>>>>>>>>                
>>>>>>>>
>>>>normally
>>>>        
>>>>
>>>>>>only
>>>>>>            
>>>>>>
>>>>>>>one will be in use unless you have explicitly set the environment
>>>>>>>              
>>>>>>>
>>>>>>variable
>>>>>>            
>>>>>>
>>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first before
>>>>>>>              
>>>>>>>
>>>>your
>>>>        
>>>>
>>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
>>>>>>>              
>>>>>>>
>>>>mixing
>>>>        
>>>>
>>>>>>>then, but only if perl can't find a particular module in the path
>>>>>>>              
>>>>>>>
>>>>>>designated
>>>>>>            
>>>>>>
>>>>>>>in PERL5LIB; then it will progress through the directories listed
>>>>>>>              
>>>>>>>
>>>in
>>>      
>>>
>>>>>>@INC.
>>>>>>            
>>>>>>
>>>>>>>This may happen if a module is unique to a particular release, but
>>>>>>>              
>>>>>>>
>>>>>>shouldn't
>>>>>>            
>>>>>>
>>>>>>>happen for the majority of modules, including RemoteBlast.  You
>>>>>>>              
>>>>>>>
>>>can
>>>      
>>>
>>>>>>check
>>>>>>            
>>>>>>
>>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
>>>>>>>              
>>>>>>>
>>>>differ
>>>>        
>>>>
>>>>>>>depending on your OS, perl build, etc.
>>>>>>>              
>>>>>>>
>>>>>>>>Regardless, if you follow the directions for installing bioperl
>>>>>>>>                
>>>>>>>>
>>>>for
>>>>        
>>>>
>>>>>>your
>>>>>>            
>>>>>>
>>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
>>>>>>>              
>>>>>>>
>>>>unless
>>>>        
>>>>
>>>>>>you
>>>>>>            
>>>>>>
>>>>>>>explicitly change the installation directory when using 'perl
>>>>>>>              
>>>>>>>
>>>>>>Makefile.PL'),
>>>>>>            
>>>>>>
>>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
>>>>>>>              
>>>>>>>
>>>>install
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>Bioperl distribution you downloaded over the old version in @INC.
>>>>>>>              
>>>>>>>
>>>>See
>>>>        
>>>>
>>>>>>this
>>>>>>            
>>>>>>
>>>>>>>page:
>>>>>>>              
>>>>>>>
>>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
>>>>>>>>for more details.
>>>>>>>>Christopher Fields
>>>>>>>>                
>>>>>>>>
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>              
>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
>>>>>>>>To: bioperl-l at lists.open-bio.org
>>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Hi, Chris,
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>I do have different versions of bioperl on my Linux machine
>>>>>>>>                
>>>>>>>>
>>>(1.4.
>>>      
>>>
>>>>and
>>>>        
>>>>
>>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
>>>>>>>>                
>>>>>>>>
>>>>1.5.1
>>>>        
>>>>
>>>>>>or I
>>>>>>            
>>>>>>
>>>>>>>>need to uninstall and remove the previous versions. I could not
>>>>>>>>                
>>>>>>>>
>>>>find
>>>>        
>>>>
>>>>>>any
>>>>>>            
>>>>>>
>>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
>>>>>>>>                
>>>>>>>>
>>>>some
>>>>        
>>>>
>>>>>>>>suggestion?
>>>>>>>>Thanks,
>>>>>>>>Guojun
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Department of Plant Biology
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>University of Georgia
>>>>>>>>      _____
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>updated from CVS
>>>>>>            
>>>>>>
>>>>>>>>which isn't the latest fix.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Make sure that you check the following:
>>>>>>>>>>1) Always post to the mailing list:
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
>>>>>>>>>>                    
>>>>>>>>>>
>>>>(CVS)
>>>>        
>>>>
>>>>>>>>installed first.  Perform a clean installation; do not upgrade
>>>>>>>>                
>>>>>>>>
>>>>only
>>>>        
>>>>
>>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
>>>>>>>>                
>>>>>>>>
>>>can't
>>>      
>>>
>>>>>>>>guarantee that mixing modules from old and new distributions
>>>>>>>>                
>>>>>>>>
>>>(1.4
>>>      
>>>
>>>>and
>>>>        
>>>>
>>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
>>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
>>>>>>>>                
>>>>>>>>
>>>>saved
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
>>>>>>>>                
>>>>>>>>
>>>>>>(v2.2.13)
>>>>>>            
>>>>>>
>>>>>>>>but it should still save it. I believe as long as next_results()
>>>>>>>>                
>>>>>>>>
>>>>isn't
>>>>        
>>>>
>>>>>>>>called, it will work.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
>>>>>>>>>>                    
>>>>>>>>>>
>>>2.2.13
>>>      
>>>
>>>>>>text output
>>>>>>            
>>>>>>
>>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
>>>>>>>>                
>>>>>>>>
>>>Roger
>>>      
>>>
>>>>Hall
>>>>        
>>>>
>>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
>>>>>>>>                
>>>>>>>>
>>>>(Jason
>>>>        
>>>>
>>>>>>or
>>>>>>            
>>>>>>
>>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
>>>>>>>>                
>>>>>>>>
>>>>>>Bugzilla:
>>>>>>            
>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
>>>>>>>>>>                    
>>>>>>>>>>
>>>>option
>>>>        
>>>>
>>>>>>of
>>>>>>            
>>>>>>
>>>>>>>>saving XML output, so isn't necessary if you don't plan on using
>>>>>>>>                
>>>>>>>>
>>>>this
>>>>        
>>>>
>>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
>>>>>>>>                
>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>>>means that the final version will change to refle the new
>>>>>>>>                
>>>>>>>>
>>>version.
>>>      
>>>
>>>>>>>>>>>>Christopher Fields
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>Dept. of Biochemistry
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>    _____
>>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
>>>>>>>>To: Chris Fields
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>Hi, Chris
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
>>>>>>>>>>                    
>>>>>>>>>>
>>>>for
>>>>        
>>>>
>>>>>>my cgi
>>>>>>            
>>>>>>
>>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
>>>>>>>>                
>>>>>>>>
>>>>even
>>>>        
>>>>
>>>>>>get
>>>>>>            
>>>>>>
>>>>>>>>any RID. Is there any suggestion?
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>>>Guojun
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>Guojun Yang
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Department of Plant Biology
>>>>>>>>University of Georgia
>>>>>>>>Tel: 706-542-1857
>>>>>>>>Fax: 706-542-1805
>>>>>>>>http://www.arches.uga.edu/~guojun
>>>>>>>>    _____
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
>>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>I would say give the new code a try, but realize that it
>>>>>>>>>>                    
>>>>>>>>>>
>>>>hasn't
>>>>        
>>>>
>>>>>>been
>>>>>>            
>>>>>>
>>>>>>>>checked
>>>>>>>>in (like I said below). I will try going over the modified
>>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
>>>>>>>>                
>>>>>>>>
>>>>anything I
>>>>        
>>>>
>>>>>>>>might
>>>>>>>>have missed. The changed order in the header of BLAST text
>>>>>>>>                
>>>>>>>>
>>>output
>>>      
>>>
>>>>has
>>>>        
>>>>
>>>>>>me a
>>>>>>            
>>>>>>
>>>>>>>>bit worried that it might not catch everything, but it at least
>>>>>>>>                
>>>>>>>>
>>>>>>doesn't
>>>>>>            
>>>>>>
>>>>>>>>hang
>>>>>>>>in the while() loop I described in the bug report below (bug
>>>>>>>>                
>>>>>>>>
>>>>#1934)
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>seems to process everything fine.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>If you want more stability in the code, you might consider
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>changing over
>>>>>>            
>>>>>>
>>>>>>>>to
>>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
>>>>>>>>                
>>>>>>>>
>>>>some
>>>>        
>>>>
>>>>>>>>changes
>>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
>>>>>>>>                
>>>>>>>>
>>>>saving
>>>>        
>>>>
>>>>>>XML
>>>>>>            
>>>>>>
>>>>>>>>output, but I believe it parses everything regardless. If you
>>>>>>>>                
>>>>>>>>
>>>look
>>>      
>>>
>>>>>>back
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>last month or so there has been a bit of discussion here about
>>>>>>>>                
>>>>>>>>
>>>it.
>>>      
>>>
>>>>>>Jason
>>>>>>            
>>>>>>
>>>>>>>>describes a bit on how to set up RemoteBlast for XML:
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>remoteblast/
>>>>>>            
>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>Dept. of Biochemistry
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>>                  
>>>>>>>>>
>>>>version
>>>>        
>>>>
>>>>>>1.28
>>>>>>            
>>>>>>
>>>>>>>>>Hi, Everybody,
>>>>>>>>>I see this post and am wondering if this is the reason for the
>>>>>>>>>malfunctionning of my webserver. We set up a webserver named
>>>>>>>>>                  
>>>>>>>>>
>>>>MAK,
>>>>        
>>>>
>>>>>>for
>>>>>>            
>>>>>>
>>>>>>>>MITE
>>>>>>>>                
>>>>>>>>
>>>>>>>>>sequence analysis. It was working very well until around
>>>>>>>>>                  
>>>>>>>>>
>>>>November
>>>>        
>>>>
>>>>>>2005,
>>>>>>            
>>>>>>
>>>>>>>>>when it stopped returning any result (the site is fine and
>>>>>>>>>                  
>>>>>>>>>
>>>seems
>>>      
>>>
>>>>to
>>>>        
>>>>
>>>>>>be
>>>>>>            
>>>>>>
>>>>>>>>>doing sth after submission). In the CGI script, I used
>>>>>>>>>                  
>>>>>>>>>
>>>>remoteblast
>>>>        
>>>>
>>>>>>(that
>>>>>>            
>>>>>>
>>>>>>>>>work was done in 2003) to do searches. I currently do not have
>>>>>>>>>                  
>>>>>>>>>
>>>>>>access to
>>>>>>            
>>>>>>
>>>>>>>>>the server because I moved. Quite several people sent emails
>>>>>>>>>                  
>>>>>>>>>
>>>to
>>>      
>>>
>>>>us
>>>>        
>>>>
>>>>>>about
>>>>>>            
>>>>>>
>>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
>>>>>>>>>                  
>>>>>>>>>
>>>>problem?
>>>>        
>>>>
>>>>>>>>Should
>>>>>>>>                
>>>>>>>>
>>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
>>>>>>>>>                  
>>>>>>>>>
>>>>version?
>>>>        
>>>>
>>>>>>>>>Thanks a lot,
>>>>>>>>>Guojun
>>>>>>>>>
>>>>>>>>>Department of Plant Biology
>>>>>>>>>University of Georgia
>>>>>>>>>Tel: 706-542-1857
>>>>>>>>>Fax: 706-542-1805
>>>>>>>>>http://www.arches.uga.edu/~guojun
>>>>>>>>>_____
>>>>>>>>>
>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
>>>>>>>>>                  
>>>>>>>>>
>>>>Jian'
>>>>        
>>>>
>>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
>>>>>>>>>                  
>>>>>>>>>
>>>[mailto:bioperl-
>>>      
>>>
>>>>>>>>>l at bioperl.org]
>>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>
>>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
>>>>>>>>>                  
>>>>>>>>>
>>>>CVS.
>>>>        
>>>>
>>>>>>It
>>>>>>            
>>>>>>
>>>>>>>>>will
>>>>>>>>>work for saving text output. However, it will not parse
>>>>>>>>>                  
>>>>>>>>>
>>>anything
>>>      
>>>
>>>>>>using
>>>>>>            
>>>>>>
>>>>>>>>>next_result (it will likely hang) and will not save XML
>>>>>>>>>                  
>>>>>>>>>
>>>format.
>>>      
>>>
>>>>See
>>>>        
>>>>
>>>>>>>>these
>>>>>>>>                
>>>>>>>>
>>>>>>>>>bugs:
>>>>>>>>>
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>
>>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
>>>>>>>>>                  
>>>>>>>>>
>>>and
>>>      
>>>
>>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
>>>>>>>>>                  
>>>>>>>>>
>>>>yet
>>>>        
>>>>
>>>>>>so
>>>>>>            
>>>>>>
>>>>>>>>are
>>>>>>>>                
>>>>>>>>
>>>>>>>>>still not included in bioperl-live; they may be further
>>>>>>>>>                  
>>>>>>>>>
>>>modified
>>>      
>>>
>>>>>>before
>>>>>>            
>>>>>>
>>>>>>>>>committing to CVS. If you're not worried about XML, you could
>>>>>>>>>                  
>>>>>>>>>
>>>>just
>>>>        
>>>>
>>>>>>try
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>first fix, which is a change to SearchIO::blast.
>>>>>>>>>
>>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
>>>>>>>>>                  
>>>>>>>>>
>>>>>>script
>>>>>>            
>>>>>>
>>>>>>>>>which
>>>>>>>>>had problems; the script you used saves the output but doesn't
>>>>>>>>>                  
>>>>>>>>>
>>>>>>actually
>>>>>>            
>>>>>>
>>>>>>>>>parse it (i.e. you don't use next_result() to go through the
>>>>>>>>>                  
>>>>>>>>>
>>>>data).
>>>>        
>>>>
>>>>>>Is
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
>>>>>>>>>                  
>>>>>>>>>
>>>you
>>>      
>>>
>>>>>>tried
>>>>>>            
>>>>>>
>>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
>>>>>>>>>                  
>>>>>>>>>
>>>>readmethod
>>>>        
>>>>
>>>>>>=>
>>>>>>            
>>>>>>
>>>>>>>>>blast"
>>>>>>>>>using your version of RemoteBlast and method next_result()?
>>>>>>>>>                  
>>>>>>>>>
>>>Like
>>>      
>>>
>>>>>>below
>>>>>>            
>>>>>>
>>>>>>>>>(from
>>>>>>>>>perldoc):
>>>>>>>>>
>>>>>>>>>while ( my @rids = $factory->each_rid ) {
>>>>>>>>>foreach my $rid ( @rids ) {
>>>>>>>>>my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>if( !ref($rc) ) {
>>>>>>>>>if( $rc < 0 ) {
>>>>>>>>>$factory->remove_rid($rid);
>>>>>>>>>}
>>>>>>>>>print STDERR "." if ( $v > 0 );
>>>>>>>>>sleep 5;
>>>>>>>>>} else { # parsing
>>>>>>>>>starts here
>>>>>>>>>my $result = $rc->next_result(); # it should hang
>>>>>>>>>here
>>>>>>>>>#save the output
>>>>>>>>>my $filename = $result->query_name()."\.out";
>>>>>>>>>$factory->save_output($filename);
>>>>>>>>>$factory->remove_rid($rid);
>>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>>>>while ( my $hit = $result->next_hit ) {
>>>>>>>>>next unless ( $v > 0);
>>>>>>>>>print "\thit name is ", $hit->name, "\n";
>>>>>>>>>while( my $hsp = $hit->next_hsp ) {
>>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>My script hanged if I used next_result() in any way prior to
>>>>>>>>>                  
>>>>>>>>>
>>>the
>>>      
>>>
>>>>>>fixes.
>>>>>>            
>>>>>>
>>>>>>>>I
>>>>>>>>                
>>>>>>>>
>>>>>>>>>want to see how many others are having the same issues with
>>>>>>>>>                  
>>>>>>>>>
>>>>parsing
>>>>        
>>>>
>>>>>>>>using
>>>>>>>>                
>>>>>>>>
>>>>>>>>>the CVS version of bioperl-live.
>>>>>>>>>
>>>>>>>>>Christopher Fields
>>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>>Dept. of Biochemistry
>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
>>>>>>>>>>                    
>>>>>>>>>>
>>>l-
>>>      
>>>
>>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
>>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
>>>>>>>>>>To: Huang Jian; bioperl-l
>>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>
>>>>>>>>>>Hi Huang,
>>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
>>>>>>>>>>                    
>>>>>>>>>>
>>>>works
>>>>        
>>>>
>>>>>>on
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>logic of checking the temporary file size to determine
>>>>>>>>>>                    
>>>>>>>>>>
>>>whether
>>>      
>>>
>>>>the
>>>>        
>>>>
>>>>>>>>Blast
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>results are ready. This condition is not getting satisfied
>>>>>>>>>>                    
>>>>>>>>>>
>>>may
>>>      
>>>
>>>>be
>>>>        
>>>>
>>>>>>due
>>>>>>            
>>>>>>
>>>>>>>>to
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>some changes brought about by NCBI. I had this problem
>>>>>>>>>>                    
>>>>>>>>>>
>>>>recently
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>>>figured out that the solution was to use the latest version
>>>>>>>>>>                    
>>>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>has
>>>>>>            
>>>>>>
>>>>>>>>>>this problem fixed (does not use file size logic any more)
>>>>>>>>>>                    
>>>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>is
>>>>>>            
>>>>>>
>>>>>>>>not
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>yet included in the BioPerl package.
>>>>>>>>>>Cheers
>>>>>>>>>>Nagesh
>>>>>>>>>>
>>>>>>>>>>Huang Jian wrote:
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>Dear Nagesh,
>>>>>>>>>>>
>>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>you
>>>>        
>>>>
>>>>>>send
>>>>>>            
>>>>>>
>>>>>>>>>>>me. Now it works perfectly!!!
>>>>>>>>>>>
>>>>>>>>>>>Thank you!!
>>>>>>>>>>>
>>>>>>>>>>>Huang
>>>>>>>>>>>
>>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
>>>>>>>>>>>
>>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
>>>>>>>>>>>
>>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
>>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>net,
>>>      
>>>
>>>>so
>>>>        
>>>>
>>>>>>still
>>>>>>            
>>>>>>
>>>>>>>>>>>via email
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>Hi Huang,
>>>>>>>>>>>>I see that you are submitting a sequence for a remote
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>blast
>>>      
>>>
>>>>>>search.
>>>>>>            
>>>>>>
>>>>>>>>>Can
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>(2005/12/09).
>>>>>>            
>>>>>>
>>>>>>>>If
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>not I have attached it with this email, try to replace it
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>with
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>>>old
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>>>one which has a bug.
>>>>>>>>>>>>Let me know if it works.
>>>>>>>>>>>>Nagesh
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>                  
>>>>>>>>>
>>>>>>_______________________________________________
>>>>>>            
>>>>>>
>>>>>>>>Bioperl-l mailing list
>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>                
>>>>>>>>
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>            
>>>>>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From jason.stajich at duke.edu  Thu Feb 16 09:00:01 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 16 Feb 2006 09:00:01 -0500
Subject: [Bioperl-l] searchIO bug?
In-Reply-To: <43F452F30200009B00000EC9@gwia.kvl.dk>
References: <43F452F30200009B00000EC9@gwia.kvl.dk>
Message-ID: <11B49C84-9C04-4F43-9278-A3AA09C9B773@duke.edu>

i think it would be more helpful if you posted the actual report  
rather than the protein since this may be dependent on the version of  
blast you are using.

if you used
split(/\s+/, $header)
  it wouldn't matter how many spaces.

On Feb 16, 2006, at 4:24 AM, Anders Stegmann wrote:

> Hi!
>
>
> I am blasting a protein seq against an identical protein.
> I am trying to parse the protein header by using the query_description
> method in the SearchIO module.
> After using the query_description method I use split / /      in order
> to easily access the different header components.
> Here I discover that the query_description method is somehow  
> introducing
> a space between number 5 comma and the following chromosome position
> number
> in the exon chromosome position list!?
> This truncates the list of exon chromosome positions from 7 to 4,  
> later
> yielding a wrong number of the introns counted.
>
> Is this a bug?
>
> Attached is:
>
> testblast1.pl: the blastprogram to run.
>
> Q0045 the seq that is used as both query and database seq.
> (Q0045 has to be formated in order to be used as a database:  
> formatdb -i
> Q0045 -p T -o F)
>
>
> Regards Anders.
>
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/




From cjfields at uiuc.edu  Thu Feb 16 10:50:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 09:50:04 -0600
Subject: [Bioperl-l] additional error message
In-Reply-To: <20060216100410.54a1a6d5@dogwood.plantbio.uga.edu>
Message-ID: <002901c63310$a7da1b20$15327e82@pyrimidine>

I don't think the apache error is related to the main issue here, but you
could always try upgrading LWP to see if that fixes it.  The second issue is
text parsing issues in SearchIO specific to nucleotide BLAST information,
which I'm looking into.

Jason has posted a bit on using XML.  Basically, do the following:

my $prog = 'blastn';
my $db = 'nr';
my $e_val=1e-10;
my $v = 1;
my @params=(-prog=>$prog,
 		-data=>$db,
	-expect=>$e_val,
	-readmethod=>'xml');

my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
$factory->retrieve_parameter('FORMAT_TYPE', 'XML');

You'll also need to modify following line:

my $filename = $result->query_name()."\.out";

b/c the XML tag for this feature is actually part of the rid for some
reason, so you'll get a weird output file name.  This is a problem with
NCBI's XML output, not SearchIO::XML parsing.

XML BLAST files can be really big (~5 MB and up depending on how much
information is returned), so it may take a little time to go through the
data.  Right now, it is the only consistently reliable way that output can
be parsed at this moment as NCBI keeps changing text output, sending us back
into "SearchIO::blast hell," as J.S. puts it.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> Sent: Thursday, February 16, 2006 9:04 AM
> To: Chris Fields; Pieter Monsieurs
> Cc: bioperl-l at lists.open-bio.org
> Subject: additional error message
> 
> when I check my apache error_log, there is one line saying:
> "waiting...Parsing of undecoded UTF-8 will give garbage when decoding
> entities at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 137.,"
> I also see an error saying "MSG: no data for midline  Features flanking
> this part of subject sequence:, " that is mentioned by Pieter.
> Chris, may I have your suggestion on change it to XML parsing? I read
> Jason's comments/suggestions about it, but could not make it work.
> Thanks
> 
> Guojun
> Department of Plant Biology
> University of Georgia
> 
> 
> 
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: Pieter Monsieurs [mailto:Pieter.Monsieurs at esat.kuleuven.be]
> Cc: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> 	version 1.28
> 
> 
> > Yeah, looks like it broke text output nucleotide parsing with that.
> > XML output parsing still works though (as expected).  I'll give it a
> > look.
> > > Chris
> > > On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote:
> > > > Hi,
> > >
> > > I have the same problem with the blast.pm-file.
> > > The people of NCBI added some extra info when giving the Blast-
> > > output. (see e.g. "Features flanking this part..." or "Features in
> > > this part ..."), example added.
> > > The blast.pm module starts looking for the hsp-alignement-
> > > information, but it dies when it hits this Feature-information.
> > >
> > > Pieter
> > >
> > >
> > >> gi|77552765|gb|DP000011.1|  > >> query.fcgi?
> > >> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza
> > >> sativa (japonica cultivar-group) chromosome 12, complete
> > >
> > > sequence
> > > Length=27492551
> > >
> > > Features flanking this part of subject sequence:
> > >   3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm
> > > sub-class  > > val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts>
> > >   2655 bp at 3' side: hypothetical protein  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts>
> > >
> > > Score = 36.2 bits (18),  Expect = 0.22
> > > Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  4         GTACTACTCTACTCTACT  21
> > >                 ||||||||||||||||||
> > >
> > > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >   2991 bp at 5' side: hypothetical protein  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts>
> > >   1131 bp at 3' side: hypothetical protein
> > >  > > val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts>
> > >
> > > Score = 36.2 bits (18),  Expect = 0.22
> > > Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  2         ATGTACTACTCTACTCTA  19
> > >                 ||||||||||||||||||
> > > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> > >
> > >
> > >
> > > Features in this part of subject sequence:
> > >   DHHC zinc finger domain, putative
> > >  > > val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts>
> > >
> > > Score = 34.2 bits (17),  Expect = 0.87
> > > Identities = 17/17 (100%), Gaps = 0/17 (0%)
> > > Strand=Plus/Plus
> > >
> > > Query  5         TACTACTCTACTCTACT  21
> > >                 |||||||||||||||||
> > > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> > >
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >   102 bp at 5' side: bZIP transcription factor, putative
> > >  > > val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts>
> > >   3740 bp at 3' side: yeast dcp1, putative  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts>
> > >
> > > Score = 32.2 bits (16),  Expect = 3.4
> > > Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > > Strand=Plus/Plus
> > >
> > > Query  7        CTACTCTACTCTACTC  22
> > >                ||||||||||||||||
> > > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >
> > >   21 bp at 5' side: peptide transporter T17F3.11, putative  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts>
> > >   10230 bp at 3' side: transposon protein, putative, unclassified
> > >  > > val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts>
> > >
> > > Score = 32.2 bits (16),  Expect = 3.4
> > > Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  7         CTACTCTACTCTACTC  22
> > >
> > >                 ||||||||||||||||
> > > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> > >
> > >
> > >
> > >
> > > Guojun Yang wrote:
> > >
> > >> Hi, Chris,
> > >> Finally the remoteblast test script works for the amino.fa query.
> > >> but when I try a nucleic acid sequence (see below), Error occurs: "
> > >> waiting........
> > >> ------------- EXCEPTION  -------------
> > >> MSG: no data for midline  Features flanking this part of subject
> > >> sequence:
> > >> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/
> > >> 5.8.3/Bio/Searc                             hIO/blast.pm:1172
> > >> STACK toplevel remoteblast_test:40
> > >> "
> > >> The query sequence is:
> > >> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > >> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > >> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > >> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > >>
> > >> The script (basically same as the remoteblast test, I only changed
> > >> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > >> #!/usr/bin/perl
> > >>
> > >> use Bio::SeqIO;
> > >> use Bio::Seq;
> > >> use Bio::Tools::Run::RemoteBlast;
> > >> use Bio::SearchIO;
> > >> use strict;
> > >> my $prog='blastn';
> > >> my $db='nr';
> > >> my $e_val=1e-10;
> > >> my @params=( -prog=>$prog,
> > >> 	-data=>$db,
> > >> 	-expect=>$e_val,
> > >> 	-readmethod=>'SearchIO');
> > >> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>
> > >> my $v = 1;
> > >>
> > >> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > >>
> > >> while (my $input = $str->next_seq()){
> > >>  #Blast a sequence against a database:
> > >>  #Alternatively, you could  pass in a file with many
> > >>  #sequences rather than loop through sequence one at a time
> > >>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>  #and swap the two lines below for an example of that.
> > >>  my $r = $factory->submit_blast($input);
> > >>  #my $r = $factory->submit_blast('amino.fa');
> > >>  print STDERR "waiting..." if( $v > 0 );
> > >>  while ( my @rids = $factory->each_rid ) {
> > >>    foreach my $rid ( @rids ) {
> > >>      my $rc = $factory->retrieve_blast($rid);
> > >>      if( !ref($rc) ) {
> > >>        if( $rc < 0 ) {
> > >>          $factory->remove_rid($rid);
> > >>        }
> > >>        print STDERR "." if ( $v > 0 );
> > >>        sleep 5;
> > >>      } else {
> > >>        my $result = $rc->next_result();
> > >>        #save the output
> > >>        my $filename = $result->query_name()."\.out";
> > >>        $factory->save_output($filename);
> > >>        $factory->remove_rid($rid);
> > >>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>        while ( my $hit = $result->next_hit ) {
> > >>          next unless ( $v > 0);
> > >>          print "\thit name is ", $hit->name, "\n";
> > >>          while( my $hsp = $hit->next_hsp ) {
> > >>            print "\t\tscore is ", $hsp->score, "\n";
> > >>          }
> > >>        }
> > >>      }
> > >>    }
> > >>  }
> > >> }
> > >>
> > >>
> > >> Do you think there might still be something in the NCBI output
> > >> format?
> > >>
> > >> Thank you,
> > >> Guojun
> > >>
> > >>
> > >>
> > >>
> > >> Guojun Yang
> > >> Department of Plant Biology
> > >> University of Georgia
> > >> Tel: 706-542-1857
> > >> Fax: 706-542-1805
> > >> http://www.arches.uga.edu/~guojun
> > >>
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>
> > >>
> > >>
> > >>> Sorry, forgot to add that I didn't see the regex issue that you
> > >>> mentioned.
> > >>> It could be a perl-related issue.  Try the fixes I mentioned and
> > >>> see what
> > >>> happens.
> > >>>
> > >>>> Christopher Fields
> > >>>>
> > >>> Postdoctoral Researcher - Switzer Lab
> > >>> Dept. of Biochemistry
> > >>> University of Illinois Urbana-Champaign
> > >>>>>> -----Original Message-----
> > >>>>>>
> > >>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>> Sent: Tuesday, February 14, 2006 12:36 PM
> > >>>> To: 'gyang at plantbio.uga.edu'
> > >>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>
> > >>>>>> It's a good habit to always add single quotes around words.
> > >>>>>> The perl
> > >>>>>>
> > >>>> interpreter may think a single bare word is a subroutine or
> > >>>> perlfunc
> > >>>> called with no args so will try to find a subroutine named blastp
> > >>>> ().  My
> > >>>> debugger actually gives the error that the bare word blastp may
> > >>>> conflict
> > >>>> with a future reserved word.  Like you said, 'use strict' will
> > >>>> point that
> > >>>> out.
> > >>>>
> > >>>>>> As for the regex, it should match all the blast programs at
> > >>>>>> NCBI (blastp,
> > >>>>>>
> > >>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure
> > >>>> nothing
> > >>>> else passes through.
> > >>>>
> > >>>>>> So, if you are using the script below, there are several
> > >>>>>> errors.  The bare
> > >>>>>>
> > >>>> words for $prog and $db need quotes, and the flags for you
> > >>>> @params array
> > >>>> don't have a dash before them.  I get this after adding quotes
> > >>>> but before
> > >>>> adding the dashes to @params:
> > >>>>
> > >>>>>> C:\Perl\Scripts>test_blast.pl
> > >>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>>>>>
> > >>>> MSG:
> > >>>> STACK: Error::throw
> > >>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > >>>> live/Bio/Root/Root.pm:328
> > >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > >>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > >>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl
> > >>>> \bioperl-
> > >>>> live/Bio/Tools/Run/RemoteBlast.pm:256
> > >>>> STACK: C:\Perl\Scripts\test_blast.pl:15
> > >>>> -----------------------------------------------------------
> > >>>>
> > >>>>>> The last line indicates a problem with this line:
> > >>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>> Changing the @params to this:
> > >>>>>> my @params=( -prog=>$prog,
> > >>>>>>
> > >>>> 	-data=>$db,
> > >>>> 	-expect=>$e_val,
> > >>>> 	-readmethod=>'SearchIO');
> > >>>>
> > >>>>>> fixes it, and I get output as expected.
> > >>>>>> Christopher Fields
> > >>>>>>
> > >>>> Postdoctoral Researcher - Switzer Lab
> > >>>> Dept. of Biochemistry
> > >>>> University of Illinois Urbana-Champaign
> > >>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>>
> > >>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>> Sent: Tuesday, February 14, 2006 11:48 AM
> > >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>>
> > >>>>> Hi, Chris,
> > >>>>> When I tried with the perldoc script, It did not work either.
> > >>>>> First it
> > >>>>> says $prog can not be bare word if I "use strict". I added
> > >>>>> quotes on the
> > >>>>> words, then it says the value for $prog does not match expression
> > >>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > >>>>>
> > >>>> script
> > >>>>
> > >>>>> is shown below. Why is the expression "t?blast[pnx]"?
> > >>>>>
> > >>>>> #!/usr/bin/perl
> > >>>>>
> > >>>>> use Bio::SeqIO;
> > >>>>> use Bio::Seq;
> > >>>>> use Bio::Tools::Run::RemoteBlast;
> > >>>>> use Bio::SearchIO;
> > >>>>>
> > >>>>>
> > >>>>> my $prog=blastp;
> > >>>>> my $db=swissprot;
> > >>>>> my $e_val=1e-10;
> > >>>>> my @params=( prog=>$prog,
> > >>>>> 	data=>$db,
> > >>>>> 	expect=>$e_val,
> > >>>>> 	readmethod=>'SearchIO');
> > >>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>
> > >>>>> my $v = 1;
> > >>>>>
> > >>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  > >>>>>
> 'fasta' );
> > >>>>>
> > >>>>> while (my $input = $str->next_seq()){
> > >>>>>  #Blast a sequence against a database:
> > >>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>  #and swap the two lines below for an example of that.
> > >>>>>  my $r = $factory->submit_blast($input);
> > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>    foreach my $rid ( @rids ) {
> > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>      if( !ref($rc) ) {
> > >>>>>        if( $rc < 0 ) {
> > >>>>>          $factory->remove_rid($rid);
> > >>>>>        }
> > >>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>        sleep 5;
> > >>>>>      } else {
> > >>>>>        my $result = $rc->next_result();
> > >>>>>        #save the output
> > >>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>        $factory->save_output($filename);
> > >>>>>        $factory->remove_rid($rid);
> > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>          next unless ( $v > 0);
> > >>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>          }
> > >>>>>        }
> > >>>>>      }
> > >>>>>    }
> > >>>>>  }
> > >>>>> }
> > >>>>>
> > >>>>> Thank you for your help!
> > >>>>>
> > >>>>>
> > >>>>> Guojun
> > >>>>> Department of Plant Biology
> > >>>>> University of Georgia
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>> To: gyang at plantbio.uga.edu
> > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> Try two things:
> > >>>>>>
> > >>>>>>> 1)  Use a much simpler script, like the one in 'perldoc
> > >>>>>>>
> > >>>>>> Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's
> > >>>>>> something
> > >>>>>>
> > >>>>> wrong
> > >>>>>
> > >>>>>> with the logic in your subroutine:
> > >>>>>>
> > >>>>>>> my $v = 1;
> > >>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  >
> >>>>>>> 'fasta' );
> > >>>>>>> while (my $input = $str->next_seq()){
> > >>>>>>>
> > >>>>>>  #Blast a sequence against a database:
> > >>>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>>  #and swap the two lines below for an example of that.
> > >>>>>>  my $r = $factory->submit_blast($input);
> > >>>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>>    foreach my $rid ( @rids ) {
> > >>>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>>      if( !ref($rc) ) {
> > >>>>>>        if( $rc < 0 ) {
> > >>>>>>          $factory->remove_rid($rid);
> > >>>>>>        }
> > >>>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>>        sleep 5;
> > >>>>>>      } else {
> > >>>>>>        my $result = $rc->next_result();
> > >>>>>>        #save the output
> > >>>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>>        $factory->save_output($filename);
> > >>>>>>        $factory->remove_rid($rid);
> > >>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>>          next unless ( $v > 0);
> > >>>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>          }
> > >>>>>>        }
> > >>>>>>      }
> > >>>>>>    }
> > >>>>>>  }
> > >>>>>> }
> > >>>>>>
> > >>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > >>>>>>>
> > >>>> really
> > >>>>
> > >>>>>> shouldn't make that much of a difference, but I noticed that
> > >>>>>> the CVS
> > >>>>>> RemoteBlast (1.28) was changed in Dec 2005, after
> > >>>>>> bioperl-1.5.1 was
> > >>>>>> released; the Bugzilla version is based off CVS.
> > >>>>>>
> > >>>>>>> Christopher Fields
> > >>>>>>>
> > >>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>> Dept. of Biochemistry
> > >>>>>> University of Illinois Urbana-Champaign
> > >>>>>>
> > >>>>>>>> -----Original Message-----
> > >>>>>>>>
> > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>> Sent: Monday, February 13, 2006 3:00 PM
> > >>>>>>> To: bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>
> > >>>>>>>>> Thanks, Chris,
> > >>>>>>>>>
> > >>>>>>> I installed version 1.5.1 and replaced the blast.pm file with
> > >>>>>>> the
> > >>>>>>>
> > >>>> one
> > >>>>
> > >>>>> from
> > >>>>>
> > >>>>>>> your bug report. The running version is 1.5 when I use the
> > >>>>>>> command
> > >>>>>>>
> > >>>> you
> > >>>>
> > >>>>>>> sent me. But when I tried the script, it doesn't change much. My
> > >>>>>>> remoteblast code (portion) is here:
> > >>>>>>>
> > >>>>>>>>> sub search {
> > >>>>>>>>>
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}
> > >>>>>>> ="$ORGN";
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}
> > >>>>>>> =5000;
> > >>>>>>> local
> > >>>>>>>
> > >>>>>>>
> > >>>> $Bio::Tools::Run::RemoteBlast::HEADER
> > >>>> {'COMPOSITION_BASED_STATISTICS'}=
> > >>>>
> > >>>>>>> 'no';
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > >>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > >>>>>>> 			      -id=>"query",
> > >>>>>>> 			      -desc=>"new seq");
> > >>>>>>> my $len=$query->length();
> > >>>>>>> @db=('nr','htgs','wgs');
> > >>>>>>> foreach my $db (@db) {
> > >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'  >
> >>>>>>> =>'blastn',
> > >>>>>>> 						'-data' =>"$db",
> > >>>>>>>
> > >>>>>>>
> > >>> '-expect'=>"$E_value");
> > >>>
> > >>>>>>>>>>> my $blast_report = $factory->submit_blast($query);
> > >>>>>>>>>>>
> > >>>>>>>>> my @rids = $factory->each_rid();
> > >>>>>>>>>
> > >>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>    print STDERR "$rid\n";
> > >>>>>>> }
> > >>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > >>>>>>> print STDERR "waiting...";
> > >>>>>>> sleep 60;
> > >>>>>>>
> > >>>>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>>>
> > >>>>>>>    my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>    while (!ref($rc) ) {
> > >>>>>>> 	if( $rc < 0 ) {
> > >>>>>>> # retrieve_blast returns -1 on error
> > >>>>>>> 	    $factory->remove_rid($rid);
> > >>>>>>> 	    print "Error!\n";
> > >>>>>>> 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > >>>>>>> 	    die "Can't retrieve $rid";
> > >>>>>>> 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > >>>>>>>
> > >>>> finished'
> > >>>>
> > >>>>>>> 	    sleep 60;
> > >>>>>>> 	    $rc = $factory->retrieve_blast($rid);
> > >>>>>>> 	}
> > >>>>>>>    }
> > >>>>>>>    if (ref($rc)) {
> > >>>>>>> 	print STDERR "Done.\n";
> > >>>>>>> 	 while( my $result = $rc->next_result) {
> > >>>>>>> 	    while( my $hit = $result->next_hit()) {
> > >>>>>>> 	    	$hit_name=$hit->name;
> > >>>>>>> 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > >>>>>>> 		$name=$1;
> > >>>>>>> 		@left_plus_start=();
> > >>>>>>> 		@left_plus_end=();
> > >>>>>>> 		@left_minus_start=();
> > >>>>>>> 		@left_minus_end=();
> > >>>>>>> 		@right_plus_start=();
> > >>>>>>> 		@right_plus_end=();
> > >>>>>>> 		@right_minus_start=();
> > >>>>>>> 		@right_minus_end=();
> > >>>>>>>
> > >>>>>>>>> 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > >>>>>>>>>
> > >>>>>>> 		while( my $hsp = $hit->next_hsp()) {
> > >>>>>>> ......
> > >>>>>>>
> > >>>>>>>>> It was working quite well before around October laster
> > >>>>>>>>> year, but
> > >>>>>>>>>
> > >>>>> it has
> > >>>>>
> > >>>>>>> stopped since then, When a submission is sent via a webpage,
> > >>>>>>> the cgi
> > >>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there,
> > >>>>>>>
> > >>>>> finally
> > >>>>>
> > >>>>>>> the expected email is received but without real results
> > >>>>>>> although it
> > >>>>>>>
> > >>>>> does
> > >>>>>
> > >>>>>>> contain something from other parts of the script. Apparently the
> > >>>>>>>
> > >>>>> search
> > >>>>>
> > >>>>>>> sub did not return anything (I know there is something should be
> > >>>>>>> returned.). Is it also possible the format of the NCBI output
> > >>>>>>> for
> > >>>>>>>
> > >>>> each
> > >>>>
> > >>>>>>> result has changed?
> > >>>>>>> Thank you,
> > >>>>>>> Guojun
> > >>>>>>>
> > >>>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>>>
> > >>>>>>> University of Georgia
> > >>>>>>>
> > >>>>>>>>>>>>> ----- Original Message -----
> > >>>>>>>>>>>>>
> > >>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>
> > >>>>>>>>>>>> How do you know two versions are installed (i.e. how are
> > >>>>>>>>>>>>
> > >>>> you
> > >>>>
> > >>>>> checking
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>> version)?  Do you see have two complete bioperl
> > >>>>>>>> distributions (in
> > >>>>>>>>
> > >>>>> two
> > >>>>>
> > >>>>>>>> separate directories) or are you looking in modules?  Here's
> > >>>>>>>> the
> > >>>>>>>>
> > >>>> way
> > >>>>
> > >>>>> to
> > >>>>>
> > >>>>>>>> check the version (from the FAQ):
> > >>>>>>>>
> > >>>>>>>>> perl -MBio::Root::Version -e 'print
> > >>>>>>>>>
> > >>>>> $Bio::Root::Version::VERSION,"\n"'
> > >>>>>
> > >>>>>>>>> If you have two full bioperl distributions on your computer,
> > >>>>>>>>>
> > >>>>> normally
> > >>>>>
> > >>>>>>> only
> > >>>>>>>
> > >>>>>>>> one will be in use unless you have explicitly set the
> > >>>>>>>> environment
> > >>>>>>>>
> > >>>>>>> variable
> > >>>>>>>
> > >>>>>>>> PERL5LIB.  The PERL5LIB  directories will be searched first
> > >>>>>>>> before
> > >>>>>>>>
> > >>>>> your
> > >>>>>
> > >>>>>>>> normal perl directory list (@INC) is searched.  You MAY get
> > >>>>>>>> some
> > >>>>>>>>
> > >>>>> mixing
> > >>>>>
> > >>>>>>>> then, but only if perl can't find a particular module in the
> > >>>>>>>> path
> > >>>>>>>>
> > >>>>>>> designated
> > >>>>>>>
> > >>>>>>>> in PERL5LIB; then it will progress through the directories
> > >>>>>>>> listed
> > >>>>>>>>
> > >>>> in
> > >>>>
> > >>>>>>> @INC.
> > >>>>>>>
> > >>>>>>>> This may happen if a module is unique to a particular
> > >>>>>>>> release, but
> > >>>>>>>>
> > >>>>>>> shouldn't
> > >>>>>>>
> > >>>>>>>> happen for the majority of modules, including RemoteBlast.  You
> > >>>>>>>>
> > >>>> can
> > >>>>
> > >>>>>>> check
> > >>>>>>>
> > >>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC
> > >>>>>>>> will
> > >>>>>>>>
> > >>>>> differ
> > >>>>>
> > >>>>>>>> depending on your OS, perl build, etc.
> > >>>>>>>>
> > >>>>>>>>> Regardless, if you follow the directions for installing
> > >>>>>>>>> bioperl
> > >>>>>>>>>
> > >>>>> for
> > >>>>>
> > >>>>>>> your
> > >>>>>>>
> > >>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make
> > >>>>>>>> install',
> > >>>>>>>>
> > >>>>> unless
> > >>>>>
> > >>>>>>> you
> > >>>>>>>
> > >>>>>>>> explicitly change the installation directory when using 'perl
> > >>>>>>>>
> > >>>>>>> Makefile.PL'),
> > >>>>>>>
> > >>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will
> > >>>>>>>>
> > >>>>> install
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>> Bioperl distribution you downloaded over the old version in
> > >>>>>>>> @INC.
> > >>>>>>>>
> > >>>>> See
> > >>>>>
> > >>>>>>> this
> > >>>>>>>
> > >>>>>>>> page:
> > >>>>>>>>
> > >>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > >>>>>>>>> for more details.
> > >>>>>>>>> Christopher Fields
> > >>>>>>>>>
> > >>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>> Dept. of Biochemistry
> > >>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>>
> > >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM
> > >>>>>>>>> To: bioperl-l at lists.open-bio.org
> > >>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>>>
> > >>>>>>>>>>> Hi, Chris,
> > >>>>>>>>>>>
> > >>>>>>>>> I do have different versions of bioperl on my Linux machine
> > >>>>>>>>>
> > >>>> (1.4.
> > >>>>
> > >>>>> and
> > >>>>>
> > >>>>>>>>> 1.5.0), this may be the problem. Should I just install
> > >>>>>>>>> bioperl-
> > >>>>>>>>>
> > >>>>> 1.5.1
> > >>>>>
> > >>>>>>> or I
> > >>>>>>>
> > >>>>>>>>> need to uninstall and remove the previous versions. I could
> > >>>>>>>>> not
> > >>>>>>>>>
> > >>>>> find
> > >>>>>
> > >>>>>>> any
> > >>>>>>>
> > >>>>>>>>> hint on uninstalling bioperl on linux. Could you please
> > >>>>>>>>> give me
> > >>>>>>>>>
> > >>>>> some
> > >>>>>
> > >>>>>>>>> suggestion?
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Guojun
> > >>>>>>>>>
> > >>>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>>>
> > >>>>>>>>> University of Georgia
> > >>>>>>>>>      _____
> > >>>>>>>>>
> > >>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>
> > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely
> > >>>>>>>>>>>>>>>
> > >>>>>>> updated from CVS
> > >>>>>>>
> > >>>>>>>>> which isn't the latest fix.
> > >>>>>>>>>
> > >>>>>>>>>>> Make sure that you check the following:
> > >>>>>>>>>>> 1) Always post to the mailing list:
> > >>>>>>>>>>>
> > >>>>>>>>> http://www.bioperl.org/wiki/
> > >>>>>>>>> HOWTO:Beginners#Getting_Assistance .
> > >>>>>>>>>
> > >>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > >>>>>>>>>>>
> > >>>>> (CVS)
> > >>>>>
> > >>>>>>>>> installed first.  Perform a clean installation; do not upgrade
> > >>>>>>>>>
> > >>>>> only
> > >>>>>
> > >>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > >>>>>>>>>
> > >>>> can't
> > >>>>
> > >>>>>>>>> guarantee that mixing modules from old and new distributions
> > >>>>>>>>>
> > >>>> (1.4
> > >>>>
> > >>>>> and
> > >>>>>
> > >>>>>>>>> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-
> > >>>>>>>>> live
> > >>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be
> > >>>>>>>>>
> > >>>>> saved
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>> parsed; it will not parse the newest BLAST text output from
> > >>>>>>>>> NCBI
> > >>>>>>>>>
> > >>>>>>> (v2.2.13)
> > >>>>>>>
> > >>>>>>>>> but it should still save it. I believe as long as
> > >>>>>>>>> next_results()
> > >>>>>>>>>
> > >>>>> isn't
> > >>>>>
> > >>>>>>>>> called, it will work.
> > >>>>>>>>>
> > >>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST
> > >>>>>>>>>>>
> > >>>> 2.2.13
> > >>>>
> > >>>>>>> text output
> > >>>>>>>
> > >>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by
> > >>>>>>>>>
> > >>>> Roger
> > >>>>
> > >>>>> Hall
> > >>>>>
> > >>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be
> > >>>>>>>>>
> > >>>>> (Jason
> > >>>>>
> > >>>>>>> or
> > >>>>>>>
> > >>>>>>>>> whomever is in charge of Bio::SearchIO).  They can be found in
> > >>>>>>>>>
> > >>>>>>> Bugzilla:
> > >>>>>>>
> > >>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>>>
> > >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>
> > >>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > >>>>>>>>>>>
> > >>>>> option
> > >>>>>
> > >>>>>>> of
> > >>>>>>>
> > >>>>>>>>> saving XML output, so isn't necessary if you don't plan on
> > >>>>>>>>> using
> > >>>>>>>>>
> > >>>>> this
> > >>>>>
> > >>>>>>>>> option.  And, remember, they haven't been committed yet to
> > >>>>>>>>> CVS,
> > >>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>>>> means that the final version will change to refle the new
> > >>>>>>>>>
> > >>>> version.
> > >>>>
> > >>>>>>>>>>>>> Christopher Fields
> > >>>>>>>>>>>>>
> > >>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>>>>>    _____
> > >>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>>>>>>>>>>
> > >>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM
> > >>>>>>>>> To: Chris Fields
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>>>> Hi, Chris
> > >>>>>>>>>>>>>
> > >>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work
> > >>>>>>>>>>>
> > >>>>> for
> > >>>>>
> > >>>>>>> my cgi
> > >>>>>>>
> > >>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I
> > >>>>>>>>> didn't
> > >>>>>>>>>
> > >>>>> even
> > >>>>>
> > >>>>>>> get
> > >>>>>>>
> > >>>>>>>>> any RID. Is there any suggestion?
> > >>>>>>>>>
> > >>>>>>>>>>>>>>> Guojun
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>> Guojun Yang
> > >>>>>>>>>>>>>
> > >>>>>>>>> Department of Plant Biology
> > >>>>>>>>> University of Georgia
> > >>>>>>>>> Tel: 706-542-1857
> > >>>>>>>>> Fax: 706-542-1805
> > >>>>>>>>> http://www.arches.uga.edu/~guojun
> > >>>>>>>>>    _____
> > >>>>>>>>>
> > >>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>>>
> > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > >>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>> I would say give the new code a try, but realize that it
> > >>>>>>>>>>>
> > >>>>> hasn't
> > >>>>>
> > >>>>>>> been
> > >>>>>>>
> > >>>>>>>>> checked
> > >>>>>>>>> in (like I said below). I will try going over the modified
> > >>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is
> > >>>>>>>>>
> > >>>>> anything I
> > >>>>>
> > >>>>>>>>> might
> > >>>>>>>>> have missed. The changed order in the header of BLAST text
> > >>>>>>>>>
> > >>>> output
> > >>>>
> > >>>>> has
> > >>>>>
> > >>>>>>> me a
> > >>>>>>>
> > >>>>>>>>> bit worried that it might not catch everything, but it at
> > >>>>>>>>> least
> > >>>>>>>>>
> > >>>>>>> doesn't
> > >>>>>>>
> > >>>>>>>>> hang
> > >>>>>>>>> in the while() loop I described in the bug report below (bug
> > >>>>>>>>>
> > >>>>> #1934)
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>> seems to process everything fine.
> > >>>>>>>>>
> > >>>>>>>>>>> If you want more stability in the code, you might consider
> > >>>>>>>>>>>
> > >>>>>>> changing over
> > >>>>>>>
> > >>>>>>>>> to
> > >>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are
> > >>>>>>>>>
> > >>>>> some
> > >>>>>
> > >>>>>>>>> changes
> > >>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > >>>>>>>>>
> > >>>>> saving
> > >>>>>
> > >>>>>>> XML
> > >>>>>>>
> > >>>>>>>>> output, but I believe it parses everything regardless. If you
> > >>>>>>>>>
> > >>>> look
> > >>>>
> > >>>>>>> back
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>> last month or so there has been a bit of discussion here about
> > >>>>>>>>>
> > >>>> it.
> > >>>>
> > >>>>>>> Jason
> > >>>>>>>
> > >>>>>>>>> describes a bit on how to set up RemoteBlast for XML:
> > >>>>>>>>>
> > >>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > >>>>>>>>>>>
> > >>>>>>> remoteblast/
> > >>>>>>>
> > >>>>>>>>>>> Christopher Fields
> > >>>>>>>>>>>
> > >>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>>>
> > >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM
> > >>>>>>>>>> To: bioperl-l at bioperl.org
> > >>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>>>
> > >>>>> version
> > >>>>>
> > >>>>>>> 1.28
> > >>>>>>>
> > >>>>>>>>>> Hi, Everybody,
> > >>>>>>>>>> I see this post and am wondering if this is the reason for
> > >>>>>>>>>> the
> > >>>>>>>>>> malfunctionning of my webserver. We set up a webserver named
> > >>>>>>>>>>
> > >>>>> MAK,
> > >>>>>
> > >>>>>>> for
> > >>>>>>>
> > >>>>>>>>> MITE
> > >>>>>>>>>
> > >>>>>>>>>> sequence analysis. It was working very well until around
> > >>>>>>>>>>
> > >>>>> November
> > >>>>>
> > >>>>>>> 2005,
> > >>>>>>>
> > >>>>>>>>>> when it stopped returning any result (the site is fine and
> > >>>>>>>>>>
> > >>>> seems
> > >>>>
> > >>>>> to
> > >>>>>
> > >>>>>>> be
> > >>>>>>>
> > >>>>>>>>>> doing sth after submission). In the CGI script, I used
> > >>>>>>>>>>
> > >>>>> remoteblast
> > >>>>>
> > >>>>>>> (that
> > >>>>>>>
> > >>>>>>>>>> work was done in 2003) to do searches. I currently do not
> > >>>>>>>>>> have
> > >>>>>>>>>>
> > >>>>>>> access to
> > >>>>>>>
> > >>>>>>>>>> the server because I moved. Quite several people sent emails
> > >>>>>>>>>>
> > >>>> to
> > >>>>
> > >>>>> us
> > >>>>>
> > >>>>>>> about
> > >>>>>>>
> > >>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the
> > >>>>>>>>>>
> > >>>>> problem?
> > >>>>>
> > >>>>>>>>> Should
> > >>>>>>>>>
> > >>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new
> > >>>>>>>>>>
> > >>>>> version?
> > >>>>>
> > >>>>>>>>>> Thanks a lot,
> > >>>>>>>>>> Guojun
> > >>>>>>>>>>
> > >>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>> University of Georgia
> > >>>>>>>>>> Tel: 706-542-1857
> > >>>>>>>>>> Fax: 706-542-1805
> > >>>>>>>>>> http://www.arches.uga.edu/~guojun
> > >>>>>>>>>> _____
> > >>>>>>>>>>
> > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > >>>>>>>>>>
> > >>>>> Jian'
> > >>>>>
> > >>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > >>>>>>>>>>
> > >>>> [mailto:bioperl-
> > >>>>
> > >>>>>>>>>> l at bioperl.org]
> > >>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > >>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>
> > >>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl-
> > >>>>>>>>>> live
> > >>>>>>>>>>
> > >>>>> CVS.
> > >>>>>
> > >>>>>>> It
> > >>>>>>>
> > >>>>>>>>>> will
> > >>>>>>>>>> work for saving text output. However, it will not parse
> > >>>>>>>>>>
> > >>>> anything
> > >>>>
> > >>>>>>> using
> > >>>>>>>
> > >>>>>>>>>> next_result (it will likely hang) and will not save XML
> > >>>>>>>>>>
> > >>>> format.
> > >>>>
> > >>>>> See
> > >>>>>
> > >>>>>>>>> these
> > >>>>>>>>>
> > >>>>>>>>>> bugs:
> > >>>>>>>>>>
> > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>>
> > >>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast
> > >>>>>>>>>>
> > >>>> and
> > >>>>
> > >>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been
> > >>>>>>>>>> checked in
> > >>>>>>>>>>
> > >>>>> yet
> > >>>>>
> > >>>>>>> so
> > >>>>>>>
> > >>>>>>>>> are
> > >>>>>>>>>
> > >>>>>>>>>> still not included in bioperl-live; they may be further
> > >>>>>>>>>>
> > >>>> modified
> > >>>>
> > >>>>>>> before
> > >>>>>>>
> > >>>>>>>>>> committing to CVS. If you're not worried about XML, you could
> > >>>>>>>>>>
> > >>>>> just
> > >>>>>
> > >>>>>>> try
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>> first fix, which is a change to SearchIO::blast.
> > >>>>>>>>>>
> > >>>>>>>>>> Nagesh, I remember you posting to the list a month ago
> > >>>>>>>>>> using a
> > >>>>>>>>>>
> > >>>>>>> script
> > >>>>>>>
> > >>>>>>>>>> which
> > >>>>>>>>>> had problems; the script you used saves the output but
> > >>>>>>>>>> doesn't
> > >>>>>>>>>>
> > >>>>>>> actually
> > >>>>>>>
> > >>>>>>>>>> parse it (i.e. you don't use next_result() to go through the
> > >>>>>>>>>>
> > >>>>> data).
> > >>>>>
> > >>>>>>> Is
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > >>>>>>>>>>
> > >>>> you
> > >>>>
> > >>>>>>> tried
> > >>>>>>>
> > >>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "-
> > >>>>>>>>>>
> > >>>>> readmethod
> > >>>>>
> > >>>>>>> =>
> > >>>>>>>
> > >>>>>>>>>> blast"
> > >>>>>>>>>> using your version of RemoteBlast and method next_result()?
> > >>>>>>>>>>
> > >>>> Like
> > >>>>
> > >>>>>>> below
> > >>>>>>>
> > >>>>>>>>>> (from
> > >>>>>>>>>> perldoc):
> > >>>>>>>>>>
> > >>>>>>>>>> while ( my @rids = $factory->each_rid ) {
> > >>>>>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>>>> my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>>>> if( !ref($rc) ) {
> > >>>>>>>>>> if( $rc < 0 ) {
> > >>>>>>>>>> $factory->remove_rid($rid);
> > >>>>>>>>>> }
> > >>>>>>>>>> print STDERR "." if ( $v > 0 );
> > >>>>>>>>>> sleep 5;
> > >>>>>>>>>> } else { # parsing
> > >>>>>>>>>> starts here
> > >>>>>>>>>> my $result = $rc->next_result(); # it should hang
> > >>>>>>>>>> here
> > >>>>>>>>>> #save the output
> > >>>>>>>>>> my $filename = $result->query_name()."\.out";
> > >>>>>>>>>> $factory->save_output($filename);
> > >>>>>>>>>> $factory->remove_rid($rid);
> > >>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>>>>> while ( my $hit = $result->next_hit ) {
> > >>>>>>>>>> next unless ( $v > 0);
> > >>>>>>>>>> print "\thit name is ", $hit->name, "\n";
> > >>>>>>>>>> while( my $hsp = $hit->next_hsp ) {
> > >>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> My script hanged if I used next_result() in any way prior to
> > >>>>>>>>>>
> > >>>> the
> > >>>>
> > >>>>>>> fixes.
> > >>>>>>>
> > >>>>>>>>> I
> > >>>>>>>>>
> > >>>>>>>>>> want to see how many others are having the same issues with
> > >>>>>>>>>>
> > >>>>> parsing
> > >>>>>
> > >>>>>>>>> using
> > >>>>>>>>>
> > >>>>>>>>>> the CVS version of bioperl-live.
> > >>>>>>>>>>
> > >>>>>>>>>> Christopher Fields
> > >>>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > >>>>>>>>>>>
> > >>>> l-
> > >>>>
> > >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > >>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM
> > >>>>>>>>>>> To: Huang Jian; bioperl-l
> > >>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi Huang,
> > >>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm
> > >>>>>>>>>>>
> > >>>>> works
> > >>>>>
> > >>>>>>> on
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>>> logic of checking the temporary file size to determine
> > >>>>>>>>>>>
> > >>>> whether
> > >>>>
> > >>>>> the
> > >>>>>
> > >>>>>>>>> Blast
> > >>>>>>>>>
> > >>>>>>>>>>> results are ready. This condition is not getting satisfied
> > >>>>>>>>>>>
> > >>>> may
> > >>>>
> > >>>>> be
> > >>>>>
> > >>>>>>> due
> > >>>>>>>
> > >>>>>>>>> to
> > >>>>>>>>>
> > >>>>>>>>>>> some changes brought about by NCBI. I had this problem
> > >>>>>>>>>>>
> > >>>>> recently
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>>>> figured out that the solution was to use the latest version
> > >>>>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>> has
> > >>>>>>>
> > >>>>>>>>>>> this problem fixed (does not use file size logic any more)
> > >>>>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>> is
> > >>>>>>>
> > >>>>>>>>> not
> > >>>>>>>>>
> > >>>>>>>>>>> yet included in the BioPerl package.
> > >>>>>>>>>>> Cheers
> > >>>>>>>>>>> Nagesh
> > >>>>>>>>>>>
> > >>>>>>>>>>> Huang Jian wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Dear Nagesh,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > >>>>>>>>>>>>
> > >>>>> you
> > >>>>>
> > >>>>>>> send
> > >>>>>>>
> > >>>>>>>>>>>> me. Now it works perfectly!!!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thank you!!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Huang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka"
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l"
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM
> > >>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > >>>>>>>>>>>>
> > >>>> net,
> > >>>>
> > >>>>> so
> > >>>>>
> > >>>>>>> still
> > >>>>>>>
> > >>>>>>>>>>>> via email
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi Huang,
> > >>>>>>>>>>>>> I see that you are submitting a sequence for a remote
> > >>>>>>>>>>>>>
> > >>>> blast
> > >>>>
> > >>>>>>> search.
> > >>>>>>>
> > >>>>>>>>>> Can
> > >>>>>>>>>>
> > >>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28
> > >>>>>>>>>>>>>
> > >>>>>>> (2005/12/09).
> > >>>>>>>
> > >>>>>>>>> If
> > >>>>>>>>>
> > >>>>>>>>>>>>> not I have attached it with this email, try to replace it
> > >>>>>>>>>>>>>
> > >>>>> with
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>>>> old
> > >>>>>>>>>>
> > >>>>>>>>>>>>> one which has a bug.
> > >>>>>>>>>>>>> Let me know if it works.
> > >>>>>>>>>>>>> Nagesh
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>> _______________________________________________
> > >>>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>>
> > >>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>> _______________________________________________
> > >>>>>>>>>
> > >>>>>>> Bioperl-l mailing list
> > >>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>
> > >>>>>>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >>
> > >
> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> > >
> > > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >



From Marc.Logghe at DEVGEN.com  Thu Feb 16 10:47:13 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Thu, 16 Feb 2006 16:47:13 +0100
Subject: [Bioperl-l] Primer maps?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>

Hi Mike,
Another route you might take is mapping your primers into
Bio::SeqFeature::Generic objects and add them to the seq object. Then
you dump the object into a rich sequence format like genbank and pass
that to EMBOSS's showseq application
Or you might do it completely with showseq. Here the only thing you need
is an annotation file containing the positions of the primers, followed
by any text (e.g. primer name).
Then you do:
showseq   -translate - -format 4
-annotation 
Have a look at http://emboss.sourceforge.net/apps/showseq.html for more
options
 
HTH,
Marc
 

Marc Logghe, PhD
Expert Scientist Bioinformatics
deVGen NV
Technologiepark 30
B - 9052 Ghent-Zwijnaarde
Tel. +32 9 324 24 83
Fax. +32 9 324 24 25
Web: www.devgen.com

 --- Disclaimer start ---
This e-mail and any attachments thereto may contain information which is
confidential and/or which is proprietary to the sender. Accordingly,
this e-mail and any attachments thereto, as well as any and all
information contained therein, are intended for the sole use of the
recipient or recipients designated above. Any use of this e-mail, of any
attachments thereto, of any and all information contained therein,
and/or of any part(s) thereof (including, without limitation, total or
partial reproduction, communication and/or distribution in any form) by
persons other than the designated recipient(s) is prohibited. If you
have received this e-mail in error, please notify the sender either by
telephone or by e-mail and delete the material from any computer.
Thank you for your cooperation.
--- Disclaimer end ---
  

 


________________________________

	From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne
	Sent: Wednesday, February 15, 2006 10:20 PM
	To: bioperl-l at lists.open-bio.org
	Subject: [Bioperl-l] Primer maps?
	
	
	Hello all --
	
	I'm having a devil of a time figuring out how to make
restriction maps using BioPerl.  What I'm going for is output similar to
GCG's map program, but instead of using a set of defined restriction
enzymes, I'd like to use a set of primers, to create a primer map rather
than a restriction map.  I do not need a table of restriction enzymes
that cut or don't cut (or primers that match or don't match, in this
case), but an honest-to-goodness map, something like:
	
	                                       FKP-5->
	                                             |
	
CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
	1921
---------+---------+---------+---------+---------+---------+ 1980
	
GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
	 
	a                         M  E  I  V  S  T  F  D  E  L  Q  D  Y
-
	
	I also need translations of orfs, but I can use GenBank files as
input to the program and thus the CDS translations are already there, so
I'm guessing that shouldn't be too hard....  How does one create such a
map using the BioPerl modules?
	
	There are intriguing indications out there that such a thing is
possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I
can't find a single example of code that creates such a basic,
bread-and-butter thing as a restriction map with orf translations.  The
documentation to these modules is fairly useless to me, consisting
mostly of internal methods and function prototypes.  Perhaps my skills
as a Perl programmer are to blame, but a clear example of how a map like
this is constructed would be a big help.
	
	Right now, I'm generating primer maps with system calls to
EMBOSS's remap, pointing it at a file of primer sequences rather than a
file of restriction enzyme sequences, but the results are less than
desired.  I'm considering trying to adapt tacg 4.1.0 or sequence
extractor 1.1 web-based code to my needs, but this seems like a lot of
work for an operation I suspect is possible in BioPerl.
	
	Any help greatly appreciated...
	
	Mike
	

	
---------------------------------------------------------------------
	 //=\   Michael J. Coyne                       phone: (617)
525-7820
	 \=//   Channing Laboratory                    FAX:   (617)
264-5193
	  //=\  EBRC, Room 617
	  \=//  221 Longwood Avenue
email:mcoyne at channing.harvard.edu
	   //=\ Boston, MA 02115                 mjcoyne at comcast.net
	   \=// 
	
---------------------------------------------------------------------
	




From sdavis2 at mail.nih.gov  Thu Feb 16 09:43:45 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 16 Feb 2006 09:43:45 -0500
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost>
Message-ID: 

Do you mean that you want to use Bio::Graphics to make a picture, or just
map your primers onto a sequence?

Sean



On 2/15/06 4:20 PM, "Michael Coyne"  wrote:

> Hello all --
> 
> I'm having a devil of a time figuring out how to make restriction maps using
> BioPerl.  What I'm going for is output similar to GCG's map program, but
> instead of using a set of defined restriction enzymes, I'd like to use a set
> of primers, to create a primer map rather than a restriction map.  I do not
> need a table of restriction enzymes that cut or don't cut (or primers that
> match or don't match, in this case), but an honest-to-goodness map, something
> like:
> 
>                                       FKP-5->
>                                             |
>     CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
> 1921 ---------+---------+---------+---------+---------+---------+ 1980
>     GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>  
> a                        M  E  I  V  S  T  F  D  E L  Q  D  Y   -
> 
> I also need translations of orfs, but I can use GenBank files as input to the
> program and thus the CDS translations are already there, so I'm guessing that
> shouldn't be too hard....  How does one create such a map using the BioPerl
> modules?
> 
> There are intriguing indications out there that such a thing is possible (e.g.
> the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find a single
> example of code that creates such a basic, bread-and-butter thing as a
> restriction map with orf translations.  The documentation to these modules is
> fairly useless to me, consisting mostly of internal methods and function
> prototypes.  Perhaps my skills as a Perl programmer are to blame, but a clear
> example of how a map like this is constructed would be a big help.
> 
> Right now, I'm generating primer maps with system calls to EMBOSS's remap,
> pointing it at a file of primer sequences rather than a file of restriction
> enzyme sequences, but the results are less than desired.  I'm considering
> trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my
> needs, but this seems like a lot of work for an operation I suspect is
> possible in BioPerl.
> 
> Any help greatly appreciated...
> 
> Mike
> 
> ---------------------------------------------------------------------
>  //=\   Michael J. Coyne                      phone: (617) 525-7820
>  \=//   Channing Laboratory                   FAX:   (617) 264-5193
>   //=\  EBRC, Room 617
>   \=//  221 Longwood Avenue       email:mcoyne at channing.harvard.edu
>    //=\ Boston, MA 02115                mjcoyne at comcast.net
>    \=// 
> ---------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Thu Feb 16 11:27:13 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 11:27:13 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

Harry,

I've long suspected, but never demonstrated, that the easiest way to do
something like this is through ENSEMBL, and Jason hinted at this as well. In
fact your question is something of a FAQ, and my previous responses always
included a plea to some anonymous ENSEMBL API expert, always unheeded. At
any rate, here is an example script I made:

#!/usr/bin/perl



use strict;

use lib "/Users/bosborne/ensembl/modules";

use DBI;

use Getopt::Long;

use Bio::EnsEMBL::DBSQL::DBAdaptor;


my $name;



GetOptions( "n=s" => \$name );



my $db = Bio::EnsEMBL::DBSQL::DBAdaptor->new(
-user   => "anonymous",

-dbname => "homo_sapiens_core_37_35j",

-host   => "ensembldb.ensembl.org",

-pass   => "",                 

-driver => 'mysql'

);



my $gene_adaptor = $db->get_GeneAdaptor;

my $slice_adaptor = $db->get_SliceAdaptor;



my @genes = @{$gene_adaptor->fetch_all_by_external_name($name)};



for my $gene (@genes) {

  for my $trans (@{$gene->get_all_Transcripts}) {

      my $seq = $slice_adaptor->fetch_by_region("chromosome",

             $trans->seq_region_name,

             $trans->start,

             $trans->end);


      print "\n",$seq->seq,"\n";

  }

}

There are some issues, the largest of which is that though this script
prints out big sequences it's completely untested! Another is that it makes
assumptions about transcripts, you should verify for yourself that ENSEMBL's
definition of transcript fits yours. Finally that
fetch_all_by_external_name() method does not seem to accept a second
argument, i.e. namespace. I found this surprising. Anyway, if more than one
gene is retrieved using some name or id you're in a quandary.

For more on this API see:

http://www.ensembl.org/info/software/core/core_tutorial.html

There are tons of modules and methods in this API, I've barely scratched the
surface here.


Brian O.




On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From heikki at sanbi.ac.za  Thu Feb 16 12:32:51 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 16 Feb 2006 19:32:51 +0200
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>
Message-ID: <200602161932.51552.heikki@sanbi.ac.za>

Mike,

Marc's suggestion is the best I've heard.

We really do not have any kind of pretty print functionality within BioPerl.
I guess there has not been a pressing need.  Bio::Graphics has filled in the 
need for sequence display.

I think Bio::Seq::PrettyPrint could be a great way to design prettyprinting in 
very modular way so that it can print out anything mapped to a sequence 
location. The EMBOSS showseq would be a great  help in there. A student 
project?

Would anyone be interested? 

   -Heikki




On Thursday 16 February 2006 17:47, Marc Logghe wrote:
> Hi Mike,
> Another route you might take is mapping your primers into
> Bio::SeqFeature::Generic objects and add them to the seq object. Then
> you dump the object into a rich sequence format like genbank and pass
> that to EMBOSS's showseq application
> Or you might do it completely with showseq. Here the only thing you need
> is an annotation file containing the positions of the primers, followed
> by any text (e.g. primer name).
> Then you do:
> showseq   -translate - -format 4
> -annotation 
> Have a look at http://emboss.sourceforge.net/apps/showseq.html for more
> options
>
> HTH,
> Marc
>
>
> Marc Logghe, PhD
> Expert Scientist Bioinformatics
> deVGen NV
> Technologiepark 30
> B - 9052 Ghent-Zwijnaarde
> Tel. +32 9 324 24 83
> Fax. +32 9 324 24 25
> Web: www.devgen.com
>
>  --- Disclaimer start ---
> This e-mail and any attachments thereto may contain information which is
> confidential and/or which is proprietary to the sender. Accordingly,
> this e-mail and any attachments thereto, as well as any and all
> information contained therein, are intended for the sole use of the
> recipient or recipients designated above. Any use of this e-mail, of any
> attachments thereto, of any and all information contained therein,
> and/or of any part(s) thereof (including, without limitation, total or
> partial reproduction, communication and/or distribution in any form) by
> persons other than the designated recipient(s) is prohibited. If you
> have received this e-mail in error, please notify the sender either by
> telephone or by e-mail and delete the material from any computer.
> Thank you for your cooperation.
> --- Disclaimer end ---
>
>
>
>
>
> ________________________________
>
> 	From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne
> 	Sent: Wednesday, February 15, 2006 10:20 PM
> 	To: bioperl-l at lists.open-bio.org
> 	Subject: [Bioperl-l] Primer maps?
>
>
> 	Hello all --
>
> 	I'm having a devil of a time figuring out how to make
> restriction maps using BioPerl.  What I'm going for is output similar to
> GCG's map program, but instead of using a set of defined restriction
> enzymes, I'd like to use a set of primers, to create a primer map rather
> than a restriction map.  I do not need a table of restriction enzymes
> that cut or don't cut (or primers that match or don't match, in this
> case), but an honest-to-goodness map, something like:
>
> 	                                       FKP-5->
>
>
> CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
> 	1921
> ---------+---------+---------+---------+---------+---------+ 1980
>
> GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>
> 	a                         M  E  I  V  S  T  F  D  E  L  Q  D  Y
> -
>
> 	I also need translations of orfs, but I can use GenBank files as
> input to the program and thus the CDS translations are already there, so
> I'm guessing that shouldn't be too hard....  How does one create such a
> map using the BioPerl modules?
>
> 	There are intriguing indications out there that such a thing is
> possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I
> can't find a single example of code that creates such a basic,
> bread-and-butter thing as a restriction map with orf translations.  The
> documentation to these modules is fairly useless to me, consisting
> mostly of internal methods and function prototypes.  Perhaps my skills
> as a Perl programmer are to blame, but a clear example of how a map like
> this is constructed would be a big help.
>
> 	Right now, I'm generating primer maps with system calls to
> EMBOSS's remap, pointing it at a file of primer sequences rather than a
> file of restriction enzyme sequences, but the results are less than
> desired.  I'm considering trying to adapt tacg 4.1.0 or sequence
> extractor 1.1 web-based code to my needs, but this seems like a lot of
> work for an operation I suspect is possible in BioPerl.
>
> 	Any help greatly appreciated...
>
> 	Mike
>
>
>
> ---------------------------------------------------------------------
> 	 //=\   Michael J. Coyne                       phone: (617)
> 525-7820
> 	 \=//   Channing Laboratory                    FAX:   (617)
> 264-5193
> 	  //=\  EBRC, Room 617
> 	  \=//  221 Longwood Avenue
> email:mcoyne at channing.harvard.edu
> 	   //=\ Boston, MA 02115                 mjcoyne at comcast.net
> 	   \=//
>
> ---------------------------------------------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From osborne1 at optonline.net  Thu Feb 16 12:59:37 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 12:59:37 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602160823.03534.hjm@tacgi.com>
Message-ID: 

Chris and Harry,

I'm writing a Wiki page on this, it's linked to the FAQ as Wiki is
complaining that the FAQ is getting too big. I'll fill in the ENSEMBL API
and Bio::DB::Fasta approaches, if you would comment on the BioPerl/eutils
approach at some point that would be superb:

http://bioperl.open-bio.org/wiki/Getting_Genomic_Sequences

Brian O.


On 2/16/06 11:23 AM, "Harry Mangalam"  wrote:

> Yes, I'm going to  try this 1st.  Also the pointer to the NCBI eutils page was
> helpful.  They describe the same thing and I think that API will give me what
> I need.  I'll post back to report.
> 
> Sorry for the delay in answering - this is a side project and as such is going
> slow.
> 
> Many thanks to you guys, especially Brian for the example code - much more
> than I had a right to expect.  Virtual Beers all round and real ones should
> we ever meet up.
> 
> Harry
> 
> 
> On Thursday 16 February 2006 04:52, Chris Fields wrote:
>> I think a method was recently implemented in Bio::DB::GenBank to
>> retrieve a segment of DNA given start and end coordinates in GenBank
>> format; that should contain the features you need.  I requested it
>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>> Would that help?
>> 
>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>> Harry,
>>> 
>>> It's not clear to me that NCBI's eutils offers this capability
>>> directly. You
>>> can probably download Entrez Gene entries and parse them for
>>> coordinates but
>>> I know of no way to remotely retrieve genomic sequences like this
>>> from NCBI
>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>> that some
>>> of us favor and to prove to myself that this is simple to do I wrote a
>>> script that I just added to examples/tools, it's called
>>> extract_genes.pl and
>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>> species to some dir, download Entrez Gene's gene2accession file,
>>> and run. It
>>> creates and stores a hash for lookups, it won't read gene2accession
>>> each
>>> time it runs.
>>> 
>>> Brian O.
>>> 
>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>> Hi Brian,
>>>> 
>>>> Thanks very much for the pointers and the speed of your reply and
>>>> apologies
>>>> for the speed of mine.
>>>> 
>>>> This looks good, but what I was looking for was a bioP approach
>>>> for hooking to
>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>> them.  In this
>>>> case, speed of retrieval is not critical and I'd rather not
>>>> download the
>>>> entirety of the sequences to a local disk to hack at them.
>>>> 
>>>> I've determined a screen-scraping approach to get them and could
>>>> script that,
>>>> but I thought that bioP had a method for using NCBI's external
>>>> API's, tho it
>>>> may be that my memory is faulty or the approach is no longer
>>>> supported due to
>>>> overload.
>>>> 
>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>> docs on them
>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>> which I
>>>> haven't started to excavate).
>>>> 
>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>> listening?
>>>> 
>>>> Harry
>>>> 
>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>> Harry,
>>>>> 
>>>>> Hope you're doing well. The approach could be based on
>>>>> Bio::DB::Fasta. So,
>>>>> from its documentation:
>>>>> 
>>>>>   use Bio::DB::Fasta;
>>>>> 
>>>>>   # create database from directory of fasta files
>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>> 
>>>>>   # simple access (for those without Bioperl)
>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>   my @ids     = $db->ids;
>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>> 
>>>>>   # Bioperl-style access
>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>> 
>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>   my $seq     = $obj->seq;
>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>> 
>>>>> Do you already have the offsets?
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>> still
>>>>>> can't find the answer to this.  Forgive me if I've missed something
>>>>>> obvious.
>>>>>> 
>>>>>> This should not be a novel request, but I've not found it
>>>>>> answered.  If
>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>> pointer to a
>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>> 
>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>> offset
>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>>>> common followup chore for some extra analysis from a gene
>>>>>> expression
>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>> the
>>>>>> sequence type to specify...?
>>>>>> 
>>>>>> 
>>>>>> TIA!
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign




From hjm at tacgi.com  Thu Feb 16 12:02:07 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 09:02:07 -0800
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost>
References: <6.2.0.14.0.20060215155422.01d44a98@localhost>
Message-ID: <200602160902.07383.hjm@tacgi.com>

A bit off the bioperl topic - if you must have bioperl, ignore this (or just 
system() wrap the command) -  but you can do exactly this mapping and in-line 
translation with a thing I wrote called tacg - you make a GCG-formatted file 
of primers ie for each pattern you need a line like:

   
;         Top                         Bottom
;Name    Offset Recognition Pattern   Offset    ! comments
primer1    0   tcgggywmkkgg               0    ! ...
primer2    0   gcttggctgaggag             0    !
 .
 .
 .
Obviously the offsets can be set to 0 for non REs.
There's no limit to the number of primer patterns (tho I think there's a 
compiled-in limit of 30 chars in the pattern - easily changed in header), no 
limit to amount of seq searched, handles degeneracies, searches at ~4Mbases/s 
on a 2G opteron (120 patterns).
 
Also does searching with errors (slowly) and regex's (at pcre speeds), and 
matrices.  Other neat stuff, too.

The output is sort of as you describe - replace the RE names with your primer 
labels and you'll have it.

6 frame xl with 3 letter abbrievs.

                  BsrGI    BsrGI AflII                      DraI
                   \        \     \                          \
    121   gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt    180
   3453   cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa   3512
              ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
1         ValCysIleCysThrLeuCysThrLeuLysThrTyrThrPheHisCysValTerIleIle
2          CysValPheValHisPheValHisLeuArgProThrHisPheIleValPheLysLeuLeu
3           ValTyrLeuTyrThrLeuTyrThrTerAspLeuHisIleSerLeuCysLeuAsnTyrTyr

4           HisIleGlnValSerGlnValSerLeuTerValValAsnTerGlnThrTerIleIleVal
5          ThrTyrLysTyrValLysTyrValTerArgSerCysMetGluAsnHisLysPheTerTer
6         HisThrAsnThrCysLysThrCysLysGlyLeuValCysLysMetThrAsnLeuAsnAsn

or 3 frames with 1 letter abbrievs

                   BsrGI    BsrGI AflII                      DraI
                   \        \     \                          \
    121   gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt    180
   3453   cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa   3512
              ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
1         V  C  I  C  T  L  C  T  L  K  T  Y  T  F  H  C  V  *  I  I
2          C  V  F  V  H  F  V  H  L  R  P  T  H  F  I  V  F  K  L  L
3           V  Y  L  Y  T  L  Y  T  *  D  L  H  I  S  L  C  L  N  Y  Y

read more at tacg.sf.net or reply to me for the latest docs and version - have 
to admit the sf site is a bit moldy.

hjm


On Wednesday 15 February 2006 13:20, Michael Coyne wrote:
>  Hello all --
>
>  I'm having a devil of a time figuring out how to make restriction maps
> using BioPerl.? What I'm going for is output similar to GCG's map program,
> but instead of using a set of defined restriction enzymes, I'd like to use
> a set of primers, to create a primer map rather than a restriction map.? I
> do not need a table of restriction enzymes that cut or don't cut (or
> primers that match or don't match, in this case), but an honest-to-goodness
> map, something like:
>
>   ?????????????????????????????????????? FKP-5->
>  ???????????????????????????????????????????? |
>  ???? CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
>  1921 ---------+---------+---------+---------+---------+---------+ 1980
>  ???? GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>  ?
>  a???????????????????????? M? E? I? V? S? T? F? D? E? L? Q? D? Y?? -
>
>  I also need translations of orfs, but I can use GenBank files as input to
> the program and thus the CDS translations are already there, so I'm
> guessing that shouldn't be too hard....? How does one create such a map
> using the BioPerl modules?
>
>  There are intriguing indications out there that such a thing is possible
> (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find
> a single example of code that creates such a basic, bread-and-butter thing
> as a restriction map with orf translations.? The documentation to these
> modules is fairly useless to me, consisting mostly of internal methods and
> function prototypes.? Perhaps my skills as a Perl programmer are to blame,
> but a clear example of how a map like this is constructed would be a big
> help.
>
>  Right now, I'm generating primer maps with system calls to EMBOSS's remap,
> pointing it at a file of primer sequences rather than a file of restriction
> enzyme sequences, but the results are less than desired.? I'm considering
> trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my
> needs, but this seems like a lot of work for an operation I suspect is
> possible in BioPerl.
>
>  Any help greatly appreciated...
>
>  Mike
>
>  ---------------------------------------------------------------------
>  ?//=\?? Michael J. Coyne?????????????????????? phone: (617) 525-7820
>  ?\=//?? Channing Laboratory??????????????????? FAX:?? (617) 264-5193
>  ? //=\? EBRC, Room 617
>  ? \=//? 221 Longwood Avenue??????? email:mcoyne at channing.harvard.edu
>  ?? //=\ Boston, MA 02115???????????????? mjcoyne at comcast.net
>  ?? \=//
>  ---------------------------------------------------------------------

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>



From hjm at tacgi.com  Thu Feb 16 11:23:02 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 08:23:02 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
	
Message-ID: <200602160823.03534.hjm@tacgi.com>

Yes, I'm going to  try this 1st.  Also the pointer to the NCBI eutils page was 
helpful.  They describe the same thing and I think that API will give me what 
I need.  I'll post back to report.  

Sorry for the delay in answering - this is a side project and as such is going 
slow.

Many thanks to you guys, especially Brian for the example code - much more 
than I had a right to expect.  Virtual Beers all round and real ones should 
we ever meet up.

Harry


On Thursday 16 February 2006 04:52, Chris Fields wrote:
> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need.  I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
>
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > Harry,
> >
> > It's not clear to me that NCBI's eutils offers this capability
> > directly. You
> > can probably download Entrez Gene entries and parse them for
> > coordinates but
> > I know of no way to remotely retrieve genomic sequences like this
> > from NCBI
> > (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > that some
> > of us favor and to prove to myself that this is simple to do I wrote a
> > script that I just added to examples/tools, it's called
> > extract_genes.pl and
> > it's based on Bio::DB::Fasta. Download the sequence files for a given
> > species to some dir, download Entrez Gene's gene2accession file,
> > and run. It
> > creates and stores a hash for lookups, it won't read gene2accession
> > each
> > time it runs.
> >
> > Brian O.
> >
> > On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> >> Hi Brian,
> >>
> >> Thanks very much for the pointers and the speed of your reply and
> >> apologies
> >> for the speed of mine.
> >>
> >> This looks good, but what I was looking for was a bioP approach
> >> for hooking to
> >> an API at NCBI or EBI so I could get this info and seqs from
> >> them.  In this
> >> case, speed of retrieval is not critical and I'd rather not
> >> download the
> >> entirety of the sequences to a local disk to hack at them.
> >>
> >> I've determined a screen-scraping approach to get them and could
> >> script that,
> >> but I thought that bioP had a method for using NCBI's external
> >> API's, tho it
> >> may be that my memory is faulty or the approach is no longer
> >> supported due to
> >> overload.
> >>
> >> Does NCBI make such APIs available anymore?  I searched a bit for
> >> docs on them
> >> but couldn't find anything (unless it's buried in the NCBI tookit,
> >> which I
> >> haven't started to excavate).
> >>
> >> Failing that, would SEALS provide such a service? Any PerlPinipeds
> >> listening?
> >>
> >> Harry
> >>
> >> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> >>> Harry,
> >>>
> >>> Hope you're doing well. The approach could be based on
> >>> Bio::DB::Fasta. So,
> >>> from its documentation:
> >>>
> >>>   use Bio::DB::Fasta;
> >>>
> >>>   # create database from directory of fasta files
> >>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>
> >>>   # simple access (for those without Bioperl)
> >>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> >>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> >>>   my @ids     = $db->ids;
> >>>   my $length   = $db->length('CHROMOSOME_I');
> >>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> >>>   my $header   = $db->header('CHROMOSOME_I');
> >>>
> >>>   # Bioperl-style access
> >>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>
> >>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> >>>   my $seq     = $obj->seq;
> >>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> >>>
> >>> Do you already have the offsets?
> >>>
> >>> Brian O.
> >>>
> >>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> >>>> Hi All,
> >>>>
> >>>> After perusing the tutorial and other docs for a an evening, I
> >>>> still
> >>>> can't find the answer to this.  Forgive me if I've missed something
> >>>> obvious.
> >>>>
> >>>> This should not be a novel request, but I've not found it
> >>>> answered.  If
> >>>> bioperl isn't the best way to do this, I'd be grateful to a
> >>>> pointer to a
> >>>> better way, especially if it includes an illuminating bit of code.
> >>>>
> >>>> The problem is to retrieve genomic sequences plus & minus some
> >>>> offset
> >>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> >>>> common followup chore for some extra analysis from a gene
> >>>> expression
> >>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> >>>> the
> >>>> sequence type to specify...?
> >>>>
> >>>>
> >>>> TIA!
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From cjfields at uiuc.edu  Thu Feb 16 16:37:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 15:37:25 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
Message-ID: <000301c63341$2e015d50$15327e82@pyrimidine>

As an update for those interested, I check on this today, feeding SearchIO
XML and text output for all NCBI's BLAST flavors.  Basically, all XML parses
fine.  All text output except blastn and tblastx works fine.  The last two
have the extra lines starting with 'Features in this part of subject
sequence:'.  I'll be checking into SearchIO::blast but don't know when I can
get around to posting a fix.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> Sent: Thursday, February 16, 2006 3:46 AM
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org; Chris Fields
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> version 1.28
> 
> Hi,
> 
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast-output.
> (see e.g. "Features flanking this part..." or "Features in this part
> ..."), example added.
> The blast.pm module starts looking for the hsp-alignement-information,
> but it dies when it hits this Feature-information.
> 
> Pieter
> 
> 
......







From osborne1 at optonline.net  Thu Feb 16 17:19:16 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 17:19:16 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: 
Message-ID: 

Chris,

Yes. The question now is where to easily get the coordinates.

Brian O.


On 2/16/06 7:52 AM, "Chris Fields"  wrote:

> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need.  I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
> 
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> 
>> Harry,
>> 
>> It's not clear to me that NCBI's eutils offers this capability
>> directly. You
>> can probably download Entrez Gene entries and parse them for
>> coordinates but
>> I know of no way to remotely retrieve genomic sequences like this
>> from NCBI
>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>> that some
>> of us favor and to prove to myself that this is simple to do I wrote a
>> script that I just added to examples/tools, it's called
>> extract_genes.pl and
>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>> species to some dir, download Entrez Gene's gene2accession file,
>> and run. It
>> creates and stores a hash for lookups, it won't read gene2accession
>> each
>> time it runs.
>> 
>> Brian O.
>> 
>> 
>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>> 
>>> Hi Brian,
>>> 
>>> Thanks very much for the pointers and the speed of your reply and
>>> apologies
>>> for the speed of mine.
>>> 
>>> This looks good, but what I was looking for was a bioP approach
>>> for hooking to
>>> an API at NCBI or EBI so I could get this info and seqs from
>>> them.  In this
>>> case, speed of retrieval is not critical and I'd rather not
>>> download the
>>> entirety of the sequences to a local disk to hack at them.
>>> 
>>> I've determined a screen-scraping approach to get them and could
>>> script that,
>>> but I thought that bioP had a method for using NCBI's external
>>> API's, tho it
>>> may be that my memory is faulty or the approach is no longer
>>> supported due to
>>> overload.
>>> 
>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>> docs on them
>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>> which I
>>> haven't started to excavate).
>>> 
>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>> listening?
>>> 
>>> Harry
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>> Harry,
>>>> 
>>>> Hope you're doing well. The approach could be based on
>>>> Bio::DB::Fasta. So,
>>>> from its documentation:
>>>> 
>>>>   use Bio::DB::Fasta;
>>>> 
>>>>   # create database from directory of fasta files
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   # simple access (for those without Bioperl)
>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>   my @ids     = $db->ids;
>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>> 
>>>>   # Bioperl-style access
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>   my $seq     = $obj->seq;
>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>> 
>>>> Do you already have the offsets?
>>>> 
>>>> Brian O.
>>>> 
>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>> Hi All,
>>>>> 
>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>> still
>>>>> can't find the answer to this.  Forgive me if I've missed something
>>>>> obvious.
>>>>> 
>>>>> This should not be a novel request, but I've not found it
>>>>> answered.  If
>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>> pointer to a
>>>>> better way, especially if it includes an illuminating bit of code.
>>>>> 
>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>> offset
>>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>>> common followup chore for some extra analysis from a gene
>>>>> expression
>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>> the
>>>>> sequence type to specify...?
>>>>> 
>>>>> 
>>>>> TIA!
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Thu Feb 16 17:29:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 16:29:15 -0600
Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO
	text parsing?
Message-ID: <000001c63348$6b8136d0$15327e82@pyrimidine>

I'm floating this to see what people think...

I'm beginning to wonder, especially when I'm wading through the
regex/parsing nightmare in SearchIO::blast, if we should either require a
minimal BLAST version number for parsing to work in SearchIO::blast.  I
could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a
warning if the blast version number is below a minimal version, so at least
people will know what the problem is (not us!).

The regexes are really piling up, and the latest changes in blastn and
tblastx will require adding a few more.  I also think that this would help
remind everybody running the latest Bioperl that there are also newer
versions of BLAST.  My current thought is to get it working for the latest
text output from NCBI, check it against the last version of BLAST (v.
2.2.12, which, luckily, blastcl3 generates), and not worry too much about
older ones.

Any thoughts on this?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From cjfields at uiuc.edu  Thu Feb 16 17:45:52 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 16:45:52 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
Message-ID: <000101c6334a$bd80a900$15327e82@pyrimidine>

If I know the start, end, and strand info for a list of features (personal
preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
up), couldn't I try pulling out the surrounding region?  My thought is this,
though I haven't coded it yet:

1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
(array of hashes) based off what I get from RNAMotif objects.
2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
in there somewhere to not tick off the NCBI curators.

Reason I'm interested in this is b/c I want to know where the RNA motif is
in context to surrounding features. If it is very close to a coding region,
then the motif likely indicates translational regulation.  Further away may
indicate transcriptional termination or another mechanism.

The files returned should have the features included as long as they are in
the full length GenBank record.  I tried it out using the web form but not
through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
page.  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, February 16, 2006 4:19 PM
> To: Chris Fields
> Cc: Harry Mangalam; bioperl-l
> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names or
> GeneIDs
> 
> Chris,
> 
> Yes. The question now is where to easily get the coordinates.
> 
> Brian O.
> 
> 
> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> 
> > I think a method was recently implemented in Bio::DB::GenBank to
> > retrieve a segment of DNA given start and end coordinates in GenBank
> > format; that should contain the features you need.  I requested it
> > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > Would that help?
> >
> > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> >
> >> Harry,
> >>
> >> It's not clear to me that NCBI's eutils offers this capability
> >> directly. You
> >> can probably download Entrez Gene entries and parse them for
> >> coordinates but
> >> I know of no way to remotely retrieve genomic sequences like this
> >> from NCBI
> >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> >> that some
> >> of us favor and to prove to myself that this is simple to do I wrote a
> >> script that I just added to examples/tools, it's called
> >> extract_genes.pl and
> >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> >> species to some dir, download Entrez Gene's gene2accession file,
> >> and run. It
> >> creates and stores a hash for lookups, it won't read gene2accession
> >> each
> >> time it runs.
> >>
> >> Brian O.
> >>
> >>
> >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> >>
> >>> Hi Brian,
> >>>
> >>> Thanks very much for the pointers and the speed of your reply and
> >>> apologies
> >>> for the speed of mine.
> >>>
> >>> This looks good, but what I was looking for was a bioP approach
> >>> for hooking to
> >>> an API at NCBI or EBI so I could get this info and seqs from
> >>> them.  In this
> >>> case, speed of retrieval is not critical and I'd rather not
> >>> download the
> >>> entirety of the sequences to a local disk to hack at them.
> >>>
> >>> I've determined a screen-scraping approach to get them and could
> >>> script that,
> >>> but I thought that bioP had a method for using NCBI's external
> >>> API's, tho it
> >>> may be that my memory is faulty or the approach is no longer
> >>> supported due to
> >>> overload.
> >>>
> >>> Does NCBI make such APIs available anymore?  I searched a bit for
> >>> docs on them
> >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> >>> which I
> >>> haven't started to excavate).
> >>>
> >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> >>> listening?
> >>>
> >>> Harry
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> >>>> Harry,
> >>>>
> >>>> Hope you're doing well. The approach could be based on
> >>>> Bio::DB::Fasta. So,
> >>>> from its documentation:
> >>>>
> >>>>   use Bio::DB::Fasta;
> >>>>
> >>>>   # create database from directory of fasta files
> >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>>
> >>>>   # simple access (for those without Bioperl)
> >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> >>>>   my @ids     = $db->ids;
> >>>>   my $length   = $db->length('CHROMOSOME_I');
> >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> >>>>   my $header   = $db->header('CHROMOSOME_I');
> >>>>
> >>>>   # Bioperl-style access
> >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>>
> >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> >>>>   my $seq     = $obj->seq;
> >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> >>>>
> >>>> Do you already have the offsets?
> >>>>
> >>>> Brian O.
> >>>>
> >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> After perusing the tutorial and other docs for a an evening, I
> >>>>> still
> >>>>> can't find the answer to this.  Forgive me if I've missed something
> >>>>> obvious.
> >>>>>
> >>>>> This should not be a novel request, but I've not found it
> >>>>> answered.  If
> >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> >>>>> pointer to a
> >>>>> better way, especially if it includes an illuminating bit of code.
> >>>>>
> >>>>> The problem is to retrieve genomic sequences plus & minus some
> >>>>> offset
> >>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> >>>>> common followup chore for some extra analysis from a gene
> >>>>> expression
> >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> >>>>> the
> >>>>> sequence type to specify...?
> >>>>>
> >>>>>
> >>>>> TIA!
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l




From hjm at tacgi.com  Thu Feb 16 18:10:59 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 15:10:59 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine>
References: <000101c6334a$bd80a900$15327e82@pyrimidine>
Message-ID: <200602161510.59679.hjm@tacgi.com>

This is essentially what I want to do and my [only in pseudocode] approach is 
basically what you describe, except that currently I only have HUGO 
descriptors, not Genbank UIDs.  If you know of an index that lists both, that 
would be the entire shot.

I'm also interested in tracking transcriptional control elements and 
cross-correlating & why I wrote the 'rules' chunk of the recently 
(self-promoted) tacg.

Best
Harry


On Thursday 16 February 2006 14:45, Chris Fields wrote:
> If I know the start, end, and strand info for a list of features (personal
> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
> up), couldn't I try pulling out the surrounding region?  My thought is
> this, though I haven't coded it yet:
>
> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
> (array of hashes) based off what I get from RNAMotif objects.
> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
> in there somewhere to not tick off the NCBI curators.
>
> Reason I'm interested in this is b/c I want to know where the RNA motif is
> in context to surrounding features. If it is very close to a coding region,
> then the motif likely indicates translational regulation.  Further away may
> indicate transcriptional termination or another mechanism.
>
> The files returned should have the features included as long as they are in
> the full length GenBank record.  I tried it out using the web form but not
> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
> page.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > Sent: Thursday, February 16, 2006 4:19 PM
> > To: Chris Fields
> > Cc: Harry Mangalam; bioperl-l
> > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> > or GeneIDs
> >
> > Chris,
> >
> > Yes. The question now is where to easily get the coordinates.
> >
> > Brian O.
> >
> > On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> > > I think a method was recently implemented in Bio::DB::GenBank to
> > > retrieve a segment of DNA given start and end coordinates in GenBank
> > > format; that should contain the features you need.  I requested it
> > > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > > Would that help?
> > >
> > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > >> Harry,
> > >>
> > >> It's not clear to me that NCBI's eutils offers this capability
> > >> directly. You
> > >> can probably download Entrez Gene entries and parse them for
> > >> coordinates but
> > >> I know of no way to remotely retrieve genomic sequences like this
> > >> from NCBI
> > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > >> that some
> > >> of us favor and to prove to myself that this is simple to do I wrote a
> > >> script that I just added to examples/tools, it's called
> > >> extract_genes.pl and
> > >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> > >> species to some dir, download Entrez Gene's gene2accession file,
> > >> and run. It
> > >> creates and stores a hash for lookups, it won't read gene2accession
> > >> each
> > >> time it runs.
> > >>
> > >> Brian O.
> > >>
> > >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> > >>> Hi Brian,
> > >>>
> > >>> Thanks very much for the pointers and the speed of your reply and
> > >>> apologies
> > >>> for the speed of mine.
> > >>>
> > >>> This looks good, but what I was looking for was a bioP approach
> > >>> for hooking to
> > >>> an API at NCBI or EBI so I could get this info and seqs from
> > >>> them.  In this
> > >>> case, speed of retrieval is not critical and I'd rather not
> > >>> download the
> > >>> entirety of the sequences to a local disk to hack at them.
> > >>>
> > >>> I've determined a screen-scraping approach to get them and could
> > >>> script that,
> > >>> but I thought that bioP had a method for using NCBI's external
> > >>> API's, tho it
> > >>> may be that my memory is faulty or the approach is no longer
> > >>> supported due to
> > >>> overload.
> > >>>
> > >>> Does NCBI make such APIs available anymore?  I searched a bit for
> > >>> docs on them
> > >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> > >>> which I
> > >>> haven't started to excavate).
> > >>>
> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> > >>> listening?
> > >>>
> > >>> Harry
> > >>>
> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> > >>>> Harry,
> > >>>>
> > >>>> Hope you're doing well. The approach could be based on
> > >>>> Bio::DB::Fasta. So,
> > >>>> from its documentation:
> > >>>>
> > >>>>   use Bio::DB::Fasta;
> > >>>>
> > >>>>   # create database from directory of fasta files
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   # simple access (for those without Bioperl)
> > >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> > >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> > >>>>   my @ids     = $db->ids;
> > >>>>   my $length   = $db->length('CHROMOSOME_I');
> > >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> > >>>>   my $header   = $db->header('CHROMOSOME_I');
> > >>>>
> > >>>>   # Bioperl-style access
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> > >>>>   my $seq     = $obj->seq;
> > >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> > >>>>
> > >>>> Do you already have the offsets?
> > >>>>
> > >>>> Brian O.
> > >>>>
> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > >>>>> Hi All,
> > >>>>>
> > >>>>> After perusing the tutorial and other docs for a an evening, I
> > >>>>> still
> > >>>>> can't find the answer to this.  Forgive me if I've missed something
> > >>>>> obvious.
> > >>>>>
> > >>>>> This should not be a novel request, but I've not found it
> > >>>>> answered.  If
> > >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> > >>>>> pointer to a
> > >>>>> better way, especially if it includes an illuminating bit of code.
> > >>>>>
> > >>>>> The problem is to retrieve genomic sequences plus & minus some
> > >>>>> offset
> > >>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> > >>>>> common followup chore for some extra analysis from a gene
> > >>>>> expression
> > >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> > >>>>> the
> > >>>>> sequence type to specify...?
> > >>>>>
> > >>>>>
> > >>>>> TIA!
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From anst at kvl.dk  Fri Feb 17 04:18:18 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Fri, 17 Feb 2006 10:18:18 +0100
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F45FE60200009B00000ED6@gwia.kvl.dk>
References: <43F45FE60200009B00000ED6@gwia.kvl.dk>
Message-ID: <43F5A2EA0200009B00000F45@gwia.kvl.dk>



>>>Anders Stegmann  02/16/06 11:20 am >>>
Hi!

I am blasting a protein seq (query) against an identical seq with a
deletion of Aa nr 61 (subject).
Then I print out the type of nomatch Aa and its position.
The nomatch for the query seq is Aa G at position 61, which is correct.
The nomatch for the subject seq is V at position 60, which is definitely
not correct!?

Is this a bug?

testblast2.pl is the program to run

Q0045 is the query seq.

Q0045del61 is the subject seq (it has to be formated: formatdb -i
Q0045del61 -p T -o F).

Regards Anders.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045del61
Type: application/octet-stream
Size: 872 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testblast2.pl
Type: application/octet-stream
Size: 6109 bytes
Desc: not available
URL: 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From saldroubi at yahoo.com  Fri Feb 17 12:49:40 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Fri, 17 Feb 2006 09:49:40 -0800 (PST)
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <43EAAEEF.3000304@infotech.monash.edu.au>
Message-ID: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>


Torsten and all,
 
 I don't think this will work for me for it only generates statistics for a single sequence.  What I need is a count matrix for each position for a number of DNA sequences.  In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide.
 
 For example if I pass an array of sequences say: ATC,CCC,TTT
 then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this:
 
 
                 1    2   3
      A        1    0    0
      C        1    1    2
      T        1    2    1     
      G        0    0    0
 
 Any idea of this is already built somewhere in bioperl?
 
 Thank you.
 
 
 Torsten Seemann  wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings?
>   Please excuse my lack of knowledge as I am a new comer to bioinformatics.

Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
example similar to what you want to do:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html

--Torsten Seemann




Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From muratem at eng.uah.edu  Fri Feb 17 12:45:30 2006
From: muratem at eng.uah.edu (Mike Muratet)
Date: Fri, 17 Feb 2006 11:45:30 -0600 (CST)
Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO
 text parsing?
In-Reply-To: <000001c63348$6b8136d0$15327e82@pyrimidine>
References: <000001c63348$6b8136d0$15327e82@pyrimidine>
Message-ID: 



On Thu, 16 Feb 2006, Chris Fields wrote:

> I'm floating this to see what people think...
>
> I'm beginning to wonder, especially when I'm wading through the
> regex/parsing nightmare in SearchIO::blast, if we should either require a
> minimal BLAST version number for parsing to work in SearchIO::blast.  I
> could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a
> warning if the blast version number is below a minimal version, so at least
> people will know what the problem is (not us!).
>
> The regexes are really piling up, and the latest changes in blastn and
> tblastx will require adding a few more.  I also think that this would help
> remind everybody running the latest Bioperl that there are also newer
> versions of BLAST.  My current thought is to get it working for the latest
> text output from NCBI, check it against the last version of BLAST (v.
> 2.2.12, which, luckily, blastcl3 generates), and not worry too much about
> older ones.
>
> Any thoughts on this?
>

Chris

I could live with it. I think most of the world runs on NCBI or WUBLAST 
and it's easy to download/update either of those.

Thanks for the effort. I use SearchIO a lot.

Mike


> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From MEC at stowers-institute.org  Fri Feb 17 13:15:53 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 17 Feb 2006 12:15:53 -0600
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: 

http://forkhead.cgb.ki.se/TFBS/ provides ability to generate position
frequency matrix from list of (presumaby aligned) sequences as follows:

#!/usr/bin/env perl	
use  TFBS::PatternGen::SimplePFM;
my @sequences = <>;
chomp @sequences;
print
TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences)->pattern->rawpr
int;
exit 0;

The output when run on your example input shows that the order the
nucleotides is not the same as you expect (it is alphbetical):

1 0 0
1 1 2
0 0 0
1 2 1

Good luck,

TFBS installation requires signifigant dependencies, including bioperl
and PDL.

Malcolm Cook 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sam 
>Al-Droubi
>Sent: Friday, February 17, 2006 11:50 AM
>To: Torsten Seemann
>Cc: BioPerl list
>Subject: Re: [Bioperl-l] Count or weight matrix in bioperl?
>
>
>Torsten and all,
> 
> I don't think this will work for me for it only generates 
>statistics for a single sequence.  What I need is a count 
>matrix for each position for a number of DNA sequences.  In 
>other words, if I pass there 3 sequences to this function then 
>it returns the count for each postion for each nucleotide.
> 
> For example if I pass an array of sequences say: ATC,CCC,TTT
> then I should get a matrix back that will have count for 
>postion 1,2,3 for each A,C,T, or G like this:
> 
> 
>                 1    2   3
>      A        1    0    0
>      C        1    1    2
>      T        1    2    1     
>      G        0    0    0
> 
> Any idea of this is already built somewhere in bioperl?
> 
> Thank you.
> 
> 
> Torsten Seemann  
>wrote:> Say I have an array of nucleotide sequences of of 
>length N. I want to calculate the count matrix (weight 
>matrix). That is for each position 1..N, I want to know how 
>many As, Cs ,Ts and Gs there are. Is the code to do this 
>already written in bioperl to build this matrix if I pass it 
>those strings?
>>   Please excuse my lack of knowledge as I am a new comer to 
>bioinformatics.
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation 
>even has an 
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/Seq
>Stats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely, 
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From jason.stajich at duke.edu  Fri Feb 17 14:01:45 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 17 Feb 2006 14:01:45 -0500
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk>
References: <43F45FE60200009B00000ED6@gwia.kvl.dk>
	<43F5A2EA0200009B00000F45@gwia.kvl.dk>
Message-ID: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu>

In case people on the list think that by my speaking up about  
question means they should ignore it...

Hopefully someone else can help debug this - I really don't have time  
I'm afraid.

-jason


On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote:

>
>
>>>> Anders Stegmann  02/16/06 11:20 am >>>
> Hi!
>
> I am blasting a protein seq (query) against an identical seq with a
> deletion of Aa nr 61 (subject).
> Then I print out the type of nomatch Aa and its position.
> The nomatch for the query seq is Aa G at position 61, which is  
> correct.
> The nomatch for the subject seq is V at position 60, which is  
> definitely
> not correct!?
>
> Is this a bug?
>
> testblast2.pl is the program to run
>
> Q0045 is the query seq.
>
> Q0045del61 is the subject seq (it has to be formated: formatdb -i
> Q0045del61 -p T -o F).
>
> Regards Anders.
>
>
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Fri Feb 17 14:17:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 17 Feb 2006 13:17:32 -0600
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu>
Message-ID: <000001c633f6$cd391740$15327e82@pyrimidine>

No, haven't ignored it.  Just been busy going through SearchIO::blast again
(I've perltidy'd it) since BLASTN and TBLASTX output (v2.2.13) don't work;
looks like all others should.  Trying to fix one problem at a time.  I'll
look at this next.  Don't worry about it.  ;>

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Friday, February 17, 2006 1:02 PM
> To: Anders Stegmann
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] another searchIO bug? with blast report
> 
> In case people on the list think that by my speaking up about
> question means they should ignore it...
> 
> Hopefully someone else can help debug this - I really don't have time
> I'm afraid.
> 
> -jason
> 
> 
> On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote:
> 
> >
> >
> >>>> Anders Stegmann  02/16/06 11:20 am >>>
> > Hi!
> >
> > I am blasting a protein seq (query) against an identical seq with a
> > deletion of Aa nr 61 (subject).
> > Then I print out the type of nomatch Aa and its position.
> > The nomatch for the query seq is Aa G at position 61, which is
> > correct.
> > The nomatch for the subject seq is V at position 60, which is
> > definitely
> > not correct!?
> >
> > Is this a bug?
> >
> > testblast2.pl is the program to run
> >
> > Q0045 is the query seq.
> >
> > Q0045del61 is the subject seq (it has to be formated: formatdb -i
> > Q0045del61 -p T -o F).
> >
> > Regards Anders.
> >
> >
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From skirov at utk.edu  Fri Feb 17 13:09:00 2006
From: skirov at utk.edu (Stefan Kirov)
Date: Fri, 17 Feb 2006 13:09:00 -0500
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>
References: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>
Message-ID: <43F6113C.6070501@utk.edu>

If you have bioperl-live:
write a file:
 >seqgroup1
ATC
CCC
TTT

my $mio=new Bio::Matrix::PSM::IO(-format=>'masta',-file=>$filename);
while (my $matrix=$mio->next_matrix) {#Returns 
Bio::Matrix::PSM::SiteMatrix object
#do something with the matrix...
print $matrix->consensus,"\n";
}

This is not going to give you the raw counts, but it will give you the 
fequency for each pos/letter. see the docs for Bio::Matrix::PSM::SiteMatrix
Hope this helps
Stefan

Sam Al-Droubi wrote:

>Torsten and all,
> 
> I don't think this will work for me for it only generates statistics for a single sequence.  What I need is a count matrix for each position for a number of DNA sequences.  In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide.
> 
> For example if I pass an array of sequences say: ATC,CCC,TTT
> then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this:
> 
> 
>                 1    2   3
>      A        1    0    0
>      C        1    1    2
>      T        1    2    1     
>      G        0    0    0
> 
> Any idea of this is already built somewhere in bioperl?
> 
> Thank you.
> 
> 
> Torsten Seemann  wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings?
>  
>
>>  Please excuse my lack of knowledge as I am a new comer to bioinformatics.
>>    
>>
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely, 
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>  
>



From cjfields at uiuc.edu  Fri Feb 17 18:02:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 17 Feb 2006 17:02:02 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
	orGeneIDs
In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine>
Message-ID: <000601c63416$2a14aa00$15327e82@pyrimidine>

Brian,

I added some sample code to the page.  See what you think.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 16, 2006 4:46 PM
> To: 'Brian Osborne'
> Cc: 'Harry Mangalam'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> orGeneIDs
> 
> If I know the start, end, and strand info for a list of features (personal
> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
> up), couldn't I try pulling out the surrounding region?  My thought is
> this,
> though I haven't coded it yet:
> 
> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
> (array of hashes) based off what I get from RNAMotif objects.
> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
> in there somewhere to not tick off the NCBI curators.
> 
> Reason I'm interested in this is b/c I want to know where the RNA motif is
> in context to surrounding features. If it is very close to a coding
> region,
> then the motif likely indicates translational regulation.  Further away
> may
> indicate transcriptional termination or another mechanism.
> 
> The files returned should have the features included as long as they are
> in
> the full length GenBank record.  I tried it out using the web form but not
> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
> page.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > Sent: Thursday, February 16, 2006 4:19 PM
> > To: Chris Fields
> > Cc: Harry Mangalam; bioperl-l
> > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> or
> > GeneIDs
> >
> > Chris,
> >
> > Yes. The question now is where to easily get the coordinates.
> >
> > Brian O.
> >
> >
> > On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> >
> > > I think a method was recently implemented in Bio::DB::GenBank to
> > > retrieve a segment of DNA given start and end coordinates in GenBank
> > > format; that should contain the features you need.  I requested it
> > > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > > Would that help?
> > >
> > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > >
> > >> Harry,
> > >>
> > >> It's not clear to me that NCBI's eutils offers this capability
> > >> directly. You
> > >> can probably download Entrez Gene entries and parse them for
> > >> coordinates but
> > >> I know of no way to remotely retrieve genomic sequences like this
> > >> from NCBI
> > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > >> that some
> > >> of us favor and to prove to myself that this is simple to do I wrote
> a
> > >> script that I just added to examples/tools, it's called
> > >> extract_genes.pl and
> > >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> > >> species to some dir, download Entrez Gene's gene2accession file,
> > >> and run. It
> > >> creates and stores a hash for lookups, it won't read gene2accession
> > >> each
> > >> time it runs.
> > >>
> > >> Brian O.
> > >>
> > >>
> > >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> > >>
> > >>> Hi Brian,
> > >>>
> > >>> Thanks very much for the pointers and the speed of your reply and
> > >>> apologies
> > >>> for the speed of mine.
> > >>>
> > >>> This looks good, but what I was looking for was a bioP approach
> > >>> for hooking to
> > >>> an API at NCBI or EBI so I could get this info and seqs from
> > >>> them.  In this
> > >>> case, speed of retrieval is not critical and I'd rather not
> > >>> download the
> > >>> entirety of the sequences to a local disk to hack at them.
> > >>>
> > >>> I've determined a screen-scraping approach to get them and could
> > >>> script that,
> > >>> but I thought that bioP had a method for using NCBI's external
> > >>> API's, tho it
> > >>> may be that my memory is faulty or the approach is no longer
> > >>> supported due to
> > >>> overload.
> > >>>
> > >>> Does NCBI make such APIs available anymore?  I searched a bit for
> > >>> docs on them
> > >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> > >>> which I
> > >>> haven't started to excavate).
> > >>>
> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> > >>> listening?
> > >>>
> > >>> Harry
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> > >>>> Harry,
> > >>>>
> > >>>> Hope you're doing well. The approach could be based on
> > >>>> Bio::DB::Fasta. So,
> > >>>> from its documentation:
> > >>>>
> > >>>>   use Bio::DB::Fasta;
> > >>>>
> > >>>>   # create database from directory of fasta files
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   # simple access (for those without Bioperl)
> > >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> > >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> > >>>>   my @ids     = $db->ids;
> > >>>>   my $length   = $db->length('CHROMOSOME_I');
> > >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> > >>>>   my $header   = $db->header('CHROMOSOME_I');
> > >>>>
> > >>>>   # Bioperl-style access
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> > >>>>   my $seq     = $obj->seq;
> > >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> > >>>>
> > >>>> Do you already have the offsets?
> > >>>>
> > >>>> Brian O.
> > >>>>
> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > >>>>> Hi All,
> > >>>>>
> > >>>>> After perusing the tutorial and other docs for a an evening, I
> > >>>>> still
> > >>>>> can't find the answer to this.  Forgive me if I've missed
> something
> > >>>>> obvious.
> > >>>>>
> > >>>>> This should not be a novel request, but I've not found it
> > >>>>> answered.  If
> > >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> > >>>>> pointer to a
> > >>>>> better way, especially if it includes an illuminating bit of code.
> > >>>>>
> > >>>>> The problem is to retrieve genomic sequences plus & minus some
> > >>>>> offset
> > >>>>> from a locus determined by HUGO keyword or GeneID.  This would be
> a
> > >>>>> common followup chore for some extra analysis from a gene
> > >>>>> expression
> > >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> > >>>>> the
> > >>>>> sequence type to specify...?
> > >>>>>
> > >>>>>
> > >>>>> TIA!
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From osborne1 at optonline.net  Fri Feb 17 23:01:14 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 17 Feb 2006 23:01:14 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
 orGeneIDs
In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine>
Message-ID: 

Chris,

That's nice. Now what I'm puzzling over is how to get the genomic
coordinates given an id, like a Gene id. The raw query is something like:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&rettyp
e=xml

This is _something_ like the queries used within Bio::DB::Query::GenBank,
but not exactly. Now taking a look at how the text returned is transformed
into objects...

Brian O.


On 2/17/06 6:02 PM, "Chris Fields"  wrote:

> Brian,
> 
> I added some sample code to the page.  See what you think.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 16, 2006 4:46 PM
>> To: 'Brian Osborne'
>> Cc: 'Harry Mangalam'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> orGeneIDs
>> 
>> If I know the start, end, and strand info for a list of features (personal
>> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
>> up), couldn't I try pulling out the surrounding region?  My thought is
>> this,
>> though I haven't coded it yet:
>> 
>> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
>> (array of hashes) based off what I get from RNAMotif objects.
>> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
>> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
>> in there somewhere to not tick off the NCBI curators.
>> 
>> Reason I'm interested in this is b/c I want to know where the RNA motif is
>> in context to surrounding features. If it is very close to a coding
>> region,
>> then the motif likely indicates translational regulation.  Further away
>> may
>> indicate transcriptional termination or another mechanism.
>> 
>> The files returned should have the features included as long as they are
>> in
>> the full length GenBank record.  I tried it out using the web form but not
>> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
>> page.
>> 
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>>> -----Original Message-----
>>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>>> Sent: Thursday, February 16, 2006 4:19 PM
>>> To: Chris Fields
>>> Cc: Harry Mangalam; bioperl-l
>>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> or
>>> GeneIDs
>>> 
>>> Chris,
>>> 
>>> Yes. The question now is where to easily get the coordinates.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
>>> 
>>>> I think a method was recently implemented in Bio::DB::GenBank to
>>>> retrieve a segment of DNA given start and end coordinates in GenBank
>>>> format; that should contain the features you need.  I requested it
>>>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>>>> Would that help?
>>>> 
>>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>>> 
>>>>> Harry,
>>>>> 
>>>>> It's not clear to me that NCBI's eutils offers this capability
>>>>> directly. You
>>>>> can probably download Entrez Gene entries and parse them for
>>>>> coordinates but
>>>>> I know of no way to remotely retrieve genomic sequences like this
>>>>> from NCBI
>>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>>>> that some
>>>>> of us favor and to prove to myself that this is simple to do I wrote
>> a
>>>>> script that I just added to examples/tools, it's called
>>>>> extract_genes.pl and
>>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>>>> species to some dir, download Entrez Gene's gene2accession file,
>>>>> and run. It
>>>>> creates and stores a hash for lookups, it won't read gene2accession
>>>>> each
>>>>> time it runs.
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> 
>>>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>>> 
>>>>>> Hi Brian,
>>>>>> 
>>>>>> Thanks very much for the pointers and the speed of your reply and
>>>>>> apologies
>>>>>> for the speed of mine.
>>>>>> 
>>>>>> This looks good, but what I was looking for was a bioP approach
>>>>>> for hooking to
>>>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>>>> them.  In this
>>>>>> case, speed of retrieval is not critical and I'd rather not
>>>>>> download the
>>>>>> entirety of the sequences to a local disk to hack at them.
>>>>>> 
>>>>>> I've determined a screen-scraping approach to get them and could
>>>>>> script that,
>>>>>> but I thought that bioP had a method for using NCBI's external
>>>>>> API's, tho it
>>>>>> may be that my memory is faulty or the approach is no longer
>>>>>> supported due to
>>>>>> overload.
>>>>>> 
>>>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>>>> docs on them
>>>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>>>> which I
>>>>>> haven't started to excavate).
>>>>>> 
>>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>>>> listening?
>>>>>> 
>>>>>> Harry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>>>> Harry,
>>>>>>> 
>>>>>>> Hope you're doing well. The approach could be based on
>>>>>>> Bio::DB::Fasta. So,
>>>>>>> from its documentation:
>>>>>>> 
>>>>>>>   use Bio::DB::Fasta;
>>>>>>> 
>>>>>>>   # create database from directory of fasta files
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   # simple access (for those without Bioperl)
>>>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>>>   my @ids     = $db->ids;
>>>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>>>> 
>>>>>>>   # Bioperl-style access
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>>>   my $seq     = $obj->seq;
>>>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>>>> 
>>>>>>> Do you already have the offsets?
>>>>>>> 
>>>>>>> Brian O.
>>>>>>> 
>>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>>>> still
>>>>>>>> can't find the answer to this.  Forgive me if I've missed
>> something
>>>>>>>> obvious.
>>>>>>>> 
>>>>>>>> This should not be a novel request, but I've not found it
>>>>>>>> answered.  If
>>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>>>> pointer to a
>>>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>>>> 
>>>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>>>> offset
>>>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be
>> a
>>>>>>>> common followup chore for some extra analysis from a gene
>>>>>>>> expression
>>>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>>>> the
>>>>>>>> sequence type to specify...?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> TIA!
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From osborne1 at optonline.net  Fri Feb 17 23:56:08 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 17 Feb 2006 23:56:08 -0500
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: 

Michael,

Yes, BioPerl has done this for you. Essentially what it does it take all the
ids in the CONTIG section and query for each individually, then use the
sequences and the location data to create the single large sequence. This
sequence is appended to the annotation and feature section of the initial
Genbank entry. If you want to study this yourself take a look at
Bio::DB::NCBIHelper::postprocess_data.

OK, to answer your first question with my assumption: what NCBI is doing is
simply providing a shorthand rather than an entire large sequence, therefore
no feature coordinates change, whether it's shorthand, CONTIG, or longhand,
ORIGIN. Second, my explanation tells you that all the sequences are the very
latest versions of each sequence, that's how eutils works by default.
However, I don't think I've answered your question because I'm not sure I
understand what you mean by "when I ask bioperl if these sequences have been
updated, I will be told no". All Bioperl does is read the file provided by
GenBank and use its stated version, nothing fancy.

Brian O.


On 2/16/06 5:31 AM, "michael watson (IAH-C)" 
wrote:

> Hi
> 
> I have two questions really.  I fetched bacterial genome sequences from
> the NCBI using Bio::DB::GenBank.
> 
> Some of these sequence entries are CONTIG sequences, ie they just point
> to other sequences that need to be joined together to form the entire
> genome.
> 
> Looking at my downloads, it looks as if bioperl has done all the
> necessary joining for me - or maybe it was the NCBI that did the
> joining?
> 
> OK, so firstly, did bioperl do the joining, and if so, are all the
> co-ordinates of the features updated to reflect their new location on
> the new, joined sequence?
> 
> And secondly, sequence versions... I'm thinking that possibly the
> sequence version of the CONTIG may be 1 (as it hasn't changed) yet the
> versions of the sequences it refers to might have changed, so when I ask
> bioperl if these sequences have been updated, I will be told no because
> the CONTIG sequence version is 1, but I should be told yes because the
> underlying sequences have...?
> 
> Make sense?
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From pedro.fabre at gmail.com  Fri Feb 17 13:36:37 2006
From: pedro.fabre at gmail.com (pedro fabre)
Date: Fri, 17 Feb 2006 18:36:37 +0000
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: 

>Torsten and all,
>
>  I don't think this will work for me for it only generates 
>statistics for a single sequence.  What I need is a count matrix for 
>each position for a number of DNA sequences.  In other words, if I 
>pass there 3 sequences to this function then it returns the count 
>for each postion for each nucleotide.
>
>  For example if I pass an array of sequences say: ATC,CCC,TTT
>  then I should get a matrix back that will have count for postion 
>1,2,3 for each A,C,T, or G like this:
>
>
>                  1    2   3
>       A        1    0    0
>       C        1    1    2
>       T        1    2    1    
>       G        0    0    0
>
>  Any idea of this is already built somewhere in bioperl?
>
>  Thank you.
>
>


Sam,

What about this?

I worked in something like that some time ago for SNP calculation

and it looks to me you are on the same way.

If you have a sequence like

   A       C       G       T       C       C       A       -       T
   C       G       G       T       A       G       T       G       C
   C       C       C       C       C       G       T       G       C
   C       G       C       T       C       G       T       G       C

Convert the sequence to numbers (0 for the first value, 1 for the 
first modification (reading by columns), 2 for the second 
modification and so on)
Deletions can be considered as another base if you like

After that:


   0       0       0       0       0       0       0       0       0
   1       1       0       0       1       1       1       1       1
   1       0       1       1       0       1       1       1       1
   1       1       1       0       0       1       1       1       1

Once we have the haplotype converted to numbers we have to generate the
snp type information for the haplotype.


SNP code = SUM ( value * multiplicity ^ position );>

     where:
       SUM is the sum of the values for the SNP
       value is the SNP number code (0 [generally for the mayor allele],
                                     1 [for the minor allele].
       position is the position on the block.

For this example the code is:

   0       0       0       0       0       0       0       0       0
   1       1       0       0       1       1       1       1       1
   1       0       1       1       0       1       1       1       1
   1       1       1       0       0       1       1       1       1
  ------------------------------------------------------------------
   14      10      12      4       2       14      14      14      14

   14 = 0*2^0 + 1*2^1 + 1*2^2 + 1*2^3
   12 = 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3
   ....

Once we have the families classify. We will B just the SNP's B.

   14      10      12      4       2

If you want to look into the code follow this link.


http://users.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/PopGen/HtSNP.pm?rev=1.4&content-type=text/vnd.viewcvs-markup

HTH
Pedro



>  Torsten Seemann  wrote:> 
>Say I have an array of nucleotide sequences of of length N. I want 
>to calculate the count matrix (weight matrix). That is for each 
>position 1..N, I want to know how many As, Cs ,Ts and Gs there are. 
>Is the code to do this already written in bioperl to build this 
>matrix if I pass it those strings?
>>    Please excuse my lack of knowledge as I am a new comer to bioinformatics.
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation even has an
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely,
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Sat Feb 18 18:35:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 18 Feb 2006 17:35:22 -0600
Subject: [Bioperl-l] Bio::SearchIO fix posted in Bugzilla
Message-ID: <97C946BE-8410-4B7F-9FA3-97A01641E20E@uiuc.edu>

Added a fix for the blastn and tblastx problems with Bio::SearchIO  
text parsing of BLAST 2.2.13 output:

http://bugzilla.open-bio.org/show_bug.cgi?id=1934

The extra lines "Features in this part of subject sequence" and the  
following descriptive lines are passed over using a loop.  See the  
bug report for specifics.

Cheers,

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From osborne1 at optonline.net  Sun Feb 19 00:47:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 19 Feb 2006 00:47:44 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
 orGeneIDs
In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine>
Message-ID: 

Chris and Harry,

OK, I've put the missing link in place. This is Bio::DB::EntrezGene, so you
can get NCBI Genes as objects, perfectly analogous to Bio::DB::GenBank and
the related modules:

use Bio::DB::EntrezGene;
$db = new Bio::DB::EntrezGene;
$seq = $db->get_Seq_by_id(2);

So starting with just a Gene id, then using Bio::DB::GenBank as Chris
showed, you can get the sequence. What's a little odd is how Entrez Gene has
stored positional information and Sequence identifier, you may have thought
that they'd create a special set of fields for this but no, it's only
available as part of a URL as far as I can tell:

Bio::Annotation::DBLink=HASH()
'_root_verbose' => 0

'database' => 'Evidence Viewer'

'primary_id' => 4693

'url' => 
'http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&contig=NT_079573.2&ge
ne=NDP&lid=4693&from=6657835&to=6682559'


Question: are NT_* sequences going to be a problem for Bio::DB::GenBank? I
see this in NCBIHelper:

# NT contigs can not be retrieved

$self->throw("NT_ contigs are whole chromosome files which are not part of
regular".
"database distributions. Go to ftp://ftp.ncbi.nih.gov/genomes/.")
      if $ids =~ /NT_/;


Perhaps we can modify this so there's no throw() when a seq_start and
seq_stop are specified.

Brian O.

On 2/17/06 6:02 PM, "Chris Fields"  wrote:

> Brian,
> 
> I added some sample code to the page.  See what you think.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 16, 2006 4:46 PM
>> To: 'Brian Osborne'
>> Cc: 'Harry Mangalam'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> orGeneIDs
>> 
>> If I know the start, end, and strand info for a list of features (personal
>> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
>> up), couldn't I try pulling out the surrounding region?  My thought is
>> this,
>> though I haven't coded it yet:
>> 
>> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
>> (array of hashes) based off what I get from RNAMotif objects.
>> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
>> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
>> in there somewhere to not tick off the NCBI curators.
>> 
>> Reason I'm interested in this is b/c I want to know where the RNA motif is
>> in context to surrounding features. If it is very close to a coding
>> region,
>> then the motif likely indicates translational regulation.  Further away
>> may
>> indicate transcriptional termination or another mechanism.
>> 
>> The files returned should have the features included as long as they are
>> in
>> the full length GenBank record.  I tried it out using the web form but not
>> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
>> page.
>> 
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>>> -----Original Message-----
>>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>>> Sent: Thursday, February 16, 2006 4:19 PM
>>> To: Chris Fields
>>> Cc: Harry Mangalam; bioperl-l
>>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> or
>>> GeneIDs
>>> 
>>> Chris,
>>> 
>>> Yes. The question now is where to easily get the coordinates.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
>>> 
>>>> I think a method was recently implemented in Bio::DB::GenBank to
>>>> retrieve a segment of DNA given start and end coordinates in GenBank
>>>> format; that should contain the features you need.  I requested it
>>>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>>>> Would that help?
>>>> 
>>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>>> 
>>>>> Harry,
>>>>> 
>>>>> It's not clear to me that NCBI's eutils offers this capability
>>>>> directly. You
>>>>> can probably download Entrez Gene entries and parse them for
>>>>> coordinates but
>>>>> I know of no way to remotely retrieve genomic sequences like this
>>>>> from NCBI
>>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>>>> that some
>>>>> of us favor and to prove to myself that this is simple to do I wrote
>> a
>>>>> script that I just added to examples/tools, it's called
>>>>> extract_genes.pl and
>>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>>>> species to some dir, download Entrez Gene's gene2accession file,
>>>>> and run. It
>>>>> creates and stores a hash for lookups, it won't read gene2accession
>>>>> each
>>>>> time it runs.
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> 
>>>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>>> 
>>>>>> Hi Brian,
>>>>>> 
>>>>>> Thanks very much for the pointers and the speed of your reply and
>>>>>> apologies
>>>>>> for the speed of mine.
>>>>>> 
>>>>>> This looks good, but what I was looking for was a bioP approach
>>>>>> for hooking to
>>>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>>>> them.  In this
>>>>>> case, speed of retrieval is not critical and I'd rather not
>>>>>> download the
>>>>>> entirety of the sequences to a local disk to hack at them.
>>>>>> 
>>>>>> I've determined a screen-scraping approach to get them and could
>>>>>> script that,
>>>>>> but I thought that bioP had a method for using NCBI's external
>>>>>> API's, tho it
>>>>>> may be that my memory is faulty or the approach is no longer
>>>>>> supported due to
>>>>>> overload.
>>>>>> 
>>>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>>>> docs on them
>>>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>>>> which I
>>>>>> haven't started to excavate).
>>>>>> 
>>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>>>> listening?
>>>>>> 
>>>>>> Harry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>>>> Harry,
>>>>>>> 
>>>>>>> Hope you're doing well. The approach could be based on
>>>>>>> Bio::DB::Fasta. So,
>>>>>>> from its documentation:
>>>>>>> 
>>>>>>>   use Bio::DB::Fasta;
>>>>>>> 
>>>>>>>   # create database from directory of fasta files
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   # simple access (for those without Bioperl)
>>>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>>>   my @ids     = $db->ids;
>>>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>>>> 
>>>>>>>   # Bioperl-style access
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>>>   my $seq     = $obj->seq;
>>>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>>>> 
>>>>>>> Do you already have the offsets?
>>>>>>> 
>>>>>>> Brian O.
>>>>>>> 
>>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>>>> still
>>>>>>>> can't find the answer to this.  Forgive me if I've missed
>> something
>>>>>>>> obvious.
>>>>>>>> 
>>>>>>>> This should not be a novel request, but I've not found it
>>>>>>>> answered.  If
>>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>>>> pointer to a
>>>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>>>> 
>>>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>>>> offset
>>>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be
>> a
>>>>>>>> common followup chore for some extra analysis from a gene
>>>>>>>> expression
>>>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>>>> the
>>>>>>>> sequence type to specify...?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> TIA!
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From maximilianh at gmail.com  Sun Feb 19 08:52:37 2006
From: maximilianh at gmail.com (Maximilian Haeussler)
Date: Sun, 19 Feb 2006 14:52:37 +0100
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <76f031ae0602190552v5f2542dbv@mail.gmail.com>

Hi bio-mailinglists,

does anyone here know of a tool or a library to display two (or more)
sequences at the same time with coloured features? Possibly with lines,
connecting some features from one sequence to the other (synteny-plot) ?
Or to display two multiple alignments, one on top of each other, with
colored features added?

It's not that it would be difficult to write, but programming visualisation
usually takes a lot of time.
Bio::Graphics seems mainly concerned with one main sequence and features on
it. Well, I could copy together two of these gif-images, but then there
would be no connecting lines. Same applies for the graphics in Biojava or
the gff2ps tool or all the multiple alignment viewers that I know (Bioedit,
ClustalX). There is something called Toucan in Java, which displays at least
several lines of gff-style-features, but no visible sequences and more
importantly, no connecting lines. A recent software, Djinn lite, is using a
similar kind of visualization to compare different spliced genes from
various species, but it's mainly aimed at splicing and written in Visual
Basic.
I guess a good compromise might be the 3D viewer Sockeye, but I haven't seen
any synteny-lines in sockeye yet.

I guess I must have missed something here. I cannot be the first one that
would like to compare, say, two gff files, or two multiple alignments?

Thanks a lot for any idea,
Max



From lutfullah at upesh.edu  Sun Feb 19 12:01:05 2006
From: lutfullah at upesh.edu (Dr. Lutfullah)
Date: Sun, 19 Feb 2006 22:01:05 +0500
Subject: [Bioperl-l] bioperl in jail
Message-ID: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>

Hello,

I am trying to create a situation where users can ssh login to a chrooted
jailed account with limited functionality.
I created the chroot jail on my Fedora Core 4 installation using a script
available at:
http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/
The script has a line:
======================
APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
/bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
/usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
/usr/libexec/openssh/sftp-server"
=======================
to which I added everything I could get with /bin/perl to make it:

APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
/bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
/usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
/usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5
/usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp
/usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl"

perl becomes available inside the jail but I cannot use the line "use
Bio::Perl" inside the jail.

The script produces an error on including /usr/lib or /usr/lib/perl5:

Copying necessary library-files to jail (may take some time)
cp: omitting directory `/usr/lib'
ldd: /usr/lib: No such file or directory
Copying files from /etc/pam.d/ to jail
Copying PAM-Modules to jail

In the jailed account the little test program:

use Bio::Perl;
print 2+4;

generated this error:

Can't locate Bio/Perl.pm in @INC (@INC contains:
/usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.4/i386-linux-thread
............................................

Any help would be much appreciated. Thanks in advance.

LK



From boris.steipe at utoronto.ca  Sun Feb 19 17:34:52 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Sun, 19 Feb 2006 17:34:52 -0500
Subject: [Bioperl-l] bioperl in jail
In-Reply-To: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>
References: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>
Message-ID: 

The path that perl uses internally to search its modules (@INC) is  
not the same thing as the path your shell uses. You have to modify  
@INC either within running scripts, or by setting the PERL5LIB  
environment variable upon login.

e.g. see http://modperlbook.org/html/ch03_09.html

HTH,
B.



On 19 Feb 2006, at 12:01, Dr. Lutfullah wrote:

> Hello,
>
> I am trying to create a situation where users can ssh login to a  
> chrooted
> jailed account with limited functionality.
> I created the chroot jail on my Fedora Core 4 installation using a  
> script
> available at:
> http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/
> The script has a line:
> ======================
> APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
> /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
> /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
> /usr/libexec/openssh/sftp-server"
> =======================
> to which I added everything I could get with /bin/perl to make it:
>
> APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
> /bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
> /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
> /usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5
> /usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp
> /usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl"
>
> perl becomes available inside the jail but I cannot use the line "use
> Bio::Perl" inside the jail.
>
> The script produces an error on including /usr/lib or /usr/lib/perl5:
>
> Copying necessary library-files to jail (may take some time)
> cp: omitting directory `/usr/lib'
> ldd: /usr/lib: No such file or directory
> Copying files from /etc/pam.d/ to jail
> Copying PAM-Modules to jail
>
> In the jailed account the little test program:
>
> use Bio::Perl;
> print 2+4;
>
> generated this error:
>
> Can't locate Bio/Perl.pm in @INC (@INC contains:
> /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread
> ............................................
>
> Any help would be much appreciated. Thanks in advance.
>
> LK
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From khoueiry at ibdm.univ-mrs.fr  Mon Feb 20 04:27:07 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Mon, 20 Feb 2006 10:27:07 +0100
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
Message-ID: <1140427628.10569.10.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 

From shameer at ncbs.res.in  Mon Feb 20 01:21:01 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 20 Feb 2006 11:51:01 +0530 (IST)
Subject: [Bioperl-l] Matrix Average Code / Module ?
In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
Message-ID: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176>

Hi all,
Is there any program/module to calculate the average of a blosum/pam any
matrix ?

I have a matrix and I need to see the average

for example

11 22 43 54 50
27 87 74 32 10
66 58 98 78 20
22 23 44 16 34

I have gone through Bio::Matrix::MatrixI and Bio::Matrix::GenericMatrix
and other perl modules like Math::Matrix
http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm
and Math::Cephes::Matrix - but none of them have a provison to do matrix 
average calculation.

Any help ???
thanks in advance,
Happy biocomputing !!!


-- 
Shameer Khadar
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From cjfields at uiuc.edu  Mon Feb 20 12:01:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 20 Feb 2006 11:01:26 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
Message-ID: <000e01c6363f$494bc5e0$15327e82@pyrimidine>

I have added a preliminary bugfix for the problems seen with nucleotide
blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
perltidy to space out the blocks (really for my own purposes; it's a pretty
complex module).  The fix bypasses the extra lines output for blastn and
tblastx and now seems to parse the text output for those reports correctly.
I tested it using all NCBI BLAST flavors for the last two version of BLAST
(2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
tested on MacOSX at the moment, so I need people out there to test it out on
anything they can to make sure it works before committing.  I'll be trying
it on Windows today.  Report back to me and I'll post anything on bugzilla.

Here it is:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> Sent: Thursday, February 16, 2006 3:46 AM
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org; Chris Fields
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> version 1.28
> 
> Hi,
> 
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast-output.
> (see e.g. "Features flanking this part..." or "Features in this part
> ..."), example added.
> The blast.pm module starts looking for the hsp-alignement-information,
> but it dies when it hits this Feature-information.
> 
> Pieter
> 
> 
> >gi|77552765|gb|DP000011.1|
>  list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> chromosome 12, complete
> 
> sequence
> Length=27492551
> 
>  Features flanking this part of subject sequence:
> 
> 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class
>  &from=19251479&to=19253693&view=gbwithparts>
> 
> 2655 bp at 3' side: hypothetical protein
>  &from=19260091&to=19260600&view=gbwithparts>
> 
>  Score = 36.2 bits (18),  Expect = 0.22
>  Identities = 18/18 (100%), Gaps = 0/18 (0%)
>  Strand=Plus/Minus
> 
> Query  4         GTACTACTCTACTCTACT  21
>                  ||||||||||||||||||
> 
> Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> 
> 
>  Features flanking this part of subject sequence:
> 
> 2991 bp at 5' side: hypothetical protein
>  &from=27003164&to=27003907&view=gbwithparts>
>    1131 bp at 3' side: hypothetical protein
> 
>  &from=27008046&to=27010752&view=gbwithparts>
> 
>  Score = 36.2 bits (18),  Expect = 0.22
>  Identities = 18/18 (100%), Gaps = 0/18 (0%)
>  Strand=Plus/Minus
> 
> Query  2         ATGTACTACTCTACTCTA  19
>                  ||||||||||||||||||
> Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> 
> 
> 
>  Features in this part of subject sequence:
>    DHHC zinc finger domain, putative
> 
>  &from=17614825&to=17618687&view=gbwithparts>
> 
>  Score = 34.2 bits (17),  Expect = 0.87
>  Identities = 17/17 (100%), Gaps = 0/17 (0%)
>  Strand=Plus/Plus
> 
> Query  5         TACTACTCTACTCTACT  21
>                  |||||||||||||||||
> Sbjct  17616437  TACTACTCTACTCTACT  17616453
> 
> 
> 
>  Features flanking this part of subject sequence:
>    102 bp at 5' side: bZIP transcription factor, putative
> 
>  &from=2774964&to=2775778&view=gbwithparts>
>    3740 bp at 3' side: yeast dcp1, putative
>  &from=2779635&to=2782508&view=gbwithparts>
> 
>  Score = 32.2 bits (16),  Expect =
> 3.4
>  Identities = 16/16 (100%), Gaps = 0/16 (0%)
>  Strand=Plus/Plus
> 
> Query  7        CTACTCTACTCTACTC  22
>                 ||||||||||||||||
> Sbjct  2775880  CTACTCTACTCTACTC  2775895
> 
> 
>  Features flanking this part of subject sequence:
> 
>    21 bp at 5' side: peptide transporter T17F3.11, putative
>  &from=27321354&to=27323117&view=gbwithparts>
> 
> 10230 bp at 3' side: transposon protein, putative, unclassified
>  &from=27333383&to=27334285&view=gbwithparts>
> 
>  Score = 32.2 bits (16),  Expect = 3.4
>  Identities = 16/16 (100%), Gaps = 0/16 (0%)
>  Strand=Plus/Minus
> 
> Query  7         CTACTCTACTCTACTC  22
> 
>                  ||||||||||||||||
> Sbjct  27323153  CTACTCTACTCTACTC  27323138
> 
> 
> 
> 
> Guojun Yang wrote:
> 
> >Hi, Chris,
> >Finally the remoteblast test script works for the amino.fa query. but
> when I try a nucleic acid sequence (see below), Error occurs:
> >"
> >waiting........
> >------------- EXCEPTION  -------------
> >MSG: no data for midline  Features flanking this part of subject
> sequence:
> >STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> hIO/blast.pm:1172
> >STACK toplevel remoteblast_test:40
> >"
> >The query sequence is:
> >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> >
> >The script (basically same as the remoteblast test, I only changed
> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> >#!/usr/bin/perl
> >
> >use Bio::SeqIO;
> >use Bio::Seq;
> >use Bio::Tools::Run::RemoteBlast;
> >use Bio::SearchIO;
> >use strict;
> >my $prog='blastn';
> >my $db='nr';
> >my $e_val=1e-10;
> >my @params=( -prog=>$prog,
> >	-data=>$db,
> >	-expect=>$e_val,
> >	-readmethod=>'SearchIO');
> >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >my $v = 1;
> >
> >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> >
> >while (my $input = $str->next_seq()){
> >  #Blast a sequence against a database:
> >  #Alternatively, you could  pass in a file with many
> >  #sequences rather than loop through sequence one at a time
> >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >  #and swap the two lines below for an example of that.
> >  my $r = $factory->submit_blast($input);
> >  #my $r = $factory->submit_blast('amino.fa');
> >  print STDERR "waiting..." if( $v > 0 );
> >  while ( my @rids = $factory->each_rid ) {
> >    foreach my $rid ( @rids ) {
> >      my $rc = $factory->retrieve_blast($rid);
> >      if( !ref($rc) ) {
> >        if( $rc < 0 ) {
> >          $factory->remove_rid($rid);
> >        }
> >        print STDERR "." if ( $v > 0 );
> >        sleep 5;
> >      } else {
> >        my $result = $rc->next_result();
> >        #save the output
> >        my $filename = $result->query_name()."\.out";
> >        $factory->save_output($filename);
> >        $factory->remove_rid($rid);
> >        print "\nQuery Name: ", $result->query_name(), "\n";
> >        while ( my $hit = $result->next_hit ) {
> >          next unless ( $v > 0);
> >          print "\thit name is ", $hit->name, "\n";
> >          while( my $hsp = $hit->next_hsp ) {
> >            print "\t\tscore is ", $hsp->score, "\n";
> >          }
> >        }
> >      }
> >    }
> >  }
> >}
> >
> >
> >Do you think there might still be something in the NCBI output format?
> >
> >Thank you,
> >Guojun
> >
> >
> >
> >
> >Guojun Yang
> >Department of Plant Biology
> >University of Georgia
> >Tel: 706-542-1857
> >Fax: 706-542-1805
> >http://www.arches.uga.edu/~guojun
> >
> >
> >
> >----- Original Message -----
> >From: Chris Fields [mailto:cjfields at uiuc.edu]
> >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >
> >
> >
> >
> >>Sorry, forgot to add that I didn't see the regex issue that you
> mentioned.
> >>It could be a perl-related issue.  Try the fixes I mentioned and see
> what
> >>happens.
> >>
> >>
> >>>Christopher Fields
> >>>
> >>>
> >>Postdoctoral Researcher - Switzer Lab
> >>Dept. of Biochemistry
> >>University of Illinois Urbana-Champaign
> >>
> >>
> >>>>>-----Original Message-----
> >>>>>
> >>>>>
> >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>Sent: Tuesday, February 14, 2006 12:36 PM
> >>>To: 'gyang at plantbio.uga.edu'
> >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >>>
> >>>
> >>>>>It's a good habit to always add single quotes around words.  The perl
> >>>>>
> >>>>>
> >>>interpreter may think a single bare word is a subroutine or perlfunc
> >>>called with no args so will try to find a subroutine named blastp().
> My
> >>>debugger actually gives the error that the bare word blastp may
> conflict
> >>>with a future reserved word.  Like you said, 'use strict' will point
> that
> >>>out.
> >>>
> >>>
> >>>>>As for the regex, it should match all the blast programs at NCBI
> (blastp,
> >>>>>
> >>>>>
> >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> >>>else passes through.
> >>>
> >>>
> >>>>>So, if you are using the script below, there are several errors.  The
> bare
> >>>>>
> >>>>>
> >>>words for $prog and $db need quotes, and the flags for you @params
> array
> >>>don't have a dash before them.  I get this after adding quotes but
> before
> >>>adding the dashes to @params:
> >>>
> >>>
> >>>>>C:\Perl\Scripts>test_blast.pl
> >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>>
> >>>>>
> >>>MSG:
> >>>STACK: Error::throw
> >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> >>>live/Bio/Root/Root.pm:328
> >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> >>>-----------------------------------------------------------
> >>>
> >>>
> >>>>>The last line indicates a problem with this line:
> >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>>Changing the @params to this:
> >>>>>my @params=( -prog=>$prog,
> >>>>>
> >>>>>
> >>>	-data=>$db,
> >>>	-expect=>$e_val,
> >>>	-readmethod=>'SearchIO');
> >>>
> >>>
> >>>>>fixes it, and I get output as expected.
> >>>>>Christopher Fields
> >>>>>
> >>>>>
> >>>Postdoctoral Researcher - Switzer Lab
> >>>Dept. of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>>>>>>-----Original Message-----
> >>>>>>>>
> >>>>>>>>
> >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >>>>
> >>>>Hi, Chris,
> >>>>When I tried with the perldoc script, It did not work either. First it
> >>>>says $prog can not be bare word if I "use strict". I added quotes on
> the
> >>>>words, then it says the value for $prog does not match expression
> >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> >>>>
> >>>>
> >>>script
> >>>
> >>>
> >>>>is shown below. Why is the expression "t?blast[pnx]"?
> >>>>
> >>>>#!/usr/bin/perl
> >>>>
> >>>>use Bio::SeqIO;
> >>>>use Bio::Seq;
> >>>>use Bio::Tools::Run::RemoteBlast;
> >>>>use Bio::SearchIO;
> >>>>
> >>>>
> >>>>my $prog=blastp;
> >>>>my $db=swissprot;
> >>>>my $e_val=1e-10;
> >>>>my @params=( prog=>$prog,
> >>>>	data=>$db,
> >>>>	expect=>$e_val,
> >>>>	readmethod=>'SearchIO');
> >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>
> >>>>my $v = 1;
> >>>>
> >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >>>>
> >>>>while (my $input = $str->next_seq()){
> >>>>  #Blast a sequence against a database:
> >>>>  #Alternatively, you could  pass in a file with many
> >>>>  #sequences rather than loop through sequence one at a time
> >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >>>>  #and swap the two lines below for an example of that.
> >>>>  my $r = $factory->submit_blast($input);
> >>>>  #my $r = $factory->submit_blast('amino.fa');
> >>>>  print STDERR "waiting..." if( $v > 0 );
> >>>>  while ( my @rids = $factory->each_rid ) {
> >>>>    foreach my $rid ( @rids ) {
> >>>>      my $rc = $factory->retrieve_blast($rid);
> >>>>      if( !ref($rc) ) {
> >>>>        if( $rc < 0 ) {
> >>>>          $factory->remove_rid($rid);
> >>>>        }
> >>>>        print STDERR "." if ( $v > 0 );
> >>>>        sleep 5;
> >>>>      } else {
> >>>>        my $result = $rc->next_result();
> >>>>        #save the output
> >>>>        my $filename = $result->query_name()."\.out";
> >>>>        $factory->save_output($filename);
> >>>>        $factory->remove_rid($rid);
> >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>        while ( my $hit = $result->next_hit ) {
> >>>>          next unless ( $v > 0);
> >>>>          print "\thit name is ", $hit->name, "\n";
> >>>>          while( my $hsp = $hit->next_hsp ) {
> >>>>            print "\t\tscore is ", $hsp->score, "\n";
> >>>>          }
> >>>>        }
> >>>>      }
> >>>>    }
> >>>>  }
> >>>>}
> >>>>
> >>>>Thank you for your help!
> >>>>
> >>>>
> >>>>Guojun
> >>>>Department of Plant Biology
> >>>>University of Georgia
> >>>>
> >>>>----- Original Message -----
> >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>To: gyang at plantbio.uga.edu
> >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Try two things:
> >>>>>
> >>>>>
> >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> >>>>>>
> >>>>>>
> >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> >>>>>
> >>>>>
> >>>>wrong
> >>>>
> >>>>
> >>>>>with the logic in your subroutine:
> >>>>>
> >>>>>
> >>>>>>my $v = 1;
> >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >>>>>>while (my $input = $str->next_seq()){
> >>>>>>
> >>>>>>
> >>>>>  #Blast a sequence against a database:
> >>>>>  #Alternatively, you could  pass in a file with many
> >>>>>  #sequences rather than loop through sequence one at a time
> >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >>>>>  #and swap the two lines below for an example of that.
> >>>>>  my $r = $factory->submit_blast($input);
> >>>>>  #my $r = $factory->submit_blast('amino.fa');
> >>>>>  print STDERR "waiting..." if( $v > 0 );
> >>>>>  while ( my @rids = $factory->each_rid ) {
> >>>>>    foreach my $rid ( @rids ) {
> >>>>>      my $rc = $factory->retrieve_blast($rid);
> >>>>>      if( !ref($rc) ) {
> >>>>>        if( $rc < 0 ) {
> >>>>>          $factory->remove_rid($rid);
> >>>>>        }
> >>>>>        print STDERR "." if ( $v > 0 );
> >>>>>        sleep 5;
> >>>>>      } else {
> >>>>>        my $result = $rc->next_result();
> >>>>>        #save the output
> >>>>>        my $filename = $result->query_name()."\.out";
> >>>>>        $factory->save_output($filename);
> >>>>>        $factory->remove_rid($rid);
> >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>>        while ( my $hit = $result->next_hit ) {
> >>>>>          next unless ( $v > 0);
> >>>>>          print "\thit name is ", $hit->name, "\n";
> >>>>>          while( my $hsp = $hit->next_hsp ) {
> >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> >>>>>          }
> >>>>>        }
> >>>>>      }
> >>>>>    }
> >>>>>  }
> >>>>>}
> >>>>>
> >>>>>
> >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> >>>>>>
> >>>>>>
> >>>really
> >>>
> >>>
> >>>>>shouldn't make that much of a difference, but I noticed that the CVS
> >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> >>>>>released; the Bugzilla version is based off CVS.
> >>>>>
> >>>>>
> >>>>>>Christopher Fields
> >>>>>>
> >>>>>>
> >>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>Dept. of Biochemistry
> >>>>>University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>
> >>>>>>>
> >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> >>>>>>To: bioperl-l at lists.open-bio.org
> >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>
> >>>>>>
> >>>>>>>>Thanks, Chris,
> >>>>>>>>
> >>>>>>>>
> >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> >>>>>>
> >>>>>>
> >>>one
> >>>
> >>>
> >>>>from
> >>>>
> >>>>
> >>>>>>your bug report. The running version is 1.5 when I use the command
> >>>>>>
> >>>>>>
> >>>you
> >>>
> >>>
> >>>>>>sent me. But when I tried the script, it doesn't change much. My
> >>>>>>remoteblast code (portion) is here:
> >>>>>>
> >>>>>>
> >>>>>>>>sub search {
> >>>>>>>>
> >>>>>>>>
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> >>>>>>local
> >>>>>>
> >>>>>>
> >>>>>>
> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> >>>
> >>>
> >>>>>>'no';
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> >>>>>>			      -id=>"query",
> >>>>>>			      -desc=>"new seq");
> >>>>>>my $len=$query->length();
> >>>>>>@db=('nr','htgs','wgs');
> >>>>>>foreach my $db (@db) {
> >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> >>>>>>						'-data' =>"$db",
> >>>>>>
> >>>>>>
> >>>>>>
> >>'-expect'=>"$E_value");
> >>
> >>
> >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>my @rids = $factory->each_rid();
> >>>>>>>>
> >>>>>>>>
> >>>>>>foreach my $rid ( @rids ) {
> >>>>>>    print STDERR "$rid\n";
> >>>>>>}
> >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> >>>>>>print STDERR "waiting...";
> >>>>>>sleep 60;
> >>>>>>
> >>>>>>
> >>>>>>>>foreach my $rid ( @rids ) {
> >>>>>>>>
> >>>>>>>>
> >>>>>>    my $rc = $factory->retrieve_blast($rid);
> >>>>>>    while (!ref($rc) ) {
> >>>>>>	if( $rc < 0 ) {
> >>>>>># retrieve_blast returns -1 on error
> >>>>>>	    $factory->remove_rid($rid);
> >>>>>>	    print "Error!\n";
> >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> >>>>>>	    die "Can't retrieve $rid";
> >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> >>>>>>
> >>>>>>
> >>>finished'
> >>>
> >>>
> >>>>>>	    sleep 60;
> >>>>>>	    $rc = $factory->retrieve_blast($rid);
> >>>>>>	}
> >>>>>>    }
> >>>>>>    if (ref($rc)) {
> >>>>>>	print STDERR "Done.\n";
> >>>>>>	 while( my $result = $rc->next_result) {
> >>>>>>	    while( my $hit = $result->next_hit()) {
> >>>>>>	    	$hit_name=$hit->name;
> >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> >>>>>>		$name=$1;
> >>>>>>		@left_plus_start=();
> >>>>>>		@left_plus_end=();
> >>>>>>		@left_minus_start=();
> >>>>>>		@left_minus_end=();
> >>>>>>		@right_plus_start=();
> >>>>>>		@right_plus_end=();
> >>>>>>		@right_minus_start=();
> >>>>>>		@right_minus_end=();
> >>>>>>
> >>>>>>
> >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> >>>>>>>>
> >>>>>>>>
> >>>>>>		while( my $hsp = $hit->next_hsp()) {
> >>>>>>......
> >>>>>>
> >>>>>>
> >>>>>>>>It was working quite well before around October laster year, but
> >>>>>>>>
> >>>>>>>>
> >>>>it has
> >>>>
> >>>>
> >>>>>>stopped since then, When a submission is sent via a webpage, the cgi
> >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> >>>>>>
> >>>>>>
> >>>>finally
> >>>>
> >>>>
> >>>>>>the expected email is received but without real results although it
> >>>>>>
> >>>>>>
> >>>>does
> >>>>
> >>>>
> >>>>>>contain something from other parts of the script. Apparently the
> >>>>>>
> >>>>>>
> >>>>search
> >>>>
> >>>>
> >>>>>>sub did not return anything (I know there is something should be
> >>>>>>returned.). Is it also possible the format of the NCBI output for
> >>>>>>
> >>>>>>
> >>>each
> >>>
> >>>
> >>>>>>result has changed?
> >>>>>>Thank you,
> >>>>>>Guojun
> >>>>>>
> >>>>>>
> >>>>>>>>>>Department of Plant Biology
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>University of Georgia
> >>>>>>
> >>>>>>
> >>>>>>>>>>>>----- Original Message -----
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>
> >>>>>>
> >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>you
> >>>
> >>>
> >>>>checking
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> >>>>>>>
> >>>>>>>
> >>>>two
> >>>>
> >>>>
> >>>>>>>separate directories) or are you looking in modules?  Here's the
> >>>>>>>
> >>>>>>>
> >>>way
> >>>
> >>>
> >>>>to
> >>>>
> >>>>
> >>>>>>>check the version (from the FAQ):
> >>>>>>>
> >>>>>>>
> >>>>>>>>perl -MBio::Root::Version -e 'print
> >>>>>>>>
> >>>>>>>>
> >>>>$Bio::Root::Version::VERSION,"\n"'
> >>>>
> >>>>
> >>>>>>>>If you have two full bioperl distributions on your computer,
> >>>>>>>>
> >>>>>>>>
> >>>>normally
> >>>>
> >>>>
> >>>>>>only
> >>>>>>
> >>>>>>
> >>>>>>>one will be in use unless you have explicitly set the environment
> >>>>>>>
> >>>>>>>
> >>>>>>variable
> >>>>>>
> >>>>>>
> >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first before
> >>>>>>>
> >>>>>>>
> >>>>your
> >>>>
> >>>>
> >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> >>>>>>>
> >>>>>>>
> >>>>mixing
> >>>>
> >>>>
> >>>>>>>then, but only if perl can't find a particular module in the path
> >>>>>>>
> >>>>>>>
> >>>>>>designated
> >>>>>>
> >>>>>>
> >>>>>>>in PERL5LIB; then it will progress through the directories listed
> >>>>>>>
> >>>>>>>
> >>>in
> >>>
> >>>
> >>>>>>@INC.
> >>>>>>
> >>>>>>
> >>>>>>>This may happen if a module is unique to a particular release, but
> >>>>>>>
> >>>>>>>
> >>>>>>shouldn't
> >>>>>>
> >>>>>>
> >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> >>>>>>>
> >>>>>>>
> >>>can
> >>>
> >>>
> >>>>>>check
> >>>>>>
> >>>>>>
> >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> >>>>>>>
> >>>>>>>
> >>>>differ
> >>>>
> >>>>
> >>>>>>>depending on your OS, perl build, etc.
> >>>>>>>
> >>>>>>>
> >>>>>>>>Regardless, if you follow the directions for installing bioperl
> >>>>>>>>
> >>>>>>>>
> >>>>for
> >>>>
> >>>>
> >>>>>>your
> >>>>>>
> >>>>>>
> >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> >>>>>>>
> >>>>>>>
> >>>>unless
> >>>>
> >>>>
> >>>>>>you
> >>>>>>
> >>>>>>
> >>>>>>>explicitly change the installation directory when using 'perl
> >>>>>>>
> >>>>>>>
> >>>>>>Makefile.PL'),
> >>>>>>
> >>>>>>
> >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> >>>>>>>
> >>>>>>>
> >>>>install
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> >>>>>>>
> >>>>>>>
> >>>>See
> >>>>
> >>>>
> >>>>>>this
> >>>>>>
> >>>>>>
> >>>>>>>page:
> >>>>>>>
> >>>>>>>
> >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> >>>>>>>>for more details.
> >>>>>>>>Christopher Fields
> >>>>>>>>
> >>>>>>>>
> >>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>Dept. of Biochemistry
> >>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> >>>>>>>>To: bioperl-l at lists.open-bio.org
> >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Hi, Chris,
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>I do have different versions of bioperl on my Linux machine
> >>>>>>>>
> >>>>>>>>
> >>>(1.4.
> >>>
> >>>
> >>>>and
> >>>>
> >>>>
> >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> >>>>>>>>
> >>>>>>>>
> >>>>1.5.1
> >>>>
> >>>>
> >>>>>>or I
> >>>>>>
> >>>>>>
> >>>>>>>>need to uninstall and remove the previous versions. I could not
> >>>>>>>>
> >>>>>>>>
> >>>>find
> >>>>
> >>>>
> >>>>>>any
> >>>>>>
> >>>>>>
> >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> >>>>>>>>
> >>>>>>>>
> >>>>some
> >>>>
> >>>>
> >>>>>>>>suggestion?
> >>>>>>>>Thanks,
> >>>>>>>>Guojun
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Department of Plant Biology
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>University of Georgia
> >>>>>>>>      _____
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>updated from CVS
> >>>>>>
> >>>>>>
> >>>>>>>>which isn't the latest fix.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Make sure that you check the following:
> >>>>>>>>>>1) Always post to the mailing list:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>(CVS)
> >>>>
> >>>>
> >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> >>>>>>>>
> >>>>>>>>
> >>>>only
> >>>>
> >>>>
> >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> >>>>>>>>
> >>>>>>>>
> >>>can't
> >>>
> >>>
> >>>>>>>>guarantee that mixing modules from old and new distributions
> >>>>>>>>
> >>>>>>>>
> >>>(1.4
> >>>
> >>>
> >>>>and
> >>>>
> >>>>
> >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> >>>>>>>>
> >>>>>>>>
> >>>>saved
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> >>>>>>>>
> >>>>>>>>
> >>>>>>(v2.2.13)
> >>>>>>
> >>>>>>
> >>>>>>>>but it should still save it. I believe as long as next_results()
> >>>>>>>>
> >>>>>>>>
> >>>>isn't
> >>>>
> >>>>
> >>>>>>>>called, it will work.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> >>>>>>>>>>
> >>>>>>>>>>
> >>>2.2.13
> >>>
> >>>
> >>>>>>text output
> >>>>>>
> >>>>>>
> >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> >>>>>>>>
> >>>>>>>>
> >>>Roger
> >>>
> >>>
> >>>>Hall
> >>>>
> >>>>
> >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> >>>>>>>>
> >>>>>>>>
> >>>>(Jason
> >>>>
> >>>>
> >>>>>>or
> >>>>>>
> >>>>>>
> >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> >>>>>>>>
> >>>>>>>>
> >>>>>>Bugzilla:
> >>>>>>
> >>>>>>
> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>option
> >>>>
> >>>>
> >>>>>>of
> >>>>>>
> >>>>>>
> >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> >>>>>>>>
> >>>>>>>>
> >>>>this
> >>>>
> >>>>
> >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> >>>>>>>>
> >>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>>>means that the final version will change to refle the new
> >>>>>>>>
> >>>>>>>>
> >>>version.
> >>>
> >>>
> >>>>>>>>>>>>Christopher Fields
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>Dept. of Biochemistry
> >>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>    _____
> >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> >>>>>>>>To: Chris Fields
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>Hi, Chris
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>for
> >>>>
> >>>>
> >>>>>>my cgi
> >>>>>>
> >>>>>>
> >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> >>>>>>>>
> >>>>>>>>
> >>>>even
> >>>>
> >>>>
> >>>>>>get
> >>>>>>
> >>>>>>
> >>>>>>>>any RID. Is there any suggestion?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>Guojun
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>Guojun Yang
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Department of Plant Biology
> >>>>>>>>University of Georgia
> >>>>>>>>Tel: 706-542-1857
> >>>>>>>>Fax: 706-542-1805
> >>>>>>>>http://www.arches.uga.edu/~guojun
> >>>>>>>>    _____
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>I would say give the new code a try, but realize that it
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>hasn't
> >>>>
> >>>>
> >>>>>>been
> >>>>>>
> >>>>>>
> >>>>>>>>checked
> >>>>>>>>in (like I said below). I will try going over the modified
> >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> >>>>>>>>
> >>>>>>>>
> >>>>anything I
> >>>>
> >>>>
> >>>>>>>>might
> >>>>>>>>have missed. The changed order in the header of BLAST text
> >>>>>>>>
> >>>>>>>>
> >>>output
> >>>
> >>>
> >>>>has
> >>>>
> >>>>
> >>>>>>me a
> >>>>>>
> >>>>>>
> >>>>>>>>bit worried that it might not catch everything, but it at least
> >>>>>>>>
> >>>>>>>>
> >>>>>>doesn't
> >>>>>>
> >>>>>>
> >>>>>>>>hang
> >>>>>>>>in the while() loop I described in the bug report below (bug
> >>>>>>>>
> >>>>>>>>
> >>>>#1934)
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>seems to process everything fine.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>If you want more stability in the code, you might consider
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>changing over
> >>>>>>
> >>>>>>
> >>>>>>>>to
> >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> >>>>>>>>
> >>>>>>>>
> >>>>some
> >>>>
> >>>>
> >>>>>>>>changes
> >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> >>>>>>>>
> >>>>>>>>
> >>>>saving
> >>>>
> >>>>
> >>>>>>XML
> >>>>>>
> >>>>>>
> >>>>>>>>output, but I believe it parses everything regardless. If you
> >>>>>>>>
> >>>>>>>>
> >>>look
> >>>
> >>>
> >>>>>>back
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>last month or so there has been a bit of discussion here about
> >>>>>>>>
> >>>>>>>>
> >>>it.
> >>>
> >>>
> >>>>>>Jason
> >>>>>>
> >>>>>>
> >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>remoteblast/
> >>>>>>
> >>>>>>
> >>>>>>>>>>Christopher Fields
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>Dept. of Biochemistry
> >>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> >>>>>>>>>To: bioperl-l at bioperl.org
> >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>>
> >>>>>>>>>
> >>>>version
> >>>>
> >>>>
> >>>>>>1.28
> >>>>>>
> >>>>>>
> >>>>>>>>>Hi, Everybody,
> >>>>>>>>>I see this post and am wondering if this is the reason for the
> >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> >>>>>>>>>
> >>>>>>>>>
> >>>>MAK,
> >>>>
> >>>>
> >>>>>>for
> >>>>>>
> >>>>>>
> >>>>>>>>MITE
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>sequence analysis. It was working very well until around
> >>>>>>>>>
> >>>>>>>>>
> >>>>November
> >>>>
> >>>>
> >>>>>>2005,
> >>>>>>
> >>>>>>
> >>>>>>>>>when it stopped returning any result (the site is fine and
> >>>>>>>>>
> >>>>>>>>>
> >>>seems
> >>>
> >>>
> >>>>to
> >>>>
> >>>>
> >>>>>>be
> >>>>>>
> >>>>>>
> >>>>>>>>>doing sth after submission). In the CGI script, I used
> >>>>>>>>>
> >>>>>>>>>
> >>>>remoteblast
> >>>>
> >>>>
> >>>>>>(that
> >>>>>>
> >>>>>>
> >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>access to
> >>>>>>
> >>>>>>
> >>>>>>>>>the server because I moved. Quite several people sent emails
> >>>>>>>>>
> >>>>>>>>>
> >>>to
> >>>
> >>>
> >>>>us
> >>>>
> >>>>
> >>>>>>about
> >>>>>>
> >>>>>>
> >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> >>>>>>>>>
> >>>>>>>>>
> >>>>problem?
> >>>>
> >>>>
> >>>>>>>>Should
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> >>>>>>>>>
> >>>>>>>>>
> >>>>version?
> >>>>
> >>>>
> >>>>>>>>>Thanks a lot,
> >>>>>>>>>Guojun
> >>>>>>>>>
> >>>>>>>>>Department of Plant Biology
> >>>>>>>>>University of Georgia
> >>>>>>>>>Tel: 706-542-1857
> >>>>>>>>>Fax: 706-542-1805
> >>>>>>>>>http://www.arches.uga.edu/~guojun
> >>>>>>>>>_____
> >>>>>>>>>
> >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> >>>>>>>>>
> >>>>>>>>>
> >>>>Jian'
> >>>>
> >>>>
> >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> >>>>>>>>>
> >>>>>>>>>
> >>>[mailto:bioperl-
> >>>
> >>>
> >>>>>>>>>l at bioperl.org]
> >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >>>>>>>>>
> >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> >>>>>>>>>
> >>>>>>>>>
> >>>>CVS.
> >>>>
> >>>>
> >>>>>>It
> >>>>>>
> >>>>>>
> >>>>>>>>>will
> >>>>>>>>>work for saving text output. However, it will not parse
> >>>>>>>>>
> >>>>>>>>>
> >>>anything
> >>>
> >>>
> >>>>>>using
> >>>>>>
> >>>>>>
> >>>>>>>>>next_result (it will likely hang) and will not save XML
> >>>>>>>>>
> >>>>>>>>>
> >>>format.
> >>>
> >>>
> >>>>See
> >>>>
> >>>>
> >>>>>>>>these
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>bugs:
> >>>>>>>>>
> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >>>>>>>>>
> >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> >>>>>>>>>
> >>>>>>>>>
> >>>and
> >>>
> >>>
> >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> >>>>>>>>>
> >>>>>>>>>
> >>>>yet
> >>>>
> >>>>
> >>>>>>so
> >>>>>>
> >>>>>>
> >>>>>>>>are
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>still not included in bioperl-live; they may be further
> >>>>>>>>>
> >>>>>>>>>
> >>>modified
> >>>
> >>>
> >>>>>>before
> >>>>>>
> >>>>>>
> >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> >>>>>>>>>
> >>>>>>>>>
> >>>>just
> >>>>
> >>>>
> >>>>>>try
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>first fix, which is a change to SearchIO::blast.
> >>>>>>>>>
> >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>script
> >>>>>>
> >>>>>>
> >>>>>>>>>which
> >>>>>>>>>had problems; the script you used saves the output but doesn't
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>actually
> >>>>>>
> >>>>>>
> >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> >>>>>>>>>
> >>>>>>>>>
> >>>>data).
> >>>>
> >>>>
> >>>>>>Is
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> >>>>>>>>>
> >>>>>>>>>
> >>>you
> >>>
> >>>
> >>>>>>tried
> >>>>>>
> >>>>>>
> >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> >>>>>>>>>
> >>>>>>>>>
> >>>>readmethod
> >>>>
> >>>>
> >>>>>>=>
> >>>>>>
> >>>>>>
> >>>>>>>>>blast"
> >>>>>>>>>using your version of RemoteBlast and method next_result()?
> >>>>>>>>>
> >>>>>>>>>
> >>>Like
> >>>
> >>>
> >>>>>>below
> >>>>>>
> >>>>>>
> >>>>>>>>>(from
> >>>>>>>>>perldoc):
> >>>>>>>>>
> >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> >>>>>>>>>foreach my $rid ( @rids ) {
> >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> >>>>>>>>>if( !ref($rc) ) {
> >>>>>>>>>if( $rc < 0 ) {
> >>>>>>>>>$factory->remove_rid($rid);
> >>>>>>>>>}
> >>>>>>>>>print STDERR "." if ( $v > 0 );
> >>>>>>>>>sleep 5;
> >>>>>>>>>} else { # parsing
> >>>>>>>>>starts here
> >>>>>>>>>my $result = $rc->next_result(); # it should hang
> >>>>>>>>>here
> >>>>>>>>>#save the output
> >>>>>>>>>my $filename = $result->query_name()."\.out";
> >>>>>>>>>$factory->save_output($filename);
> >>>>>>>>>$factory->remove_rid($rid);
> >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>>>>>>while ( my $hit = $result->next_hit ) {
> >>>>>>>>>next unless ( $v > 0);
> >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>My script hanged if I used next_result() in any way prior to
> >>>>>>>>>
> >>>>>>>>>
> >>>the
> >>>
> >>>
> >>>>>>fixes.
> >>>>>>
> >>>>>>
> >>>>>>>>I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>want to see how many others are having the same issues with
> >>>>>>>>>
> >>>>>>>>>
> >>>>parsing
> >>>>
> >>>>
> >>>>>>>>using
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>the CVS version of bioperl-live.
> >>>>>>>>>
> >>>>>>>>>Christopher Fields
> >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>>Dept. of Biochemistry
> >>>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> >>>>>>>>>>
> >>>>>>>>>>
> >>>l-
> >>>
> >>>
> >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> >>>>>>>>>>To: Huang Jian; bioperl-l
> >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >>>>>>>>>>
> >>>>>>>>>>Hi Huang,
> >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>works
> >>>>
> >>>>
> >>>>>>on
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>logic of checking the temporary file size to determine
> >>>>>>>>>>
> >>>>>>>>>>
> >>>whether
> >>>
> >>>
> >>>>the
> >>>>
> >>>>
> >>>>>>>>Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>results are ready. This condition is not getting satisfied
> >>>>>>>>>>
> >>>>>>>>>>
> >>>may
> >>>
> >>>
> >>>>be
> >>>>
> >>>>
> >>>>>>due
> >>>>>>
> >>>>>>
> >>>>>>>>to
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>some changes brought about by NCBI. I had this problem
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>recently
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>>>figured out that the solution was to use the latest version
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>has
> >>>>>>
> >>>>>>
> >>>>>>>>>>this problem fixed (does not use file size logic any more)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>is
> >>>>>>
> >>>>>>
> >>>>>>>>not
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>yet included in the BioPerl package.
> >>>>>>>>>>Cheers
> >>>>>>>>>>Nagesh
> >>>>>>>>>>
> >>>>>>>>>>Huang Jian wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>Dear Nagesh,
> >>>>>>>>>>>
> >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>you
> >>>>
> >>>>
> >>>>>>send
> >>>>>>
> >>>>>>
> >>>>>>>>>>>me. Now it works perfectly!!!
> >>>>>>>>>>>
> >>>>>>>>>>>Thank you!!
> >>>>>>>>>>>
> >>>>>>>>>>>Huang
> >>>>>>>>>>>
> >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> >>>>>>>>>>>
> >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> >>>>>>>>>>>
> >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>net,
> >>>
> >>>
> >>>>so
> >>>>
> >>>>
> >>>>>>still
> >>>>>>
> >>>>>>
> >>>>>>>>>>>via email
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>Hi Huang,
> >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>blast
> >>>
> >>>
> >>>>>>search.
> >>>>>>
> >>>>>>
> >>>>>>>>>Can
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>(2005/12/09).
> >>>>>>
> >>>>>>
> >>>>>>>>If
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>not I have attached it with this email, try to replace it
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>with
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>>>old
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>>one which has a bug.
> >>>>>>>>>>>>Let me know if it works.
> >>>>>>>>>>>>Nagesh
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>_______________________________________________
> >>>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>_______________________________________________
> >>>>>>
> >>>>>>
> >>>>>>>>Bioperl-l mailing list
> >>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>_______________________________________________
> >>>>>>>>
> >>>>>>>>
> >>>>>>Bioperl-l mailing list
> >>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From valiente at lsi.upc.edu  Mon Feb 20 13:51:35 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Mon, 20 Feb 2006 19:51:35 +0100
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <43FA0FB7.6060904@lsi.upc.edu>

The local flat file implementation of Bio::DB::Taxonomy seems to be fine:

use Bio::DB::Taxonomy;
my $nodesfile = "nodes.dmp";
my $namesfile = "names.dmp";
my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
                               -nodesfile => $nodesfile,
                               -namesfile => $namefile);
my $taxonid = $db->get_taxonid('Homo sapiens');

Here, $taxonid is 9606. However,

my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);

raises:

-------------------- WARNING ---------------------
MSG: can't create a species object for Homo sapiens (human) because it isn't a species but is a '' instead
---------------------------------------------------

Thanks,

Gabriel



From boris.steipe at utoronto.ca  Mon Feb 20 13:40:19 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 20 Feb 2006 13:40:19 -0500
Subject: [Bioperl-l] Matrix Average Code / Module ?
In-Reply-To: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
	<59825.192.168.1.176.1140416461.squirrel@192.168.1.176>
Message-ID: <92CF0104-0524-4BA3-B039-3CEECF68E20B@utoronto.ca>

Assuming you mean the arithmetic average of all elements in a matrix,  
you could do the following (using your numbers):


#!/usr/bin/perl -w
use strict;

my @matrix;

push(@matrix, [(11,22,43,54,50)]); # [(...)] :a list passed as an  
anonymous array
push(@matrix, [(27,87,74,32,10)]);
push(@matrix, [(66,58,98,78,20)]);
push(@matrix, [(22,23,44,16,34)]);

my $sum = 0;
my $number = 0;

foreach my $row (@matrix) {
     foreach my $element (@{$row}){
         $sum += $element;
         $number++;
     }
}

print "Average of $number elements = ", $sum/$number,"\n";
exit;


HTH,

B.




On 20 Feb 2006, at 01:21, Shameer Khadar wrote:

> Hi all,
> Is there any program/module to calculate the average of a blosum/ 
> pam any
> matrix ?
>
> I have a matrix and I need to see the average
>
> for example
>
> 11 22 43 54 50
> 27 87 74 32 10
> 66 58 98 78 20
> 22 23 44 16 34
>
> I have gone through Bio::Matrix::MatrixI and  
> Bio::Matrix::GenericMatrix
> and other perl modules like Math::Matrix
> http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm
> and Math::Cephes::Matrix - but none of them have a provison to do  
> matrix
> average calculation.
>
> Any help ???
> thanks in advance,
> Happy biocomputing !!!
>
>
> -- 
> Shameer Khadar
> National Centre for Biological Sciences (TIFR)
> UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
> T - 91-080-23636420-32 EXT 4241
> F - 91-080-23636662/23636675
> W - http://www.ncbs.res.in
> --------------------------------------------------
> "Refrain from illusions, insist on work and not words,
>  patiently seek divine and scientific truth."
> MM
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Mon Feb 20 17:01:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 20 Feb 2006 16:01:15 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on
	RemoteBlast.pmversion 1.28
In-Reply-To: <000e01c6363f$494bc5e0$15327e82@pyrimidine>
Message-ID: <000001c63669$2bf06a80$15327e82@pyrimidine>

Guojun Yang pointed out that his BLAST output was still not parsed
correctly, so I posted another change:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

The direct link for the module is:

http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view

Note that all caveats (can't sue if computer blows up, this is a very
preliminary bugfix, etc.) apply.

Apparently, NCBI has changed blastn and tblastx output to show features in
the region for each HSP, starting with the either one of the following
lines:

 Features in this part of subject sequence:
 Features flanking this part of subject sequence:

If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13,
most blastn or tblastx report parsing seems to choke on these lines, unless
you are pretty lucky.  This extra little feature was introduced a while back
for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default
and hadn't starting affecting web output until this last fall.  The first
fix I posted caught only the first version but not the second

The fix included a loop with debugging output to bypass this for now.  If
you use SearchIO directly for parsing (not through RemoteBlast) you can see
the bypassed lines by setting the '-verbose' flag to 1.

Thanks to Guojun Yang for pointing this out.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Monday, February 20, 2006 11:01 AM
> To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pmversion 1.28
> 
> I have added a preliminary bugfix for the problems seen with nucleotide
> blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
> perltidy to space out the blocks (really for my own purposes; it's a
> pretty
> complex module).  The fix bypasses the extra lines output for blastn and
> tblastx and now seems to parse the text output for those reports
> correctly.
> I tested it using all NCBI BLAST flavors for the last two version of BLAST
> (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
> report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
> tested on MacOSX at the moment, so I need people out there to test it out
> on
> anything they can to make sure it works before committing.  I'll be trying
> it on Windows today.  Report back to me and I'll post anything on
> bugzilla.
> 
> Here it is:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> > Sent: Thursday, February 16, 2006 3:46 AM
> > To: gyang at plantbio.uga.edu
> > Cc: bioperl-l at lists.open-bio.org; Chris Fields
> > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pm
> > version 1.28
> >
> > Hi,
> >
> > I have the same problem with the blast.pm-file.
> > The people of NCBI added some extra info when giving the Blast-output.
> > (see e.g. "Features flanking this part..." or "Features in this part
> > ..."), example added.
> > The blast.pm module starts looking for the hsp-alignement-information,
> > but it dies when it hits this Feature-information.
> >
> > Pieter
> >
> >
> > >gi|77552765|gb|DP000011.1|
> >
>  > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> > chromosome 12, complete
> >
> > sequence
> > Length=27492551
> >
> >  Features flanking this part of subject sequence:
> >
> > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-
> class
> >
>  > &from=19251479&to=19253693&view=gbwithparts>
> >
> > 2655 bp at 3' side: hypothetical protein
> >
>  > &from=19260091&to=19260600&view=gbwithparts>
> >
> >  Score = 36.2 bits (18),  Expect = 0.22
> >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> >  Strand=Plus/Minus
> >
> > Query  4         GTACTACTCTACTCTACT  21
> >                  ||||||||||||||||||
> >
> > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> >
> >
> >  Features flanking this part of subject sequence:
> >
> > 2991 bp at 5' side: hypothetical protein
> >
>  > &from=27003164&to=27003907&view=gbwithparts>
> >    1131 bp at 3' side: hypothetical protein
> >
> >
>  > &from=27008046&to=27010752&view=gbwithparts>
> >
> >  Score = 36.2 bits (18),  Expect = 0.22
> >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> >  Strand=Plus/Minus
> >
> > Query  2         ATGTACTACTCTACTCTA  19
> >                  ||||||||||||||||||
> > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> >
> >
> >
> >  Features in this part of subject sequence:
> >    DHHC zinc finger domain, putative
> >
> >
>  > &from=17614825&to=17618687&view=gbwithparts>
> >
> >  Score = 34.2 bits (17),  Expect = 0.87
> >  Identities = 17/17 (100%), Gaps = 0/17 (0%)
> >  Strand=Plus/Plus
> >
> > Query  5         TACTACTCTACTCTACT  21
> >                  |||||||||||||||||
> > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> >
> >
> >
> >  Features flanking this part of subject sequence:
> >    102 bp at 5' side: bZIP transcription factor, putative
> >
> >
>  > &from=2774964&to=2775778&view=gbwithparts>
> >    3740 bp at 3' side: yeast dcp1, putative
> >
>  > &from=2779635&to=2782508&view=gbwithparts>
> >
> >  Score = 32.2 bits (16),  Expect =
> > 3.4
> >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> >  Strand=Plus/Plus
> >
> > Query  7        CTACTCTACTCTACTC  22
> >                 ||||||||||||||||
> > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> >
> >
> >  Features flanking this part of subject sequence:
> >
> >    21 bp at 5' side: peptide transporter T17F3.11, putative
> >
>  > &from=27321354&to=27323117&view=gbwithparts>
> >
> > 10230 bp at 3' side: transposon protein, putative, unclassified
> >
>  > &from=27333383&to=27334285&view=gbwithparts>
> >
> >  Score = 32.2 bits (16),  Expect = 3.4
> >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> >  Strand=Plus/Minus
> >
> > Query  7         CTACTCTACTCTACTC  22
> >
> >                  ||||||||||||||||
> > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> >
> >
> >
> >
> > Guojun Yang wrote:
> >
> > >Hi, Chris,
> > >Finally the remoteblast test script works for the amino.fa query. but
> > when I try a nucleic acid sequence (see below), Error occurs:
> > >"
> > >waiting........
> > >------------- EXCEPTION  -------------
> > >MSG: no data for midline  Features flanking this part of subject
> > sequence:
> > >STACK Bio::SearchIO::blast::next_result
> > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> > hIO/blast.pm:1172
> > >STACK toplevel remoteblast_test:40
> > >"
> > >The query sequence is:
> > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > >
> > >The script (basically same as the remoteblast test, I only changed
> > database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > >#!/usr/bin/perl
> > >
> > >use Bio::SeqIO;
> > >use Bio::Seq;
> > >use Bio::Tools::Run::RemoteBlast;
> > >use Bio::SearchIO;
> > >use strict;
> > >my $prog='blastn';
> > >my $db='nr';
> > >my $e_val=1e-10;
> > >my @params=( -prog=>$prog,
> > >	-data=>$db,
> > >	-expect=>$e_val,
> > >	-readmethod=>'SearchIO');
> > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >
> > >my $v = 1;
> > >
> > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > >
> > >while (my $input = $str->next_seq()){
> > >  #Blast a sequence against a database:
> > >  #Alternatively, you could  pass in a file with many
> > >  #sequences rather than loop through sequence one at a time
> > >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >  #and swap the two lines below for an example of that.
> > >  my $r = $factory->submit_blast($input);
> > >  #my $r = $factory->submit_blast('amino.fa');
> > >  print STDERR "waiting..." if( $v > 0 );
> > >  while ( my @rids = $factory->each_rid ) {
> > >    foreach my $rid ( @rids ) {
> > >      my $rc = $factory->retrieve_blast($rid);
> > >      if( !ref($rc) ) {
> > >        if( $rc < 0 ) {
> > >          $factory->remove_rid($rid);
> > >        }
> > >        print STDERR "." if ( $v > 0 );
> > >        sleep 5;
> > >      } else {
> > >        my $result = $rc->next_result();
> > >        #save the output
> > >        my $filename = $result->query_name()."\.out";
> > >        $factory->save_output($filename);
> > >        $factory->remove_rid($rid);
> > >        print "\nQuery Name: ", $result->query_name(), "\n";
> > >        while ( my $hit = $result->next_hit ) {
> > >          next unless ( $v > 0);
> > >          print "\thit name is ", $hit->name, "\n";
> > >          while( my $hsp = $hit->next_hsp ) {
> > >            print "\t\tscore is ", $hsp->score, "\n";
> > >          }
> > >        }
> > >      }
> > >    }
> > >  }
> > >}
> > >
> > >
> > >Do you think there might still be something in the NCBI output format?
> > >
> > >Thank you,
> > >Guojun
> > >
> > >
> > >
> > >
> > >Guojun Yang
> > >Department of Plant Biology
> > >University of Georgia
> > >Tel: 706-542-1857
> > >Fax: 706-542-1805
> > >http://www.arches.uga.edu/~guojun
> > >
> > >
> > >
> > >----- Original Message -----
> > >From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >
> > >
> > >
> > >
> > >>Sorry, forgot to add that I didn't see the regex issue that you
> > mentioned.
> > >>It could be a perl-related issue.  Try the fixes I mentioned and see
> > what
> > >>happens.
> > >>
> > >>
> > >>>Christopher Fields
> > >>>
> > >>>
> > >>Postdoctoral Researcher - Switzer Lab
> > >>Dept. of Biochemistry
> > >>University of Illinois Urbana-Champaign
> > >>
> > >>
> > >>>>>-----Original Message-----
> > >>>>>
> > >>>>>
> > >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>Sent: Tuesday, February 14, 2006 12:36 PM
> > >>>To: 'gyang at plantbio.uga.edu'
> > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>
> > >>>
> > >>>>>It's a good habit to always add single quotes around words.  The
> perl
> > >>>>>
> > >>>>>
> > >>>interpreter may think a single bare word is a subroutine or perlfunc
> > >>>called with no args so will try to find a subroutine named blastp().
> > My
> > >>>debugger actually gives the error that the bare word blastp may
> > conflict
> > >>>with a future reserved word.  Like you said, 'use strict' will point
> > that
> > >>>out.
> > >>>
> > >>>
> > >>>>>As for the regex, it should match all the blast programs at NCBI
> > (blastp,
> > >>>>>
> > >>>>>
> > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure
> nothing
> > >>>else passes through.
> > >>>
> > >>>
> > >>>>>So, if you are using the script below, there are several errors.
> The
> > bare
> > >>>>>
> > >>>>>
> > >>>words for $prog and $db need quotes, and the flags for you @params
> > array
> > >>>don't have a dash before them.  I get this after adding quotes but
> > before
> > >>>adding the dashes to @params:
> > >>>
> > >>>
> > >>>>>C:\Perl\Scripts>test_blast.pl
> > >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> > >>>>>
> > >>>>>
> > >>>MSG:
> > >>>STACK: Error::throw
> > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > >>>live/Bio/Root/Root.pm:328
> > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> > >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> > >>>-----------------------------------------------------------
> > >>>
> > >>>
> > >>>>>The last line indicates a problem with this line:
> > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>Changing the @params to this:
> > >>>>>my @params=( -prog=>$prog,
> > >>>>>
> > >>>>>
> > >>>	-data=>$db,
> > >>>	-expect=>$e_val,
> > >>>	-readmethod=>'SearchIO');
> > >>>
> > >>>
> > >>>>>fixes it, and I get output as expected.
> > >>>>>Christopher Fields
> > >>>>>
> > >>>>>
> > >>>Postdoctoral Researcher - Switzer Lab
> > >>>Dept. of Biochemistry
> > >>>University of Illinois Urbana-Champaign
> > >>>
> > >>>
> > >>>>>>>>-----Original Message-----
> > >>>>>>>>
> > >>>>>>>>
> > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>
> > >>>>Hi, Chris,
> > >>>>When I tried with the perldoc script, It did not work either. First
> it
> > >>>>says $prog can not be bare word if I "use strict". I added quotes on
> > the
> > >>>>words, then it says the value for $prog does not match expression
> > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > >>>>
> > >>>>
> > >>>script
> > >>>
> > >>>
> > >>>>is shown below. Why is the expression "t?blast[pnx]"?
> > >>>>
> > >>>>#!/usr/bin/perl
> > >>>>
> > >>>>use Bio::SeqIO;
> > >>>>use Bio::Seq;
> > >>>>use Bio::Tools::Run::RemoteBlast;
> > >>>>use Bio::SearchIO;
> > >>>>
> > >>>>
> > >>>>my $prog=blastp;
> > >>>>my $db=swissprot;
> > >>>>my $e_val=1e-10;
> > >>>>my @params=( prog=>$prog,
> > >>>>	data=>$db,
> > >>>>	expect=>$e_val,
> > >>>>	readmethod=>'SearchIO');
> > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>
> > >>>>my $v = 1;
> > >>>>
> > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > >>>>
> > >>>>while (my $input = $str->next_seq()){
> > >>>>  #Blast a sequence against a database:
> > >>>>  #Alternatively, you could  pass in a file with many
> > >>>>  #sequences rather than loop through sequence one at a time
> > >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>  #and swap the two lines below for an example of that.
> > >>>>  my $r = $factory->submit_blast($input);
> > >>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>    foreach my $rid ( @rids ) {
> > >>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>      if( !ref($rc) ) {
> > >>>>        if( $rc < 0 ) {
> > >>>>          $factory->remove_rid($rid);
> > >>>>        }
> > >>>>        print STDERR "." if ( $v > 0 );
> > >>>>        sleep 5;
> > >>>>      } else {
> > >>>>        my $result = $rc->next_result();
> > >>>>        #save the output
> > >>>>        my $filename = $result->query_name()."\.out";
> > >>>>        $factory->save_output($filename);
> > >>>>        $factory->remove_rid($rid);
> > >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>        while ( my $hit = $result->next_hit ) {
> > >>>>          next unless ( $v > 0);
> > >>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>          }
> > >>>>        }
> > >>>>      }
> > >>>>    }
> > >>>>  }
> > >>>>}
> > >>>>
> > >>>>Thank you for your help!
> > >>>>
> > >>>>
> > >>>>Guojun
> > >>>>Department of Plant Biology
> > >>>>University of Georgia
> > >>>>
> > >>>>----- Original Message -----
> > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>To: gyang at plantbio.uga.edu
> > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>>Try two things:
> > >>>>>
> > >>>>>
> > >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> > >>>>>>
> > >>>>>>
> > >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > >>>>>
> > >>>>>
> > >>>>wrong
> > >>>>
> > >>>>
> > >>>>>with the logic in your subroutine:
> > >>>>>
> > >>>>>
> > >>>>>>my $v = 1;
> > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> );
> > >>>>>>while (my $input = $str->next_seq()){
> > >>>>>>
> > >>>>>>
> > >>>>>  #Blast a sequence against a database:
> > >>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>  #and swap the two lines below for an example of that.
> > >>>>>  my $r = $factory->submit_blast($input);
> > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>    foreach my $rid ( @rids ) {
> > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>      if( !ref($rc) ) {
> > >>>>>        if( $rc < 0 ) {
> > >>>>>          $factory->remove_rid($rid);
> > >>>>>        }
> > >>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>        sleep 5;
> > >>>>>      } else {
> > >>>>>        my $result = $rc->next_result();
> > >>>>>        #save the output
> > >>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>        $factory->save_output($filename);
> > >>>>>        $factory->remove_rid($rid);
> > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>          next unless ( $v > 0);
> > >>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>          }
> > >>>>>        }
> > >>>>>      }
> > >>>>>    }
> > >>>>>  }
> > >>>>>}
> > >>>>>
> > >>>>>
> > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > >>>>>>
> > >>>>>>
> > >>>really
> > >>>
> > >>>
> > >>>>>shouldn't make that much of a difference, but I noticed that the
> CVS
> > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > >>>>>released; the Bugzilla version is based off CVS.
> > >>>>>
> > >>>>>
> > >>>>>>Christopher Fields
> > >>>>>>
> > >>>>>>
> > >>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>Dept. of Biochemistry
> > >>>>>University of Illinois Urbana-Champaign
> > >>>>>
> > >>>>>
> > >>>>>>>-----Original Message-----
> > >>>>>>>
> > >>>>>>>
> > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> > >>>>>>To: bioperl-l at lists.open-bio.org
> > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>Thanks, Chris,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> > >>>>>>
> > >>>>>>
> > >>>one
> > >>>
> > >>>
> > >>>>from
> > >>>>
> > >>>>
> > >>>>>>your bug report. The running version is 1.5 when I use the command
> > >>>>>>
> > >>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>>>sent me. But when I tried the script, it doesn't change much. My
> > >>>>>>remoteblast code (portion) is here:
> > >>>>>>
> > >>>>>>
> > >>>>>>>>sub search {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>local
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > >>>>>>local
> > >>>>>>
> > >>>>>>
> > >>>>>>
> >
> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > >>>
> > >>>
> > >>>>>>'no';
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > >>>>>>			      -id=>"query",
> > >>>>>>			      -desc=>"new seq");
> > >>>>>>my $len=$query->length();
> > >>>>>>@db=('nr','htgs','wgs');
> > >>>>>>foreach my $db (@db) {
> > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> =>'blastn',
> > >>>>>>						'-data' =>"$db",
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>'-expect'=>"$E_value");
> > >>
> > >>
> > >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>my @rids = $factory->each_rid();
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>foreach my $rid ( @rids ) {
> > >>>>>>    print STDERR "$rid\n";
> > >>>>>>}
> > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > >>>>>>print STDERR "waiting...";
> > >>>>>>sleep 60;
> > >>>>>>
> > >>>>>>
> > >>>>>>>>foreach my $rid ( @rids ) {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>    my $rc = $factory->retrieve_blast($rid);
> > >>>>>>    while (!ref($rc) ) {
> > >>>>>>	if( $rc < 0 ) {
> > >>>>>># retrieve_blast returns -1 on error
> > >>>>>>	    $factory->remove_rid($rid);
> > >>>>>>	    print "Error!\n";
> > >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> > >>>>>>	    die "Can't retrieve $rid";
> > >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > >>>>>>
> > >>>>>>
> > >>>finished'
> > >>>
> > >>>
> > >>>>>>	    sleep 60;
> > >>>>>>	    $rc = $factory->retrieve_blast($rid);
> > >>>>>>	}
> > >>>>>>    }
> > >>>>>>    if (ref($rc)) {
> > >>>>>>	print STDERR "Done.\n";
> > >>>>>>	 while( my $result = $rc->next_result) {
> > >>>>>>	    while( my $hit = $result->next_hit()) {
> > >>>>>>	    	$hit_name=$hit->name;
> > >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > >>>>>>		$name=$1;
> > >>>>>>		@left_plus_start=();
> > >>>>>>		@left_plus_end=();
> > >>>>>>		@left_minus_start=();
> > >>>>>>		@left_minus_end=();
> > >>>>>>		@right_plus_start=();
> > >>>>>>		@right_plus_end=();
> > >>>>>>		@right_minus_start=();
> > >>>>>>		@right_minus_end=();
> > >>>>>>
> > >>>>>>
> > >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>		while( my $hsp = $hit->next_hsp()) {
> > >>>>>>......
> > >>>>>>
> > >>>>>>
> > >>>>>>>>It was working quite well before around October laster year, but
> > >>>>>>>>
> > >>>>>>>>
> > >>>>it has
> > >>>>
> > >>>>
> > >>>>>>stopped since then, When a submission is sent via a webpage, the
> cgi
> > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> > >>>>>>
> > >>>>>>
> > >>>>finally
> > >>>>
> > >>>>
> > >>>>>>the expected email is received but without real results although
> it
> > >>>>>>
> > >>>>>>
> > >>>>does
> > >>>>
> > >>>>
> > >>>>>>contain something from other parts of the script. Apparently the
> > >>>>>>
> > >>>>>>
> > >>>>search
> > >>>>
> > >>>>
> > >>>>>>sub did not return anything (I know there is something should be
> > >>>>>>returned.). Is it also possible the format of the NCBI output for
> > >>>>>>
> > >>>>>>
> > >>>each
> > >>>
> > >>>
> > >>>>>>result has changed?
> > >>>>>>Thank you,
> > >>>>>>Guojun
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>Department of Plant Biology
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>University of Georgia
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>>----- Original Message -----
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>checking
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> > >>>>>>>
> > >>>>>>>
> > >>>>two
> > >>>>
> > >>>>
> > >>>>>>>separate directories) or are you looking in modules?  Here's the
> > >>>>>>>
> > >>>>>>>
> > >>>way
> > >>>
> > >>>
> > >>>>to
> > >>>>
> > >>>>
> > >>>>>>>check the version (from the FAQ):
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>perl -MBio::Root::Version -e 'print
> > >>>>>>>>
> > >>>>>>>>
> > >>>>$Bio::Root::Version::VERSION,"\n"'
> > >>>>
> > >>>>
> > >>>>>>>>If you have two full bioperl distributions on your computer,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>normally
> > >>>>
> > >>>>
> > >>>>>>only
> > >>>>>>
> > >>>>>>
> > >>>>>>>one will be in use unless you have explicitly set the environment
> > >>>>>>>
> > >>>>>>>
> > >>>>>>variable
> > >>>>>>
> > >>>>>>
> > >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first
> before
> > >>>>>>>
> > >>>>>>>
> > >>>>your
> > >>>>
> > >>>>
> > >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> > >>>>>>>
> > >>>>>>>
> > >>>>mixing
> > >>>>
> > >>>>
> > >>>>>>>then, but only if perl can't find a particular module in the path
> > >>>>>>>
> > >>>>>>>
> > >>>>>>designated
> > >>>>>>
> > >>>>>>
> > >>>>>>>in PERL5LIB; then it will progress through the directories listed
> > >>>>>>>
> > >>>>>>>
> > >>>in
> > >>>
> > >>>
> > >>>>>>@INC.
> > >>>>>>
> > >>>>>>
> > >>>>>>>This may happen if a module is unique to a particular release,
> but
> > >>>>>>>
> > >>>>>>>
> > >>>>>>shouldn't
> > >>>>>>
> > >>>>>>
> > >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> > >>>>>>>
> > >>>>>>>
> > >>>can
> > >>>
> > >>>
> > >>>>>>check
> > >>>>>>
> > >>>>>>
> > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > >>>>>>>
> > >>>>>>>
> > >>>>differ
> > >>>>
> > >>>>
> > >>>>>>>depending on your OS, perl build, etc.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>Regardless, if you follow the directions for installing bioperl
> > >>>>>>>>
> > >>>>>>>>
> > >>>>for
> > >>>>
> > >>>>
> > >>>>>>your
> > >>>>>>
> > >>>>>>
> > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > >>>>>>>
> > >>>>>>>
> > >>>>unless
> > >>>>
> > >>>>
> > >>>>>>you
> > >>>>>>
> > >>>>>>
> > >>>>>>>explicitly change the installation directory when using 'perl
> > >>>>>>>
> > >>>>>>>
> > >>>>>>Makefile.PL'),
> > >>>>>>
> > >>>>>>
> > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> > >>>>>>>
> > >>>>>>>
> > >>>>install
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> > >>>>>>>
> > >>>>>>>
> > >>>>See
> > >>>>
> > >>>>
> > >>>>>>this
> > >>>>>>
> > >>>>>>
> > >>>>>>>page:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > >>>>>>>>for more details.
> > >>>>>>>>Christopher Fields
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>Dept. of Biochemistry
> > >>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> > >>>>>>>>To: bioperl-l at lists.open-bio.org
> > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Hi, Chris,
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>I do have different versions of bioperl on my Linux machine
> > >>>>>>>>
> > >>>>>>>>
> > >>>(1.4.
> > >>>
> > >>>
> > >>>>and
> > >>>>
> > >>>>
> > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> > >>>>>>>>
> > >>>>>>>>
> > >>>>1.5.1
> > >>>>
> > >>>>
> > >>>>>>or I
> > >>>>>>
> > >>>>>>
> > >>>>>>>>need to uninstall and remove the previous versions. I could not
> > >>>>>>>>
> > >>>>>>>>
> > >>>>find
> > >>>>
> > >>>>
> > >>>>>>any
> > >>>>>>
> > >>>>>>
> > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> > >>>>>>>>
> > >>>>>>>>
> > >>>>some
> > >>>>
> > >>>>
> > >>>>>>>>suggestion?
> > >>>>>>>>Thanks,
> > >>>>>>>>Guojun
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Department of Plant Biology
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>University of Georgia
> > >>>>>>>>      _____
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>updated from CVS
> > >>>>>>
> > >>>>>>
> > >>>>>>>>which isn't the latest fix.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Make sure that you check the following:
> > >>>>>>>>>>1) Always post to the mailing list:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>(CVS)
> > >>>>
> > >>>>
> > >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> > >>>>>>>>
> > >>>>>>>>
> > >>>>only
> > >>>>
> > >>>>
> > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > >>>>>>>>
> > >>>>>>>>
> > >>>can't
> > >>>
> > >>>
> > >>>>>>>>guarantee that mixing modules from old and new distributions
> > >>>>>>>>
> > >>>>>>>>
> > >>>(1.4
> > >>>
> > >>>
> > >>>>and
> > >>>>
> > >>>>
> > >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> > >>>>>>>>
> > >>>>>>>>
> > >>>>saved
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>(v2.2.13)
> > >>>>>>
> > >>>>>>
> > >>>>>>>>but it should still save it. I believe as long as next_results()
> > >>>>>>>>
> > >>>>>>>>
> > >>>>isn't
> > >>>>
> > >>>>
> > >>>>>>>>called, it will work.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>2.2.13
> > >>>
> > >>>
> > >>>>>>text output
> > >>>>>>
> > >>>>>>
> > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> > >>>>>>>>
> > >>>>>>>>
> > >>>Roger
> > >>>
> > >>>
> > >>>>Hall
> > >>>>
> > >>>>
> > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> > >>>>>>>>
> > >>>>>>>>
> > >>>>(Jason
> > >>>>
> > >>>>
> > >>>>>>or
> > >>>>>>
> > >>>>>>
> > >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>Bugzilla:
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>option
> > >>>>
> > >>>>
> > >>>>>>of
> > >>>>>>
> > >>>>>>
> > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> > >>>>>>>>
> > >>>>>>>>
> > >>>>this
> > >>>>
> > >>>>
> > >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>>>means that the final version will change to refle the new
> > >>>>>>>>
> > >>>>>>>>
> > >>>version.
> > >>>
> > >>>
> > >>>>>>>>>>>>Christopher Fields
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>Dept. of Biochemistry
> > >>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>    _____
> > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> > >>>>>>>>To: Chris Fields
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>Hi, Chris
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>for
> > >>>>
> > >>>>
> > >>>>>>my cgi
> > >>>>>>
> > >>>>>>
> > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > >>>>>>>>
> > >>>>>>>>
> > >>>>even
> > >>>>
> > >>>>
> > >>>>>>get
> > >>>>>>
> > >>>>>>
> > >>>>>>>>any RID. Is there any suggestion?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>>>Guojun
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>Guojun Yang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Department of Plant Biology
> > >>>>>>>>University of Georgia
> > >>>>>>>>Tel: 706-542-1857
> > >>>>>>>>Fax: 706-542-1805
> > >>>>>>>>http://www.arches.uga.edu/~guojun
> > >>>>>>>>    _____
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>I would say give the new code a try, but realize that it
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>hasn't
> > >>>>
> > >>>>
> > >>>>>>been
> > >>>>>>
> > >>>>>>
> > >>>>>>>>checked
> > >>>>>>>>in (like I said below). I will try going over the modified
> > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> > >>>>>>>>
> > >>>>>>>>
> > >>>>anything I
> > >>>>
> > >>>>
> > >>>>>>>>might
> > >>>>>>>>have missed. The changed order in the header of BLAST text
> > >>>>>>>>
> > >>>>>>>>
> > >>>output
> > >>>
> > >>>
> > >>>>has
> > >>>>
> > >>>>
> > >>>>>>me a
> > >>>>>>
> > >>>>>>
> > >>>>>>>>bit worried that it might not catch everything, but it at least
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>doesn't
> > >>>>>>
> > >>>>>>
> > >>>>>>>>hang
> > >>>>>>>>in the while() loop I described in the bug report below (bug
> > >>>>>>>>
> > >>>>>>>>
> > >>>>#1934)
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>seems to process everything fine.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>If you want more stability in the code, you might consider
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>changing over
> > >>>>>>
> > >>>>>>
> > >>>>>>>>to
> > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> > >>>>>>>>
> > >>>>>>>>
> > >>>>some
> > >>>>
> > >>>>
> > >>>>>>>>changes
> > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > >>>>>>>>
> > >>>>>>>>
> > >>>>saving
> > >>>>
> > >>>>
> > >>>>>>XML
> > >>>>>>
> > >>>>>>
> > >>>>>>>>output, but I believe it parses everything regardless. If you
> > >>>>>>>>
> > >>>>>>>>
> > >>>look
> > >>>
> > >>>
> > >>>>>>back
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>last month or so there has been a bit of discussion here about
> > >>>>>>>>
> > >>>>>>>>
> > >>>it.
> > >>>
> > >>>
> > >>>>>>Jason
> > >>>>>>
> > >>>>>>
> > >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>remoteblast/
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>Christopher Fields
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>Dept. of Biochemistry
> > >>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> > >>>>>>>>>To: bioperl-l at bioperl.org
> > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>version
> > >>>>
> > >>>>
> > >>>>>>1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>Hi, Everybody,
> > >>>>>>>>>I see this post and am wondering if this is the reason for the
> > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>MAK,
> > >>>>
> > >>>>
> > >>>>>>for
> > >>>>>>
> > >>>>>>
> > >>>>>>>>MITE
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>sequence analysis. It was working very well until around
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>November
> > >>>>
> > >>>>
> > >>>>>>2005,
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>when it stopped returning any result (the site is fine and
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>seems
> > >>>
> > >>>
> > >>>>to
> > >>>>
> > >>>>
> > >>>>>>be
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>doing sth after submission). In the CGI script, I used
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>remoteblast
> > >>>>
> > >>>>
> > >>>>>>(that
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>access to
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>the server because I moved. Quite several people sent emails
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>to
> > >>>
> > >>>
> > >>>>us
> > >>>>
> > >>>>
> > >>>>>>about
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>problem?
> > >>>>
> > >>>>
> > >>>>>>>>Should
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>version?
> > >>>>
> > >>>>
> > >>>>>>>>>Thanks a lot,
> > >>>>>>>>>Guojun
> > >>>>>>>>>
> > >>>>>>>>>Department of Plant Biology
> > >>>>>>>>>University of Georgia
> > >>>>>>>>>Tel: 706-542-1857
> > >>>>>>>>>Fax: 706-542-1805
> > >>>>>>>>>http://www.arches.uga.edu/~guojun
> > >>>>>>>>>_____
> > >>>>>>>>>
> > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>Jian'
> > >>>>
> > >>>>
> > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>[mailto:bioperl-
> > >>>
> > >>>
> > >>>>>>>>>l at bioperl.org]
> > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>
> > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>CVS.
> > >>>>
> > >>>>
> > >>>>>>It
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>will
> > >>>>>>>>>work for saving text output. However, it will not parse
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>anything
> > >>>
> > >>>
> > >>>>>>using
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>next_result (it will likely hang) and will not save XML
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>format.
> > >>>
> > >>>
> > >>>>See
> > >>>>
> > >>>>
> > >>>>>>>>these
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>bugs:
> > >>>>>>>>>
> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>
> > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>and
> > >>>
> > >>>
> > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>yet
> > >>>>
> > >>>>
> > >>>>>>so
> > >>>>>>
> > >>>>>>
> > >>>>>>>>are
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>still not included in bioperl-live; they may be further
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>modified
> > >>>
> > >>>
> > >>>>>>before
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>just
> > >>>>
> > >>>>
> > >>>>>>try
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>first fix, which is a change to SearchIO::blast.
> > >>>>>>>>>
> > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>script
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>which
> > >>>>>>>>>had problems; the script you used saves the output but doesn't
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>actually
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>data).
> > >>>>
> > >>>>
> > >>>>>>Is
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>>>tried
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>readmethod
> > >>>>
> > >>>>
> > >>>>>>=>
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>blast"
> > >>>>>>>>>using your version of RemoteBlast and method next_result()?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>Like
> > >>>
> > >>>
> > >>>>>>below
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>(from
> > >>>>>>>>>perldoc):
> > >>>>>>>>>
> > >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> > >>>>>>>>>foreach my $rid ( @rids ) {
> > >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>>>if( !ref($rc) ) {
> > >>>>>>>>>if( $rc < 0 ) {
> > >>>>>>>>>$factory->remove_rid($rid);
> > >>>>>>>>>}
> > >>>>>>>>>print STDERR "." if ( $v > 0 );
> > >>>>>>>>>sleep 5;
> > >>>>>>>>>} else { # parsing
> > >>>>>>>>>starts here
> > >>>>>>>>>my $result = $rc->next_result(); # it should hang
> > >>>>>>>>>here
> > >>>>>>>>>#save the output
> > >>>>>>>>>my $filename = $result->query_name()."\.out";
> > >>>>>>>>>$factory->save_output($filename);
> > >>>>>>>>>$factory->remove_rid($rid);
> > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>>>>while ( my $hit = $result->next_hit ) {
> > >>>>>>>>>next unless ( $v > 0);
> > >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> > >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>My script hanged if I used next_result() in any way prior to
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>the
> > >>>
> > >>>
> > >>>>>>fixes.
> > >>>>>>
> > >>>>>>
> > >>>>>>>>I
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>want to see how many others are having the same issues with
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>parsing
> > >>>>
> > >>>>
> > >>>>>>>>using
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>the CVS version of bioperl-live.
> > >>>>>>>>>
> > >>>>>>>>>Christopher Fields
> > >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>>Dept. of Biochemistry
> > >>>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>l-
> > >>>
> > >>>
> > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> > >>>>>>>>>>To: Huang Jian; bioperl-l
> > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>
> > >>>>>>>>>>Hi Huang,
> > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>works
> > >>>>
> > >>>>
> > >>>>>>on
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>logic of checking the temporary file size to determine
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>whether
> > >>>
> > >>>
> > >>>>the
> > >>>>
> > >>>>
> > >>>>>>>>Blast
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>results are ready. This condition is not getting satisfied
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>may
> > >>>
> > >>>
> > >>>>be
> > >>>>
> > >>>>
> > >>>>>>due
> > >>>>>>
> > >>>>>>
> > >>>>>>>>to
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>some changes brought about by NCBI. I had this problem
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>recently
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>figured out that the solution was to use the latest version
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>has
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>this problem fixed (does not use file size logic any more)
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>is
> > >>>>>>
> > >>>>>>
> > >>>>>>>>not
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>yet included in the BioPerl package.
> > >>>>>>>>>>Cheers
> > >>>>>>>>>>Nagesh
> > >>>>>>>>>>
> > >>>>>>>>>>Huang Jian wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>Dear Nagesh,
> > >>>>>>>>>>>
> > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>you
> > >>>>
> > >>>>
> > >>>>>>send
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>me. Now it works perfectly!!!
> > >>>>>>>>>>>
> > >>>>>>>>>>>Thank you!!
> > >>>>>>>>>>>
> > >>>>>>>>>>>Huang
> > >>>>>>>>>>>
> > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> > >>>>>>>>>>>
> > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> > >>>>>>>>>>>
> > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>net,
> > >>>
> > >>>
> > >>>>so
> > >>>>
> > >>>>
> > >>>>>>still
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>via email
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>Hi Huang,
> > >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>blast
> > >>>
> > >>>
> > >>>>>>search.
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>Can
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>(2005/12/09).
> > >>>>>>
> > >>>>>>
> > >>>>>>>>If
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>not I have attached it with this email, try to replace it
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>with
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>old
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>>>one which has a bug.
> > >>>>>>>>>>>>Let me know if it works.
> > >>>>>>>>>>>>Nagesh
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>_______________________________________________
> > >>>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>_______________________________________________
> > >>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>_______________________________________________
> > >>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>_______________________________________________
> > >>>>>>
> > >>>>>>
> > >>>>>>>>Bioperl-l mailing list
> > >>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>
> > >>>>>>>>_______________________________________________
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>Bioperl-l mailing list
> > >>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >
> > >_______________________________________________
> > >Bioperl-l mailing list
> > >Bioperl-l at lists.open-bio.org
> > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > >
> >
> > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Mon Feb 20 17:22:28 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 20 Feb 2006 17:22:28 -0500
Subject: [Bioperl-l] Tested-OK
Message-ID: <20060220172228.f7d22947@dogwood.plantbio.uga.edu>

Chris, I tested the latest fix for blast.pm on my linux with blastn. It worked very well although my CGI script still not returning what I need, but it's not related to this parsing of blast results I think. Thanks for your great efforts.

Guojun 

----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: 'Chris Fields' [mailto:cjfields at uiuc.edu], 'Pieter Monsieurs' [mailto:Pieter.Monsieurs at esat.kuleuven.be], gyang at plantbio.uga.edu
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28


> Guojun Yang pointed out that his BLAST output was still not parsed
> correctly, so I posted another change:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > The direct link for the module is:
> > http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view
> > Note that all caveats (can't sue if computer blows up, this is a very
> preliminary bugfix, etc.) apply.
> > Apparently, NCBI has changed blastn and tblastx output to show features in
> the region for each HSP, starting with the either one of the following
> lines:
> >  Features in this part of subject sequence:
>  Features flanking this part of subject sequence:
> > If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13,
> most blastn or tblastx report parsing seems to choke on these lines, unless
> you are pretty lucky.  This extra little feature was introduced a while back
> for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default
> and hadn't starting affecting web output until this last fall.  The first
> fix I posted caught only the first version but not the second
> > The fix included a loop with debugging output to bypass this for now.  If
> you use SearchIO directly for parsing (not through RemoteBlast) you can see
> the bypassed lines by setting the '-verbose' flag to 1.
> > Thanks to Guojun Yang for pointing this out.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Chris Fields
> > Sent: Monday, February 20, 2006 11:01 AM
> > To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> > RemoteBlast.pmversion 1.28
> > > > I have added a preliminary bugfix for the problems seen with nucleotide
> > blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
> > perltidy to space out the blocks (really for my own purposes; it's a
> > pretty
> > complex module).  The fix bypasses the extra lines output for blastn and
> > tblastx and now seems to parse the text output for those reports
> > correctly.
> > I tested it using all NCBI BLAST flavors for the last two version of BLAST
> > (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
> > report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
> > tested on MacOSX at the moment, so I need people out there to test it out
> > on
> > anything they can to make sure it works before committing.  I'll be trying
> > it on Windows today.  Report back to me and I'll post anything on
> > bugzilla.
> > > > Here it is:
> > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> > > Sent: Thursday, February 16, 2006 3:46 AM
> > > To: gyang at plantbio.uga.edu
> > > Cc: bioperl-l at lists.open-bio.org; Chris Fields
> > > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> > RemoteBlast.pm
> > > version 1.28
> > >
> > > Hi,
> > >
> > > I have the same problem with the blast.pm-file.
> > > The people of NCBI added some extra info when giving the Blast-output.
> > > (see e.g. "Features flanking this part..." or "Features in this part
> > > ..."), example added.
> > > The blast.pm module starts looking for the hsp-alignement-information,
> > > but it dies when it hits this Feature-information.
> > >
> > > Pieter
> > >
> > >
> > > >gi|77552765|gb|DP000011.1|
> > >
> >  > > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> > > chromosome 12, complete
> > >
> > > sequence
> > > Length=27492551
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-
> > class
> > >
> >  > > &from=19251479&to=19253693&view=gbwithparts>
> > >
> > > 2655 bp at 3' side: hypothetical protein
> > >
> >  > > &from=19260091&to=19260600&view=gbwithparts>
> > >
> > >  Score = 36.2 bits (18),  Expect = 0.22
> > >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  4         GTACTACTCTACTCTACT  21
> > >                  ||||||||||||||||||
> > >
> > > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > > 2991 bp at 5' side: hypothetical protein
> > >
> >  > > &from=27003164&to=27003907&view=gbwithparts>
> > >    1131 bp at 3' side: hypothetical protein
> > >
> > >
> >  > > &from=27008046&to=27010752&view=gbwithparts>
> > >
> > >  Score = 36.2 bits (18),  Expect = 0.22
> > >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  2         ATGTACTACTCTACTCTA  19
> > >                  ||||||||||||||||||
> > > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> > >
> > >
> > >
> > >  Features in this part of subject sequence:
> > >    DHHC zinc finger domain, putative
> > >
> > >
> >  > > &from=17614825&to=17618687&view=gbwithparts>
> > >
> > >  Score = 34.2 bits (17),  Expect = 0.87
> > >  Identities = 17/17 (100%), Gaps = 0/17 (0%)
> > >  Strand=Plus/Plus
> > >
> > > Query  5         TACTACTCTACTCTACT  21
> > >                  |||||||||||||||||
> > > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> > >
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >    102 bp at 5' side: bZIP transcription factor, putative
> > >
> > >
> >  > > &from=2774964&to=2775778&view=gbwithparts>
> > >    3740 bp at 3' side: yeast dcp1, putative
> > >
> >  > > &from=2779635&to=2782508&view=gbwithparts>
> > >
> > >  Score = 32.2 bits (16),  Expect =
> > > 3.4
> > >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > >  Strand=Plus/Plus
> > >
> > > Query  7        CTACTCTACTCTACTC  22
> > >                 ||||||||||||||||
> > > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > >    21 bp at 5' side: peptide transporter T17F3.11, putative
> > >
> >  > > &from=27321354&to=27323117&view=gbwithparts>
> > >
> > > 10230 bp at 3' side: transposon protein, putative, unclassified
> > >
> >  > > &from=27333383&to=27334285&view=gbwithparts>
> > >
> > >  Score = 32.2 bits (16),  Expect = 3.4
> > >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  7         CTACTCTACTCTACTC  22
> > >
> > >                  ||||||||||||||||
> > > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> > >
> > >
> > >
> > >
> > > Guojun Yang wrote:
> > >
> > > >Hi, Chris,
> > > >Finally the remoteblast test script works for the amino.fa query. but
> > > when I try a nucleic acid sequence (see below), Error occurs:
> > > >"
> > > >waiting........
> > > >------------- EXCEPTION  -------------
> > > >MSG: no data for midline  Features flanking this part of subject
> > > sequence:
> > > >STACK Bio::SearchIO::blast::next_result
> > > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> > > hIO/blast.pm:1172
> > > >STACK toplevel remoteblast_test:40
> > > >"
> > > >The query sequence is:
> > > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > > >
> > > >The script (basically same as the remoteblast test, I only changed
> > > database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > > >#!/usr/bin/perl
> > > >
> > > >use Bio::SeqIO;
> > > >use Bio::Seq;
> > > >use Bio::Tools::Run::RemoteBlast;
> > > >use Bio::SearchIO;
> > > >use strict;
> > > >my $prog='blastn';
> > > >my $db='nr';
> > > >my $e_val=1e-10;
> > > >my @params=( -prog=>$prog,
> > > >	-data=>$db,
> > > >	-expect=>$e_val,
> > > >	-readmethod=>'SearchIO');
> > > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >
> > > >my $v = 1;
> > > >
> > > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > > >
> > > >while (my $input = $str->next_seq()){
> > > >  #Blast a sequence against a database:
> > > >  #Alternatively, you could  pass in a file with many
> > > >  #sequences rather than loop through sequence one at a time
> > > >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >  #and swap the two lines below for an example of that.
> > > >  my $r = $factory->submit_blast($input);
> > > >  #my $r = $factory->submit_blast('amino.fa');
> > > >  print STDERR "waiting..." if( $v > 0 );
> > > >  while ( my @rids = $factory->each_rid ) {
> > > >    foreach my $rid ( @rids ) {
> > > >      my $rc = $factory->retrieve_blast($rid);
> > > >      if( !ref($rc) ) {
> > > >        if( $rc < 0 ) {
> > > >          $factory->remove_rid($rid);
> > > >        }
> > > >        print STDERR "." if ( $v > 0 );
> > > >        sleep 5;
> > > >      } else {
> > > >        my $result = $rc->next_result();
> > > >        #save the output
> > > >        my $filename = $result->query_name()."\.out";
> > > >        $factory->save_output($filename);
> > > >        $factory->remove_rid($rid);
> > > >        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >        while ( my $hit = $result->next_hit ) {
> > > >          next unless ( $v > 0);
> > > >          print "\thit name is ", $hit->name, "\n";
> > > >          while( my $hsp = $hit->next_hsp ) {
> > > >            print "\t\tscore is ", $hsp->score, "\n";
> > > >          }
> > > >        }
> > > >      }
> > > >    }
> > > >  }
> > > >}
> > > >
> > > >
> > > >Do you think there might still be something in the NCBI output format?
> > > >
> > > >Thank you,
> > > >Guojun
> > > >
> > > >
> > > >
> > > >
> > > >Guojun Yang
> > > >Department of Plant Biology
> > > >University of Georgia
> > > >Tel: 706-542-1857
> > > >Fax: 706-542-1805
> > > >http://www.arches.uga.edu/~guojun
> > > >
> > > >
> > > >
> > > >----- Original Message -----
> > > >From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >
> > > >
> > > >
> > > >
> > > >>Sorry, forgot to add that I didn't see the regex issue that you
> > > mentioned.
> > > >>It could be a perl-related issue.  Try the fixes I mentioned and see
> > > what
> > > >>happens.
> > > >>
> > > >>
> > > >>>Christopher Fields
> > > >>>
> > > >>>
> > > >>Postdoctoral Researcher - Switzer Lab
> > > >>Dept. of Biochemistry
> > > >>University of Illinois Urbana-Champaign
> > > >>
> > > >>
> > > >>>>>-----Original Message-----
> > > >>>>>
> > > >>>>>
> > > >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>Sent: Tuesday, February 14, 2006 12:36 PM
> > > >>>To: 'gyang at plantbio.uga.edu'
> > > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >>>
> > > >>>
> > > >>>>>It's a good habit to always add single quotes around words.  The
> > perl
> > > >>>>>
> > > >>>>>
> > > >>>interpreter may think a single bare word is a subroutine or perlfunc
> > > >>>called with no args so will try to find a subroutine named blastp().
> > > My
> > > >>>debugger actually gives the error that the bare word blastp may
> > > conflict
> > > >>>with a future reserved word.  Like you said, 'use strict' will point
> > > that
> > > >>>out.
> > > >>>
> > > >>>
> > > >>>>>As for the regex, it should match all the blast programs at NCBI
> > > (blastp,
> > > >>>>>
> > > >>>>>
> > > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure
> > nothing
> > > >>>else passes through.
> > > >>>
> > > >>>
> > > >>>>>So, if you are using the script below, there are several errors.
> > The
> > > bare
> > > >>>>>
> > > >>>>>
> > > >>>words for $prog and $db need quotes, and the flags for you @params
> > > array
> > > >>>don't have a dash before them.  I get this after adding quotes but
> > > before
> > > >>>adding the dashes to @params:
> > > >>>
> > > >>>
> > > >>>>>C:\Perl\Scripts>test_blast.pl
> > > >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> > > >>>>>
> > > >>>>>
> > > >>>MSG:
> > > >>>STACK: Error::throw
> > > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > > >>>live/Bio/Root/Root.pm:328
> > > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > > >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> > > >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> > > >>>-----------------------------------------------------------
> > > >>>
> > > >>>
> > > >>>>>The last line indicates a problem with this line:
> > > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >>>>>Changing the @params to this:
> > > >>>>>my @params=( -prog=>$prog,
> > > >>>>>
> > > >>>>>
> > > >>>	-data=>$db,
> > > >>>	-expect=>$e_val,
> > > >>>	-readmethod=>'SearchIO');
> > > >>>
> > > >>>
> > > >>>>>fixes it, and I get output as expected.
> > > >>>>>Christopher Fields
> > > >>>>>
> > > >>>>>
> > > >>>Postdoctoral Researcher - Switzer Lab
> > > >>>Dept. of Biochemistry
> > > >>>University of Illinois Urbana-Champaign
> > > >>>
> > > >>>
> > > >>>>>>>>-----Original Message-----
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> > > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >>>>
> > > >>>>Hi, Chris,
> > > >>>>When I tried with the perldoc script, It did not work either. First
> > it
> > > >>>>says $prog can not be bare word if I "use strict". I added quotes on
> > > the
> > > >>>>words, then it says the value for $prog does not match expression
> > > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > > >>>>
> > > >>>>
> > > >>>script
> > > >>>
> > > >>>
> > > >>>>is shown below. Why is the expression "t?blast[pnx]"?
> > > >>>>
> > > >>>>#!/usr/bin/perl
> > > >>>>
> > > >>>>use Bio::SeqIO;
> > > >>>>use Bio::Seq;
> > > >>>>use Bio::Tools::Run::RemoteBlast;
> > > >>>>use Bio::SearchIO;
> > > >>>>
> > > >>>>
> > > >>>>my $prog=blastp;
> > > >>>>my $db=swissprot;
> > > >>>>my $e_val=1e-10;
> > > >>>>my @params=( prog=>$prog,
> > > >>>>	data=>$db,
> > > >>>>	expect=>$e_val,
> > > >>>>	readmethod=>'SearchIO');
> > > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >>>>
> > > >>>>my $v = 1;
> > > >>>>
> > > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > >>>>
> > > >>>>while (my $input = $str->next_seq()){
> > > >>>>  #Blast a sequence against a database:
> > > >>>>  #Alternatively, you could  pass in a file with many
> > > >>>>  #sequences rather than loop through sequence one at a time
> > > >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >>>>  #and swap the two lines below for an example of that.
> > > >>>>  my $r = $factory->submit_blast($input);
> > > >>>>  #my $r = $factory->submit_blast('amino.fa');
> > > >>>>  print STDERR "waiting..." if( $v > 0 );
> > > >>>>  while ( my @rids = $factory->each_rid ) {
> > > >>>>    foreach my $rid ( @rids ) {
> > > >>>>      my $rc = $factory->retrieve_blast($rid);
> > > >>>>      if( !ref($rc) ) {
> > > >>>>        if( $rc < 0 ) {
> > > >>>>          $factory->remove_rid($rid);
> > > >>>>        }
> > > >>>>        print STDERR "." if ( $v > 0 );
> > > >>>>        sleep 5;
> > > >>>>      } else {
> > > >>>>        my $result = $rc->next_result();
> > > >>>>        #save the output
> > > >>>>        my $filename = $result->query_name()."\.out";
> > > >>>>        $factory->save_output($filename);
> > > >>>>        $factory->remove_rid($rid);
> > > >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>        while ( my $hit = $result->next_hit ) {
> > > >>>>          next unless ( $v > 0);
> > > >>>>          print "\thit name is ", $hit->name, "\n";
> > > >>>>          while( my $hsp = $hit->next_hsp ) {
> > > >>>>            print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>          }
> > > >>>>        }
> > > >>>>      }
> > > >>>>    }
> > > >>>>  }
> > > >>>>}
> > > >>>>
> > > >>>>Thank you for your help!
> > > >>>>
> > > >>>>
> > > >>>>Guojun
> > > >>>>Department of Plant Biology
> > > >>>>University of Georgia
> > > >>>>
> > > >>>>----- Original Message -----
> > > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>To: gyang at plantbio.uga.edu
> > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>>Try two things:
> > > >>>>>
> > > >>>>>
> > > >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> > > >>>>>>
> > > >>>>>>
> > > >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > > >>>>>
> > > >>>>>
> > > >>>>wrong
> > > >>>>
> > > >>>>
> > > >>>>>with the logic in your subroutine:
> > > >>>>>
> > > >>>>>
> > > >>>>>>my $v = 1;
> > > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> > );
> > > >>>>>>while (my $input = $str->next_seq()){
> > > >>>>>>
> > > >>>>>>
> > > >>>>>  #Blast a sequence against a database:
> > > >>>>>  #Alternatively, you could  pass in a file with many
> > > >>>>>  #sequences rather than loop through sequence one at a time
> > > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >>>>>  #and swap the two lines below for an example of that.
> > > >>>>>  my $r = $factory->submit_blast($input);
> > > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > > >>>>>  while ( my @rids = $factory->each_rid ) {
> > > >>>>>    foreach my $rid ( @rids ) {
> > > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > > >>>>>      if( !ref($rc) ) {
> > > >>>>>        if( $rc < 0 ) {
> > > >>>>>          $factory->remove_rid($rid);
> > > >>>>>        }
> > > >>>>>        print STDERR "." if ( $v > 0 );
> > > >>>>>        sleep 5;
> > > >>>>>      } else {
> > > >>>>>        my $result = $rc->next_result();
> > > >>>>>        #save the output
> > > >>>>>        my $filename = $result->query_name()."\.out";
> > > >>>>>        $factory->save_output($filename);
> > > >>>>>        $factory->remove_rid($rid);
> > > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>>        while ( my $hit = $result->next_hit ) {
> > > >>>>>          next unless ( $v > 0);
> > > >>>>>          print "\thit name is ", $hit->name, "\n";
> > > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>>          }
> > > >>>>>        }
> > > >>>>>      }
> > > >>>>>    }
> > > >>>>>  }
> > > >>>>>}
> > > >>>>>
> > > >>>>>
> > > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > > >>>>>>
> > > >>>>>>
> > > >>>really
> > > >>>
> > > >>>
> > > >>>>>shouldn't make that much of a difference, but I noticed that the
> > CVS
> > > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > >>>>>released; the Bugzilla version is based off CVS.
> > > >>>>>
> > > >>>>>
> > > >>>>>>Christopher Fields
> > > >>>>>>
> > > >>>>>>
> > > >>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>Dept. of Biochemistry
> > > >>>>>University of Illinois Urbana-Champaign
> > > >>>>>
> > > >>>>>
> > > >>>>>>>-----Original Message-----
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> > > >>>>>>To: bioperl-l at lists.open-bio.org
> > > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>Thanks, Chris,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> > > >>>>>>
> > > >>>>>>
> > > >>>one
> > > >>>
> > > >>>
> > > >>>>from
> > > >>>>
> > > >>>>
> > > >>>>>>your bug report. The running version is 1.5 when I use the command
> > > >>>>>>
> > > >>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>>>sent me. But when I tried the script, it doesn't change much. My
> > > >>>>>>remoteblast code (portion) is here:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>sub search {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>local
> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > >>>>>>local
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > >
> > >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > >>>
> > > >>>
> > > >>>>>>'no';
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > >>>>>>			      -id=>"query",
> > > >>>>>>			      -desc=>"new seq");
> > > >>>>>>my $len=$query->length();
> > > >>>>>>@db=('nr','htgs','wgs');
> > > >>>>>>foreach my $db (@db) {
> > > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> > =>'blastn',
> > > >>>>>>						'-data' =>"$db",
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>'-expect'=>"$E_value");
> > > >>
> > > >>
> > > >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>my @rids = $factory->each_rid();
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>    print STDERR "$rid\n";
> > > >>>>>>}
> > > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > >>>>>>print STDERR "waiting...";
> > > >>>>>>sleep 60;
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>    my $rc = $factory->retrieve_blast($rid);
> > > >>>>>>    while (!ref($rc) ) {
> > > >>>>>>	if( $rc < 0 ) {
> > > >>>>>># retrieve_blast returns -1 on error
> > > >>>>>>	    $factory->remove_rid($rid);
> > > >>>>>>	    print "Error!\n";
> > > >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > >>>>>>	    die "Can't retrieve $rid";
> > > >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > > >>>>>>
> > > >>>>>>
> > > >>>finished'
> > > >>>
> > > >>>
> > > >>>>>>	    sleep 60;
> > > >>>>>>	    $rc = $factory->retrieve_blast($rid);
> > > >>>>>>	}
> > > >>>>>>    }
> > > >>>>>>    if (ref($rc)) {
> > > >>>>>>	print STDERR "Done.\n";
> > > >>>>>>	 while( my $result = $rc->next_result) {
> > > >>>>>>	    while( my $hit = $result->next_hit()) {
> > > >>>>>>	    	$hit_name=$hit->name;
> > > >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > >>>>>>		$name=$1;
> > > >>>>>>		@left_plus_start=();
> > > >>>>>>		@left_plus_end=();
> > > >>>>>>		@left_minus_start=();
> > > >>>>>>		@left_minus_end=();
> > > >>>>>>		@right_plus_start=();
> > > >>>>>>		@right_plus_end=();
> > > >>>>>>		@right_minus_start=();
> > > >>>>>>		@right_minus_end=();
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>		while( my $hsp = $hit->next_hsp()) {
> > > >>>>>>......
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>It was working quite well before around October laster year, but
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>it has
> > > >>>>
> > > >>>>
> > > >>>>>>stopped since then, When a submission is sent via a webpage, the
> > cgi
> > > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > >>>>>>
> > > >>>>>>
> > > >>>>finally
> > > >>>>
> > > >>>>
> > > >>>>>>the expected email is received but without real results although
> > it
> > > >>>>>>
> > > >>>>>>
> > > >>>>does
> > > >>>>
> > > >>>>
> > > >>>>>>contain something from other parts of the script. Apparently the
> > > >>>>>>
> > > >>>>>>
> > > >>>>search
> > > >>>>
> > > >>>>
> > > >>>>>>sub did not return anything (I know there is something should be
> > > >>>>>>returned.). Is it also possible the format of the NCBI output for
> > > >>>>>>
> > > >>>>>>
> > > >>>each
> > > >>>
> > > >>>
> > > >>>>>>result has changed?
> > > >>>>>>Thank you,
> > > >>>>>>Guojun
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>University of Georgia
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>>----- Original Message -----
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>checking
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>two
> > > >>>>
> > > >>>>
> > > >>>>>>>separate directories) or are you looking in modules?  Here's the
> > > >>>>>>>
> > > >>>>>>>
> > > >>>way
> > > >>>
> > > >>>
> > > >>>>to
> > > >>>>
> > > >>>>
> > > >>>>>>>check the version (from the FAQ):
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>perl -MBio::Root::Version -e 'print
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>$Bio::Root::Version::VERSION,"\n"'
> > > >>>>
> > > >>>>
> > > >>>>>>>>If you have two full bioperl distributions on your computer,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>normally
> > > >>>>
> > > >>>>
> > > >>>>>>only
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>one will be in use unless you have explicitly set the environment
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>variable
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first
> > before
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>your
> > > >>>>
> > > >>>>
> > > >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>mixing
> > > >>>>
> > > >>>>
> > > >>>>>>>then, but only if perl can't find a particular module in the path
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>designated
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>in PERL5LIB; then it will progress through the directories listed
> > > >>>>>>>
> > > >>>>>>>
> > > >>>in
> > > >>>
> > > >>>
> > > >>>>>>@INC.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>This may happen if a module is unique to a particular release,
> > but
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>shouldn't
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> > > >>>>>>>
> > > >>>>>>>
> > > >>>can
> > > >>>
> > > >>>
> > > >>>>>>check
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>differ
> > > >>>>
> > > >>>>
> > > >>>>>>>depending on your OS, perl build, etc.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>Regardless, if you follow the directions for installing bioperl
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>for
> > > >>>>
> > > >>>>
> > > >>>>>>your
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>unless
> > > >>>>
> > > >>>>
> > > >>>>>>you
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>explicitly change the installation directory when using 'perl
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>Makefile.PL'),
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>install
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>See
> > > >>>>
> > > >>>>
> > > >>>>>>this
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>page:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > >>>>>>>>for more details.
> > > >>>>>>>>Christopher Fields
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>Dept. of Biochemistry
> > > >>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> > > >>>>>>>>To: bioperl-l at lists.open-bio.org
> > > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Hi, Chris,
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>I do have different versions of bioperl on my Linux machine
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>(1.4.
> > > >>>
> > > >>>
> > > >>>>and
> > > >>>>
> > > >>>>
> > > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>1.5.1
> > > >>>>
> > > >>>>
> > > >>>>>>or I
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>need to uninstall and remove the previous versions. I could not
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>find
> > > >>>>
> > > >>>>
> > > >>>>>>any
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>some
> > > >>>>
> > > >>>>
> > > >>>>>>>>suggestion?
> > > >>>>>>>>Thanks,
> > > >>>>>>>>Guojun
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>University of Georgia
> > > >>>>>>>>      _____
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>updated from CVS
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>which isn't the latest fix.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Make sure that you check the following:
> > > >>>>>>>>>>1) Always post to the mailing list:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>(CVS)
> > > >>>>
> > > >>>>
> > > >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>only
> > > >>>>
> > > >>>>
> > > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>can't
> > > >>>
> > > >>>
> > > >>>>>>>>guarantee that mixing modules from old and new distributions
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>(1.4
> > > >>>
> > > >>>
> > > >>>>and
> > > >>>>
> > > >>>>
> > > >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>saved
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>(v2.2.13)
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>but it should still save it. I believe as long as next_results()
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>isn't
> > > >>>>
> > > >>>>
> > > >>>>>>>>called, it will work.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>2.2.13
> > > >>>
> > > >>>
> > > >>>>>>text output
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>Roger
> > > >>>
> > > >>>
> > > >>>>Hall
> > > >>>>
> > > >>>>
> > > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>(Jason
> > > >>>>
> > > >>>>
> > > >>>>>>or
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>Bugzilla:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>option
> > > >>>>
> > > >>>>
> > > >>>>>>of
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>this
> > > >>>>
> > > >>>>
> > > >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>>>means that the final version will change to refle the new
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>version.
> > > >>>
> > > >>>
> > > >>>>>>>>>>>>Christopher Fields
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>    _____
> > > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> > > >>>>>>>>To: Chris Fields
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>Hi, Chris
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>for
> > > >>>>
> > > >>>>
> > > >>>>>>my cgi
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>even
> > > >>>>
> > > >>>>
> > > >>>>>>get
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>any RID. Is there any suggestion?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>>>Guojun
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>Guojun Yang
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Department of Plant Biology
> > > >>>>>>>>University of Georgia
> > > >>>>>>>>Tel: 706-542-1857
> > > >>>>>>>>Fax: 706-542-1805
> > > >>>>>>>>http://www.arches.uga.edu/~guojun
> > > >>>>>>>>    _____
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>I would say give the new code a try, but realize that it
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>hasn't
> > > >>>>
> > > >>>>
> > > >>>>>>been
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>checked
> > > >>>>>>>>in (like I said below). I will try going over the modified
> > > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>anything I
> > > >>>>
> > > >>>>
> > > >>>>>>>>might
> > > >>>>>>>>have missed. The changed order in the header of BLAST text
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>output
> > > >>>
> > > >>>
> > > >>>>has
> > > >>>>
> > > >>>>
> > > >>>>>>me a
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>bit worried that it might not catch everything, but it at least
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>doesn't
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>hang
> > > >>>>>>>>in the while() loop I described in the bug report below (bug
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>#1934)
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>seems to process everything fine.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>If you want more stability in the code, you might consider
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>changing over
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>to
> > > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>some
> > > >>>>
> > > >>>>
> > > >>>>>>>>changes
> > > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>saving
> > > >>>>
> > > >>>>
> > > >>>>>>XML
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>output, but I believe it parses everything regardless. If you
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>look
> > > >>>
> > > >>>
> > > >>>>>>back
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>last month or so there has been a bit of discussion here about
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>it.
> > > >>>
> > > >>>
> > > >>>>>>Jason
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>remoteblast/
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>Christopher Fields
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> > > >>>>>>>>>To: bioperl-l at bioperl.org
> > > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>version
> > > >>>>
> > > >>>>
> > > >>>>>>1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>Hi, Everybody,
> > > >>>>>>>>>I see this post and am wondering if this is the reason for the
> > > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>MAK,
> > > >>>>
> > > >>>>
> > > >>>>>>for
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>MITE
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>sequence analysis. It was working very well until around
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>November
> > > >>>>
> > > >>>>
> > > >>>>>>2005,
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>when it stopped returning any result (the site is fine and
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>seems
> > > >>>
> > > >>>
> > > >>>>to
> > > >>>>
> > > >>>>
> > > >>>>>>be
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>doing sth after submission). In the CGI script, I used
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>remoteblast
> > > >>>>
> > > >>>>
> > > >>>>>>(that
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>access to
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>the server because I moved. Quite several people sent emails
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>to
> > > >>>
> > > >>>
> > > >>>>us
> > > >>>>
> > > >>>>
> > > >>>>>>about
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>problem?
> > > >>>>
> > > >>>>
> > > >>>>>>>>Should
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>version?
> > > >>>>
> > > >>>>
> > > >>>>>>>>>Thanks a lot,
> > > >>>>>>>>>Guojun
> > > >>>>>>>>>
> > > >>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>University of Georgia
> > > >>>>>>>>>Tel: 706-542-1857
> > > >>>>>>>>>Fax: 706-542-1805
> > > >>>>>>>>>http://www.arches.uga.edu/~guojun
> > > >>>>>>>>>_____
> > > >>>>>>>>>
> > > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>Jian'
> > > >>>>
> > > >>>>
> > > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>[mailto:bioperl-
> > > >>>
> > > >>>
> > > >>>>>>>>>l at bioperl.org]
> > > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >>>>>>>>>
> > > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>CVS.
> > > >>>>
> > > >>>>
> > > >>>>>>It
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>will
> > > >>>>>>>>>work for saving text output. However, it will not parse
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>anything
> > > >>>
> > > >>>
> > > >>>>>>using
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>next_result (it will likely hang) and will not save XML
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>format.
> > > >>>
> > > >>>
> > > >>>>See
> > > >>>>
> > > >>>>
> > > >>>>>>>>these
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>bugs:
> > > >>>>>>>>>
> > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > >>>>>>>>>
> > > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>and
> > > >>>
> > > >>>
> > > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>yet
> > > >>>>
> > > >>>>
> > > >>>>>>so
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>are
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>still not included in bioperl-live; they may be further
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>modified
> > > >>>
> > > >>>
> > > >>>>>>before
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>just
> > > >>>>
> > > >>>>
> > > >>>>>>try
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>first fix, which is a change to SearchIO::blast.
> > > >>>>>>>>>
> > > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>script
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>which
> > > >>>>>>>>>had problems; the script you used saves the output but doesn't
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>actually
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>data).
> > > >>>>
> > > >>>>
> > > >>>>>>Is
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>>>tried
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>readmethod
> > > >>>>
> > > >>>>
> > > >>>>>>=>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>blast"
> > > >>>>>>>>>using your version of RemoteBlast and method next_result()?
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>Like
> > > >>>
> > > >>>
> > > >>>>>>below
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>(from
> > > >>>>>>>>>perldoc):
> > > >>>>>>>>>
> > > >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> > > >>>>>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> > > >>>>>>>>>if( !ref($rc) ) {
> > > >>>>>>>>>if( $rc < 0 ) {
> > > >>>>>>>>>$factory->remove_rid($rid);
> > > >>>>>>>>>}
> > > >>>>>>>>>print STDERR "." if ( $v > 0 );
> > > >>>>>>>>>sleep 5;
> > > >>>>>>>>>} else { # parsing
> > > >>>>>>>>>starts here
> > > >>>>>>>>>my $result = $rc->next_result(); # it should hang
> > > >>>>>>>>>here
> > > >>>>>>>>>#save the output
> > > >>>>>>>>>my $filename = $result->query_name()."\.out";
> > > >>>>>>>>>$factory->save_output($filename);
> > > >>>>>>>>>$factory->remove_rid($rid);
> > > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>>>>>>while ( my $hit = $result->next_hit ) {
> > > >>>>>>>>>next unless ( $v > 0);
> > > >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> > > >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> > > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>My script hanged if I used next_result() in any way prior to
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>the
> > > >>>
> > > >>>
> > > >>>>>>fixes.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>I
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>want to see how many others are having the same issues with
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>parsing
> > > >>>>
> > > >>>>
> > > >>>>>>>>using
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>the CVS version of bioperl-live.
> > > >>>>>>>>>
> > > >>>>>>>>>Christopher Fields
> > > >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>l-
> > > >>>
> > > >>>
> > > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> > > >>>>>>>>>>To: Huang Jian; bioperl-l
> > > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >>>>>>>>>>
> > > >>>>>>>>>>Hi Huang,
> > > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>works
> > > >>>>
> > > >>>>
> > > >>>>>>on
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>logic of checking the temporary file size to determine
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>whether
> > > >>>
> > > >>>
> > > >>>>the
> > > >>>>
> > > >>>>
> > > >>>>>>>>Blast
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>results are ready. This condition is not getting satisfied
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>may
> > > >>>
> > > >>>
> > > >>>>be
> > > >>>>
> > > >>>>
> > > >>>>>>due
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>to
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>some changes brought about by NCBI. I had this problem
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>recently
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>figured out that the solution was to use the latest version
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>has
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>this problem fixed (does not use file size logic any more)
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>is
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>not
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>yet included in the BioPerl package.
> > > >>>>>>>>>>Cheers
> > > >>>>>>>>>>Nagesh
> > > >>>>>>>>>>
> > > >>>>>>>>>>Huang Jian wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>>Dear Nagesh,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>you
> > > >>>>
> > > >>>>
> > > >>>>>>send
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>me. Now it works perfectly!!!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Thank you!!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Huang
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> > > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>net,
> > > >>>
> > > >>>
> > > >>>>so
> > > >>>>
> > > >>>>
> > > >>>>>>still
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>via email
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>>Hi Huang,
> > > >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>blast
> > > >>>
> > > >>>
> > > >>>>>>search.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>Can
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>(2005/12/09).
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>If
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>not I have attached it with this email, try to replace it
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>with
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>old
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>>>one which has a bug.
> > > >>>>>>>>>>>>Let me know if it works.
> > > >>>>>>>>>>>>Nagesh
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>_______________________________________________
> > > >>>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>_______________________________________________
> > > >>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>_______________________________________________
> > > >>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>_______________________________________________
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>
> > > >>>>>>>>_______________________________________________
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>Bioperl-l mailing list
> > > >>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >
> > > >_______________________________________________
> > > >Bioperl-l mailing list
> > > >Bioperl-l at lists.open-bio.org
> > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > >
> > >
> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 



From cjm at fruitfly.org  Mon Feb 20 20:48:57 2006
From: cjm at fruitfly.org (chris mungall)
Date: Mon, 20 Feb 2006 17:48:57 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
	<3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
Message-ID: <930b0083193357df7d43cc7a3111c938@fruitfly.org>


I like the idea of using an ontology to describe the ontology.

Note that the proposed structure:
OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI

will lead to cycles in the object graph when the metadata ontology 
describes itself.

actually, I think the ontology module already has object reference 
cycles. TermI->OntologyI->TermI

When I brought this up originally people didn't seem to care much - so 
long as you're only parsing GO then it's not a big issue, people have 
enough memory they won't notice a big chunk of memory that refuses to 
be garbage collected way after it's used. Of course, if you want to use 
bioperl to cycle though all of OBO + SnoMed + UMLS then it's a 
different story.

I think it's best of Sohel concentrates on getting obo.pm working, then 
we can start thinking as a group about the best way to capture ontology 
metadata. This includes metadata on the whole ontology, and metadata on 
the terms (eg synonyms).

To what extent are the current modules already in use? I think the 
object cycle is a serious flaw, will it be possible to fix this without 
a major overhaul?


On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:

> Sohel, please do keep the discussion on the list, in your own interest
> as there's a multitude of people who can respond to you.
>
> SimpleValue would probably be what I'd use too. As Heikki hinted you
> might even create an ontology for annotating ontologies, which would
> allow you to use Annotation::OntologyTerm for annotation, but then
> there's no qualifier value ...
>
> Bioperl 1.5.1 has been released last year, please check the website.
>
> 	-hilmar
>
> On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
>
>> Hi Hilmar,
>>   I really like your suggestion of implementing the Bio::AnnotatableI
>> interface in the Bio::Ontology::Ontology class. I am going to 
>> implement
>> this and play around a little with it. I am planning to use
>> Bio::Annotation::SimpleValue for annotating the header as it provides 
>> a
>> good way of specifying the Tag/value pair. What are your thoughts on
>> using this?
>>
>>   Also, I was wondering if you have any idea about the scheduled date
>> for the Bioperl 1.51 release. I would like to contribute some stuff in
>> the next release.
>>
>> Thanks,
>> Sohel.
>>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Friday, February 10, 2006 3:40 PM
>> To: Sohel Merchant
>> Cc: Bioperl
>> Subject: Re: Bio::Ontology::Ontology
>>
>> Sohel,
>>
>> please allow me to copy the list in my response. There's many good and
>> insightful people on the list who may have something to add or
>> different ideas.
>>
>> I've come across that problem myself, for instance with InterPro. What
>> I've done so far simply is to stick it unstructured into the 
>> definition
>> slot, which is not helpful if your purpose goes further than just
>> displaying it in an unstructured fashion.
>>
>> I'm not sure you would want to create another class for this (like
>> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
>> implementation, probably not the interface) annotatable (i.e.,
>> implement Bio::Annotatable), which supposedly would be simple to do
>> (AnnotationCollection is already implemented, you'd just return an
>> instance of it).
>>
>> Even though tag/value pairs sound like quick&fast way to go I'm 
>> leaning
>> against it; in essence we're moving away from that elsewhere
>> (SeqFeatureI) and hence I don't think we should restart it here.
>>
>> I'm not giving a definitive answer here, just my (initial) thoughts.
>> Hope that helps nonetheless. Can you fancy yourself trying the
>> Annotatable approach and let us know how it goes?
>>
>> 	-hilmar
>>
>>
>> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
>>
>>> Hi Hilmar,
>>> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
>>> Northwestern University. I am working on a parser for an ontology
>>> file. I really like the ontology object model which you have
>>> contributed to Bioperl. I think its just Awesome!! One of things 
>>> which
>>
>>> I thought would be great to capture is the ontology headers. Right 
>>> now
>>
>>> one can specify only the name, authority information. I was wondering
>>> if there is any way, I could also capture other ontology file headers
>>> like version of the file, date when that ontology file was made. I 
>>> was
>>
>>> thinking of making a header class or alternatively it could go as 
>>> Hash
>>
>>> of values in the Bio::Ontology::Ontology class itself. I wanted to
>>> know whets your thoughts about on this.
>>> ?
>>> Thanks,
>>> Sohel Merchant
>>> dictyBase
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Mon Feb 20 23:42:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 20 Feb 2006 23:42:18 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <43FA0FB7.6060904@lsi.upc.edu>
Message-ID: 

Gabriel,

You had a couple of little errors in your script but once fixed it worked
fine:

#!/usr/bin/perl -w


use strict;

use lib "/Users/bosborne/bioperl-live";

use Bio::DB::Taxonomy;



my $nodesfile = "nodes.dmp";

my $namefile = "names.dmp";

my $db = new Bio::DB::Taxonomy(-source => 'flatfile',

-nodesfile => $nodesfile,

-namesfile => $namefile);


my $taxonid = $db->get_taxonid('Homo sapiens');


# Here, $taxonid is 9606. However,


my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);


print $species->common_name;


This is using bioperl-live on Mac OSX, Perl 5.8. Are you on Windows? If so
then do "-directory => C:/temp", see what happens.

Brian O.

On 2/20/06 1:51 PM, "Gabriel Valiente"  wrote:

> use Bio::DB::Taxonomy;
> my $nodesfile = "nodes.dmp";
> my $namesfile = "names.dmp";
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>                                -nodesfile => $nodesfile,
>                                -namesfile => $namefile);
> my $taxonid = $db->get_taxonid('Homo sapiens');
> 
> Here, $taxonid is 9606. However,
> 
> my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);




From valiente at lsi.upc.edu  Tue Feb 21 07:19:04 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 13:19:04 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <1125313334valiente@lsi.upc.es>

Thanks. There's still a problem with Bio::DB::Taxonomy:

use strict;
use Bio::DB::Taxonomy;

my $nodesfile = "nodes.dmp";
my $namesfile = "names.dmp";
my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
                              -nodesfile => $nodesfile,
                              -namesfile => $namesfile);

my $taxonid = $db->get_taxonid('Homo sapiens');
my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 

So far so good. Now, access to the parent node via

my $parent = $node->get_Parent_Node;

is alright, but access to the children nodes via

my @childrenids = $db->get_Children_Taxids($taxonid);

raises:

------------- EXCEPTION  -------------
MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not
implemented by package Bio::DB::Taxonomy::entrez.
This is not your fault - author of Bio::DB::Taxonomy::entrez should be
blamed!

STACK Bio::Root::RootI::throw_not_implemented
/home/valiente/bioperl-live/Bio/Root/RootI.pm:523
STACK Bio::DB::Taxonomy::get_Children_Taxids
/home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162
STACK toplevel fetch.pl:17

Perhaps there could be a $node->get_Children_Nodes() method in
Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez.
You, know, efficient access to the children of a node is a quite
important method for almost any interesting use of the NCBI Taxonomy.

Gabriel




From dhoworth at mrc-lmb.cam.ac.uk  Tue Feb 21 05:47:41 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Tue, 21 Feb 2006 10:47:41 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
Message-ID: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>

I'm drawing a simple graphic and seeing something I didn't expect. I'm 
not sure whether I've misunderstood the docs or found a bug. If I run a 
program containing:

     my $name   = 'O68601';
     my $length = 44;
     my $panel  = Bio::Graphics::Panel->new(
                 -length    => $length,
                 -width     => 800,
                 -pad_left  => 10,
                 -pad_right => 10,
                 -key_style => 'between',
                 );

     my $feature = new Bio::SeqFeature::Generic(
                 -start  => 1,
                 -end    => $length,
                 -display_name => $name . " ($length)",
                 );

     $panel->add_track($feature,
                 -glyph   => 'arrow',
                 -tick    =>  1,
                 -fgcolor => 'black',
                 -double  => 1,
                 -label   => 1,
                 );

Then I see a tick strip labelled at its left end with '1' and at its 
right end with '45'. I expected to see '44'. Should I be looking for a 
bug in Bio::Graphics or fixing my program?

Thanks, Dave


From gbazykin at Princeton.EDU  Tue Feb 21 09:37:32 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Tue, 21 Feb 2006 09:37:32 -0500
Subject: [Bioperl-l] planning sequence mutating modules
Message-ID: <922343764.20060221093732@princeton.edu>

Heikki:

Let me explain what I need more clearly, and perhaps you guys can tell
me how this can be done best in Bioperl.

I?d like to marry the trees and the sequences, so that I could get a
sequence corresponding to each of the nodes (including internal nodes)
on the tree. The sequences of the nodes can be either generated by
some evolution process, or loaded; PAUP, for example, can reconstruct
the sequences of the internal nodes. I am dealing with coding
sequence, and for my purposes, I need to look at individual codons
rather than nucleotides. Then I answer questions such as this:

- for this codon (position), when (before which nodes of the tree) did
all (synonymous or non-synonymous) mutations occur?

- for this node and for this codon, when (before which node) did the
preceding (synonymous or non-synonymous) mutation occur? Preceding
means that it occurred in the line of direct ancestors, i.e. between
some two sequences on the path from this node to the root.

- infer position-specific ?substitution matrix? from the tree, i.e. in
this position, what fraction of nucleotides A that were present at the
beginning of each brunch, turned into nucleotide ?C? by the end of the
branch, possibly weighting with branch lengths.

Further, I need to do simulate sequence evolution along the tree,
e.g., like this:

- mutate specified codon along the tree, perhaps with given
substitution matrix (and, possibly, with given
non-synonymous/synonymous substitutions rate). In the process, the
codons for all nodes will be generated.

I need to do all this for large trees (with hundreds of leaves) and
long sequences. So far, I have been using a huge hash to store all my
sequences for each of the nodes:

my $node = (some tree::node object)
my $posit = 0; 
$codons{$posit}->{$node} =  ?AAA?;

etc. But there should be a better way to do it? How can I integrate
all this into Bioperl? (I am new to object-oriented programming).

I?ll be thankful for any feedback.

Yegor



------------------------------
Tuesday, February 14, 2006, 11:09:27 AM, you wrote:

> Yegor,

> Like you said, there are examples how it is done.. It should be possible to
> evolve sequences based on a rooted tree. You just walk the tree and evolve
> each sequence from its parent.  If there is  an agreement how the branch
> lengths get translated to  mutations, even that could be done. Do you have
> any suggestions?

>         -Heikki



> On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
>> Hi,
>>
>> Just a thought: I really think that in perspective, it would be nice
>> to be able to evolve the sequence along a tree of given shape. I think
>> PAML's "evolver" has this functionality. I've already been doing this
>> in my scripts, but I am not sure how to couple the tree and the
>> sequence data properly.
>>
>> Yegor (George) Bazykin
>>
>>
>> ------------------------------
>>
>> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
>> > I've committed an interim solution to the sequence evolution problem:
>> >
>> >     $newseq = Bio::SeqUtils-> evolve
>> >         ($seq, $similarity, $transition_transversion_rate);
>> >
>> > I will go on to transform this code to fully OO, extensible solution.
>> >
>> >    -Heikki
>> >
>> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> >> Ryan Golhar's mail got me thinking that we should have a simple
>> >> framework for mutating sequences to a desired level. The model can then
>> >> be extended to necessary complexity when needed by subclassing.
>> >>
>> >> To start with, I have been planning:
>> >>
>> >>
>> >> Bio::SeqEvolution::EvolutionI - interface file
>> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>> >>         (defaults to Bio::PrimarySeq)
>> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>> >>        - returns an array of $count seqs
>> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>> >>       converteed to probabilites of change internally
>> >>
>> >>   various methods to define the extent of divergence:
>> >>   only one to start with:
>> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>> >>    (= 100% - identity)
>> >>
>> >> Bio::SeqEvolution::Factory - core class to call,
>> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>> >>
>> >>
>> >> Bio::SeqEvolution::DNASimple - default for nucleotides
>> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>> >>         e.g. 5 => 5:1, defaults to 1:1
>> >>         simple alternative to a scoring matrix
>> >>
>> >>
>> >> I am soliciting usual comments and suggestions about naming and minimal
>> >> functionality.
>> >>
>> >>
>> >>    -Heikki
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Tue Feb 21 09:46:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 09:46:56 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1125313334valiente@lsi.upc.es>
Message-ID: 

Gabriel,

I don't think so, this works:

#!/usr/bin/perl -w



use strict;

use lib "/Users/bosborne/bioperl-live";


use Bio::DB::Taxonomy;


my $nodesfile = "nodes.dmp";

my $namefile = "names.dmp";

my $db = new Bio::DB::Taxonomy(-source => 'flatfile',

-nodesfile => $nodesfile,

-namesfile => $namefile);


my $taxonid = $db->get_taxonid('Homo sapiens');


my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 


# Here, $taxonid is 9606. However,


my $parent = $node->get_Parent_Node;


# is alright, but access to the children nodes via


my @childrenids = $db->get_Children_Taxids($taxonid);


print "@childrenids";


What Bioperl version are you using?

Brian O.


On 2/21/06 7:19 AM, "Gabriel Valiente"  wrote:

> my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 




From gbazykin at Princeton.EDU  Mon Feb 20 18:21:03 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Mon, 20 Feb 2006 18:21:03 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
	<214316262.20060214093454@princeton.edu>
	<200602141809.28057.heikki@sanbi.ac.za>
Message-ID: <158747055.20060220182103@princeton.edu>

Heikki:

Let me explain what I need more clearly, and perhaps you guys can tell
me how this can be done best in Bioperl.

I?d like to marry the trees and the sequences, so that I could get a
sequence corresponding to each of the nodes (including internal nodes)
on the tree. The sequences of the nodes can be either generated by
some evolution process, or loaded; PAUP, for example, can reconstruct
the sequences of the internal nodes. I am dealing with coding
sequence, and for my purposes, I need to look at individual codons
rather than nucleotides. Then I answer questions such as this:

- for this codon (position), when (before which nodes of the tree) did
all (synonymous or non-synonymous) mutations occur?

- for this node and for this codon, when (before which node) did the
preceding (synonymous or non-synonymous) mutation occur? Preceding
means that it occurred in the line of direct ancestors, i.e. between
some two sequences on the path from this node to the root.

- infer position-specific ?substitution matrix? from the tree, i.e. in
this position, what fraction of nucleotides A that were present at the
beginning of each brunch, turned into nucleotide ?C? by the end of the
branch, possibly weighting with branch lengths.

Further, I need to do simulate sequence evolution along the tree,
e.g., like this:

- mutate specified codon along the tree, perhaps with given
substitution matrix (and, possibly, with given
non-synonymous/synonymous substitutions rate). In the process, the
codons for all nodes will be generated.

I need to do all this for large trees (with hundreds of leaves) and
long sequences. So far, I have been using a huge hash to store all my
sequences for each of the nodes:

my $node = (some tree::node object)
my $posit = 0; 
$codons{$posit}->{$node} =  ?AAA?;

etc. But there should be a better way to do it? How can I integrate
all this into Bioperl? (I am new to object-oriented programming).

I?ll be thankful for any feedback.

Yegor



------------------------------
Tuesday, February 14, 2006, 11:09:27 AM, you wrote:

> Yegor,

> Like you said, there are examples how it is done.. It should be possible to
> evolve sequences based on a rooted tree. You just walk the tree and evolve
> each sequence from its parent.  If there is  an agreement how the branch
> lengths get translated to  mutations, even that could be done. Do you have
> any suggestions?

>         -Heikki



> On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
>> Hi,
>>
>> Just a thought: I really think that in perspective, it would be nice
>> to be able to evolve the sequence along a tree of given shape. I think
>> PAML's "evolver" has this functionality. I've already been doing this
>> in my scripts, but I am not sure how to couple the tree and the
>> sequence data properly.
>>
>> Yegor (George) Bazykin
>>
>>
>> ------------------------------
>>
>> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
>> > I've committed an interim solution to the sequence evolution problem:
>> >
>> >     $newseq = Bio::SeqUtils-> evolve
>> >         ($seq, $similarity, $transition_transversion_rate);
>> >
>> > I will go on to transform this code to fully OO, extensible solution.
>> >
>> >    -Heikki
>> >
>> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> >> Ryan Golhar's mail got me thinking that we should have a simple
>> >> framework for mutating sequences to a desired level. The model can then
>> >> be extended to necessary complexity when needed by subclassing.
>> >>
>> >> To start with, I have been planning:
>> >>
>> >>
>> >> Bio::SeqEvolution::EvolutionI - interface file
>> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>> >>         (defaults to Bio::PrimarySeq)
>> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>> >>        - returns an array of $count seqs
>> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>> >>       converteed to probabilites of change internally
>> >>
>> >>   various methods to define the extent of divergence:
>> >>   only one to start with:
>> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>> >>    (= 100% - identity)
>> >>
>> >> Bio::SeqEvolution::Factory - core class to call,
>> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>> >>
>> >>
>> >> Bio::SeqEvolution::DNASimple - default for nucleotides
>> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>> >>         e.g. 5 => 5:1, defaults to 1:1
>> >>         simple alternative to a scoring matrix
>> >>
>> >>
>> >> I am soliciting usual comments and suggestions about naming and minimal
>> >> functionality.
>> >>
>> >>
>> >>    -Heikki
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From jason.stajich at duke.edu  Tue Feb 21 09:51:39 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 21 Feb 2006 09:51:39 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1125313334valiente@lsi.upc.es>
References: <1125313334valiente@lsi.upc.es>
Message-ID: <16B69355-A7EC-4FA6-B0F3-A473C705B921@duke.edu>

of course it should, and it does support this.  Children query  
definitely exists for the flatfile implementation I don't understand  
why are you getting entrez errors when you are requesting the  
flatfile handle?
I can't investigate but it definitely worked for me to get  children  
nodes.  Did you actually try running the script that already should  
work - scripts/taxa/local_taxonomdb_query ?

You definitely can't request children nodes via the entrez  
implementation because NCBI doesn't (or didn't when this was written  
I don't know about now) provide children id access so it is pretty  
useful for that - although the eutils support may have expanded I'm  
not sure. If someone has the itch, please scratch it and work on this.

I think you need to pass in $parent instead of $taxonid to  
get_Children_Taxids -- although I guess I wrote the method to accept  
either.

-jason

On Feb 21, 2006, at 7:19 AM, Gabriel Valiente wrote:

> Thanks. There's still a problem with Bio::DB::Taxonomy:
>
> use strict;
> use Bio::DB::Taxonomy;
>
> my $nodesfile = "nodes.dmp";
> my $namesfile = "names.dmp";
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>                               -nodesfile => $nodesfile,
>                               -namesfile => $namesfile);
>
> my $taxonid = $db->get_taxonid('Homo sapiens');
> my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid);
>
> So far so good. Now, access to the parent node via
>
> my $parent = $node->get_Parent_Node;
>
> is alright, but access to the children nodes via
>
> my @childrenids = $db->get_Children_Taxids($taxonid);
>
> raises:
>
> ------------- EXCEPTION  -------------
> MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not
> implemented by package Bio::DB::Taxonomy::entrez.
> This is not your fault - author of Bio::DB::Taxonomy::entrez should be
> blamed!
>
> STACK Bio::Root::RootI::throw_not_implemented
> /home/valiente/bioperl-live/Bio/Root/RootI.pm:523
> STACK Bio::DB::Taxonomy::get_Children_Taxids
> /home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162
> STACK toplevel fetch.pl:17
>
> Perhaps there could be a $node->get_Children_Nodes() method in
> Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez.
> You, know, efficient access to the children of a node is a quite
> important method for almost any interesting use of the NCBI Taxonomy.
>
> Gabriel
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From hlapp at gmx.net  Mon Feb 20 21:52:34 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 20 Feb 2006 18:52:34 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <930b0083193357df7d43cc7a3111c938@fruitfly.org>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
	<3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
	<930b0083193357df7d43cc7a3111c938@fruitfly.org>
Message-ID: 

On 2/20/06, chris mungall  wrote:
>
> I like the idea of using an ontology to describe the ontology.
>
> Note that the proposed structure:
> OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
>
> will lead to cycles in the object graph when the metadata ontology
> describes itself.

Yes I know, that's why I didn't want to be too vocal about it ...

>
> actually, I think the ontology module already has object reference
> cycles. TermI->OntologyI->TermI
>
> When I brought this up originally people didn't seem to care much - so
> long as you're only parsing GO then it's not a big issue, people have
> enough memory they won't notice a big chunk of memory that refuses to
> be garbage collected way after it's used.

There is a method that destroys the cycle: $ontology->close()
(this is also an interface method)

Essentially, the cycle is not in OntologyI itself but in OntologyI
HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
terms which (may) hold a reference to an OntologyI which holds a
reference to the OntologyEngineI.

I say 'may' in parentheses because an implementation may use tricks
like late instantiation, stringified references (handles), and weak
references. It's possible to avoid the cycle altogether using such
tricks but it remains questionable how much this then affects
performance, and how ugly and incomprehensible the code would become.
Since there is the close() method I haven't bothered yet trying a
fully de-cycled implementation.

> Of course, if you want to use
> bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> different story.

Well if you want to keep all three in memory for some kind of
cross-reasoning then yes you are in trouble. But if you do one
ontology after another, you'd just have make sure to call close() on
an ontology once you're done with it.

>
> I think it's best of Sohel concentrates on getting obo.pm working, then
> we can start thinking as a group about the best way to capture ontology
> metadata. This includes metadata on the whole ontology, and metadata on
> the terms (eg synonyms).
>
> To what extent are the current modules already in use?

I don't know about others but I use them often.

> I think the object cycle is a serious flaw, will it be possible to fix this without
> a major overhaul?

If I recall correctly the way go-perl circumvents this is by having
the ontology of a term as a flat attribute. This also means that when
having a term alone, you cannot ask for its connected terms. It's been
a while, so Chris set me straight where this is not true.

It should be possible to come up with an implementation of OntologyI
that for all intents and purposes behaves like a flat scalar giving
the name until you call one of its graph traversal methods. At that
point it would instantiate the engine from persistent storage (file,
or a database connection), or retrieve one from a 'store'. The latter
is I believe what Allen started with the OntologyStore, but again I
would need to check the details.

    -hilmar

>
>
> On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
>
> > Sohel, please do keep the discussion on the list, in your own interest
> > as there's a multitude of people who can respond to you.
> >
> > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > might even create an ontology for annotating ontologies, which would
> > allow you to use Annotation::OntologyTerm for annotation, but then
> > there's no qualifier value ...
> >
> > Bioperl 1.5.1 has been released last year, please check the website.
> >
> >       -hilmar
> >
> > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> >
> >> Hi Hilmar,
> >>   I really like your suggestion of implementing the Bio::AnnotatableI
> >> interface in the Bio::Ontology::Ontology class. I am going to
> >> implement
> >> this and play around a little with it. I am planning to use
> >> Bio::Annotation::SimpleValue for annotating the header as it provides
> >> a
> >> good way of specifying the Tag/value pair. What are your thoughts on
> >> using this?
> >>
> >>   Also, I was wondering if you have any idea about the scheduled date
> >> for the Bioperl 1.51 release. I would like to contribute some stuff in
> >> the next release.
> >>
> >> Thanks,
> >> Sohel.
> >>
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Friday, February 10, 2006 3:40 PM
> >> To: Sohel Merchant
> >> Cc: Bioperl
> >> Subject: Re: Bio::Ontology::Ontology
> >>
> >> Sohel,
> >>
> >> please allow me to copy the list in my response. There's many good and
> >> insightful people on the list who may have something to add or
> >> different ideas.
> >>
> >> I've come across that problem myself, for instance with InterPro. What
> >> I've done so far simply is to stick it unstructured into the
> >> definition
> >> slot, which is not helpful if your purpose goes further than just
> >> displaying it in an unstructured fashion.
> >>
> >> I'm not sure you would want to create another class for this (like
> >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> >> implementation, probably not the interface) annotatable (i.e.,
> >> implement Bio::Annotatable), which supposedly would be simple to do
> >> (AnnotationCollection is already implemented, you'd just return an
> >> instance of it).
> >>
> >> Even though tag/value pairs sound like quick&fast way to go I'm
> >> leaning
> >> against it; in essence we're moving away from that elsewhere
> >> (SeqFeatureI) and hence I don't think we should restart it here.
> >>
> >> I'm not giving a definitive answer here, just my (initial) thoughts.
> >> Hope that helps nonetheless. Can you fancy yourself trying the
> >> Annotatable approach and let us know how it goes?
> >>
> >>      -hilmar
> >>
> >>
> >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> >>
> >>> Hi Hilmar,
> >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> >>> Northwestern University. I am working on a parser for an ontology
> >>> file. I really like the ontology object model which you have
> >>> contributed to Bioperl. I think its just Awesome!! One of things
> >>> which
> >>
> >>> I thought would be great to capture is the ontology headers. Right
> >>> now
> >>
> >>> one can specify only the name, authority information. I was wondering
> >>> if there is any way, I could also capture other ontology file headers
> >>> like version of the file, date when that ontology file was made. I
> >>> was
> >>
> >>> thinking of making a header class or alternatively it could go as
> >>> Hash
> >>
> >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> >>> know whets your thoughts about on this.
> >>>
> >>> Thanks,
> >>> Sohel Merchant
> >>> dictyBase
> >>>
> >> --
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >>
> >>
> >>
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From valiente at lsi.upc.edu  Tue Feb 21 11:10:05 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 17:10:05 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <1783551242valiente@lsi.upc.es>

It works now, with the #!/usr/bin/perl -w switch. Sorry about that.

I'd like to contribute a couple of additional methods to
Bio::DB::Taxonomy. The first one returns a reference to an array with
the full lineage of a given node.

sub lineage {
  my $node = shift;
  my @PATH;
  while ($node->node_name ne "root") {
    $node = $node->get_Parent_Node;
    unshift @PATH, $node;
  }
  return \@PATH;
}

The second one uses the lineage method to return the most recent common
ancestor of two given nodes.

sub LCA {
  my $node1 = shift;
  my $node2 = shift;
  my @PATH1 = @{lineage($node1)};
  my @PATH2 = @{lineage($node2)};
  my $root1 = shift @PATH1;
  my $root2 = shift @PATH2;
  while ($root1->node_name eq $root2->node_name) {
    $root1 = shift @PATH1;
    $root2 = shift @PATH2;
  }
  return $root1;
}

Jason, shall I include them myself in Bio::DB::Taxonomy or can you take
care of this? I think, the right place for these methods might be
Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy.

Thanks,

Gabriel




From lstein at cshl.edu  Tue Feb 21 10:55:30 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 21 Feb 2006 10:55:30 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
Message-ID: <200602211055.31221.lstein@cshl.edu>

Hi,

When you are looking at the resolution of individual bases, a base pair at 
position one occupies the half-open interval from 1->2, meaning that it comes 
up to, but doesn't quite touch, the 2. For the purposes of display, 
Bio::Graphics draws the end of the half-open interval.

Lincoln

On Tuesday 21 February 2006 05:47, Dave Howorth wrote:
> I'm drawing a simple graphic and seeing something I didn't expect. I'm
> not sure whether I've misunderstood the docs or found a bug. If I run a
> program containing:
>
>      my $name   = 'O68601';
>      my $length = 44;
>      my $panel  = Bio::Graphics::Panel->new(
>                  -length    => $length,
>                  -width     => 800,
>                  -pad_left  => 10,
>                  -pad_right => 10,
>                  -key_style => 'between',
>                  );
>
>      my $feature = new Bio::SeqFeature::Generic(
>                  -start  => 1,
>                  -end    => $length,
>                  -display_name => $name . " ($length)",
>                  );
>
>      $panel->add_track($feature,
>                  -glyph   => 'arrow',
>                  -tick    =>  1,
>                  -fgcolor => 'black',
>                  -double  => 1,
>                  -label   => 1,
>                  );
>
> Then I see a tick strip labelled at its left end with '1' and at its
> right end with '45'. I expected to see '44'. Should I be looking for a
> bug in Bio::Graphics or fixing my program?
>
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason.stajich at duke.edu  Tue Feb 21 11:28:22 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 21 Feb 2006 11:28:22 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1783551242valiente@lsi.upc.es>
References: <1783551242valiente@lsi.upc.es>
Message-ID: <1C38DDCF-9312-42D3-923F-C0DD4CE7E9AA@duke.edu>

you'll have to do it - I don't have time, I thought there was  
something like this already, but I guess not, so please put it in.  I  
must do this when we initialize the classification array when  
building a node,


On Feb 21, 2006, at 11:10 AM, Gabriel Valiente wrote:

> It works now, with the #!/usr/bin/perl -w switch. Sorry about that.
>
> I'd like to contribute a couple of additional methods to
> Bio::DB::Taxonomy. The first one returns a reference to an array with
> the full lineage of a given node.
>
> sub lineage {
>   my $node = shift;
>   my @PATH;
>   while ($node->node_name ne "root") {
>     $node = $node->get_Parent_Node;
>     unshift @PATH, $node;
>   }
>   return \@PATH;
> }
>
> The second one uses the lineage method to return the most recent  
> common
> ancestor of two given nodes.
>
> sub LCA {
>   my $node1 = shift;
>   my $node2 = shift;
>   my @PATH1 = @{lineage($node1)};
>   my @PATH2 = @{lineage($node2)};
>   my $root1 = shift @PATH1;
>   my $root2 = shift @PATH2;
>   while ($root1->node_name eq $root2->node_name) {
>     $root1 = shift @PATH1;
>     $root2 = shift @PATH2;
>   }
>   return $root1;
> }
>
> Jason, shall I include them myself in Bio::DB::Taxonomy or can you  
> take
> care of this? I think, the right place for these methods might be
> Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy.
>
> Thanks,
>
> Gabriel
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From dhoworth at mrc-lmb.cam.ac.uk  Tue Feb 21 11:50:37 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Tue, 21 Feb 2006 16:50:37 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602211055.31221.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>
Message-ID: <43FB44DD.4090504@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> When you are looking at the resolution of individual bases, a base pair at 
> position one occupies the half-open interval from 1->2, meaning that it comes 
> up to, but doesn't quite touch, the 2. For the purposes of display, 
> Bio::Graphics draws the end of the half-open interval.

I think I understand the description of what it's doing but I don't 
understand why. What is the purpose of labelling the [44,45) interval 
45, when that interval is representing the 44th discrete mer?

I'm working with proteins and domains, so I'm always at the level of 
individual residues and people frequently care about the exact residue 
boundaries, especially when the regions are short. So I need to make 
pictures that match the data.

The displayed track seems more consistent with an interpretation that 
the residues are represented by the discrete integer points along the 
line but I don't know if I'm buying myself trouble later if I try to 
adopt that interpretation.

Alternatively, is there some way to get a track with 44 intervals, 
labelled 1 to 44?

Or will I need to patch my copy of bioperl to achieve that?

Thanks, Dave


From cjfields at uiuc.edu  Tue Feb 21 12:30:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 11:30:58 -0600
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk>
Message-ID: <000301c6370c$93b07c70$15327e82@pyrimidine>

Anders,

I think you should look through the mail list archives for an answer,
specifically:

http://portal.open-bio.org/pipermail/bioperl-l/2004-November/017285.html

Look up the other methods in Bio::Search::HSP::BlastHSP as well. They may be
more helpful.  I can't help but think there is something wrong with the
logic in your subroutines since they don't call other methods built in to
HSP objects.  It may be an off-by-one error.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Anders Stegmann
> Sent: Friday, February 17, 2006 3:18 AM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] another searchIO bug? with blast report
> 
> 
> 
> >>>Anders Stegmann  02/16/06 11:20 am >>>
> Hi!
> 
> I am blasting a protein seq (query) against an identical seq with a
> deletion of Aa nr 61 (subject).
> Then I print out the type of nomatch Aa and its position.
> The nomatch for the query seq is Aa G at position 61, which is correct.
> The nomatch for the subject seq is V at position 60, which is definitely
> not correct!?
> 
> Is this a bug?
> 
> testblast2.pl is the program to run
> 
> Q0045 is the query seq.
> 
> Q0045del61 is the subject seq (it has to be formated: formatdb -i
> Q0045del61 -p T -o F).
> 
> Regards Anders.
> 




From staffa at niehs.nih.gov  Tue Feb 21 12:24:39 2006
From: staffa at niehs.nih.gov (staffa)
Date: Tue, 21 Feb 2006 12:24:39 -0500
Subject: [Bioperl-l] Pattern Density
Message-ID: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>

Good Friends,
I have an important client who wants a histogram display of the density 
of "ccgg" along any chromosome of the mouse genome in 1000 bp windows.

I'm thinking that maybe there is a bio-perl module that could help with 
this.
That'd probably beat having to write something from scratch.
Any help that you give would be greatly appreciated.
I am more concerned about the reading and analysis of the sequence than 
actual plotting of the histogram, but anything you can offer will be 
appreciated.

Thank you.

Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1167 bytes
Desc: not available
URL: 

From lstein at cshl.edu  Tue Feb 21 13:25:59 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 21 Feb 2006 13:25:59 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FB44DD.4090504@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>
	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
Message-ID: <200602211326.00021.lstein@cshl.edu>

Hi Dave,

Well, when you are using 1-based coordinates, an line that contains 44 
intervals will have 45 ticks. If you move to 0-based coordinates, then the 
first tick will be labeled 0 and the last tick will be labeled 44. An 
alternative is to make each base dimensionless, but that becomes a problem 
when dealing with single base features, such as SNPs. These issues are why I 
have long advocated for interbase coordinates in which you number the 
positions between bases rather than the bases themselves.

Draw me the picture of what you expect to see. I think of it this way:

	1    2  3  4   5   6
         A>G>C>T>A>

Lincoln

On Tuesday 21 February 2006 11:50, Dave Howorth wrote:
> Lincoln Stein wrote:
> > When you are looking at the resolution of individual bases, a base pair
> > at position one occupies the half-open interval from 1->2, meaning that
> > it comes up to, but doesn't quite touch, the 2. For the purposes of
> > display, Bio::Graphics draws the end of the half-open interval.
>
> I think I understand the description of what it's doing but I don't
> understand why. What is the purpose of labelling the [44,45) interval
> 45, when that interval is representing the 44th discrete mer?
>
> I'm working with proteins and domains, so I'm always at the level of
> individual residues and people frequently care about the exact residue
> boundaries, especially when the regions are short. So I need to make
> pictures that match the data.
>
> The displayed track seems more consistent with an interpretation that
> the residues are represented by the discrete integer points along the
> line but I don't know if I'm buying myself trouble later if I try to
> adopt that interpretation.
>
> Alternatively, is there some way to get a track with 44 intervals,
> labelled 1 to 44?
>
> Or will I need to patch my copy of bioperl to achieve that?
>
> Thanks, Dave

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From osborne1 at optonline.net  Tue Feb 21 13:25:35 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 13:25:35 -0500
Subject: [Bioperl-l] Pattern Density
In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
Message-ID: 

Nick,

Right, BioPerl really can?t help you with the histogram itself but there are
probably multiple solutions to the problem of iterating over the sequence.
Here?s one idea, untested, it assumes your sequence is in fasta format:

use strict;
use Bio::DB::Fasta;
use Bio::Tools::SeqWords;

my $db  = Bio::DB::Fasta->new('/path/to/fasta/files');
my $obj = $db->get_Seq_by_id('CHROMOSOME_I');
my $start = 0;
my $windowsize = 1000;
my $str = ?ccgg?;
my $len = $obj->length;
my $overlap = 250;

while (1) {
    my $end = $start + $windowsize;
    last if ( $end > $len);
    my $subseq  = $obj->subseq($start,$end);
    my $count = get_count($str,$subseq);
    $start += $overlap;
}

sub get_count {
    my ($str,$subseq) = @_;
    my $seqobj = Bio::Seq->new(-seq => $subseq);
    my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj);
    my $ref = $seq_word->count_overlap_words(length($str));
    $ref->{$str};
}

Note this skips the very last window, debugging needed.

Brian O.


On 2/21/06 12:24 PM, "staffa"  wrote:

> I am more concerned about the reading and analysis of the sequence than actual
> plotting of the histogram, but anything you can offer will be appreciated.





From gyang at plantbio.uga.edu  Tue Feb 21 13:45:50 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Tue, 21 Feb 2006 13:45:50 -0500
Subject: [Bioperl-l] full chromosome accesscion number mess
In-Reply-To: <000001c63669$2bf06a80$15327e82@pyrimidine>
Message-ID: <20060221184550.6557851b@dogwood.plantbio.uga.edu>

Hi, everybody,  
In the process of reparing my CGI script after NCBI blast output format change, I noticed that the accession number for rice pseudochromosome is very confusing and cause trouble for sequence retrieving. My script use remoteblast to search for similar sequences,and then retrieve the hit sequence with a bit flanking region from GenBank. The rice pseudochromosomes have accession numbers similar to that of the individual clones like AP00XXX. I do not want the sequence retrieving to involve these accessions because it takes forever. Can anybody give some suggestion on how to deal with it?  
Thanks,  
 

Guojun Yang
Department of Plant Biology
University of Georgia


From valiente at lsi.upc.edu  Tue Feb 21 13:46:10 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 19:46:10 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <3193394449valiente@lsi.upc.es>

> you'll have to do it - I don't have time, I thought there was  
> something like this already, but I guess not, so please put it in.

Done. I've added methods get_Lineage_Nodes and get_LCA_Node to
Bio::Taxonomy::Node.

> Uhm, does that return the LCA or one of the first divergent ancestors?
> And what does it do if lineage($node1) is the same as lineage($node2)?

Thanks, I've already taken this into account.

Cheers

Gabriel




From s-merchant at northwestern.edu  Tue Feb 21 13:47:54 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Tue, 21 Feb 2006 12:47:54 -0600
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: 
Message-ID: <000001c63717$5314ded0$c2987ca5@pc13>

Hi Hilmar and Chris,
  I have played around a bit using Bio::Annotation::Collection to
capture the headers of an ontology file. It behaves pretty well and
avoids the cycle issue which might arise by suing ontology to describe
the ontology. I have an initial version of a working parser for obo flat
file format. 

Chris, I was able to model any kind of relationship by using some of the
functionality in the Bio::Ontology::SimpleGoEngine which, I had
initially overlooked. 

I would like to commit this code to the Bioperl CVS, but I don't have
write access to it I believe. Can I send the stuff to either of you
guys?

Hilmar, I would like your feedback on the code base and would be happy
to make any changes required before we commit it to the CVS.

Thanks,
Sohel Merchant.
dictyBase

-----Original Message-----
From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
Lapp
Sent: Monday, February 20, 2006 8:53 PM
To: chris mungall
Cc: Bioperl; Sohel Merchant
Subject: Re: [Bioperl-l] Bio::Ontology::Ontology

On 2/20/06, chris mungall  wrote:
>
> I like the idea of using an ontology to describe the ontology.
>
> Note that the proposed structure:
> OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
>
> will lead to cycles in the object graph when the metadata ontology
> describes itself.

Yes I know, that's why I didn't want to be too vocal about it ...

>
> actually, I think the ontology module already has object reference
> cycles. TermI->OntologyI->TermI
>
> When I brought this up originally people didn't seem to care much - so
> long as you're only parsing GO then it's not a big issue, people have
> enough memory they won't notice a big chunk of memory that refuses to
> be garbage collected way after it's used.

There is a method that destroys the cycle: $ontology->close()
(this is also an interface method)

Essentially, the cycle is not in OntologyI itself but in OntologyI
HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
terms which (may) hold a reference to an OntologyI which holds a
reference to the OntologyEngineI.

I say 'may' in parentheses because an implementation may use tricks
like late instantiation, stringified references (handles), and weak
references. It's possible to avoid the cycle altogether using such
tricks but it remains questionable how much this then affects
performance, and how ugly and incomprehensible the code would become.
Since there is the close() method I haven't bothered yet trying a
fully de-cycled implementation.

> Of course, if you want to use
> bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> different story.

Well if you want to keep all three in memory for some kind of
cross-reasoning then yes you are in trouble. But if you do one
ontology after another, you'd just have make sure to call close() on
an ontology once you're done with it.

>
> I think it's best of Sohel concentrates on getting obo.pm working,
then
> we can start thinking as a group about the best way to capture
ontology
> metadata. This includes metadata on the whole ontology, and metadata
on
> the terms (eg synonyms).
>
> To what extent are the current modules already in use?

I don't know about others but I use them often.

> I think the object cycle is a serious flaw, will it be possible to fix
this without
> a major overhaul?

If I recall correctly the way go-perl circumvents this is by having
the ontology of a term as a flat attribute. This also means that when
having a term alone, you cannot ask for its connected terms. It's been
a while, so Chris set me straight where this is not true.

It should be possible to come up with an implementation of OntologyI
that for all intents and purposes behaves like a flat scalar giving
the name until you call one of its graph traversal methods. At that
point it would instantiate the engine from persistent storage (file,
or a database connection), or retrieve one from a 'store'. The latter
is I believe what Allen started with the OntologyStore, but again I
would need to check the details.

    -hilmar

>
>
> On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
>
> > Sohel, please do keep the discussion on the list, in your own
interest
> > as there's a multitude of people who can respond to you.
> >
> > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > might even create an ontology for annotating ontologies, which would
> > allow you to use Annotation::OntologyTerm for annotation, but then
> > there's no qualifier value ...
> >
> > Bioperl 1.5.1 has been released last year, please check the website.
> >
> >       -hilmar
> >
> > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> >
> >> Hi Hilmar,
> >>   I really like your suggestion of implementing the
Bio::AnnotatableI
> >> interface in the Bio::Ontology::Ontology class. I am going to
> >> implement
> >> this and play around a little with it. I am planning to use
> >> Bio::Annotation::SimpleValue for annotating the header as it
provides
> >> a
> >> good way of specifying the Tag/value pair. What are your thoughts
on
> >> using this?
> >>
> >>   Also, I was wondering if you have any idea about the scheduled
date
> >> for the Bioperl 1.51 release. I would like to contribute some stuff
in
> >> the next release.
> >>
> >> Thanks,
> >> Sohel.
> >>
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Friday, February 10, 2006 3:40 PM
> >> To: Sohel Merchant
> >> Cc: Bioperl
> >> Subject: Re: Bio::Ontology::Ontology
> >>
> >> Sohel,
> >>
> >> please allow me to copy the list in my response. There's many good
and
> >> insightful people on the list who may have something to add or
> >> different ideas.
> >>
> >> I've come across that problem myself, for instance with InterPro.
What
> >> I've done so far simply is to stick it unstructured into the
> >> definition
> >> slot, which is not helpful if your purpose goes further than just
> >> displaying it in an unstructured fashion.
> >>
> >> I'm not sure you would want to create another class for this (like
> >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e.,
the
> >> implementation, probably not the interface) annotatable (i.e.,
> >> implement Bio::Annotatable), which supposedly would be simple to do
> >> (AnnotationCollection is already implemented, you'd just return an
> >> instance of it).
> >>
> >> Even though tag/value pairs sound like quick&fast way to go I'm
> >> leaning
> >> against it; in essence we're moving away from that elsewhere
> >> (SeqFeatureI) and hence I don't think we should restart it here.
> >>
> >> I'm not giving a definitive answer here, just my (initial)
thoughts.
> >> Hope that helps nonetheless. Can you fancy yourself trying the
> >> Annotatable approach and let us know how it goes?
> >>
> >>      -hilmar
> >>
> >>
> >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> >>
> >>> Hi Hilmar,
> >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> >>> Northwestern University. I am working on a parser for an ontology
> >>> file. I really like the ontology object model which you have
> >>> contributed to Bioperl. I think its just Awesome!! One of things
> >>> which
> >>
> >>> I thought would be great to capture is the ontology headers. Right
> >>> now
> >>
> >>> one can specify only the name, authority information. I was
wondering
> >>> if there is any way, I could also capture other ontology file
headers
> >>> like version of the file, date when that ontology file was made. I
> >>> was
> >>
> >>> thinking of making a header class or alternatively it could go as
> >>> Hash
> >>
> >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> >>> know whets your thoughts about on this.
> >>>
> >>> Thanks,
> >>> Sohel Merchant
> >>> dictyBase
> >>>
> >> --
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >>
> >>
> >>
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------




From cjfields at uiuc.edu  Tue Feb 21 14:25:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 13:25:02 -0600
Subject: [Bioperl-l] full chromosome accesscion number mess
In-Reply-To: <20060221184550.6557851b@dogwood.plantbio.uga.edu>
Message-ID: <000001c6371c$83bf92a0$15327e82@pyrimidine>

What is the accession you're having problems with?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Tuesday, February 21, 2006 12:46 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] full chromosome accesscion number mess
> 
> Hi, everybody,
> In the process of reparing my CGI script after NCBI blast output format
> change, I noticed that the accession number for rice pseudochromosome is
> very confusing and cause trouble for sequence retrieving. My script use
> remoteblast to search for similar sequences,and then retrieve the hit
> sequence with a bit flanking region from GenBank. The rice
> pseudochromosomes have accession numbers similar to that of the individual
> clones like AP00XXX. I do not want the sequence retrieving to involve
> these accessions because it takes forever. Can anybody give some
> suggestion on how to deal with it?
> Thanks,
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hlapp at gmx.net  Tue Feb 21 14:31:31 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 11:31:31 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c63717$5314ded0$c2987ca5@pc13>
References: 
	<000001c63717$5314ded0$c2987ca5@pc13>
Message-ID: 

Send it to me. I'll review and check it in if appropriate. You should
also write a test (and include it in what you send to me; see t/*.t
for examples for how to write a test). (and obviously the test should
succeed)

Chris, I suppose this is the time to object - I would conceptually
like the ontology-based annotation too but now we are up against a
(hopefully) working implementation which can only be beaten by another
working implementation, and frankly I don't have time to attempt one
now.

   -hilmar

On 2/21/06, Sohel Merchant  wrote:
> Hi Hilmar and Chris,
>   I have played around a bit using Bio::Annotation::Collection to
> capture the headers of an ontology file. It behaves pretty well and
> avoids the cycle issue which might arise by suing ontology to describe
> the ontology. I have an initial version of a working parser for obo flat
> file format.
>
> Chris, I was able to model any kind of relationship by using some of the
> functionality in the Bio::Ontology::SimpleGoEngine which, I had
> initially overlooked.
>
> I would like to commit this code to the Bioperl CVS, but I don't have
> write access to it I believe. Can I send the stuff to either of you
> guys?
>
> Hilmar, I would like your feedback on the code base and would be happy
> to make any changes required before we commit it to the CVS.
>
> Thanks,
> Sohel Merchant.
> dictyBase
>
> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Monday, February 20, 2006 8:53 PM
> To: chris mungall
> Cc: Bioperl; Sohel Merchant
> Subject: Re: [Bioperl-l] Bio::Ontology::Ontology
>
> On 2/20/06, chris mungall  wrote:
> >
> > I like the idea of using an ontology to describe the ontology.
> >
> > Note that the proposed structure:
> > OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
> >
> > will lead to cycles in the object graph when the metadata ontology
> > describes itself.
>
> Yes I know, that's why I didn't want to be too vocal about it ...
>
> >
> > actually, I think the ontology module already has object reference
> > cycles. TermI->OntologyI->TermI
> >
> > When I brought this up originally people didn't seem to care much - so
> > long as you're only parsing GO then it's not a big issue, people have
> > enough memory they won't notice a big chunk of memory that refuses to
> > be garbage collected way after it's used.
>
> There is a method that destroys the cycle: $ontology->close()
> (this is also an interface method)
>
> Essentially, the cycle is not in OntologyI itself but in OntologyI
> HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
> terms which (may) hold a reference to an OntologyI which holds a
> reference to the OntologyEngineI.
>
> I say 'may' in parentheses because an implementation may use tricks
> like late instantiation, stringified references (handles), and weak
> references. It's possible to avoid the cycle altogether using such
> tricks but it remains questionable how much this then affects
> performance, and how ugly and incomprehensible the code would become.
> Since there is the close() method I haven't bothered yet trying a
> fully de-cycled implementation.
>
> > Of course, if you want to use
> > bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> > different story.
>
> Well if you want to keep all three in memory for some kind of
> cross-reasoning then yes you are in trouble. But if you do one
> ontology after another, you'd just have make sure to call close() on
> an ontology once you're done with it.
>
> >
> > I think it's best of Sohel concentrates on getting obo.pm working,
> then
> > we can start thinking as a group about the best way to capture
> ontology
> > metadata. This includes metadata on the whole ontology, and metadata
> on
> > the terms (eg synonyms).
> >
> > To what extent are the current modules already in use?
>
> I don't know about others but I use them often.
>
> > I think the object cycle is a serious flaw, will it be possible to fix
> this without
> > a major overhaul?
>
> If I recall correctly the way go-perl circumvents this is by having
> the ontology of a term as a flat attribute. This also means that when
> having a term alone, you cannot ask for its connected terms. It's been
> a while, so Chris set me straight where this is not true.
>
> It should be possible to come up with an implementation of OntologyI
> that for all intents and purposes behaves like a flat scalar giving
> the name until you call one of its graph traversal methods. At that
> point it would instantiate the engine from persistent storage (file,
> or a database connection), or retrieve one from a 'store'. The latter
> is I believe what Allen started with the OntologyStore, but again I
> would need to check the details.
>
>     -hilmar
>
> >
> >
> > On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
> >
> > > Sohel, please do keep the discussion on the list, in your own
> interest
> > > as there's a multitude of people who can respond to you.
> > >
> > > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > > might even create an ontology for annotating ontologies, which would
> > > allow you to use Annotation::OntologyTerm for annotation, but then
> > > there's no qualifier value ...
> > >
> > > Bioperl 1.5.1 has been released last year, please check the website.
> > >
> > >       -hilmar
> > >
> > > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> > >
> > >> Hi Hilmar,
> > >>   I really like your suggestion of implementing the
> Bio::AnnotatableI
> > >> interface in the Bio::Ontology::Ontology class. I am going to
> > >> implement
> > >> this and play around a little with it. I am planning to use
> > >> Bio::Annotation::SimpleValue for annotating the header as it
> provides
> > >> a
> > >> good way of specifying the Tag/value pair. What are your thoughts
> on
> > >> using this?
> > >>
> > >>   Also, I was wondering if you have any idea about the scheduled
> date
> > >> for the Bioperl 1.51 release. I would like to contribute some stuff
> in
> > >> the next release.
> > >>
> > >> Thanks,
> > >> Sohel.
> > >>
> > >> -----Original Message-----
> > >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > >> Sent: Friday, February 10, 2006 3:40 PM
> > >> To: Sohel Merchant
> > >> Cc: Bioperl
> > >> Subject: Re: Bio::Ontology::Ontology
> > >>
> > >> Sohel,
> > >>
> > >> please allow me to copy the list in my response. There's many good
> and
> > >> insightful people on the list who may have something to add or
> > >> different ideas.
> > >>
> > >> I've come across that problem myself, for instance with InterPro.
> What
> > >> I've done so far simply is to stick it unstructured into the
> > >> definition
> > >> slot, which is not helpful if your purpose goes further than just
> > >> displaying it in an unstructured fashion.
> > >>
> > >> I'm not sure you would want to create another class for this (like
> > >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e.,
> the
> > >> implementation, probably not the interface) annotatable (i.e.,
> > >> implement Bio::Annotatable), which supposedly would be simple to do
> > >> (AnnotationCollection is already implemented, you'd just return an
> > >> instance of it).
> > >>
> > >> Even though tag/value pairs sound like quick&fast way to go I'm
> > >> leaning
> > >> against it; in essence we're moving away from that elsewhere
> > >> (SeqFeatureI) and hence I don't think we should restart it here.
> > >>
> > >> I'm not giving a definitive answer here, just my (initial)
> thoughts.
> > >> Hope that helps nonetheless. Can you fancy yourself trying the
> > >> Annotatable approach and let us know how it goes?
> > >>
> > >>      -hilmar
> > >>
> > >>
> > >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> > >>
> > >>> Hi Hilmar,
> > >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> > >>> Northwestern University. I am working on a parser for an ontology
> > >>> file. I really like the ontology object model which you have
> > >>> contributed to Bioperl. I think its just Awesome!! One of things
> > >>> which
> > >>
> > >>> I thought would be great to capture is the ontology headers. Right
> > >>> now
> > >>
> > >>> one can specify only the name, authority information. I was
> wondering
> > >>> if there is any way, I could also capture other ontology file
> headers
> > >>> like version of the file, date when that ontology file was made. I
> > >>> was
> > >>
> > >>> thinking of making a header class or alternatively it could go as
> > >>> Hash
> > >>
> > >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> > >>> know whets your thoughts about on this.
> > >>>
> > >>> Thanks,
> > >>> Sohel Merchant
> > >>> dictyBase
> > >>>
> > >> --
> > >> -------------------------------------------------------------
> > >> Hilmar Lapp                            email: lapp at gnf.org
> > >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > >> -------------------------------------------------------------
> > >>
> > >>
> > >>
> > >>
> > > --
> > > -------------------------------------------------------------
> > > Hilmar Lapp                            email: lapp at gnf.org
> > > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > > -------------------------------------------------------------
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From MEC at stowers-institute.org  Tue Feb 21 15:38:55 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 21 Feb 2006 14:38:55 -0600
Subject: [Bioperl-l] Pattern Density
Message-ID: 

 
You might consider displaying ccgg content as a track in mouse genome
browser at
http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34
 
For example, the following track causes it to display 3 proportionally
sized red boxes in the first 3K of mouse Chr1 

[MotifContent]
glyph = xyplot
graph_type = boxes
fgcolor = black
bgcolor = red
height=100
min_score=0
max_score=100
label=1
key="Motif Content"

reference=Chr1
MotifContent CCGG   1..1000    score=20
MotifContent CCGG   1001..2000    score=50
MotifContent CCGG   2001..3000    score=30


There are many ways for computing the score.  I myself would begin with:

#!/usr/bin/env perl
use strict;

use Bio::SeqIO; # for reading sequence to scan
use TFBS::Word::Consensus; # for the pattern matching.  cf.
http://forkhead.cgb.ki.se/TFBS/ 
use PDL::Basic; # if you have it installed, for the histogram binning
statistics 

 
 



________________________________

	From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa
	Sent: Tuesday, February 21, 2006 11:25 AM
	To: bioperl-l at lists.open-bio.org
	Subject: [Bioperl-l] Pattern Density
	
	
	Good Friends, 
	I have an important client who wants a histogram display of the
density of "ccgg" along any chromosome of the mouse genome in 1000 bp
windows. 

	I'm thinking that maybe there is a bio-perl module that could
help with this. 
	That'd probably beat having to write something from scratch. 
	Any help that you give would be greatly appreciated. 
	I am more concerned about the reading and analysis of the
sequence than actual plotting of the histogram, but anything you can
offer will be appreciated. 

	Thank you. 

	Nick Staffa 
	Telephone: 919-316-4569 (NIEHS: 6-4569) 
	Scientific Computing Support Group 
	NIEHS Information Technology Support Services Contract 
	(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) 
	National Institute of Environmental Health Sciences 
	National Institutes of Health 
	Research Triangle Park, North Carolina 




From cjfields at uiuc.edu  Tue Feb 21 16:15:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 15:15:18 -0600
Subject: [Bioperl-l] bioperl maillist searches not updated
Message-ID: <000801c6372b$eae00870$15327e82@pyrimidine>

Seems that using Google to search through the mailing list will only get
mail up to the beginning of August 2005.  I went back to look up Hilmar's
email on bioperl-db recently and can't find it.  So I tried anything in
2006:

http://www.google.com/search?hl=en&lr=&safe=off&as_qdr=all&q=site%3Abioperl.
org+inurl%3Apipermail+inurl%3Abioperl-l+2006&btnG=Search

And got nothin'!

The Open-Bio form has some mail from 2006, but only up to 1-24-2006.
Luckily, the mailing list archives seem to be fine:



Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From osborne1 at optonline.net  Tue Feb 21 16:13:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 16:13:44 -0500
Subject: [Bioperl-l] Pattern Density
In-Reply-To: 
Message-ID: 

Nick,

I was mistaken previously when I hinted that you couldn't create histograms
using Bioperl:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Graphics/Glyph/xyplot.
html

This could do exactly what you want.

Brian O.


On 2/21/06 3:38 PM, "Cook, Malcolm"  wrote:

>  
> You might consider displaying ccgg content as a track in mouse genome
> browser at
> http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34
>  
> For example, the following track causes it to display 3 proportionally
> sized red boxes in the first 3K of mouse Chr1
> 
> [MotifContent]
> glyph = xyplot
> graph_type = boxes
> fgcolor = black
> bgcolor = red
> height=100
> min_score=0
> max_score=100
> label=1
> key="Motif Content"
> 
> reference=Chr1
> MotifContent CCGG   1..1000    score=20
> MotifContent CCGG   1001..2000    score=50
> MotifContent CCGG   2001..3000    score=30
> 
> 
> There are many ways for computing the score.  I myself would begin with:
> 
> #!/usr/bin/env perl
> use strict;
> 
> use Bio::SeqIO; # for reading sequence to scan
> use TFBS::Word::Consensus; # for the pattern matching.  cf.
> http://forkhead.cgb.ki.se/TFBS/
> use PDL::Basic; # if you have it installed, for the histogram binning
> statistics 
> 
>  
>  
> 
> 
> 
> ________________________________
> 
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa
> Sent: Tuesday, February 21, 2006 11:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Pattern Density
> 
> 
> Good Friends, 
> I have an important client who wants a histogram display of the
> density of "ccgg" along any chromosome of the mouse genome in 1000 bp
> windows. 
> 
> I'm thinking that maybe there is a bio-perl module that could
> help with this. 
> That'd probably beat having to write something from scratch.
> Any help that you give would be greatly appreciated.
> I am more concerned about the reading and analysis of the
> sequence than actual plotting of the histogram, but anything you can
> offer will be appreciated.
> 
> Thank you. 
> 
> Nick Staffa 
> Telephone: 919-316-4569 (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Tue Feb 21 16:58:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 15:58:07 -0600
Subject: [Bioperl-l] bioperl-db issues
Message-ID: <000d01c63731$e61be1f0$15327e82@pyrimidine>

Sorry about the huge delay in this response, got caught up with other
things.

> > Bad News:  There's a new problem now. I updated from CVS yesterday; I
> > walked
> > through the steps and ran 'nmake test', with everything passing fine.
> > However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> > every 5 minutes or so.  I noticed (when using '-debug') that it is
> > hanging
> > up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
> > database,
> > load the biosql schema, and load sequences w/o loading taxonomy, the
> > problem
> > goes away.
> >
> > Here's the debugging output (I cut it off at the point it hangs up):
> > [...]
> 
> > preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> > taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> > taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
> > ncbi_taxon_id =
> > ?
> > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> > SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
> 
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name
> over (taxon_id,name,name_class). Also, there should be separate indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

So far everything looks like you mentioned (see below for the ANALYZE
stuff).  The only thing that I wasn't sure about was that taxon_name indexes
were all primary keys.  That's really it.

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
> 
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x version
> of MySQL under certain circumstances you have to analyze the statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
> 
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
> 
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
> 
> This should show you whether the index would be picked up for the query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).
> 
> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?

I find it odd that it worked well back in December and doesn't work now.  I
updated bioperl and bioperl-db from CVS since then, so have there been any
changes that may have caused this?  I noticed a few changes here and there.

Here's what I have tried thus far:

1) I reinstalled MySQL.  I thought it might be that I had my database on a
partitioned drive, so I reinstalled on the main drive.

2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the
schema, and got the same error when loading (hanging on SpeciesAdaptor.
Tried ANALYZE:
------------------------------------
mysql> ANALYZE TABLE taxon;
+----------------+---------+----------+----------+
| Table          | Op      | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| bioseqdb.taxon | analyze | status   | OK       |
+----------------+---------+----------+----------+
1 row in set (0.42 sec)

mysql> ANALYZE TABLE taxon_name;
+---------------------+---------+----------+----------+
| Table               | Op      | Msg_type | Msg_text |
+---------------------+---------+----------+----------+
| bioseqdb.taxon_name | analyze | status   | OK       |
+---------------------+---------+----------+----------+
1 row in set (0.36 sec)

mysql>
------------------------------------
so that's fine.  

3) Using EXPLAIN table:
------------------------------------
mysql> EXPLAIN taxon;
+-------------------+---------------------+------+-----+---------+----------
------+
| Field             | Type                | Null | Key | Default | Extra
|
+-------------------+---------------------+------+-----+---------+----------
------+
| taxon_id          | int(10) unsigned    | NO   | PRI | NULL    |
auto_increment |
| ncbi_taxon_id     | int(10)             | YES  | UNI | NULL    |
|
| parent_taxon_id   | int(10) unsigned    | YES  | MUL | NULL    |
|
| node_rank         | varchar(32)         | YES  |     | NULL    |
|
| genetic_code      | tinyint(3) unsigned | YES  |     | NULL    |
|
| mito_genetic_code | tinyint(3) unsigned | YES  |     | NULL    |
|
| left_value        | int(10) unsigned    | YES  | UNI | NULL    |
|
| right_value       | int(10) unsigned    | YES  | UNI | NULL    |
|
+-------------------+---------------------+------+-----+---------+----------
------+
8 rows in set (0.02 sec)

mysql> EXPLAIN taxon_name;
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| taxon_id   | int(10) unsigned | NO   | PRI |         |       |
| name       | varchar(255)     | NO   | PRI |         |       |
| name_class | varchar(32)      | NO   | PRI |         |       |
+------------+------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

------------------------------------
Does taxon_name need three primary keys?

4) So I tried reloading the sequences:
------------------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -format
genbank -dbname bioseqdb -dbuser root -dbpass ********** -testonly -safe
-debug NP_249092.gpt

And got this:

Loading NP_249092.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
......
SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank)
SimpleValueAdaptor::add_assoc: binding column 1 to "21" (FK to
Bio::SeqFeature::Generic)
SimpleValueAdaptor::add_assoc: binding column 2 to "34" (FK to
Bio::Annotation::SimpleValue)
SimpleValueAdaptor::add_assoc: binding column 3 to "11" (value)
SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank)
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
BioNamespaceAdaptor: binding UK column 1 to "bioperl" (namespace)
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
------------------------------------
Which is where it hangs, as before, usually about 2 minutes for each
sequence.  It seems there's a timeout happening in there somewhere...  It
definitely has something to do with the lookup, but like I said it did run
much faster last Nov-Dec.

So I'm a bit lost now.  Any ideas?  

I may try re-optimizing tables to see if it helps any.

I'm also really thinking of giving postgresql a shot but I have used mysql
for a while now; I'd like to stay with it if I can.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From cjfields at uiuc.edu  Tue Feb 21 23:09:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 22:09:18 -0600
Subject: [Bioperl-l] bioperl-db issues
In-Reply-To: 
Message-ID: <000001c63765$c0472370$15327e82@pyrimidine>

I got it worked out.  The Windows installer had picked out lower memory
settings (key buffer 10M, for instance) when I reinstalled, which
drastically slowed everything down.  I reset the settings for a server
environment and it's fine now.  Well, as fine as it will likely get since
I'm running this on a 1.8 GHz P4 with 756 MB RAM, so I'm not expecting it to
actually fly.  It's loading at about two sequences/second.  I'll have to see
if I get a speed improvement when optimizing tables.  I'll add this to the
wiki for installing bioperl-db under Windows.  

Are there optimal settings for using bioperl-db, such as key buffer and sort
buffer size, buffer pool size, etc?  Or do you think I'm likely to run into
a processor speed limit?  Just trying to get a fix on how much memory I
could push towards getting a smaller sequence database loaded, nothing like
swissprot.  I saw something in the mail list about setting
max_allowed_packet and a few other settings but that was about four years
ago.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Tuesday, February 21, 2006 6:44 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: bioperl-db issues
> 
> On 2/21/06, Chris Fields  wrote:
> > [...]
> > I find it odd that it worked well back in December and doesn't work now.
> I
> > updated bioperl and bioperl-db from CVS since then, so have there been
> any
> > changes that may have caused this?  I noticed a few changes here and
> there.
> 
> The changes were fixes to retrieve the rank on persistent annotation
> objects (it was only stored before, but never retrieved). Neither the
> SpeciesAdaptor nor any of the taxonomy queries was affected by this.
> 
> >
> > Here's what I have tried thus far:
> >
> > 1) I reinstalled MySQL.  I thought it might be that I had my database on
> a
> > partitioned drive, so I reinstalled on the main drive.
> >
> > 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded
> the
> > schema, and got the same error when loading (hanging on SpeciesAdaptor.
> > Tried ANALYZE:
> > ------------------------------------
> > mysql> ANALYZE TABLE taxon;
> > +----------------+---------+----------+----------+
> > | Table          | Op      | Msg_type | Msg_text |
> > +----------------+---------+----------+----------+
> > | bioseqdb.taxon | analyze | status   | OK       |
> > +----------------+---------+----------+----------+
> > 1 row in set (0.42 sec)
> >
> > mysql> ANALYZE TABLE taxon_name;
> > +---------------------+---------+----------+----------+
> > | Table               | Op      | Msg_type | Msg_text |
> > +---------------------+---------+----------+----------+
> > | bioseqdb.taxon_name | analyze | status   | OK       |
> > +---------------------+---------+----------+----------+
> > 1 row in set (0.36 sec)
> 
> I'm not sure but you may have to analyze all tables.
> 
> >
> > mysql>
> > ------------------------------------
> > so that's fine.
> >
> > 3) Using EXPLAIN table:
> > ------------------------------------
> > mysql> EXPLAIN taxon;
> 
> Note that you wouldn't use EXPLAIN on a table but on a query instead.
> I.e., copy&paste the offending query into the mysql editor, prefix it
> with EXPLAIN and then see what the results are. It should show whether
> the indexes are being used properly.
> 
> Most likely it doesn't use one of the idnexes that it should be using
> but does a full table scan instead. The explain plan should pinpoint
> that.
> 
> BTW you can also use this to reconfirm the command line observation
> about the query being slow - it should 'hang' in the mysql shell as
> well. If it doesn't then there is something else going on. (if the
> placeholders pose a problem replace them with the actual values as
> given in the log)
> 
> > [..]
> > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> > SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
> > ------------------------------------
> > Which is where it hangs, as before, usually about 2 minutes for each
> > sequence.
> 
> Do you also see a SELECT CLASSIFICATION query succeeding the one above
> (e.g., if you wait)? I'm asking because I'm surprised that that isn't
> the one you're seeing as taking too long, because it has been reported
> earlier to cause such problems with mysql. Alex Zelensky posted what
> he found worked as a fix.
> 
>   -hilmar
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------



From hlapp at gmx.net  Tue Feb 21 19:43:42 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 16:43:42 -0800
Subject: [Bioperl-l] bioperl-db issues
In-Reply-To: <000d01c63731$e61be1f0$15327e82@pyrimidine>
References: <000d01c63731$e61be1f0$15327e82@pyrimidine>
Message-ID: 

On 2/21/06, Chris Fields  wrote:
> [...]
> I find it odd that it worked well back in December and doesn't work now.  I
> updated bioperl and bioperl-db from CVS since then, so have there been any
> changes that may have caused this?  I noticed a few changes here and there.

The changes were fixes to retrieve the rank on persistent annotation
objects (it was only stored before, but never retrieved). Neither the
SpeciesAdaptor nor any of the taxonomy queries was affected by this.

>
> Here's what I have tried thus far:
>
> 1) I reinstalled MySQL.  I thought it might be that I had my database on a
> partitioned drive, so I reinstalled on the main drive.
>
> 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the
> schema, and got the same error when loading (hanging on SpeciesAdaptor.
> Tried ANALYZE:
> ------------------------------------
> mysql> ANALYZE TABLE taxon;
> +----------------+---------+----------+----------+
> | Table          | Op      | Msg_type | Msg_text |
> +----------------+---------+----------+----------+
> | bioseqdb.taxon | analyze | status   | OK       |
> +----------------+---------+----------+----------+
> 1 row in set (0.42 sec)
>
> mysql> ANALYZE TABLE taxon_name;
> +---------------------+---------+----------+----------+
> | Table               | Op      | Msg_type | Msg_text |
> +---------------------+---------+----------+----------+
> | bioseqdb.taxon_name | analyze | status   | OK       |
> +---------------------+---------+----------+----------+
> 1 row in set (0.36 sec)

I'm not sure but you may have to analyze all tables.

>
> mysql>
> ------------------------------------
> so that's fine.
>
> 3) Using EXPLAIN table:
> ------------------------------------
> mysql> EXPLAIN taxon;

Note that you wouldn't use EXPLAIN on a table but on a query instead.
I.e., copy&paste the offending query into the mysql editor, prefix it
with EXPLAIN and then see what the results are. It should show whether
the indexes are being used properly.

Most likely it doesn't use one of the idnexes that it should be using
but does a full table scan instead. The explain plan should pinpoint
that.

BTW you can also use this to reconfirm the command line observation
about the query being slow - it should 'hang' in the mysql shell as
well. If it doesn't then there is something else going on. (if the
placeholders pose a problem replace them with the actual values as
given in the log)

> [..]
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
> ------------------------------------
> Which is where it hangs, as before, usually about 2 minutes for each
> sequence.

Do you also see a SELECT CLASSIFICATION query succeeding the one above
(e.g., if you wait)? I'm asking because I'm surprised that that isn't
the one you're seeing as taking too long, because it has been reported
earlier to cause such problems with mysql. Alex Zelensky posted what
he found worked as a fix.

  -hilmar
--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb 22 00:13:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 23:13:18 -0600
Subject: [Bioperl-l] removing sequences from a database?
Message-ID: <000001c6376e$b113c170$15327e82@pyrimidine>

I think this has been posed once but I couldn't find a straight answer on
the mailing list; is there a way to remove sequences in a BioSQL database
using bioperl-db?  This is the last I heard about it:

http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From hlapp at gmx.net  Wed Feb 22 00:20:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 21:20:05 -0800
Subject: [Bioperl-l] removing sequences from a database?
In-Reply-To: <000001c6376e$b113c170$15327e82@pyrimidine>
References: <000001c6376e$b113c170$15327e82@pyrimidine>
Message-ID: 

This is a pretty old posting :-) Sure you can remove sequences. In
fact you can remove any persistent object by calling $pobj->remove().
I.e., for a persistent sequence (which is what you get from the
adaptors): $pseq->remove()

Do not forget to call commit() on the persistence adaptor or the
persistent object itself or otherwise the operation is rolled back
when you disconnect.

BTW there are examples for objects other than the sequence object
itself (say you want to remove only the features) in the
scripts/biosql directory; some of the --mergeobjs closure examples do
this.

    -hilmar

On 2/21/06, Chris Fields  wrote:
> I think this has been posed once but I couldn't find a straight answer on
> the mailing list; is there a way to remove sequences in a BioSQL database
> using bioperl-db?  This is the last I heard about it:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 05:20:10 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 10:20:10 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602211326.00021.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
Message-ID: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> Hi Dave,
> 
> Well, when you are using 1-based coordinates, an line that contains 44 
> intervals will have 45 ticks. If you move to 0-based coordinates, then the 
> first tick will be labeled 0 and the last tick will be labeled 44. An 
> alternative is to make each base dimensionless, but that becomes a problem 
> when dealing with single base features, such as SNPs.
 >
> These issues are why I have long advocated for interbase coordinates
> in which you number the positions between bases rather than the bases
> themselves.

I see your point but I need to work with the coordinates that the users 
expect and are familiar with. (Things get much worse with PDB residue 
numbering :)

> Draw me the picture of what you expect to see. I think of it this way:
> 
> 	1    2  3  4   5   6
>          A>G>C>T>A>

I guess something went wrong with your ASCII art :(

OK, consider a 44-residue entry from SwissProt (P12239):

   TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR

The first T is numbered 1 and the last R is numbered 44.

So I expect to see a line with 44 positions indicated somehow (whether 
these are half-open intervals or points on the line), with the number 1 
at the left end and the number 44 at the right end.

An important point is that if I then place other tracks below this one 
that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, 
they should align properly (according to whatever convention is used to 
represent a residue).

For a short sequence like this it would be possible to use letters to 
represent the residue but I'd like to use the same convention for longer 
sequences as well and have everything be consistent.

I'm hoping Bio:Graphics will make this easy.

Thanks, Dave


From khoueiry at ibdm.univ-mrs.fr  Wed Feb 22 04:12:20 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Wed, 22 Feb 2006 10:12:20 +0100
Subject: [Bioperl-l] [Fwd: Re:  Pattern Density]
Message-ID: <1140599541.19981.26.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 
-------------- next part --------------
An embedded message was scrubbed...
From: khoueiry 
Subject: Re: [Bioperl-l] Pattern Density
Date: Tue, 21 Feb 2006 19:47:54 +0100
Size: 3812
URL: 

From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 10:13:10 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 15:13:10 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <1140619014.3142.81.camel@localhost.localdomain>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>	
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<1140619014.3142.81.camel@localhost.localdomain>
Message-ID: <43FC7F86.6060901@mrc-lmb.cam.ac.uk>

Scott Cain wrote:
> I don't know if this helps at all, but you could think of that 45 tick
> mark as the termination, since the space between the 44th and the 45th
> tick mark corresponds to your 44th residue.

Yes, that's the way I do think of it and that's the way I expect 
everybody else to think of it.

But the numbers need to match the residues in any case. ie. the numbers 
need to match the spaces not the tick marks, if the spaces match the 
residues.

> I suppose it is a matter of correctly training your users :-)

The important thing is to have a consistent model, then it's easy to 
explain to users.

Cheers, Dave


From lstein at cshl.edu  Wed Feb 22 11:22:02 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 11:22:02 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
Message-ID: <200602221122.02707.lstein@cshl.edu>

The base starts at the tickmark and extends to (but doesn't touch) the next 
one. If you are down at the resolution at which you see residue letters, then 
lines drawn underneath the letters will line up like this:

 1  2  3  4  5  6  7  8  9 10    ticks
 T  S  N  T  P  N  Q  E  P       residues
    =========   ===========      domains

Right?

Lincoln

On Wednesday 22 February 2006 05:20, Dave Howorth wrote:
> Lincoln Stein wrote:
> > Hi Dave,
> >
> > Well, when you are using 1-based coordinates, an line that contains 44
> > intervals will have 45 ticks. If you move to 0-based coordinates, then
> > the first tick will be labeled 0 and the last tick will be labeled 44. An
> > alternative is to make each base dimensionless, but that becomes a
> > problem when dealing with single base features, such as SNPs.
> >
> > These issues are why I have long advocated for interbase coordinates
> > in which you number the positions between bases rather than the bases
> > themselves.
>
> I see your point but I need to work with the coordinates that the users
> expect and are familiar with. (Things get much worse with PDB residue
> numbering :)
>
> > Draw me the picture of what you expect to see. I think of it this way:
> >
> > 	1    2  3  4   5   6
> >          A>G>C>T>A>
>
> I guess something went wrong with your ASCII art :(
>
> OK, consider a 44-residue entry from SwissProt (P12239):
>
>    TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR
>
> The first T is numbered 1 and the last R is numbered 44.
>
> So I expect to see a line with 44 positions indicated somehow (whether
> these are half-open intervals or points on the line), with the number 1
> at the left end and the number 44 at the right end.
>
> An important point is that if I then place other tracks below this one
> that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI,
> they should align properly (according to whatever convention is used to
> represent a residue).
>
> For a short sequence like this it would be possible to use letters to
> represent the residue but I'd like to use the same convention for longer
> sequences as well and have everything be consistent.
>
> I'm hoping Bio:Graphics will make this easy.
>
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 11:34:08 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 16:34:08 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602221122.02707.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<200602221122.02707.lstein@cshl.edu>
Message-ID: <43FC9280.1020008@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> The base starts at the tickmark and extends to (but doesn't touch) the next 
> one. If you are down at the resolution at which you see residue letters, then 
> lines drawn underneath the letters will line up like this:
> 
>  1  2  3  4  5  6  7  8  9 10    ticks
>  T  S  N  T  P  N  Q  E  P       residues
>     =========   ===========      domains
> 
> Right?

Yes. What's your point?

Dave


From cain at cshl.edu  Wed Feb 22 11:29:21 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 22 Feb 2006 11:29:21 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC7F86.6060901@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<1140619014.3142.81.camel@localhost.localdomain>
	<43FC7F86.6060901@mrc-lmb.cam.ac.uk>
Message-ID: <1140625762.3142.107.camel@localhost.localdomain>

Hi Dave,

I took the example code you posted a few days ago and added a few
motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
last residue), which results in the attached graphic.

As Lincoln pointed it, the features are drawn from the beginning (1 and
35), and through the last residue (up to but not touching 11 and 45).
So the space between 35 and 36 corresponds to residue 35.  That's the
way it works.

Scott


On Wed, 2006-02-22 at 15:13 +0000, Dave Howorth wrote:
> Scott Cain wrote:
> > I don't know if this helps at all, but you could think of that 45 tick
> > mark as the termination, since the space between the 44th and the 45th
> > tick mark corresponds to your 44th residue.
> 
> Yes, that's the way I do think of it and that's the way I expect 
> everybody else to think of it.
> 
> But the numbers need to match the residues in any case. ie. the numbers 
> need to match the spaces not the tick marks, if the spaces match the 
> residues.
> 
> > I suppose it is a matter of correctly training your users :-)
> 
> The important thing is to have a consistent model, then it's easy to 
> explain to users.
> 
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: motifs.png
Type: image/png
Size: 1879 bytes
Desc: not available
URL: 

From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 11:45:00 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 16:45:00 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <1140625762.3142.107.camel@localhost.localdomain>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>	
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>	
	<1140619014.3142.81.camel@localhost.localdomain>	
	<43FC7F86.6060901@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
Message-ID: <43FC950C.7080007@mrc-lmb.cam.ac.uk>

Scott Cain wrote:
> Hi Dave,
> 
> I took the example code you posted a few days ago and added a few
> motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> last residue), which results in the attached graphic.

Yes, that's the same sort of graphic I'm getting.

> As Lincoln pointed it, the features are drawn from the beginning (1 and
> 35), and through the last residue (up to but not touching 11 and 45).
> So the space between 35 and 36 corresponds to residue 35.

But there is no residue 45!  So there should be no number 45 anywhere on 
the picture.

I think the problem is that the tick strip is displaying numbers for the 
ticks instead of the intervals. The intervals are what corresponds to 
users' models of physical reality and my graphics need to match that.

 > That's the way it works.

I guess I'll have to experiment and patch until it does what I want 
then, if nobody knows how to do it.

Cheers, Dave


From iamvela at yahoo.com  Wed Feb 22 12:21:59 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 09:21:59 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
Message-ID: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com>

Hi All:

I am new to Perl/BioPerl world.

I am debugging a program that used to work fine
before. 
Blast works fine and returns results, but I am unale
to get any hits from the results.

Here is the relevant code:

$blastObj = new Bio::SearchIO (-file=>$resultsFile,
-format=>'blast');
  while (my $result = $blastObj->next_result()) {
     while (my $bioPerlHit = $result->next_hit()) {
         .......


The first while condition returns true, but the second
while condition returns false. So looks like there is
some result, but it is unable to identify the hits in
the result. I printed the $result (pasted below).

Any ideas/comments to resolve this? Thanks in advance.

I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on
Windows XP platform. 

Like I said before, this application was running fine
on a different windows machine with similar
environment,so looks like there is some change in the
products/versions that is causing the problem.

thanks again,
Raghu




Blast result (i can send complete result if you need
it):

BLASTP 2.2.13 [Nov-27-2005]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Sch?ffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman 
(1997), "Gapped BLAST and PSI-BLAST: a new generation
of 
protein database search programs", Nucleic Acids Res.
25:3389-3402.

RID: 1140573059-19990-140117828872.BLASTQ1


Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding
environmental samples
           3,297,000 sequences; 1,129,354,045 total
letters
Query=  
Length=360


                                                      
            Score     E
Sequences producing significant alignments:           
            (Bits)  Value

ref|XP_534770.2|  PREDICTED: similar to
Mitogen-activated prot...   739    0.0   
gb|AAX36107.1|  mitogen-activated protein kinase 1
[synthetic con   739    0.0   
pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
Complexed...   739    0.0   
pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
With A S...   739    0.0   
ref|NP_786987.1|  mitogen-activated protein kinase 1
[Bos taur...   739    0.0   
emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
>prf||1813...   738    0.0   
gb|AAQ02541.1|  mitogen-activated protein kinase 1
[synthetic con   736    0.0   
gb|AAH99905.1|  Mitogen-activated protein kinase 1
[Homo sapiens]   735    0.0   
emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
             734    0.0   
gb|AAH58258.1|  Mitogen activated protein kinase 1
[Mus muscul...   731    0.0   
pdb|4ERK|   The Complex Structure Of The Map Kinase
Erk2OLOMOU...   731    0.0   
pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
Arginin...   730    0.0   
ref|XP_860750.1|  PREDICTED: similar to
Mitogen-activated prot...   729    0.0   
gb|AAK56503.1|  extracellular signal-regulated kinase
2 [Gallu...   726    0.0   
ref|XP_860716.1|  PREDICTED: similar to
Mitogen-activated prot...   726    0.0   
pdb|2ERK|   Phosphorylated Map Kinase Erk2            
             726    0.0   
pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
Kinase Co...   725    0.0   
ref|XP_860682.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
ref|XP_860651.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
>prf||181...   717    0.0   
ref|NP_001017127.1|  mitogen-activated protein kinase
1 [Xenopus    715    0.0   
dbj|BAE28679.1|  unnamed protein product [Mus
musculus]             713    0.0   
emb|CAA42482.1|  MAP kinase [Xenopus laevis]
>gb|AAH60748.1| M...   711    0.0   
sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
1 (Myel...   711    0.0   
gb|AAH76730.1|  Xp42 protein [Xenopus laevis]         
             706    0.0   
gb|AAH65868.1|  Mitogen-activated protein kinase 1
[Danio rerio]    696    0.0   
dbj|BAD23843.1|  extracellular signal regulated
protein kinase...   694    0.0   
ref|NP_878308.2|  mitogen-activated protein kinase 1
[Danio re...   694    0.0   
emb|CAG07778.1|  unnamed protein product [Tetraodon
nigroviridis]   692    0.0   
dbj|BAB11813.1|  ERK2 [Danio rerio]                   
             689    0.0   
gb|AAY57805.1|  extracellular signal-regulated kinase
2 [Danio re   687    0.0   
gb|AAH45505.1|  Mitogen-activated protein kinase 3
[Danio reri...   654    0.0   
dbj|BAB11812.1|  ERK1 [Danio rerio]                   
             654    0.0   
ref|XP_609884.2|  PREDICTED: similar to mitogen
activated prot...   653    0.0   
dbj|BAD23842.1|  extracellular signal regulated
protein kinase...   650    0.0   
gb|AAH29712.1|  Mitogen activated protein kinase 3
[Mus muscul...   644    0.0   
ref|XP_885698.1|  PREDICTED: similar to mitogen
activated prot...   644    0.0   
gb|AAA20009.1|  microtubule-associated protein-2
kinase             643    0.0   
emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
>ref|NP_059043...   641    0.0   
gb|AAH13992.1|  Mitogen-activated protein kinase 3
[Homo sapie...   641    0.0   
gb|AAQ02422.1|  mitogen-activated protein kinase 3
[synthetic ...   641    0.0   
gb|AAA41123.1|  extracellular signal-regulated kinase
1             640    0.0   
ref|XP_854045.1|  PREDICTED: similar to mitogen
activated prot...   640    0.0   
gb|AAA63486.1|  extracellular-signal-regulated kinase
1 [Rattus n   640    0.0   
emb|CAG02655.1|  unnamed protein product [Tetraodon
nigroviridis]   640    0.0   
emb|CAA42744.1|  protein serine/threonine kinase [Homo
sapiens...   639    0.0   
gb|AAA36142.1|  kinase 1                              
             639    0.0   
emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
>prf||181...   639    0.0   
ref|XP_885840.1|  PREDICTED: similar to mitogen
activated prot...   632    5e-180
ref|XP_885818.1|  PREDICTED: similar to mitogen
activated prot...   630    3e-179
ref|XP_860621.1|  PREDICTED: similar to
Mitogen-activated prot...   627    2e-178
gb|AAF71666.1|  extracellular signal-regulated kinase
1b [Rattus    627    2e-178
ref|XP_393029.1|  PREDICTED: similar to MAP kinase
[Apis mellifer   621    1e-176
gb|AAA83210.1|  MAP kinase                            
             619    4e-176
dbj|BAE46741.1|  Extracellular regulated MAP kinase
[Bombyx mori]   618    1e-175
gb|AAH13754.1|  Mapk3 protein [Mus musculus]          
             612    9e-174
dbj|BAE06412.1|  mitogen-activated protein kinase
[Ciona intestin   607    2e-172
dbj|BAE33167.1|  unnamed protein product [Mus
musculus]             600    3e-170
gb|AAN46679.1|  MAP kinase [Strongylocentrotus
purpuratus] >re...   598    1e-169
dbj|BAC02940.1|  mitogen-activated protein kinase
[Halocynthia ro   592    6e-168
gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
>gb|EAA4631...   590    2e-167
emb|CAD97888.1|  hypothetical protein [Homo sapiens]  
             589    5e-167
emb|CAD60453.1|  extracellular signal-regulated
protein kinase...   589    5e-167
emb|CAD56894.1|  mitogen-activated protein kinase 1
[Meloidogyne    589    6e-167
ref|XP_536917.2|  PREDICTED: similar to mitogen
activated prot...   588    1e-166
gb|AAN40736.1|  mitogen-activated protein kinase
[Paralichthys ol   586    4e-166
emb|CAE73725.1|  Hypothetical protein CBG21247
[Caenorhabditis br   583    3e-165
emb|CAA87057.1|  Hypothetical protein F43C1.2a
[Caenorhabditis...   581    2e-164
gb|AAA18956.1|  Sur-1 MAP kinase                      
             581    2e-164
emb|CAB60996.1|  Hypothetical protein F43C1.2b
[Caenorhabditis...   581    2e-164
gb|AAK52329.1|  extracellular signal-related kinase 1b
[Homo sapi   580    4e-164
ref|XP_885794.1|  PREDICTED: similar to mitogen
activated prot...   553    4e-156
ref|XP_868146.1|  PREDICTED: similar to mitogen
activated prot...   548    2e-154
gb|AAK52330.1|  extracellular signal-related kinase 1c
[Homo sapi   546    4e-154
dbj|BAA22620.1|  ERK2 [Mus musculus]                  
             544    2e-153
ref|XP_510921.1|  PREDICTED: mitogen-activated protein
kinase 3 [   529    8e-149
gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]    
             496    7e-139
emb|CAJ44437.1|  MAP kinase [Echinococcus
multilocularis]           491    1e-137
ref|XP_885774.1|  PREDICTED: similar to mitogen
activated prot...   444    3e-123
gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
str. PES...   431    2e-119
gb|AAZ38881.1|  extracellular regulated kinase
[Littorina littore   431    2e-119
emb|CAD60723.1|  unnamed protein product [Podospora
anserina]       411    2e-113
gb|AAK25816.1|  MAP kinase [Neurospora crassa]
>ref|XP_959713....   411    2e-113
gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
[Aspergillus ...   409    1e-112
gb|EAA74589.1|  hypothetical protein FG06385.1
[Gibberella zea...   409    1e-112
ref|XP_504312.1|  hypothetical protein [Yarrowia
lipolytica] >...   408    2e-112
gb|AAG01162.1|  mitogen-activated protein kinase
[Fusarium oxy...   408    2e-112
gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
>gb|AAK52840.1|...   408    2e-112
dbj|BAE57584.1|  unnamed protein product [Aspergillus
oryzae]       408    2e-112
dbj|BAD42855.1|  mitogen-activated protein kinase
[Bipolaris oryz   407    3e-112
gb|AAD50496.1|  mitogen activated protein kinase
[Colletotrichum    407    3e-112
gb|AAF05913.1|  mitogen-activated protein kinase
[Cochliobolus he   407    3e-112
gb|AAM89501.1|  mitogen-activated protein kinase
[Leptosphaeria m   407    3e-112
dbj|BAB21569.1|  mitogen-activated protein kinase
[Glomerella cin   407    3e-112
gb|AAB72017.1|  mitogen-activated protein kinase
[Nectria haem...   407    3e-112
emb|CAC36428.1|  mitogen activated protein kinase
[Gibberella fuj   406    6e-112
ref|XP_364720.1|  hypothetical protein MG09565.4
[Magnaporthe gri   406    6e-112
gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]   
             406    6e-112
gb|AAO63561.1|  mitogen activated protein kinase
[Verticillium fu   406    8e-112
dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]     
             405    1e-111

ALIGNMENTS
>ref|XP_534770.2| PREDICTED: similar to
Mitogen-activated protein kinase 1 (Extracellular 
signal-regulated kinase 2) (ERK-2) (Mitogen-activated 
protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
(ERT1) 
isoform 1 [Canis familiaris]
 ref|NP_620407.1| mitogen-activated protein kinase 1
[Homo sapiens]
 ref|NP_002736.3| mitogen-activated protein kinase 1
[Homo sapiens]
 gb|AAH17832.1| Mitogen-activated protein kinase 1
[Homo sapiens]
 sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
1 (Extracellular signal-regulated 
kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)

(MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
 gb|AAA58459.1| protein kinase 2
Length=360

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>gb|AAX36107.1| mitogen-activated protein kinase 1
[synthetic construct]
Length=361

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
Complexed With A Pyrazolopyridazine 
Derivative
Length=368

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  9   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 68

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From lstein at cshl.edu  Wed Feb 22 13:23:09 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:23:09 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221323.09872.lstein@cshl.edu>

Hi Dave,

If you want to adjust the way that the arrow.pm module draws the ticks, please 
make it a user-configurable option with the default being the current method. 
It should be easy enough to do this -- you just offset the position of the 
labels by 0.5 interval and inhibit drawing of the last one.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Wed Feb 22 13:40:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:40:27 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221340.28573.lstein@cshl.edu>

I have just committed a version of the arrow.pm glyph that has a 
-label_intervals flag.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Feb 22 14:45:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 13:45:54 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <000c01c637e8$980c6f90$15327e82@pyrimidine>

Upgrade bioperl from CVS using nmake. 

Installation instructions for using nmake:

http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core

You can download a tarball using anonymous CVS (link at bottom):

http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/

or use CVS directly:

http://www.bioperl.org/wiki/Using_CVS

Then make sure to grab the last SearchIO::last bugfix, which is not in CVS
yet:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

Replace the blast.pm in \site\lib\Bio\SearchIO in your Perl directory.

Does that fix it?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 11:22 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Blast returns result, but does not return hits
> 
> Hi All:
> 
> I am new to Perl/BioPerl world.
> 
> I am debugging a program that used to work fine
> before.
> Blast works fine and returns results, but I am unale
> to get any hits from the results.
> 
> Here is the relevant code:
> 
> $blastObj = new Bio::SearchIO (-file=>$resultsFile,
> -format=>'blast');
>   while (my $result = $blastObj->next_result()) {
>      while (my $bioPerlHit = $result->next_hit()) {
>          .......
> 
> 
> The first while condition returns true, but the second
> while condition returns false. So looks like there is
> some result, but it is unable to identify the hits in
> the result. I printed the $result (pasted below).
> 
> Any ideas/comments to resolve this? Thanks in advance.
> 
> I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on
> Windows XP platform.
> 
> Like I said before, this application was running fine
> on a different windows machine with similar
> environment,so looks like there is some change in the
> products/versions that is causing the problem.
> 
> thanks again,
> Raghu
> 
> 
> 
> 
> Blast result (i can send complete result if you need
> it):
> 
> 

> BLASTP 2.2.13 [Nov-27-2005]
> Reference: Altschul, Stephen F., Thomas L. Madden,
> Alejandro A. Sch?ffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> Lipman
> (1997), "Gapped BLAST and PSI-BLAST: a new generation
> of
> protein database search programs", Nucleic Acids Res.
> 25:3389-3402.
> 
> RID: 1140573059-19990-140117828872.BLASTQ1
> 
> 
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding
> environmental samples
>            3,297,000 sequences; 1,129,354,045 total
> letters
> Query=
> Length=360
> 
> 
> 
>             Score     E
> Sequences producing significant alignments:
>             (Bits)  Value
> 
> ref|XP_534770.2|  PREDICTED: similar to
> Mitogen-activated prot...   739    0.0
> gb|AAX36107.1|  mitogen-activated protein kinase 1
> [synthetic con   739    0.0
> pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
> Complexed...   739    0.0
> pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
> With A S...   739    0.0
> ref|NP_786987.1|  mitogen-activated protein kinase 1
> [Bos taur...   739    0.0
> emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
> >prf||1813...   738    0.0
> gb|AAQ02541.1|  mitogen-activated protein kinase 1
> [synthetic con   736    0.0
> gb|AAH99905.1|  Mitogen-activated protein kinase 1
> [Homo sapiens]   735    0.0
> emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
>              734    0.0
> gb|AAH58258.1|  Mitogen activated protein kinase 1
> [Mus muscul...   731    0.0
> pdb|4ERK|   The Complex Structure Of The Map Kinase
> Erk2OLOMOU...   731    0.0
> pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
> Arginin...   730    0.0
> ref|XP_860750.1|  PREDICTED: similar to
> Mitogen-activated prot...   729    0.0
> gb|AAK56503.1|  extracellular signal-regulated kinase
> 2 [Gallu...   726    0.0
> ref|XP_860716.1|  PREDICTED: similar to
> Mitogen-activated prot...   726    0.0
> pdb|2ERK|   Phosphorylated Map Kinase Erk2
>              726    0.0
> pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
> Kinase Co...   725    0.0
> ref|XP_860682.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> ref|XP_860651.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
> >prf||181...   717    0.0
> ref|NP_001017127.1|  mitogen-activated protein kinase
> 1 [Xenopus    715    0.0
> dbj|BAE28679.1|  unnamed protein product [Mus
> musculus]             713    0.0
> emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> >gb|AAH60748.1| M...   711    0.0
> sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
> 1 (Myel...   711    0.0
> gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
>              706    0.0
> gb|AAH65868.1|  Mitogen-activated protein kinase 1
> [Danio rerio]    696    0.0
> dbj|BAD23843.1|  extracellular signal regulated
> protein kinase...   694    0.0
> ref|NP_878308.2|  mitogen-activated protein kinase 1
> [Danio re...   694    0.0
> emb|CAG07778.1|  unnamed protein product [Tetraodon
> nigroviridis]   692    0.0
> dbj|BAB11813.1|  ERK2 [Danio rerio]
>              689    0.0
> gb|AAY57805.1|  extracellular signal-regulated kinase
> 2 [Danio re   687    0.0
> gb|AAH45505.1|  Mitogen-activated protein kinase 3
> [Danio reri...   654    0.0
> dbj|BAB11812.1|  ERK1 [Danio rerio]
>              654    0.0
> ref|XP_609884.2|  PREDICTED: similar to mitogen
> activated prot...   653    0.0
> dbj|BAD23842.1|  extracellular signal regulated
> protein kinase...   650    0.0
> gb|AAH29712.1|  Mitogen activated protein kinase 3
> [Mus muscul...   644    0.0
> ref|XP_885698.1|  PREDICTED: similar to mitogen
> activated prot...   644    0.0
> gb|AAA20009.1|  microtubule-associated protein-2
> kinase             643    0.0
> emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
> >ref|NP_059043...   641    0.0
> gb|AAH13992.1|  Mitogen-activated protein kinase 3
> [Homo sapie...   641    0.0
> gb|AAQ02422.1|  mitogen-activated protein kinase 3
> [synthetic ...   641    0.0
> gb|AAA41123.1|  extracellular signal-regulated kinase
> 1             640    0.0
> ref|XP_854045.1|  PREDICTED: similar to mitogen
> activated prot...   640    0.0
> gb|AAA63486.1|  extracellular-signal-regulated kinase
> 1 [Rattus n   640    0.0
> emb|CAG02655.1|  unnamed protein product [Tetraodon
> nigroviridis]   640    0.0
> emb|CAA42744.1|  protein serine/threonine kinase [Homo
> sapiens...   639    0.0
> gb|AAA36142.1|  kinase 1
>              639    0.0
> emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
> >prf||181...   639    0.0
> ref|XP_885840.1|  PREDICTED: similar to mitogen
> activated prot...   632    5e-180
> ref|XP_885818.1|  PREDICTED: similar to mitogen
> activated prot...   630    3e-179
> ref|XP_860621.1|  PREDICTED: similar to
> Mitogen-activated prot...   627    2e-178
> gb|AAF71666.1|  extracellular signal-regulated kinase
> 1b [Rattus    627    2e-178
> ref|XP_393029.1|  PREDICTED: similar to MAP kinase
> [Apis mellifer   621    1e-176
> gb|AAA83210.1|  MAP kinase
>              619    4e-176
> dbj|BAE46741.1|  Extracellular regulated MAP kinase
> [Bombyx mori]   618    1e-175
> gb|AAH13754.1|  Mapk3 protein [Mus musculus]
>              612    9e-174
> dbj|BAE06412.1|  mitogen-activated protein kinase
> [Ciona intestin   607    2e-172
> dbj|BAE33167.1|  unnamed protein product [Mus
> musculus]             600    3e-170
> gb|AAN46679.1|  MAP kinase [Strongylocentrotus
> purpuratus] >re...   598    1e-169
> dbj|BAC02940.1|  mitogen-activated protein kinase
> [Halocynthia ro   592    6e-168
> gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
> >gb|EAA4631...   590    2e-167
> emb|CAD97888.1|  hypothetical protein [Homo sapiens]
>              589    5e-167
> emb|CAD60453.1|  extracellular signal-regulated
> protein kinase...   589    5e-167
> emb|CAD56894.1|  mitogen-activated protein kinase 1
> [Meloidogyne    589    6e-167
> ref|XP_536917.2|  PREDICTED: similar to mitogen
> activated prot...   588    1e-166
> gb|AAN40736.1|  mitogen-activated protein kinase
> [Paralichthys ol   586    4e-166
> emb|CAE73725.1|  Hypothetical protein CBG21247
> [Caenorhabditis br   583    3e-165
> emb|CAA87057.1|  Hypothetical protein F43C1.2a
> [Caenorhabditis...   581    2e-164
> gb|AAA18956.1|  Sur-1 MAP kinase
>              581    2e-164
> emb|CAB60996.1|  Hypothetical protein F43C1.2b
> [Caenorhabditis...   581    2e-164
> gb|AAK52329.1|  extracellular signal-related kinase 1b
> [Homo sapi   580    4e-164
> ref|XP_885794.1|  PREDICTED: similar to mitogen
> activated prot...   553    4e-156
> ref|XP_868146.1|  PREDICTED: similar to mitogen
> activated prot...   548    2e-154
> gb|AAK52330.1|  extracellular signal-related kinase 1c
> [Homo sapi   546    4e-154
> dbj|BAA22620.1|  ERK2 [Mus musculus]
>              544    2e-153
> ref|XP_510921.1|  PREDICTED: mitogen-activated protein
> kinase 3 [   529    8e-149
> gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]
>              496    7e-139
> emb|CAJ44437.1|  MAP kinase [Echinococcus
> multilocularis]           491    1e-137
> ref|XP_885774.1|  PREDICTED: similar to mitogen
> activated prot...   444    3e-123
> gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
> str. PES...   431    2e-119
> gb|AAZ38881.1|  extracellular regulated kinase
> [Littorina littore   431    2e-119
> emb|CAD60723.1|  unnamed protein product [Podospora
> anserina]       411    2e-113
> gb|AAK25816.1|  MAP kinase [Neurospora crassa]
> >ref|XP_959713....   411    2e-113
> gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
> [Aspergillus ...   409    1e-112
> gb|EAA74589.1|  hypothetical protein FG06385.1
> [Gibberella zea...   409    1e-112
> ref|XP_504312.1|  hypothetical protein [Yarrowia
> lipolytica] >...   408    2e-112
> gb|AAG01162.1|  mitogen-activated protein kinase
> [Fusarium oxy...   408    2e-112
> gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
> >gb|AAK52840.1|...   408    2e-112
> dbj|BAE57584.1|  unnamed protein product [Aspergillus
> oryzae]       408    2e-112
> dbj|BAD42855.1|  mitogen-activated protein kinase
> [Bipolaris oryz   407    3e-112
> gb|AAD50496.1|  mitogen activated protein kinase
> [Colletotrichum    407    3e-112
> gb|AAF05913.1|  mitogen-activated protein kinase
> [Cochliobolus he   407    3e-112
> gb|AAM89501.1|  mitogen-activated protein kinase
> [Leptosphaeria m   407    3e-112
> dbj|BAB21569.1|  mitogen-activated protein kinase
> [Glomerella cin   407    3e-112
> gb|AAB72017.1|  mitogen-activated protein kinase
> [Nectria haem...   407    3e-112
> emb|CAC36428.1|  mitogen activated protein kinase
> [Gibberella fuj   406    6e-112
> ref|XP_364720.1|  hypothetical protein MG09565.4
> [Magnaporthe gri   406    6e-112
> gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]
>              406    6e-112
> gb|AAO63561.1|  mitogen activated protein kinase
> [Verticillium fu   406    8e-112
> dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]
>              405    1e-111
> 
> ALIGNMENTS
> >ref|XP_534770.2| PREDICTED: similar to
> Mitogen-activated protein kinase 1 (Extracellular
> signal-regulated kinase 2) (ERK-2) (Mitogen-activated
> protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
> (ERT1)
> isoform 1 [Canis familiaris]
>  ref|NP_620407.1| mitogen-activated protein kinase 1
> [Homo sapiens]
>  ref|NP_002736.3| mitogen-activated protein kinase 1
> [Homo sapiens]
>  gb|AAH17832.1| Mitogen-activated protein kinase 1
> [Homo sapiens]
>  sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
> 1 (Extracellular signal-regulated
> kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)
> 
> (MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
>  gb|AAA58459.1| protein kinase 2
> Length=360
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >gb|AAX36107.1| mitogen-activated protein kinase 1
> [synthetic construct]
> Length=361
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
> Complexed With A Pyrazolopyridazine
> Derivative
> Length=368
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  9
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  68
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From iamvela at yahoo.com  Wed Feb 22 16:06:54 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 13:06:54 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000c01c637e8$980c6f90$15327e82@pyrimidine>
Message-ID: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>

Thanks Chris. I am getting below mentioned errors with
nmake.

As suggested, I downloaded the nmake utility from
Microsoft website and the bioperl-live tarball.

After untaring, I replaced the blast.pm file (under
bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
size) attached to the bug report 1934.

I then did the following to install packages using
nmake:

1) perl Makefile.pl was successful without any errors.


2) 'c:\nmake' results in following errors

        pl2bat.bat blib\script\bp_unflatten_seq.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.


3) 'c:\nmake test' fails with following errors:

NMAKE : fatal error U1095: expanded command line
'C:\mod_perl\Perl\bin\perl.exe
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib\lib', 'blib\arch')" t\AACh
ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
t\AlignStats.t t\AlignUti
l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
t\Annotation.t t\AnnotationAdapto
r.t t\asciitree.t t\Assembly.t t\Biblio.t
t\Biblio_biofetch.t t\Biblio_eutils.t
t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
t\BioGraphics.t t\BlastIndex.t
 t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
t\Chain.t t\chaosxml.t t\cig
arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
t\Compatible.t t\consed.t
 t\CoordinateGraph.t t\CoordinateMapper.t
t\Correlate.t t\ctf.t t\CytoMap.t t\DB
.t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
t\ECnumber.t t\ELM.t t\embl
.t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
t\entrezgene.t t\ePCR.t t\ESEfind
er.t t\est2genome.t t\Exception.t t\Exonerate.t
t\exp.t t\fasta.t t\FeatureIO.t
t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
t\gcg.t t\GDB.t t\Gel.t t\genba
nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
t\Genomewise.t t\Genpred.t
 t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
t\GuessSeqFormat.t t\hmmer.t t\HNN
.t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
t\InterProParser.t t\IUPAC.
t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
t\largepseq.t t\LinkageMap.t t\L
iveSeq.t t\LocatableSeq.t t\Location.t
t\LocationFactory.t t\LocusLink.t t\lucy.
t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
t\MeSH.t t\metafasta.t t\Me
taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
t\MitoProt.t t\Molphy.t t\Mult
iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
t\NetPhos.t t\Node.t t\OddCo
des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
t\OMIMparser.t t\Ontology.t t\On
tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
t\phd.t t\Phenotype.t t\Phyli
pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
t\pir.t t\pln.t t\PopGen.
t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
t\primedseq.t t\Primer.t t\prime
r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
t\ProtMatrix.t t\ProtPsm.t t\Ps
eudowise.t t\psm.t t\QRNA.t t\qual.t
t\RandDistFunctions.t t\RandomTreeFactory.t
 t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
t\Relationship.t t\Relatio
nshipType.t t\RemoteBlast.t t\RepeatMasker.t
t\RestrictionAnalysis.t t\Restricti
onEnzyme.t t\RestrictionIO.t t\RNAChange.t
t\Root-Utilities.t t\RootI.t t\RootIO
.t t\RootStorable.t t\Scansite.t t\scf.t
t\SearchDist.t t\SearchIO.t t\Seq.t t\s
eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
t\SeqDiff.t t\SeqFeatCollectio
n.t t\SeqFeature.t t\seqfeaturePrimer.t
t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
 t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
t\sequencetrace.t t\SeqUtils.t
 t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
t\Sigcleave.t t\Sim4.t t\Similar
ityPair.t t\SimpleAlign.t t\simpleGOparser.t
t\singlet.t t\sirna.t t\SiteMatrix.
t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
t\splicedseq.t t\StandAloneBlast.t t\
StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
t\TagHaplotype.t t\Taxonom
y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
t\tinyseq.t t\Tools.t t\Tree
.t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
t\tutorial.t t\UCSCParsers.t
 t\Unflattener.t t\Unflattener2.t t\UniGene.t
t\Variation_IO.t t\WABA.t t\XEMBL_
DB.t t\ztr.t' too long
Stop.

C:\bioperl-live\bioperl-live>



4) 'c:\nmake install' results in following errors:

        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.
Appending installation info to
C:\mod_perl\Perl\lib/perllocal.pod
NMAKE : fatal error U1095: expanded command line '@
C:\mod_perl\Perl\bin\perl.ex
e "-MExtUtils::Command::MM" -e perllocal_install 
"Module" "Bio"  "installed int
o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic" 
VERSION "1.5"  EXE_FILES "./
scripts_temp/bp_biblio.pl
./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
./scripts_temp/bp_genbank2gff3.
pl ./scripts_temp/bp_generate_histogram.pl
./scripts_temp/bp_load_gff.pl ./scrip
ts_temp/bp_meta_gff.pl
./scripts_temp/bp_process_gadfly.pl
./scripts_temp/bp_pro
cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
./scripts_temp/bp_embl2picture
.pl ./scripts_temp/bp_glyphs1-demo.pl
./scripts_temp/bp_glyphs2-demo.pl ./script
s_temp/bp_biofetch_genbank_proxy.pl
./scripts_temp/bp_bioflat_index.pl ./scripts
_temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
./scripts_temp/bp_contig_draw.
pl ./scripts_temp/bp_feature_draw.pl
./scripts_temp/bp_frend.pl ./scripts_temp/b
p_search_overview.pl ./scripts_temp/bp_fetch.pl
./scripts_temp/bp_index.pl ./scr
ipts_temp/bp_seqret.pl
./scripts_temp/bp_composite_LD.pl
./scripts_temp/bp_heter
ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
./scripts_temp/bp_filter_
search.pl ./scripts_temp/bp_hmmer_to_table.pl
./scripts_temp/bp_search2table.pl
./scripts_temp/bp_extract_feature_seq.pl
./scripts_temp/bp_make_mrna_protein.pl
./scripts_temp/bp_seqconvert.pl
./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
_translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
./scripts_temp/bp_aacomp.pl
 ./scripts_temp/bp_chaos_plot.pl
./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
ligo_count.pl
./scripts_temp/bp_classify_hits_kingdom.pl
./scripts_temp/bp_local
_taxonomydb_query.pl
./scripts_temp/bp_query_entrez_taxa.pl
./scripts_temp/bp_ta
xid4species.pl ./scripts_temp/bp_blast2tree.pl
./scripts_temp/bp_nexus2nh.pl ./s
cripts_temp/bp_tree2pag.pl
./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
l ./scripts_temp/bp_sreformat.pl
./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
mask_by_search.pl ./scripts_temp/bp_mutate.pl
./scripts_temp/bp_pairwise_kaks.pl
 ./scripts_temp/bp_remote_blast.pl
./scripts_temp/bp_search2alnblocks.pl ./scrip
ts_temp/bp_search2BSML.pl
./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
C:\mod_perl\Perl\lib\perllocal.
pod' too long
Stop.

C:\bioperl-live\bioperl-live>

--- Chris Fields  wrote:

> Upgrade bioperl from CVS using nmake. 
> 
> Installation instructions for using nmake:
> 
>
http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> 
> You can download a tarball using anonymous CVS (link
> at bottom):
> 
>
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> 
> or use CVS directly:
> 
> http://www.bioperl.org/wiki/Using_CVS
> 
> Then make sure to grab the last SearchIO::last
> bugfix, which is not in CVS
> yet:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> 
> Replace the blast.pm in \site\lib\Bio\SearchIO in
> your Perl directory.
> 
> Does that fix it?
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 11:22 AM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Hi All:
> > 
> > I am new to Perl/BioPerl world.
> > 
> > I am debugging a program that used to work fine
> > before.
> > Blast works fine and returns results, but I am
> unale
> > to get any hits from the results.
> > 
> > Here is the relevant code:
> > 
> > $blastObj = new Bio::SearchIO
> (-file=>$resultsFile,
> > -format=>'blast');
> >   while (my $result = $blastObj->next_result()) {
> >      while (my $bioPerlHit = $result->next_hit())
> {
> >          .......
> > 
> > 
> > The first while condition returns true, but the
> second
> > while condition returns false. So looks like there
> is
> > some result, but it is unable to identify the hits
> in
> > the result. I printed the $result (pasted below).
> > 
> > Any ideas/comments to resolve this? Thanks in
> advance.
> > 
> > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> 1.3.34 on
> > Windows XP platform.
> > 
> > Like I said before, this application was running
> fine
> > on a different windows machine with similar
> > environment,so looks like there is some change in
> the
> > products/versions that is causing the problem.
> > 
> > thanks again,
> > Raghu
> > 
> > 
> > 
> > 
> > Blast result (i can send complete result if you
> need
> > it):
> > 
> > 

> > BLASTP 2.2.13 [Nov-27-2005]
> > Reference: Altschul, Stephen F., Thomas L. Madden,
> > Alejandro A. Sch?ffer,
> > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> J.
> > Lipman
> > (1997), "Gapped BLAST and PSI-BLAST: a new
> generation
> > of
> > protein database search programs", Nucleic Acids
> Res.
> > 25:3389-3402.
> > 
> > RID: 1140573059-19990-140117828872.BLASTQ1
> > 
> > 
> > Database: All non-redundant GenBank CDS
> > translations+PDB+SwissProt+PIR+PRF excluding
> > environmental samples
> >            3,297,000 sequences; 1,129,354,045
> total
> > letters
> > Query=
> > Length=360
> > 
> > 
> > 
> >             Score     E
> > Sequences producing significant alignments:
> >             (Bits)  Value
> > 
> > ref|XP_534770.2|  PREDICTED: similar to
> > Mitogen-activated prot...   739    0.0
> > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > [synthetic con   739    0.0
> > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> Erk2
> > Complexed...   739    0.0
> > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> Complex
> > With A S...   739    0.0
> > ref|NP_786987.1|  mitogen-activated protein kinase
> 1
> > [Bos taur...   739    0.0
> > emb|CAA77752.1|  41kD protein kinase [Homo
> sapiens]
> > >prf||1813...   738    0.0
> > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > [synthetic con   736    0.0
> > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > [Homo sapiens]   735    0.0
> > emb|CAI29602.1|  hypothetical protein [Pongo
> pygmaeus]
> >              734    0.0
> > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > [Mus muscul...   731    0.0
> > pdb|4ERK|   The Complex Structure Of The Map
> Kinase
> > Erk2OLOMOU...   731    0.0
> > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> With An
> > Arginin...   730    0.0
> > ref|XP_860750.1|  PREDICTED: similar to
> > Mitogen-activated prot...   729    0.0
> > gb|AAK56503.1|  extracellular signal-regulated
> kinase
> > 2 [Gallu...   726    0.0
> > ref|XP_860716.1|  PREDICTED: similar to
> > Mitogen-activated prot...   726    0.0
> > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> >              726    0.0
> > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> Map
> > Kinase Co...   725    0.0
> > ref|XP_860682.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > ref|XP_860651.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > emb|CAA77753.1|  40kDa protein kinase [Homo
> sapiens]
> > >prf||181...   717    0.0
> > ref|NP_001017127.1|  mitogen-activated protein
> kinase
> > 1 [Xenopus    715    0.0
> > dbj|BAE28679.1|  unnamed protein product [Mus
> > musculus]             713    0.0
> > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > >gb|AAH60748.1| M...   711    0.0
> > sp|P26696|MK01_XENLA  Mitogen-activated protein
> kinase
> > 1 (Myel...   711    0.0
> > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> >              706    0.0
> > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > [Danio rerio]    696    0.0
> > dbj|BAD23843.1|  extracellular signal regulated
> > protein kinase...   694    0.0
> > ref|NP_878308.2|  mitogen-activated protein kinase
> 1
> > [Danio re...   694    0.0
> > emb|CAG07778.1|  unnamed protein product
> [Tetraodon
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed Feb 22 16:55:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 15:55:34 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <001701c637fa$b5110120$15327e82@pyrimidine>

You know, I assumed you were using ActivePerl b/c of the older version of
Bioperl (and since it?s the most commonly used Perl for Windows build).  My
goof.  It looks like you're using Apache/mod_perl/perl, right?  The only
Perl/Apache/mod_perl combos for Windows I know of are listed here:

http://perl.apache.org/docs/2.0/os/win32/install.html

The only Perl for Windows we have actively supported is ActivePerl AFAIK,
but maybe we can walk through this.  Anything learned here can be added to
the installation instructions in case this comes up again.

To start, what mod_perl/Perl version are you using, and from what
distributor (IndigoStar, Apache, etc)?  Each distribution should have some
documentation for installing CPAN modules or prebuilt/pretested packages,
like ActiveState's PPM or IndigoStar's GPM.  I think Apache's Perl build is
from ActiveState's source code so should come with PPM.

Next: you obviously have installed Bioperl before (v1.2.3); did you use
'make' or 'nmake', or was it from a repository (like IndigoPerl's GPM)?
AFAIK, you would install it like you would any other perl module; there
should be no problem with 'make/nmake', though 'make/nmake test' will not
pass completely (it should pass most tests, though, otherwise something is
seriously wrong).

The other option, though not as nice, is setting the PERL5LIB variable to
include the bioperl-live directory; it works for me while I'm developing.  I
don?t know how this may affect other mod_perl-related functions, though.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Wednesday, February 22, 2006 3:07 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris. I am getting below mentioned errors with
> nmake.
> 
> As suggested, I downloaded the nmake utility from
> Microsoft website and the bioperl-live tarball.
> 
> After untaring, I replaced the blast.pm file (under
> bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
> size) attached to the bug report 1934.
> 
> I then did the following to install packages using
> nmake:
> 
> 1) perl Makefile.pl was successful without any errors.
> 
> 
> 2) 'c:\nmake' results in following errors
> 
>         pl2bat.bat blib\script\bp_unflatten_seq.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> 
> 
> 3) 'c:\nmake test' fails with following errors:
> 
> NMAKE : fatal error U1095: expanded command line
> 'C:\mod_perl\Perl\bin\perl.exe
> "-MExtUtils::Command::MM" "-e" "test_harness(0,
> 'blib\lib', 'blib\arch')" t\AACh
> ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
> t\AlignStats.t t\AlignUti
> l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> t\Annotation.t t\AnnotationAdapto
> r.t t\asciitree.t t\Assembly.t t\Biblio.t
> t\Biblio_biofetch.t t\Biblio_eutils.t
> t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> t\BioGraphics.t t\BlastIndex.t
>  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
> t\Chain.t t\chaosxml.t t\cig
> arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
> t\Compatible.t t\consed.t
>  t\CoordinateGraph.t t\CoordinateMapper.t
> t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
> t\ECnumber.t t\ELM.t t\embl
> .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> t\entrezgene.t t\ePCR.t t\ESEfind
> er.t t\est2genome.t t\Exception.t t\Exonerate.t
> t\exp.t t\fasta.t t\FeatureIO.t
> t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> t\gcg.t t\GDB.t t\Gel.t t\genba
> nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
> t\Genomewise.t t\Genpred.t
>  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> t\GuessSeqFormat.t t\hmmer.t t\HNN
> .t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
> t\InterProParser.t t\IUPAC.
> t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> t\largepseq.t t\LinkageMap.t t\L
> iveSeq.t t\LocatableSeq.t t\Location.t
> t\LocationFactory.t t\LocusLink.t t\lucy.
> t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
> t\MeSH.t t\metafasta.t t\Me
> taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> t\MitoProt.t t\Molphy.t t\Mult
> iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
> t\NetPhos.t t\Node.t t\OddCo
> des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> t\OMIMparser.t t\Ontology.t t\On
> tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> t\phd.t t\Phenotype.t t\Phyli
> pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
> t\pir.t t\pln.t t\PopGen.
> t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> t\primedseq.t t\Primer.t t\prime
> r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> t\ProtMatrix.t t\ProtPsm.t t\Ps
> eudowise.t t\psm.t t\QRNA.t t\qual.t
> t\RandDistFunctions.t t\RandomTreeFactory.t
>  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
> t\Relationship.t t\Relatio
> nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> t\RestrictionAnalysis.t t\Restricti
> onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> t\Root-Utilities.t t\RootI.t t\RootIO
> .t t\RootStorable.t t\Scansite.t t\scf.t
> t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> t\SeqDiff.t t\SeqFeatCollectio
> n.t t\SeqFeature.t t\seqfeaturePrimer.t
> t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
>  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> t\sequencetrace.t t\SeqUtils.t
>  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> t\Sigcleave.t t\Sim4.t t\Similar
> ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> t\singlet.t t\sirna.t t\SiteMatrix.
> t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
> t\splicedseq.t t\StandAloneBlast.t t\
> StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
> t\TagHaplotype.t t\Taxonom
> y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
> t\tinyseq.t t\Tools.t t\Tree
> .t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
> t\tutorial.t t\UCSCParsers.t
>  t\Unflattener.t t\Unflattener2.t t\UniGene.t
> t\Variation_IO.t t\WABA.t t\XEMBL_
> DB.t t\ztr.t' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> 
> 
> 4) 'c:\nmake install' results in following errors:
> 
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> Appending installation info to
> C:\mod_perl\Perl\lib/perllocal.pod
> NMAKE : fatal error U1095: expanded command line '@
> C:\mod_perl\Perl\bin\perl.ex
> e "-MExtUtils::Command::MM" -e perllocal_install
> "Module" "Bio"  "installed int
> o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic"
> VERSION "1.5"  EXE_FILES "./
> scripts_temp/bp_biblio.pl
> ./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
> k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
> ./scripts_temp/bp_genbank2gff3.
> pl ./scripts_temp/bp_generate_histogram.pl
> ./scripts_temp/bp_load_gff.pl ./scrip
> ts_temp/bp_meta_gff.pl
> ./scripts_temp/bp_process_gadfly.pl
> ./scripts_temp/bp_pro
> cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
> ./scripts_temp/bp_embl2picture
> .pl ./scripts_temp/bp_glyphs1-demo.pl
> ./scripts_temp/bp_glyphs2-demo.pl ./script
> s_temp/bp_biofetch_genbank_proxy.pl
> ./scripts_temp/bp_bioflat_index.pl ./scripts
> _temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
> ./scripts_temp/bp_contig_draw.
> pl ./scripts_temp/bp_feature_draw.pl
> ./scripts_temp/bp_frend.pl ./scripts_temp/b
> p_search_overview.pl ./scripts_temp/bp_fetch.pl
> ./scripts_temp/bp_index.pl ./scr
> ipts_temp/bp_seqret.pl
> ./scripts_temp/bp_composite_LD.pl
> ./scripts_temp/bp_heter
> ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
> ./scripts_temp/bp_filter_
> search.pl ./scripts_temp/bp_hmmer_to_table.pl
> ./scripts_temp/bp_search2table.pl
> ./scripts_temp/bp_extract_feature_seq.pl
> ./scripts_temp/bp_make_mrna_protein.pl
> ./scripts_temp/bp_seqconvert.pl
> ./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
> _translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
> ./scripts_temp/bp_aacomp.pl
>  ./scripts_temp/bp_chaos_plot.pl
> ./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
> ligo_count.pl
> ./scripts_temp/bp_classify_hits_kingdom.pl
> ./scripts_temp/bp_local
> _taxonomydb_query.pl
> ./scripts_temp/bp_query_entrez_taxa.pl
> ./scripts_temp/bp_ta
> xid4species.pl ./scripts_temp/bp_blast2tree.pl
> ./scripts_temp/bp_nexus2nh.pl ./s
> cripts_temp/bp_tree2pag.pl
> ./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
> l ./scripts_temp/bp_sreformat.pl
> ./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
> mask_by_search.pl ./scripts_temp/bp_mutate.pl
> ./scripts_temp/bp_pairwise_kaks.pl
>  ./scripts_temp/bp_remote_blast.pl
> ./scripts_temp/bp_search2alnblocks.pl ./scrip
> ts_temp/bp_search2BSML.pl
> ./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
> ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
> C:\mod_perl\Perl\lib\perllocal.
> pod' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> --- Chris Fields  wrote:
> 
> > Upgrade bioperl from CVS using nmake.
> >
> > Installation instructions for using nmake:
> >
> >
> http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> >
> > You can download a tarball using anonymous CVS (link
> > at bottom):
> >
> >
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >
> > or use CVS directly:
> >
> > http://www.bioperl.org/wiki/Using_CVS
> >
> > Then make sure to grab the last SearchIO::last
> > bugfix, which is not in CVS
> > yet:
> >
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >
> > Replace the blast.pm in \site\lib\Bio\SearchIO in
> > your Perl directory.
> >
> > Does that fix it?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 11:22 AM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > Hi All:
> > >
> > > I am new to Perl/BioPerl world.
> > >
> > > I am debugging a program that used to work fine
> > > before.
> > > Blast works fine and returns results, but I am
> > unale
> > > to get any hits from the results.
> > >
> > > Here is the relevant code:
> > >
> > > $blastObj = new Bio::SearchIO
> > (-file=>$resultsFile,
> > > -format=>'blast');
> > >   while (my $result = $blastObj->next_result()) {
> > >      while (my $bioPerlHit = $result->next_hit())
> > {
> > >          .......
> > >
> > >
> > > The first while condition returns true, but the
> > second
> > > while condition returns false. So looks like there
> > is
> > > some result, but it is unable to identify the hits
> > in
> > > the result. I printed the $result (pasted below).
> > >
> > > Any ideas/comments to resolve this? Thanks in
> > advance.
> > >
> > > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> > 1.3.34 on
> > > Windows XP platform.
> > >
> > > Like I said before, this application was running
> > fine
> > > on a different windows machine with similar
> > > environment,so looks like there is some change in
> > the
> > > products/versions that is causing the problem.
> > >
> > > thanks again,
> > > Raghu
> > >
> > >
> > >
> > >
> > > Blast result (i can send complete result if you
> > need
> > > it):
> > >
> > > 

> > > BLASTP 2.2.13 [Nov-27-2005]
> > > Reference: Altschul, Stephen F., Thomas L. Madden,
> > > Alejandro A. Sch?ffer,
> > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> > J.
> > > Lipman
> > > (1997), "Gapped BLAST and PSI-BLAST: a new
> > generation
> > > of
> > > protein database search programs", Nucleic Acids
> > Res.
> > > 25:3389-3402.
> > >
> > > RID: 1140573059-19990-140117828872.BLASTQ1
> > >
> > >
> > > Database: All non-redundant GenBank CDS
> > > translations+PDB+SwissProt+PIR+PRF excluding
> > > environmental samples
> > >            3,297,000 sequences; 1,129,354,045
> > total
> > > letters
> > > Query=
> > > Length=360
> > >
> > >
> > >
> > >             Score     E
> > > Sequences producing significant alignments:
> > >             (Bits)  Value
> > >
> > > ref|XP_534770.2|  PREDICTED: similar to
> > > Mitogen-activated prot...   739    0.0
> > > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > > [synthetic con   739    0.0
> > > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> > Erk2
> > > Complexed...   739    0.0
> > > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> > Complex
> > > With A S...   739    0.0
> > > ref|NP_786987.1|  mitogen-activated protein kinase
> > 1
> > > [Bos taur...   739    0.0
> > > emb|CAA77752.1|  41kD protein kinase [Homo
> > sapiens]
> > > >prf||1813...   738    0.0
> > > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > > [synthetic con   736    0.0
> > > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > > [Homo sapiens]   735    0.0
> > > emb|CAI29602.1|  hypothetical protein [Pongo
> > pygmaeus]
> > >              734    0.0
> > > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > > [Mus muscul...   731    0.0
> > > pdb|4ERK|   The Complex Structure Of The Map
> > Kinase
> > > Erk2OLOMOU...   731    0.0
> > > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> > With An
> > > Arginin...   730    0.0
> > > ref|XP_860750.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   729    0.0
> > > gb|AAK56503.1|  extracellular signal-regulated
> > kinase
> > > 2 [Gallu...   726    0.0
> > > ref|XP_860716.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   726    0.0
> > > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> > >              726    0.0
> > > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> > Map
> > > Kinase Co...   725    0.0
> > > ref|XP_860682.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > ref|XP_860651.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > emb|CAA77753.1|  40kDa protein kinase [Homo
> > sapiens]
> > > >prf||181...   717    0.0
> > > ref|NP_001017127.1|  mitogen-activated protein
> > kinase
> > > 1 [Xenopus    715    0.0
> > > dbj|BAE28679.1|  unnamed protein product [Mus
> > > musculus]             713    0.0
> > > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > > >gb|AAH60748.1| M...   711    0.0
> > > sp|P26696|MK01_XENLA  Mitogen-activated protein
> > kinase
> > > 1 (Myel...   711    0.0
> > > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> > >              706    0.0
> > > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > > [Danio rerio]    696    0.0
> > > dbj|BAD23843.1|  extracellular signal regulated
> > > protein kinase...   694    0.0
> > > ref|NP_878308.2|  mitogen-activated protein kinase
> > 1
> > > [Danio re...   694    0.0
> > > emb|CAG07778.1|  unnamed protein product
> > [Tetraodon
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com




From iamvela at yahoo.com  Wed Feb 22 17:32:08 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 14:32:08 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <001701c637fa$b5110120$15327e82@pyrimidine>
Message-ID: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>

Chris,

Please see my response below.

--- Chris Fields  wrote:

> You know, I assumed you were using ActivePerl b/c of
> the older version of
> Bioperl (and since it?s the most commonly used Perl
> for Windows build).  My
> goof.  It looks like you're using
> Apache/mod_perl/perl, right?  The only
> Perl/Apache/mod_perl combos for Windows I know of
> are listed here:


I am using ActivePerl 5.8.7 downloaded from
activeperl.com. I just happened to install it under
c:\mod_perl\Perl directory (application has hardcoded
dependencies for this directory). I am not using
apache/mod_perl/perl.

Please see below version string returned by perl
exectutable.

 
C:\bioperl-live\bioperl-live>perl -version

This is perl, v5.8.7 built for
MSWin32-x86-multi-thread
(with 14 registered patches, see perl -V for more
detail)

Copyright 1987-2005, Larry Wall

Binary build 815 [211909] provided by ActiveState
http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Nov  2 2005 08:44:52


> 
>
http://perl.apache.org/docs/2.0/os/win32/install.html
> 
> The only Perl for Windows we have actively supported
> is ActivePerl AFAIK,
> but maybe we can walk through this.  Anything
> learned here can be added to
> the installation instructions in case this comes up
> again.
> 
> To start, what mod_perl/Perl version are you using,
> and from what
> distributor (IndigoStar, Apache, etc)?  Each
> distribution should have some
> documentation for installing CPAN modules or
> prebuilt/pretested packages,
> like ActiveState's PPM or IndigoStar's GPM.  I think
> Apache's Perl build is
> from ActiveState's source code so should come with
> PPM.
> 



I used 'ppm' to install packages (DBI, Oracle-DBD,
bioperl etc) before, so this is the first time I tried
to install it using 'nmake' utility.

After downloading the latest bioperl tar ball and
replacing the blast.pm file, can I just do ppm install
bioperl instead of doing nmake?


> Next: you obviously have installed Bioperl before
> (v1.2.3); did you use
> 'make' or 'nmake', or was it from a repository (like
> IndigoPerl's GPM)?
> AFAIK, you would install it like you would any other
> perl module; there
> should be no problem with 'make/nmake', though
> 'make/nmake test' will not
> pass completely (it should pass most tests, though,
> otherwise something is
> seriously wrong).
> 
> The other option, though not as nice, is setting the
> PERL5LIB variable to
> include the bioperl-live directory; it works for me
> while I'm developing. 

I tried setting PERL5LIB, but it did not make any
difference. I am still getting the same errors.


I wanted to a clean install, i tried 'nmake clean',
but looks like there is no 'rm' utility installed on
my machine.

thanks for all your help,
Raghu

> I
> don?t know how this may affect other
> mod_perl-related functions, though.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Raghunath Verabelli
> [mailto:iamvela at yahoo.com]
> > Sent: Wednesday, February 22, 2006 3:07 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Thanks Chris. I am getting below mentioned errors
> with
> > nmake.
> > 
> > As suggested, I downloaded the nmake utility from
> > Microsoft website and the bioperl-live tarball.
> > 
> > After untaring, I replaced the blast.pm file
> (under
> > bioperl-live\Bio\SearchIO) with the blast.pm (86
> KB
> > size) attached to the bug report 1934.
> > 
> > I then did the following to install packages using
> > nmake:
> > 
> > 1) perl Makefile.pl was successful without any
> errors.
> > 
> > 
> > 2) 'c:\nmake' results in following errors
> > 
> >         pl2bat.bat blib\script\bp_unflatten_seq.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_taxid4species.pl blib\script\bp_taxid4species.pl
> >         pl2bat.bat blib\script\bp_taxid4species.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_seqret.pl blib\script\bp_seqret.pl
> >         pl2bat.bat blib\script\bp_seqret.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioscripts.pod
> > Can't open bioscripts.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodatabases.pod
> > Can't open biodatabases.pod: No such file or
> > directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodesign.pod
> > Can't open biodesign.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioperl.pod
> > Can't open bioperl.pod: No such file or directory.
> > 
> > 
> > 3) 'c:\nmake test' fails with following errors:
> > 
> > NMAKE : fatal error U1095: expanded command line
> > 'C:\mod_perl\Perl\bin\perl.exe
> > "-MExtUtils::Command::MM" "-e" "test_harness(0,
> > 'blib\lib', 'blib\arch')" t\AACh
> > ange.t t\AAReverseMutate.t t\abi.t t\ace.t
> t\AlignIO.t
> > t\AlignStats.t t\AlignUti
> > l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> > t\Annotation.t t\AnnotationAdapto
> > r.t t\asciitree.t t\Assembly.t t\Biblio.t
> > t\Biblio_biofetch.t t\Biblio_eutils.t
> > t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> > t\BioGraphics.t t\BlastIndex.t
> >  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t
> t\bsml_sax.t
> > t\Chain.t t\chaosxml.t t\cig
> > arstring.t t\ClusterIO.t t\Coalescent.t
> t\CodonTable.t
> > t\Compatible.t t\consed.t
> >  t\CoordinateGraph.t t\CoordinateMapper.t
> > t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> > .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t
> t\Domcut.t
> > t\ECnumber.t t\ELM.t t\embl
> > .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> > t\entrezgene.t t\ePCR.t t\ESEfind
> > er.t t\est2genome.t t\Exception.t t\Exonerate.t
> > t\exp.t t\fasta.t t\FeatureIO.t
> > t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> > t\gcg.t t\GDB.t t\Gel.t t\genba
> > nk.t t\GeneCoordinateMapper.t t\Geneid.t
> t\Genewise.t
> > t\Genomewise.t t\Genpred.t
> >  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> > t\GuessSeqFormat.t t\hmmer.t t\HNN
> > .t t\HtSNP.t t\Index.t t\InstanceSite.t
> t\interpro.t
> > t\InterProParser.t t\IUPAC.
> > t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> > t\largepseq.t t\LinkageMap.t t\L
> > iveSeq.t t\LocatableSeq.t t\Location.t
> > t\LocationFactory.t t\LocusLink.t t\lucy.
> > t t\Map.t t\MapIO.t t\masta.t t\Matrix.t
> t\Measure.t
> > t\MeSH.t t\metafasta.t t\Me
> > taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> > t\MitoProt.t t\Molphy.t t\Mult
> > iFile.t t\multiple_fasta.t t\Mutation.t
> t\Mutator.t
> > t\NetPhos.t t\Node.t t\OddCo
> > des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> > t\OMIMparser.t t\Ontology.t t\On
> > tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> > t\phd.t t\Phenotype.t t\Phyli
> > pDist.t t\PhysicalMap.t t\pICalculator.t
> t\Pictogram.t
> > t\pir.t t\pln.t t\PopGen.
> > t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> > t\primedseq.t t\Primer.t t\prime
> > r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> > t\ProtMatrix.t t\ProtPsm.t t\Ps
> > eudowise.t t\psm.t t\QRNA.t t\qual.t
> > t\RandDistFunctions.t t\RandomTreeFactory.t
> >  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t
> t\Registry.t
> > t\Relationship.t t\Relatio
> > nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> > t\RestrictionAnalysis.t t\Restricti
> > onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> > t\Root-Utilities.t t\RootI.t t\RootIO
> > .t t\RootStorable.t t\Scansite.t t\scf.t
> > t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> > eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> > t\SeqDiff.t t\SeqFeatCollectio
> > n.t t\SeqFeature.t t\seqfeaturePrimer.t
> > t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
> >  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> > t\sequencetrace.t t\SeqUtils.t
> >  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> > t\Sigcleave.t t\Sim4.t t\Similar
> > ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed Feb 22 19:02:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 18:02:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>
Message-ID: <002101c6380c$75910880$15327e82@pyrimidine>

> 
> I am using ActivePerl 5.8.7 downloaded from
> activeperl.com. I just happened to install it under
> c:\mod_perl\Perl directory (application has hardcoded
> dependencies for this directory). I am not using
> apache/mod_perl/perl.
> 
> Please see below version string returned by perl
> exectutable.
> 
> 
> C:\bioperl-live\bioperl-live>perl -version
> 
> This is perl, v5.8.7 built for
> MSWin32-x86-multi-thread
> (with 14 registered patches, see perl -V for more
> detail)
> 
> Copyright 1987-2005, Larry Wall
> 
> Binary build 815 [211909] provided by ActiveState
> http://www.ActiveState.com
> ActiveState is a division of Sophos.
> Built Nov  2 2005 08:44:52
 
When you type 'perl -V' what do you see (make sure it is a capital 'V', not
lower case).

> http://perl.apache.org/docs/2.0/os/win32/install.html
> >
> > The only Perl for Windows we have actively supported
> > is ActivePerl AFAIK,
> > but maybe we can walk through this.  Anything
> > learned here can be added to
> > the installation instructions in case this comes up
> > again.
> >
> I used 'ppm' to install packages (DBI, Oracle-DBD,
> bioperl etc) before, so this is the first time I tried
> to install it using 'nmake' utility.
>
> After downloading the latest bioperl tar ball and
> replacing the blast.pm file, can I just do ppm install
> bioperl instead of doing nmake?

Okay, so I know you're using PPM now.  No, you can't do that.  I'm adding a
section to this page:

http://bioperl.open-bio.org/wiki/Making_a_BioPerl_release

about building your own PPM; it will explain everything.  It isn't up yet
but should be up tonight or tomorrow.  BTW, you'll still need nmake to work
for this to work.  Again, make sure nmake is in your PATH env variable, or
at least have it in the same directory you plan running 'nmake', 'nmake
install.'  Although nmake is buggy I haven't had a problem with it yet.
 
> > Next: you obviously have installed Bioperl before
> > (v1.2.3); did you use
> > 'make' or 'nmake', or was it from a repository (like
> > IndigoPerl's GPM)?
> > AFAIK, you would install it like you would any other
> > perl module; there
> > should be no problem with 'make/nmake', though
> > 'make/nmake test' will not
> > pass completely (it should pass most tests, though,
> > otherwise something is
> > seriously wrong).
> >
> > The other option, though not as nice, is setting the
> > PERL5LIB variable to
> > include the bioperl-live directory; it works for me
> > while I'm developing.
> 
> I tried setting PERL5LIB, but it did not make any
> difference. I am still getting the same errors.
 
Do you mean the errors from nmake or errors from your scripts?  If PERL5LIB
is set properly then it should parse those directories for modules before it
checks the rest in @INC (i.e. will not need to make and install these using
nmake).  

The reason I don't recommend this is it's not the best habit to get into
installing the entire Bioperl distribution into a folder and using PERL5LIB,
but some are forced to do it this way, so it's there if you need it.  A
direct installation is recommended if possible.

The PERL5LIB I use below only contains modules I'm working on or
modifications of current modules (like SearchIO::blast, RemoteBlast, etc).
Bioperl from CVS is installed via PPM (custom-built PPM, BTW, using the
instructions I mentioned).  

The following is what my PERL5LIB is set to.  Note that it also tells you
what @INC is set to as well:

C:\Perl\src\bioperl\bioperl-live>perl -V
Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef useithreads=define 



  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="C:/Perl/src/bioperl/bioperl-live;
C:/Perl/src/bioperl/bioperl-db"
  @INC:
    C:/Perl/src/bioperl/bioperl-live
     C:/Perl/src/bioperl/bioperl-db
    C:/Perl/lib
    C:/Perl/site/lib
    .



From iamvela at yahoo.com  Wed Feb 22 21:25:02 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 18:25:02 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <002101c6380c$75910880$15327e82@pyrimidine>
Message-ID: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>


Thanks very much Chris for your time.
Please see below output that you requested (the only
difference i saw between your output and mine is @INC
value. I have only 2 directories c:\mod_perl\perl
where i installed activeperl. I see two additional
directories in your @INC path).

>  
> When you type 'perl -V' what do you see (make sure
> it is a capital 'V', not
> lower case).

C:\Documents and Settings\Administrator>perl  -V
Summary of my perl5 (revision 5 version 8 subversion
7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0,
archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef
useithreads=define usemultiplicity=de
fine
    useperlio=define d_sfio=undef uselargefiles=define
usesocks=undef
    use64bitint=undef use64bitall=undef
uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi
-DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED
-DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_
CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO
-DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8,
byteorder=1234
    d_longlong=undef, longlongsize=8,
d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double',
nvsize=8, Off_t='__int64', lseeksi
ze=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug
-opt:ref,icf  -libpath:"C:
\mod_perl\Perl\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib 
netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comd
lg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib  netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes,
libperl=perl58.lib
    gnulibc_version='undef'
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef,
ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo
-nodefaultlib -debug -opt:ref,icf  -
libpath:"C:\mod_perl\Perl\lib\CORE"  -machine:x86'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES
                        USE_SITECUSTOMIZE
PERL_IMPLICIT_CONTEXT
                        PERL_IMPLICIT_SYS
  Locally applied patches:
        ActivePerl Build 815 [211909]
        Iin_load_module moved for compatibility with
build 806
        PerlEx support in CGI::Carp
        Less verbose ExtUtils::Install and Pod::Find
        instmodsh upgraded from
ExtUtils-MakeMaker-6.25
        Patch for CAN-2005-0448 from Debian with
modifications
        Upgrade to Time-HiRes-1.76
        25774 Keys of %INC always use forward slashes
        25747 Accidental interpolation of $@ in
Pod::Html
        25362 File::Path::mkpath resets errno
        25181 Incorrect (X)HTML generated by Pod::Html
        24999 Avoid redefinition warning for MinGW
        24699 ICMP_UNREACHABLE handling in Net::Ping
        21540 Fix backward-compatibility issues in
if.pm
  Built under MSWin32
  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="c:\bioperl-live"
  @INC:
    c:\bioperl-live
    C:/mod_perl/Perl/lib
    C:/mod_perl/Perl/site/lib
    .



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From michael.watson at bbsrc.ac.uk  Thu Feb 23 05:17:39 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 23 Feb 2006 10:17:39 -0000
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503008306@iahce2ksrv1.iah.bbsrc.ac.uk>

What I mean is, you have accession1, which is a contig file referring to
n other sequence files.  Accession1 has a version number.  Is that
version number increased when one of the sequences that constitute it is
updated? 

-----Original Message-----
From: Brian Osborne [mailto:osborne1 at optonline.net] 
Sent: 18 February 2006 04:56
To: michael watson (IAH-C); bioperl-l
Subject: Re: [Bioperl-l] CONTIG sequence files from the NCBI

Michael,

Yes, BioPerl has done this for you. Essentially what it does it take all
the ids in the CONTIG section and query for each individually, then use
the sequences and the location data to create the single large sequence.
This sequence is appended to the annotation and feature section of the
initial Genbank entry. If you want to study this yourself take a look at
Bio::DB::NCBIHelper::postprocess_data.

OK, to answer your first question with my assumption: what NCBI is doing
is simply providing a shorthand rather than an entire large sequence,
therefore no feature coordinates change, whether it's shorthand, CONTIG,
or longhand, ORIGIN. Second, my explanation tells you that all the
sequences are the very latest versions of each sequence, that's how
eutils works by default.
However, I don't think I've answered your question because I'm not sure
I understand what you mean by "when I ask bioperl if these sequences
have been updated, I will be told no". All Bioperl does is read the file
provided by GenBank and use its stated version, nothing fancy.

Brian O.


On 2/16/06 5:31 AM, "michael watson (IAH-C)"

wrote:

> Hi
> 
> I have two questions really.  I fetched bacterial genome sequences 
> from the NCBI using Bio::DB::GenBank.
> 
> Some of these sequence entries are CONTIG sequences, ie they just 
> point to other sequences that need to be joined together to form the 
> entire genome.
> 
> Looking at my downloads, it looks as if bioperl has done all the 
> necessary joining for me - or maybe it was the NCBI that did the 
> joining?
> 
> OK, so firstly, did bioperl do the joining, and if so, are all the 
> co-ordinates of the features updated to reflect their new location on 
> the new, joined sequence?
> 
> And secondly, sequence versions... I'm thinking that possibly the 
> sequence version of the CONTIG may be 1 (as it hasn't changed) yet the

> versions of the sequences it refers to might have changed, so when I 
> ask bioperl if these sequences have been updated, I will be told no 
> because the CONTIG sequence version is 1, but I should be told yes 
> because the underlying sequences have...?
> 
> Make sense?
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From neetisomaiya at gmail.com  Thu Feb 23 05:26:23 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:56:23 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <764978cf0602230226vb907821x5407599bf9accf44@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 05:45:19 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 16:15:19 +0530
Subject: [Bioperl-l] using parameters other than default in standalone blast
Message-ID: <764978cf0602230245m45747fexbb42074a98515177@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlas t->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 05:14:46 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:44:46 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230214r4b2a5efcl69ac207789379416@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 05:13:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:43:10 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive



From cjfields at uiuc.edu  Thu Feb 23 09:39:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 08:39:40 -0600
Subject: [Bioperl-l] urgent help required - syntax for using
	paramatersdifferent from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <000301c63886$fa95eb20$15327e82@pyrimidine>

Have you tried this to see if it works?  The blast report itself should tell
you if everything is set correctly.  Use 'perldoc
Bio::Tools::Run::StandAlone::Blast', which explains everything.  I don't
know if the example script works but the test script StandAloneBlast.t (in
/t) should; that will give you plenty of examples for setting parameters.

And please, don't spam the bioperl-l list with repeated emails (four at last
count over 2 1/2 hours).
 
Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: Thursday, February 23, 2006 4:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] urgent help required - syntax for using
> paramatersdifferent from default in standalone blast
> 
> Hi,
> 
> I am running standalone blast and I wanna use a particular e value, gap
> open
> and extension cost and matrix. Is the following the correct syntax for the
> same :
> 
>                                 my $Seq_in = Bio::SeqIO->new (-file =>
> $file, -format => 'fasta');
>                                 my $query = $Seq_in->next_seq();
>                                 my $factory =
> Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
>                                                  'database' => '
> human.rna.fna',
>                                                  _READMETHOD => "Blast"
>                                                  );
>                                 $factory->e(0.0001);
>                                 $factory->G(-11);
>                                 $factory->E(-1);
>                                 $factory->M('BLOSUM80');
> 
>                                 my $blast_report =
> $factory->blastall($query);
>                                 my $result = $blast_report->next_result;
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Thu Feb 23 10:23:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 09:23:53 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>
Message-ID: <000a01c6388d$281ed010$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 8:25 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> 
> Thanks very much Chris for your time.
> Please see below output that you requested (the only
> difference i saw between your output and mine is @INC
> value. I have only 2 directories c:\mod_perl\perl
> where i installed activeperl. I see two additional
> directories in your @INC path).
> 
> >
> > When you type 'perl -V' what do you see (make sure
> > it is a capital 'V', not
> > lower case).
> 
> C:\Documents and Settings\Administrator>perl  -V
> Summary of my perl5 (revision 5 version 8 subversion
> 7) configuration:
>   Platform:
>     osname=MSWin32, osvers=5.0,
> archname=MSWin32-x86-multi-thread

[....]

> if.pm
>   Built under MSWin32
>   Compiled at Nov  2 2005 08:44:52
>   %ENV:
>     PERL5LIB="c:\bioperl-live"
>   @INC:
>     c:\bioperl-live
>     C:/mod_perl/Perl/lib
>     C:/mod_perl/Perl/site/lib
>     .

Personally I wouldn't place the the bioperl-live folder in the root
directory; this shouldn't make a difference, but you can try moving it to
the perl directory in a separate folder to see if that helps.  Can't see why
it would make a difference, but it is Windows... Main reason I'll switching
over to Mac OS X!

Make sure that the Bio directory is in the bioperl-live directory,
regardless (i.e. if PERL5LIB is set to
C:\mod_perl\Perl\bioperl\bioperl-live, then there should be a directory like
C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it won't work.

What do you get with this?

perl -MBio::Root::Version -e "print $Bio::Root::Version::VERSION"

If everything is working (PERL5LIB, etc) then it should be 1.5 for CVS
bioperl; otherwise it will either find the old version (1.2.3) or fail
completely.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From iamvela at yahoo.com  Thu Feb 23 11:23:56 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 08:23:56 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000a01c6388d$281ed010$15327e82@pyrimidine>
Message-ID: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>

Thanks Chris for all your help.

The patch for blast.pm worked. I was able to parse the
hits from the raw file. I uninstalled previous
versions of bioperl using ppm and then I installed
bioperl 1.4.x using nmake, and applied your fix. I am
getting hits the way I wanted.

However, I noticed that the p-value for each hit
doesn't seem to be parsed
correctly. It sets it to 0 for all hits. Not sure if
this is a known issue. Any suggestions/comments,
please let me know.

Thanks,
Raghu

--- Chris Fields  wrote:

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 8:25 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > 
> > Thanks very much Chris for your time.
> > Please see below output that you requested (the
> only
> > difference i saw between your output and mine is
> @INC
> > value. I have only 2 directories c:\mod_perl\perl
> > where i installed activeperl. I see two additional
> > directories in your @INC path).
> > 
> > >
> > > When you type 'perl -V' what do you see (make
> sure
> > > it is a capital 'V', not
> > > lower case).
> > 
> > C:\Documents and Settings\Administrator>perl  -V
> > Summary of my perl5 (revision 5 version 8
> subversion
> > 7) configuration:
> >   Platform:
> >     osname=MSWin32, osvers=5.0,
> > archname=MSWin32-x86-multi-thread
> 
> [....]
> 
> > if.pm
> >   Built under MSWin32
> >   Compiled at Nov  2 2005 08:44:52
> >   %ENV:
> >     PERL5LIB="c:\bioperl-live"
> >   @INC:
> >     c:\bioperl-live
> >     C:/mod_perl/Perl/lib
> >     C:/mod_perl/Perl/site/lib
> >     .
> 
> Personally I wouldn't place the the bioperl-live
> folder in the root
> directory; this shouldn't make a difference, but you
> can try moving it to
> the perl directory in a separate folder to see if
> that helps.  Can't see why
> it would make a difference, but it is Windows...
> Main reason I'll switching
> over to Mac OS X!
> 
> Make sure that the Bio directory is in the
> bioperl-live directory,
> regardless (i.e. if PERL5LIB is set to
> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> should be a directory like
> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> won't work.
> 
> What do you get with this?
> 
> perl -MBio::Root::Version -e "print
> $Bio::Root::Version::VERSION"
> 
> If everything is working (PERL5LIB, etc) then it
> should be 1.5 for CVS
> bioperl; otherwise it will either find the old
> version (1.2.3) or fail
> completely.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Thu Feb 23 12:41:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 11:41:07 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>
Message-ID: <000301c638a0$53eb9a30$15327e82@pyrimidine>

Yes that's a potential issue.  I'll try to replicate that here; please send
a code example so I can see how you're calling for the p-value.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Thursday, February 23, 2006 10:24 AM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris for all your help.
> 
> The patch for blast.pm worked. I was able to parse the
> hits from the raw file. I uninstalled previous
> versions of bioperl using ppm and then I installed
> bioperl 1.4.x using nmake, and applied your fix. I am
> getting hits the way I wanted.
> 
> However, I noticed that the p-value for each hit
> doesn't seem to be parsed
> correctly. It sets it to 0 for all hits. Not sure if
> this is a known issue. Any suggestions/comments,
> please let me know.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > >
> > > Thanks very much Chris for your time.
> > > Please see below output that you requested (the
> > only
> > > difference i saw between your output and mine is
> > @INC
> > > value. I have only 2 directories c:\mod_perl\perl
> > > where i installed activeperl. I see two additional
> > > directories in your @INC path).
> > >
> > > >
> > > > When you type 'perl -V' what do you see (make
> > sure
> > > > it is a capital 'V', not
> > > > lower case).
> > >
> > > C:\Documents and Settings\Administrator>perl  -V
> > > Summary of my perl5 (revision 5 version 8
> > subversion
> > > 7) configuration:
> > >   Platform:
> > >     osname=MSWin32, osvers=5.0,
> > > archname=MSWin32-x86-multi-thread
> >
> > [....]
> >
> > > if.pm
> > >   Built under MSWin32
> > >   Compiled at Nov  2 2005 08:44:52
> > >   %ENV:
> > >     PERL5LIB="c:\bioperl-live"
> > >   @INC:
> > >     c:\bioperl-live
> > >     C:/mod_perl/Perl/lib
> > >     C:/mod_perl/Perl/site/lib
> > >     .
> >
> > Personally I wouldn't place the the bioperl-live
> > folder in the root
> > directory; this shouldn't make a difference, but you
> > can try moving it to
> > the perl directory in a separate folder to see if
> > that helps.  Can't see why
> > it would make a difference, but it is Windows...
> > Main reason I'll switching
> > over to Mac OS X!
> >
> > Make sure that the Bio directory is in the
> > bioperl-live directory,
> > regardless (i.e. if PERL5LIB is set to
> > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > should be a directory like
> > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > won't work.
> >
> > What do you get with this?
> >
> > perl -MBio::Root::Version -e "print
> > $Bio::Root::Version::VERSION"
> >
> > If everything is working (PERL5LIB, etc) then it
> > should be 1.5 for CVS
> > bioperl; otherwise it will either find the old
> > version (1.2.3) or fail
> > completely.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Thu Feb 23 13:06:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:06:37 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000301c638a0$53eb9a30$15327e82@pyrimidine>
Message-ID: <000401c638a3$e37fb520$15327e82@pyrimidine>

Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-mill NCBI
blast report these days gives e-values (expectation value), NOT p-values.  I
think they changed over to using only e-values with BLAST v2.  Make sure you
didn't mix these up; look out the text output to make sure that P values are
present.  That would explain why you're getting 0, since they don't exist.

>From the BLAST tutorial:

The BLAST programs report E-value rather than P-values because it is easier
to understand the difference between, for example, E-value of 5 and 10 than
P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and E-value
are nearly identical.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 23, 2006 11:41 AM
> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Yes that's a potential issue.  I'll try to replicate that here; please
> send
> a code example so I can see how you're calling for the p-value.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> > Sent: Thursday, February 23, 2006 10:24 AM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> >
> > Thanks Chris for all your help.
> >
> > The patch for blast.pm worked. I was able to parse the
> > hits from the raw file. I uninstalled previous
> > versions of bioperl using ppm and then I installed
> > bioperl 1.4.x using nmake, and applied your fix. I am
> > getting hits the way I wanted.
> >
> > However, I noticed that the p-value for each hit
> > doesn't seem to be parsed
> > correctly. It sets it to 0 for all hits. Not sure if
> > this is a known issue. Any suggestions/comments,
> > please let me know.
> >
> > Thanks,
> > Raghu
> >
> > --- Chris Fields  wrote:
> >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > > Verabelli
> > > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] Blast returns result, but
> > > does not return hits
> > > >
> > > >
> > > > Thanks very much Chris for your time.
> > > > Please see below output that you requested (the
> > > only
> > > > difference i saw between your output and mine is
> > > @INC
> > > > value. I have only 2 directories c:\mod_perl\perl
> > > > where i installed activeperl. I see two additional
> > > > directories in your @INC path).
> > > >
> > > > >
> > > > > When you type 'perl -V' what do you see (make
> > > sure
> > > > > it is a capital 'V', not
> > > > > lower case).
> > > >
> > > > C:\Documents and Settings\Administrator>perl  -V
> > > > Summary of my perl5 (revision 5 version 8
> > > subversion
> > > > 7) configuration:
> > > >   Platform:
> > > >     osname=MSWin32, osvers=5.0,
> > > > archname=MSWin32-x86-multi-thread
> > >
> > > [....]
> > >
> > > > if.pm
> > > >   Built under MSWin32
> > > >   Compiled at Nov  2 2005 08:44:52
> > > >   %ENV:
> > > >     PERL5LIB="c:\bioperl-live"
> > > >   @INC:
> > > >     c:\bioperl-live
> > > >     C:/mod_perl/Perl/lib
> > > >     C:/mod_perl/Perl/site/lib
> > > >     .
> > >
> > > Personally I wouldn't place the the bioperl-live
> > > folder in the root
> > > directory; this shouldn't make a difference, but you
> > > can try moving it to
> > > the perl directory in a separate folder to see if
> > > that helps.  Can't see why
> > > it would make a difference, but it is Windows...
> > > Main reason I'll switching
> > > over to Mac OS X!
> > >
> > > Make sure that the Bio directory is in the
> > > bioperl-live directory,
> > > regardless (i.e. if PERL5LIB is set to
> > > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > > should be a directory like
> > > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > > won't work.
> > >
> > > What do you get with this?
> > >
> > > perl -MBio::Root::Version -e "print
> > > $Bio::Root::Version::VERSION"
> > >
> > > If everything is working (PERL5LIB, etc) then it
> > > should be 1.5 for CVS
> > > bioperl; otherwise it will either find the old
> > > version (1.2.3) or fail
> > > completely.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason.stajich at duke.edu  Thu Feb 23 13:29:57 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 23 Feb 2006 13:29:57 -0500
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000401c638a3$e37fb520$15327e82@pyrimidine>
References: <000401c638a3$e37fb520$15327e82@pyrimidine>
Message-ID: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>

p-values do show up in WU-BLAST reports so that is why we have a p- 
value function.


On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:

> Hold up a second.  Do you mean e-value, or p-value?  A run-of-the- 
> mill NCBI
> blast report these days gives e-values (expectation value), NOT p- 
> values.  I
> think they changed over to using only e-values with BLAST v2.  Make  
> sure you
> didn't mix these up; look out the text output to make sure that P  
> values are
> present.  That would explain why you're getting 0, since they don't  
> exist.
>
>> From the BLAST tutorial:
>
> The BLAST programs report E-value rather than P-values because it  
> is easier
> to understand the difference between, for example, E-value of 5 and  
> 10 than
> P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and  
> E-value
> are nearly identical.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 23, 2006 11:41 AM
>> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Blast returns result, but does not return  
>> hits
>>
>> Yes that's a potential issue.  I'll try to replicate that here;  
>> please
>> send
>> a code example so I can see how you're calling for the p-value.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
>>> Sent: Thursday, February 23, 2006 10:24 AM
>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] Blast returns result, but does not  
>>> return hits
>>>
>>> Thanks Chris for all your help.
>>>
>>> The patch for blast.pm worked. I was able to parse the
>>> hits from the raw file. I uninstalled previous
>>> versions of bioperl using ppm and then I installed
>>> bioperl 1.4.x using nmake, and applied your fix. I am
>>> getting hits the way I wanted.
>>>
>>> However, I noticed that the p-value for each hit
>>> doesn't seem to be parsed
>>> correctly. It sets it to 0 for all hits. Not sure if
>>> this is a known issue. Any suggestions/comments,
>>> please let me know.
>>>
>>> Thanks,
>>> Raghu
>>>
>>> --- Chris Fields  wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
>>>> Verabelli
>>>>> Sent: Wednesday, February 22, 2006 8:25 PM
>>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] Blast returns result, but
>>>> does not return hits
>>>>>
>>>>>
>>>>> Thanks very much Chris for your time.
>>>>> Please see below output that you requested (the
>>>> only
>>>>> difference i saw between your output and mine is
>>>> @INC
>>>>> value. I have only 2 directories c:\mod_perl\perl
>>>>> where i installed activeperl. I see two additional
>>>>> directories in your @INC path).
>>>>>
>>>>>>
>>>>>> When you type 'perl -V' what do you see (make
>>>> sure
>>>>>> it is a capital 'V', not
>>>>>> lower case).
>>>>>
>>>>> C:\Documents and Settings\Administrator>perl  -V
>>>>> Summary of my perl5 (revision 5 version 8
>>>> subversion
>>>>> 7) configuration:
>>>>>   Platform:
>>>>>     osname=MSWin32, osvers=5.0,
>>>>> archname=MSWin32-x86-multi-thread
>>>>
>>>> [....]
>>>>
>>>>> if.pm
>>>>>   Built under MSWin32
>>>>>   Compiled at Nov  2 2005 08:44:52
>>>>>   %ENV:
>>>>>     PERL5LIB="c:\bioperl-live"
>>>>>   @INC:
>>>>>     c:\bioperl-live
>>>>>     C:/mod_perl/Perl/lib
>>>>>     C:/mod_perl/Perl/site/lib
>>>>>     .
>>>>
>>>> Personally I wouldn't place the the bioperl-live
>>>> folder in the root
>>>> directory; this shouldn't make a difference, but you
>>>> can try moving it to
>>>> the perl directory in a separate folder to see if
>>>> that helps.  Can't see why
>>>> it would make a difference, but it is Windows...
>>>> Main reason I'll switching
>>>> over to Mac OS X!
>>>>
>>>> Make sure that the Bio directory is in the
>>>> bioperl-live directory,
>>>> regardless (i.e. if PERL5LIB is set to
>>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
>>>> should be a directory like
>>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
>>>> won't work.
>>>>
>>>> What do you get with this?
>>>>
>>>> perl -MBio::Root::Version -e "print
>>>> $Bio::Root::Version::VERSION"
>>>>
>>>> If everything is working (PERL5LIB, etc) then it
>>>> should be 1.5 for CVS
>>>> bioperl; otherwise it will either find the old
>>>> version (1.2.3) or fail
>>>> completely.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Thu Feb 23 13:34:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:34:19 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>
Message-ID: <000501c638a7$c2802630$15327e82@pyrimidine>

I think Raghu's running NCBI BLAST, though.  Am I right? 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Thursday, February 23, 2006 12:30 PM
> To: Chris Fields
> Cc: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> p-values do show up in WU-BLAST reports so that is why we have a p-
> value function.
> 
> 
> On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> 
> > Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-
> > mill NCBI
> > blast report these days gives e-values (expectation value), NOT p-
> > values.  I
> > think they changed over to using only e-values with BLAST v2.  Make
> > sure you
> > didn't mix these up; look out the text output to make sure that P
> > values are
> > present.  That would explain why you're getting 0, since they don't
> > exist.
> >
> >> From the BLAST tutorial:
> >
> > The BLAST programs report E-value rather than P-values because it
> > is easier
> > to understand the difference between, for example, E-value of 5 and
> > 10 than
> > P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and
> > E-value
> > are nearly identical.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, February 23, 2006 11:41 AM
> >> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Blast returns result, but does not return
> >> hits
> >>
> >> Yes that's a potential issue.  I'll try to replicate that here;
> >> please
> >> send
> >> a code example so I can see how you're calling for the p-value.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab
> >> Dept. of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> >>> Sent: Thursday, February 23, 2006 10:24 AM
> >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>> Subject: Re: [Bioperl-l] Blast returns result, but does not
> >>> return hits
> >>>
> >>> Thanks Chris for all your help.
> >>>
> >>> The patch for blast.pm worked. I was able to parse the
> >>> hits from the raw file. I uninstalled previous
> >>> versions of bioperl using ppm and then I installed
> >>> bioperl 1.4.x using nmake, and applied your fix. I am
> >>> getting hits the way I wanted.
> >>>
> >>> However, I noticed that the p-value for each hit
> >>> doesn't seem to be parsed
> >>> correctly. It sets it to 0 for all hits. Not sure if
> >>> this is a known issue. Any suggestions/comments,
> >>> please let me know.
> >>>
> >>> Thanks,
> >>> Raghu
> >>>
> >>> --- Chris Fields  wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
> >>>> Verabelli
> >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] Blast returns result, but
> >>>> does not return hits
> >>>>>
> >>>>>
> >>>>> Thanks very much Chris for your time.
> >>>>> Please see below output that you requested (the
> >>>> only
> >>>>> difference i saw between your output and mine is
> >>>> @INC
> >>>>> value. I have only 2 directories c:\mod_perl\perl
> >>>>> where i installed activeperl. I see two additional
> >>>>> directories in your @INC path).
> >>>>>
> >>>>>>
> >>>>>> When you type 'perl -V' what do you see (make
> >>>> sure
> >>>>>> it is a capital 'V', not
> >>>>>> lower case).
> >>>>>
> >>>>> C:\Documents and Settings\Administrator>perl  -V
> >>>>> Summary of my perl5 (revision 5 version 8
> >>>> subversion
> >>>>> 7) configuration:
> >>>>>   Platform:
> >>>>>     osname=MSWin32, osvers=5.0,
> >>>>> archname=MSWin32-x86-multi-thread
> >>>>
> >>>> [....]
> >>>>
> >>>>> if.pm
> >>>>>   Built under MSWin32
> >>>>>   Compiled at Nov  2 2005 08:44:52
> >>>>>   %ENV:
> >>>>>     PERL5LIB="c:\bioperl-live"
> >>>>>   @INC:
> >>>>>     c:\bioperl-live
> >>>>>     C:/mod_perl/Perl/lib
> >>>>>     C:/mod_perl/Perl/site/lib
> >>>>>     .
> >>>>
> >>>> Personally I wouldn't place the the bioperl-live
> >>>> folder in the root
> >>>> directory; this shouldn't make a difference, but you
> >>>> can try moving it to
> >>>> the perl directory in a separate folder to see if
> >>>> that helps.  Can't see why
> >>>> it would make a difference, but it is Windows...
> >>>> Main reason I'll switching
> >>>> over to Mac OS X!
> >>>>
> >>>> Make sure that the Bio directory is in the
> >>>> bioperl-live directory,
> >>>> regardless (i.e. if PERL5LIB is set to
> >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> >>>> should be a directory like
> >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> >>>> won't work.
> >>>>
> >>>> What do you get with this?
> >>>>
> >>>> perl -MBio::Root::Version -e "print
> >>>> $Bio::Root::Version::VERSION"
> >>>>
> >>>> If everything is working (PERL5LIB, etc) then it
> >>>> should be 1.5 for CVS
> >>>> bioperl; otherwise it will either find the old
> >>>> version (1.2.3) or fail
> >>>> completely.
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher - Switzer Lab
> >>>> Dept. of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >>>
> >>> __________________________________________________
> >>> Do You Yahoo!?
> >>> Tired of spam?  Yahoo! Mail has the best spam protection around
> >>> http://mail.yahoo.com
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12




From iamvela at yahoo.com  Thu Feb 23 14:33:50 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 11:33:50 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000501c638a7$c2802630$15327e82@pyrimidine>
Message-ID: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>

Chris, you are right. I am using NCBI BLAST.

Here is my http query:

my $urltext =
"http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=nr&PROGRAM=blastp";

This is my code for populating p-value:

my $pValue = $bioPerlHit->significance;


I looked at the text output, could not find any p
value column, the only 'value' column in the output is
'E value'. I will try that.

Thanks,
Raghu
 
--- Chris Fields  wrote:

> I think Raghu's running NCBI BLAST, though.  Am I
> right? 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Jason Stajich
> [mailto:jason.stajich at duke.edu]
> > Sent: Thursday, February 23, 2006 12:30 PM
> > To: Chris Fields
> > Cc: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > p-values do show up in WU-BLAST reports so that is
> why we have a p-
> > value function.
> > 
> > 
> > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > 
> > > Hold up a second.  Do you mean e-value, or
> p-value?  A run-of-the-
> > > mill NCBI
> > > blast report these days gives e-values
> (expectation value), NOT p-
> > > values.  I
> > > think they changed over to using only e-values
> with BLAST v2.  Make
> > > sure you
> > > didn't mix these up; look out the text output to
> make sure that P
> > > values are
> > > present.  That would explain why you're getting
> 0, since they don't
> > > exist.
> > >
> > >> From the BLAST tutorial:
> > >
> > > The BLAST programs report E-value rather than
> P-values because it
> > > is easier
> > > to understand the difference between, for
> example, E-value of 5 and
> > > 10 than
> > > P-values of 0.993 and 0.99995. However, when E <
> 0.01, P-values and
> > > E-value
> > > are nearly identical.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris
> Fields
> > >> Sent: Thursday, February 23, 2006 11:41 AM
> > >> To: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Blast returns result,
> but does not return
> > >> hits
> > >>
> > >> Yes that's a potential issue.  I'll try to
> replicate that here;
> > >> please
> > >> send
> > >> a code example so I can see how you're calling
> for the p-value.
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher - Switzer Lab
> > >> Dept. of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath Verabelli
> > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>> Subject: Re: [Bioperl-l] Blast returns result,
> but does not
> > >>> return hits
> > >>>
> > >>> Thanks Chris for all your help.
> > >>>
> > >>> The patch for blast.pm worked. I was able to
> parse the
> > >>> hits from the raw file. I uninstalled previous
> > >>> versions of bioperl using ppm and then I
> installed
> > >>> bioperl 1.4.x using nmake, and applied your
> fix. I am
> > >>> getting hits the way I wanted.
> > >>>
> > >>> However, I noticed that the p-value for each
> hit
> > >>> doesn't seem to be parsed
> > >>> correctly. It sets it to 0 for all hits. Not
> sure if
> > >>> this is a known issue. Any
> suggestions/comments,
> > >>> please let me know.
> > >>>
> > >>> Thanks,
> > >>> Raghu
> > >>>
> > >>> --- Chris Fields  wrote:
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > >>>> [mailto:bioperl-l-
> > >>>>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath
> > >>>> Verabelli
> > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > >>>>> To: Chris Fields;
> bioperl-l at lists.open-bio.org
> > >>>>> Subject: Re: [Bioperl-l] Blast returns
> result, but
> > >>>> does not return hits
> > >>>>>
> > >>>>>
> > >>>>> Thanks very much Chris for your time.
> > >>>>> Please see below output that you requested
> (the
> > >>>> only
> > >>>>> difference i saw between your output and
> mine is
> > >>>> @INC
> > >>>>> value. I have only 2 directories
> c:\mod_perl\perl
> > >>>>> where i installed activeperl. I see two
> additional
> > >>>>> directories in your @INC path).
> > >>>>>
> > >>>>>>
> > >>>>>> When you type 'perl -V' what do you see
> (make
> > >>>> sure
> > >>>>>> it is a capital 'V', not
> > >>>>>> lower case).
> > >>>>>
> > >>>>> C:\Documents and Settings\Administrator>perl
>  -V
> > >>>>> Summary of my perl5 (revision 5 version 8
> > >>>> subversion
> > >>>>> 7) configuration:
> > >>>>>   Platform:
> > >>>>>     osname=MSWin32, osvers=5.0,
> > >>>>> archname=MSWin32-x86-multi-thread
> > >>>>
> > >>>> [....]
> > >>>>
> > >>>>> if.pm
> > >>>>>   Built under MSWin32
> > >>>>>   Compiled at Nov  2 2005 08:44:52
> > >>>>>   %ENV:
> > >>>>>     PERL5LIB="c:\bioperl-live"
> > >>>>>   @INC:
> > >>>>>     c:\bioperl-live
> > >>>>>     C:/mod_perl/Perl/lib
> > >>>>>     C:/mod_perl/Perl/site/lib
> > >>>>>     .
> > >>>>
> > >>>> Personally I wouldn't place the the
> bioperl-live
> > >>>> folder in the root
> > >>>> directory; this shouldn't make a difference,
> but you
> > >>>> can try moving it to
> > >>>> the perl directory in a separate folder to
> see if
> > >>>> that helps.  Can't see why
> > >>>> it would make a difference, but it is
> Windows...
> > >>>> Main reason I'll switching
> > >>>> over to Mac OS X!
> > >>>>
> > >>>> Make sure that the Bio directory is in the
> > >>>> bioperl-live directory,
> > >>>> regardless (i.e. if PERL5LIB is set to
> > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> there
> > >>>> should be a directory like
> > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> it
> > >>>> won't work.
> > >>>>
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Thu Feb 23 16:11:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 15:11:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>
Message-ID: <000301c638bd$bc9eb590$15327e82@pyrimidine>

I think you want $hit->expect (for hits) or $hsp->evalue (for HSPs).
$hit->significance (for NCBI blast) gives the values from the descriptions
(the score and expect) for each hit.

If you want to see what methods are available for any given object (in this
case Bio::Search::Hit::BlastHit ot Bio::Search::HSP::BlastHSP), use the
below script from the bioperl FAQ (use PPM to install Class::Inspector
first) and pass the object module name on the command line.  Be careful as
many of these are get/sets (so don't pass any args).
----------------------------------
#!perl
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-methods($class,'full','public')}),
"\n";
----------------------------------
You should get something like this:

C:\Perl\Scripts>methods.pl Bio::Search::Hit::BlastHit
Bio::Root::Root::DESTROY
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::Search::Hit::BlastHit::expect
Bio::Search::Hit::BlastHit::found_again
Bio::Search::Hit::BlastHit::iteration
Bio::Search::Hit::BlastHit::new
Bio::Search::Hit::GenericHit::accession
Bio::Search::Hit::GenericHit::add_hsp
Bio::Search::Hit::GenericHit::algorithm
Bio::Search::Hit::GenericHit::ambiguous_aln
Bio::Search::Hit::GenericHit::bits
Bio::Search::Hit::GenericHit::description
Bio::Search::Hit::GenericHit::each_accession_number
Bio::Search::Hit::GenericHit::end
Bio::Search::Hit::GenericHit::frac_aligned_hit
Bio::Search::Hit::GenericHit::frac_aligned_query
Bio::Search::Hit::GenericHit::frac_conserved
Bio::Search::Hit::GenericHit::frac_identical
Bio::Search::Hit::GenericHit::frame
Bio::Search::Hit::GenericHit::gaps
Bio::Search::Hit::GenericHit::hsp
Bio::Search::Hit::GenericHit::hsps
Bio::Search::Hit::GenericHit::length
Bio::Search::Hit::GenericHit::length_aln
Bio::Search::Hit::GenericHit::locus
Bio::Search::Hit::GenericHit::logical_length
Bio::Search::Hit::GenericHit::matches
Bio::Search::Hit::GenericHit::n
Bio::Search::Hit::GenericHit::name
Bio::Search::Hit::GenericHit::next_hsp
Bio::Search::Hit::GenericHit::num_hsps
Bio::Search::Hit::GenericHit::num_unaligned_hit
Bio::Search::Hit::GenericHit::num_unaligned_query
Bio::Search::Hit::GenericHit::num_unaligned_sbjct
Bio::Search::Hit::GenericHit::overlap
Bio::Search::Hit::GenericHit::p
Bio::Search::Hit::GenericHit::query_length
Bio::Search::Hit::GenericHit::range
Bio::Search::Hit::GenericHit::rank
Bio::Search::Hit::GenericHit::raw_score
Bio::Search::Hit::GenericHit::rewind
Bio::Search::Hit::GenericHit::score
Bio::Search::Hit::GenericHit::seq_inds
Bio::Search::Hit::GenericHit::significance
Bio::Search::Hit::GenericHit::start
Bio::Search::Hit::GenericHit::strand
Bio::Search::Hit::GenericHit::tiled_hsps
Bio::Search::Hit::HitI::hit_description
Bio::Search::Hit::HitI::hit_length

Nice, huh?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Thursday, February 23, 2006 1:34 PM
> To: Chris Fields; 'Jason Stajich'
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Blast returns result, but does not return hits
> 
> Chris, you are right. I am using NCBI BLAST.
> 
> Here is my http query:
> 
> my $urltext =
> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=n
> r&PROGRAM=blastp";
> 
> This is my code for populating p-value:
> 
> my $pValue = $bioPerlHit->significance;
> 
> 
> I looked at the text output, could not find any p
> value column, the only 'value' column in the output is
> 'E value'. I will try that.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > I think Raghu's running NCBI BLAST, though.  Am I
> > right?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > > -----Original Message-----
> > > From: Jason Stajich
> > [mailto:jason.stajich at duke.edu]
> > > Sent: Thursday, February 23, 2006 12:30 PM
> > > To: Chris Fields
> > > Cc: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > p-values do show up in WU-BLAST reports so that is
> > why we have a p-
> > > value function.
> > >
> > >
> > > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > >
> > > > Hold up a second.  Do you mean e-value, or
> > p-value?  A run-of-the-
> > > > mill NCBI
> > > > blast report these days gives e-values
> > (expectation value), NOT p-
> > > > values.  I
> > > > think they changed over to using only e-values
> > with BLAST v2.  Make
> > > > sure you
> > > > didn't mix these up; look out the text output to
> > make sure that P
> > > > values are
> > > > present.  That would explain why you're getting
> > 0, since they don't
> > > > exist.
> > > >
> > > >> From the BLAST tutorial:
> > > >
> > > > The BLAST programs report E-value rather than
> > P-values because it
> > > > is easier
> > > > to understand the difference between, for
> > example, E-value of 5 and
> > > > 10 than
> > > > P-values of 0.993 and 0.99995. However, when E <
> > 0.01, P-values and
> > > > E-value
> > > > are nearly identical.
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Chris
> > Fields
> > > >> Sent: Thursday, February 23, 2006 11:41 AM
> > > >> To: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > >> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not return
> > > >> hits
> > > >>
> > > >> Yes that's a potential issue.  I'll try to
> > replicate that here;
> > > >> please
> > > >> send
> > > >> a code example so I can see how you're calling
> > for the p-value.
> > > >>
> > > >> Christopher Fields
> > > >> Postdoctoral Researcher - Switzer Lab
> > > >> Dept. of Biochemistry
> > > >> University of Illinois Urbana-Champaign
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath Verabelli
> > > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > > >>> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not
> > > >>> return hits
> > > >>>
> > > >>> Thanks Chris for all your help.
> > > >>>
> > > >>> The patch for blast.pm worked. I was able to
> > parse the
> > > >>> hits from the raw file. I uninstalled previous
> > > >>> versions of bioperl using ppm and then I
> > installed
> > > >>> bioperl 1.4.x using nmake, and applied your
> > fix. I am
> > > >>> getting hits the way I wanted.
> > > >>>
> > > >>> However, I noticed that the p-value for each
> > hit
> > > >>> doesn't seem to be parsed
> > > >>> correctly. It sets it to 0 for all hits. Not
> > sure if
> > > >>> this is a known issue. Any
> > suggestions/comments,
> > > >>> please let me know.
> > > >>>
> > > >>> Thanks,
> > > >>> Raghu
> > > >>>
> > > >>> --- Chris Fields  wrote:
> > > >>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > > >>>> [mailto:bioperl-l-
> > > >>>>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath
> > > >>>> Verabelli
> > > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > > >>>>> To: Chris Fields;
> > bioperl-l at lists.open-bio.org
> > > >>>>> Subject: Re: [Bioperl-l] Blast returns
> > result, but
> > > >>>> does not return hits
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks very much Chris for your time.
> > > >>>>> Please see below output that you requested
> > (the
> > > >>>> only
> > > >>>>> difference i saw between your output and
> > mine is
> > > >>>> @INC
> > > >>>>> value. I have only 2 directories
> > c:\mod_perl\perl
> > > >>>>> where i installed activeperl. I see two
> > additional
> > > >>>>> directories in your @INC path).
> > > >>>>>
> > > >>>>>>
> > > >>>>>> When you type 'perl -V' what do you see
> > (make
> > > >>>> sure
> > > >>>>>> it is a capital 'V', not
> > > >>>>>> lower case).
> > > >>>>>
> > > >>>>> C:\Documents and Settings\Administrator>perl
> >  -V
> > > >>>>> Summary of my perl5 (revision 5 version 8
> > > >>>> subversion
> > > >>>>> 7) configuration:
> > > >>>>>   Platform:
> > > >>>>>     osname=MSWin32, osvers=5.0,
> > > >>>>> archname=MSWin32-x86-multi-thread
> > > >>>>
> > > >>>> [....]
> > > >>>>
> > > >>>>> if.pm
> > > >>>>>   Built under MSWin32
> > > >>>>>   Compiled at Nov  2 2005 08:44:52
> > > >>>>>   %ENV:
> > > >>>>>     PERL5LIB="c:\bioperl-live"
> > > >>>>>   @INC:
> > > >>>>>     c:\bioperl-live
> > > >>>>>     C:/mod_perl/Perl/lib
> > > >>>>>     C:/mod_perl/Perl/site/lib
> > > >>>>>     .
> > > >>>>
> > > >>>> Personally I wouldn't place the the
> > bioperl-live
> > > >>>> folder in the root
> > > >>>> directory; this shouldn't make a difference,
> > but you
> > > >>>> can try moving it to
> > > >>>> the perl directory in a separate folder to
> > see if
> > > >>>> that helps.  Can't see why
> > > >>>> it would make a difference, but it is
> > Windows...
> > > >>>> Main reason I'll switching
> > > >>>> over to Mac OS X!
> > > >>>>
> > > >>>> Make sure that the Bio directory is in the
> > > >>>> bioperl-live directory,
> > > >>>> regardless (i.e. if PERL5LIB is set to
> > > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> > there
> > > >>>> should be a directory like
> > > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> > it
> > > >>>> won't work.
> > > >>>>
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com



From cain at cshl.edu  Wed Feb 22 09:36:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 22 Feb 2006 09:36:54 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
Message-ID: <1140619014.3142.81.camel@localhost.localdomain>

Hi Dave,

I don't know if this helps at all, but you could think of that 45 tick
mark as the termination, since the space between the 44th and the 45th
tick mark corresponds to your 44th residue.  I suppose it is a matter of
correctly training your users :-)

Scott


On Wed, 2006-02-22 at 10:20 +0000, Dave Howorth wrote:
> Lincoln Stein wrote:
> > Hi Dave,
> > 
> > Well, when you are using 1-based coordinates, an line that contains 44 
> > intervals will have 45 ticks. If you move to 0-based coordinates, then the 
> > first tick will be labeled 0 and the last tick will be labeled 44. An 
> > alternative is to make each base dimensionless, but that becomes a problem 
> > when dealing with single base features, such as SNPs.
>  >
> > These issues are why I have long advocated for interbase coordinates
> > in which you number the positions between bases rather than the bases
> > themselves.
> 
> I see your point but I need to work with the coordinates that the users 
> expect and are familiar with. (Things get much worse with PDB residue 
> numbering :)
> 
> > Draw me the picture of what you expect to see. I think of it this way:
> > 
> > 	1    2  3  4   5   6
> >          A>G>C>T>A>
> 
> I guess something went wrong with your ASCII art :(
> 
> OK, consider a 44-residue entry from SwissProt (P12239):
> 
>    TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR
> 
> The first T is numbered 1 and the last R is numbered 44.
> 
> So I expect to see a line with 44 positions indicated somehow (whether 
> these are half-open intervals or points on the line), with the number 1 
> at the left end and the number 44 at the right end.
> 
> An important point is that if I then place other tracks below this one 
> that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, 
> they should align properly (according to whatever convention is used to 
> represent a residue).
> 
> For a short sequence like this it would be possible to use letters to 
> represent the residue but I'd like to use the same convention for longer 
> sequences as well and have everything be consistent.
> 
> I'm hoping Bio:Graphics will make this easy.
> 
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hlapp at gnf.org  Thu Feb 23 21:10:13 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 23 Feb 2006 18:10:13 -0800
Subject: [Bioperl-l] [BioSQL-l] Load seqfeature from biosql database
	with perl
In-Reply-To: <1140744561.2888.19.camel@alien>
Message-ID: 

Yes, kudos to you for figuring this out yourself, and you actually figured
out the more difficult way. I apologize for my delay in responding, I was
tied up this morning and last night.

You got the first key step right, namely obtaining the right persistence
adaptor. This step determines which object you get back.

Your query will work, and in fact will be equally fast as the simple
solution (which is simple only because it is simpler to code, not because
the internally executed query is simpler). The simple solution is that every
Bio::DB::PersistenceAdaptorI implementing object (i.e., any object you get
back from $db->get_object_adaptor(..)) has a method
$adp->find_by_primary_key(). So, using that method:

    $feature = $adaptor->find_by_primary_key($seqfeature_id);

You can also control the type of object to be created (so long as it is a
Bio::SeqFeatureI) by passing in an object factory in addition.

BTW as an aside, using the finder method will also make the object cache
used for lookup first if the cache is enabled. It doesn't matter for seq
features because due to the potentially large number of objects the cache is
not enabled by default for this adaptor.

    -hilmar  

On 2/23/06 5:29 PM, "Michael Cipriano"  wrote:

> Ah, I think I figured it out.
> 
> my $seqfeature_id = '401138';
> my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");
> 
> my $query = Bio::DB::Query::BioQuery->new(
> 
> -datacollections=>["Bio::SeqFeatureI t1"],
>                                         -where => ["t1.Bio::SeqFeatureI
> = ?"]);
> 
> my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
> SEQ',
> 
> -values=>[$seqfeature_id]);
> 
> while(my $loc = $qres->next_object())
> {
>         my $obj = $loc;
> 
>         print $obj->primary_key() . "\n";
>         print 'location:' . $obj->location->to_FTstring() . "\n";
>         $obj->add_tag_value("test", "moretest");
>         foreach my $tag ($obj->get_all_tags())
>         {
>                 print " Values for tag $tag: ";
>                 print join(' ',$obj->get_tag_values($tag));
>                 print "\n";
>         }
>         print "------------------\n";
> 
> }
> 
> 
> 
> This seems to work
> On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
>> Hello BioSQLers,
>> 
>> I have a simple question (I hope), Can I easily load a seqfeature from a
>> biosql database into a perl Bio::SeqFeatureI object?  I have the
>> database value for the  seqfeature.seqfeature_id and would like to load
>> it using this alone.
>> 
>> I do not want to have to load the whole bioentry object then search for
>> the feature, I just want the feature object since the bioentry is a
>> whole genome and loading that will take more time then necessary.
>> 
>> I have searched the documentation and have even tried looking through
>> the code for the modules, but could not find an easy fast method.
>> 
>> Please reply directly to me as well as the list as I am not a list
>> member.
>> 
>> Thanks for your help,
>> 
>> 
>> Michael Cipriano
>> 
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




From praveecbt at yahoo.co.in  Fri Feb 24 00:57:22 2006
From: praveecbt at yahoo.co.in (Praveen Raj)
Date: Fri, 24 Feb 2006 05:57:22 +0000 (GMT)
Subject: [Bioperl-l] Problem in BioPerl. Help!
Message-ID: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>

Dear sir,
   
           I have one problem in using Bioperl module 'Clustalw.pm'.
Clustalw creates SimpleAlign object as output,isn't it?.
  I successfully convert the object into 'clustal' and 'phylip' format using a
  file handler.
Sir, I want to make a newick format( for phylogenetic tree ) from the object itself.
But I know that Standalone Clustalw creates a newick file(.dnd extension) as an output along with 
the .aln file.
When I created a 'clustal' format and printed into a web page, it look like this;
   
  CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence format is Pearson
Sequence 1: >gi|dengue2|           13 aa
Sequence 2: >gi|yellowfever|       13 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  15
Guide tree        file created:   [\tXGgJDIuZZ\jmIerlkHz7.dnd]
Start of Multiple Alignment
There are 1 groups
...............
   
  I don't know where the .dnd file(it's in newick format) is created.
It's not in the current directory.
Is there any method to specify the path for the .dnd file?
  I have gone through all the documentation provided with the BioPerl & clustalw.
  
How can I create a 'newick' output(.dnd file) format from a SimpleAlign object,created by Clustalw.pm?
   
  It's a great benefit for me, if you provide a solution for the same.
I can't move forward without a solution for this.
  So, Please reply...
   
                                    Thanking you,
                                                   Praveen Raj(student).
                                                   National Institute of Virology,   
                                                   Pune. India

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.


From roy at colibase.bham.ac.uk  Fri Feb 24 10:51:46 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Fri, 24 Feb 2006 15:51:46 +0000
Subject: [Bioperl-l] Problem in BioPerl. Help!
In-Reply-To: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
References: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
Message-ID: <43FF2B92.9090801@colibase.bham.ac.uk>

Praveen Raj wrote:
> Sir, I want to make a newick format( for phylogenetic tree ) from the
> object itself. But I know that Standalone Clustalw creates a newick
> file(.dnd extension) as an output along with the .aln file.

Be careful with this. The .dnd files produced by ClustalW contain a 
Newick format guide tree- produced from pairwise-aligned sequences to 
guide the multiple alignment process. This should not be confused with a 
phylogenetic analysis, and the .dnd file is usually best ignored.

ClustalW can be used to produce a true phylogenetic tree from the 
alignment using the Neighbor-joining method (see the menus and 
documentation for details). This method produces files with a .ph or 
.phb extension (.phb if the tree is bootstrapped). I'm not sure if this 
process can be done using BioPerl, but it is possible to do using 
ClustalW's command line flags, so if you need to automate the process 
you could use Perl's system command. If you want to use BioPerl you can 
use the Phylip program neighbor to generate your tree directly from a 
SimpleAlign object, using the module 
Bio::Tools::Run::Phylo::Phylip::Neighbor.

Cheers.
Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk




From perlmails at gmail.com  Sun Feb 26 06:51:37 2006
From: perlmails at gmail.com (perlmails at gmail.com)
Date: Sun, 26 Feb 2006 17:21:37 +0530
Subject: [Bioperl-l] extract ncDNA
Message-ID: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>

Dear Bioperl group,

I have been working on extracting non-coding DNA (ncDNA) sequences
from an organimsm.

I tried extracting the intergenic sequences from the sense-strand
after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
the EMBL feature table entries using the Bioperl and the additional
script (mentioned below).

Now, I realised that there is a problem to extract the ncDNA sequences
from the negative-strand, Any ideas?

To extract the ncDNAs from negative-strand, I thought of converting
the negative-strand co-ordinates to sense-strand co-ordinates and
adding these to the sense-strand cords. Then filter all the features
(select the ncDNAs after discarding the features from EMBL FT) to get
all the ncDNAs.

Is there anything I am missing for using from the bioperl kit?

##<<>
use strict;

my $EMBL_cord_file = "Organism.feature.cords";  # feature
co-ordinates: start \t end
my $RAW_file = "Organism.raw";
my $ncDNA_file = "Organism.ncDNA";

open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
open(RAW, $RAW_file) or die "Canot open RAW_file";
open(OUT, ">$ncDNA_file") or die;

my @dna=;
my $dna = join('', at dna);

while($dna){
	$dna=~s/\s//g;
	while(){
		my @cords = split /\t/;
		my	$start = $cords[0];
		my	$end = $cords[1];
		my $replaceString = "\n>$start..$end";
		substr($dna, $start-1, $end-$start+1, $replaceString);
}
	print OUT $dna,"\n";
	exit;
}
##<<>

Another thing is, since I am reading the whole file in a scalar the
script does not complete the extraction of all ncDNAs from the
sense-strand. Obviously, the features are parsed first before the
flattening of the 266,000 nt sequence into a single string.

Any help would be appreciated.

-PO



From cjfields at uiuc.edu  Sun Feb 26 09:12:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 26 Feb 2006 08:12:57 -0600
Subject: [Bioperl-l] extract ncDNA
In-Reply-To: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
References: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
Message-ID: 

You're not using bioperl.  See:

http://www.bioperl.org/wiki/HOWTO:Beginners

then go to:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Chris


On Feb 26, 2006, at 5:51 AM, perlmails at gmail.com wrote:

> Dear Bioperl group,
>
> I have been working on extracting non-coding DNA (ncDNA) sequences
> from an organimsm.
>
> I tried extracting the intergenic sequences from the sense-strand
> after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
> the EMBL feature table entries using the Bioperl and the additional
> script (mentioned below).
>
> Now, I realised that there is a problem to extract the ncDNA sequences
> from the negative-strand, Any ideas?
>
> To extract the ncDNAs from negative-strand, I thought of converting
> the negative-strand co-ordinates to sense-strand co-ordinates and
> adding these to the sense-strand cords. Then filter all the features
> (select the ncDNAs after discarding the features from EMBL FT) to get
> all the ncDNAs.
>
> Is there anything I am missing for using from the bioperl kit?
>
> ##<<>
> use strict;
>
> my $EMBL_cord_file = "Organism.feature.cords";  # feature
> co-ordinates: start \t end
> my $RAW_file = "Organism.raw";
> my $ncDNA_file = "Organism.ncDNA";
>
> open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
> open(RAW, $RAW_file) or die "Canot open RAW_file";
> open(OUT, ">$ncDNA_file") or die;
>
> my @dna=;
> my $dna = join('', at dna);
>
> while($dna){
> 	$dna=~s/\s//g;
> 	while(){
> 		my @cords = split /\t/;
> 		my	$start = $cords[0];
> 		my	$end = $cords[1];
> 		my $replaceString = "\n>$start..$end";
> 		substr($dna, $start-1, $end-$start+1, $replaceString);
> }
> 	print OUT $dna,"\n";
> 	exit;
> }
> ##<<>
>
> Another thing is, since I am reading the whole file in a scalar the
> script does not complete the extraction of all ncDNAs from the
> sense-strand. Obviously, the features are parsed first before the
> flattening of the 266,000 nt sequence into a single string.
>
> Any help would be appreciated.
>
> -PO
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From saldroubi at yahoo.com  Sun Feb 26 15:15:14 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Sun, 26 Feb 2006 12:15:14 -0800 (PST)
Subject: [Bioperl-l] Is it worth it?
Message-ID: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>

Hello everyone,
   
  Please forgive me for posting my questions on this list since they are not directly related to bioperl but since most of you are doing bioinformatics, I thought I could ask for some advise.  Also, please point me to other lists or websites if more appropriate. 
   
  Basically I am wondering if it is worth it getting a Master or PhD degree in bioinformatics with funding?  I already have an MS degree in Software Engineering and I've take a few bioinformatics courses and I like the field.  Additionally, I am almost 40 years old and have a stable job.  If I am to get PhD in 3 to 4 years, what job opportunities will be out there for me?  And is it better to work in academia or the private sector?  What the average salary like?
   
  Thank you very much and please respond to me directly instead of of the list since my questions are off topic.
   
   


Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From joel at macresearcher.com  Sun Feb 26 22:12:12 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Sun, 26 Feb 2006 20:12:12 -0700
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
References: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
Message-ID: 

It seems to me that your mind is already made up. By asking such a  
question I think it's safe to say a PhD program in Bioinformatics  
would not be your cup of tea. This is not to be negative. If you like  
bioinformatics, do bioinformatics. Join an open-source project, or  
start one of your own. If you live in a town with a University, find  
a lab that needs bioinformatics work and volunteer your time. If you  
really have a passion for bioinformatics, just do bioinformatics and  
your path will become clear, opportunities will arise, your salary  
will be what you need. Just my two shekels of course.

- Joel

On Feb 26, 2006, at 1:15 PM, Sam Al-Droubi wrote:

> Hello everyone,
>
>   Please forgive me for posting my questions on this list since  
> they are not directly related to bioperl but since most of you are  
> doing bioinformatics, I thought I could ask for some advise.  Also,  
> please point me to other lists or websites if more appropriate.
>
>   Basically I am wondering if it is worth it getting a Master or  
> PhD degree in bioinformatics with funding?  I already have an MS  
> degree in Software Engineering and I've take a few bioinformatics  
> courses and I like the field.  Additionally, I am almost 40 years  
> old and have a stable job.  If I am to get PhD in 3 to 4 years,  
> what job opportunities will be out there for me?  And is it better  
> to work in academia or the private sector?  What the average salary  
> like?
>
>   Thank you very much and please respond to me directly instead of  
> of the list since my questions are off topic.
>
>
>
>
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From sdavis2 at mail.nih.gov  Mon Feb 27 06:39:27 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 27 Feb 2006 06:39:27 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
Message-ID: 




On 2/26/06 10:12 PM, "Joel Dudley"  wrote:

> It seems to me that your mind is already made up. By asking such a
> question I think it's safe to say a PhD program in Bioinformatics
> would not be your cup of tea. This is not to be negative. If you like
> bioinformatics, do bioinformatics. Join an open-source project, or
> start one of your own. If you live in a town with a University, find
> a lab that needs bioinformatics work and volunteer your time. If you
> really have a passion for bioinformatics, just do bioinformatics and
> your path will become clear, opportunities will arise, your salary
> will be what you need. Just my two shekels of course.

I would second this sentiment.  Most of the folks that I know that are doing
bioinformatics are doing it WITHOUT a degree in it.  The trick is to have
both computational skills AND domain-specific knowledge.  Just find a
project that will require you to gain some domain-specific knowledge (which
can actually happen pretty quickly) and go for it.  As Joel said, there are
dozens of open source projects that would love a helping hand.  If you need
more face-time, do as Joel suggests and work with a local university (or
even high school) to design some web-based tools or something like that to
do things that would be either educational or novel.

Sean




From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb 27 05:40:19 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 27 Feb 2006 10:40:19 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602221340.28573.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	<1140625762.3142.107.camel@localhost.localdomain>	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
	<200602221340.28573.lstein@cshl.edu>
Message-ID: <4402D713.2050007@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> I have just committed a version of the arrow.pm glyph that has a 
> -label_intervals flag.

Thanks Lincoln,

I've edited your new version so it displays the tick labels pretty much 
as I need. My changes were to the first and last label and to move the 
position of the others a little. I hope that it behaves exactly like 
your version unless label_intervals is set. I've attached my edited version.

There's still an oddity with the number of minor ticks at the start and 
end of the line (I've seen 7, 8, and 9 minor intervals at the start of 
the line as well as 10) but I'll probably ignore that for now.

Thanks, Dave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arrow.pm
Type: application/x-perl
Size: 16357 bytes
Desc: not available
URL: 

From boris.steipe at utoronto.ca  Mon Feb 27 10:42:54 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 27 Feb 2006 10:42:54 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
References: 
Message-ID: <56C842D6-18AD-40B0-AE9A-47A29AE83F1D@utoronto.ca>

I'd put I slightly different emphasis on this: obviously most of  
those in the field can't have a degree in bioinformatics because such  
degree programs haven't been around for all that long. One shouldn't  
conclude that graduate programs are therefore somehow less relevant.  
To successfully apply for a paid job, you need credentials for your  
ability to be productive.

Credentials can come from open source projects IF you can document  
the scope and quality of your contributions.

Credentials can come from a graduate degree IF your thesis appears  
relevant, original and well executed.

Credentials can come from peer-reviewed publications.

Credentials can come from personal references of collaborators.



Regards,
B.

On 27 Feb 2006, at 06:39, Sean Davis wrote:

>
>
>
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
>
>> It seems to me that your mind is already made up. By asking such a
>> question I think it's safe to say a PhD program in Bioinformatics
>> would not be your cup of tea. This is not to be negative. If you like
>> bioinformatics, do bioinformatics. Join an open-source project, or
>> start one of your own. If you live in a town with a University, find
>> a lab that needs bioinformatics work and volunteer your time. If you
>> really have a passion for bioinformatics, just do bioinformatics and
>> your path will become clear, opportunities will arise, your salary
>> will be what you need. Just my two shekels of course.
>
> I would second this sentiment.  Most of the folks that I know that  
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is  
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific  
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said,  
> there are
> dozens of open source projects that would love a helping hand.  If  
> you need
> more face-time, do as Joel suggests and work with a local  
> university (or
> even high school) to design some web-based tools or something like  
> that to
> do things that would be either educational or novel.
>
> Sean
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From slenk at emich.edu  Mon Feb 27 16:07:38 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Mon, 27 Feb 2006 16:07:38 -0500
Subject: [Bioperl-l] Is it worth it?
Message-ID: <556d070556f727.556f727556d070@emich.edu>

Gee golly ollie, this is good advice. I face the same issues, but am much older (53). I am taking a Sloan MS in 
Bioinformatics while working full time at the car parts company. I bring what I have newly learned at school to 
work (Perl especially, in which I build and share tools even as far away as exotic India (smile)). I take what I have 
from work (discipline, experience, work ethic) and apply it to open source and shared school projects. The 
world has given me a lot; I enjoy giving back. Why not take an MS in Biology/Bioinformatics at your pace and 
see where it leads. I have no idea if I will EVER have a JOB in Bioinformatics, so I just live it day by day. Plug 
follows - see MCPrimers at CPAN for PCR primer design for molecular cloning with site-directed mutagenesis. I 
did this as an outgrowth of a Rectech class I took. 



----- Original Message -----
From: Sean Davis 
Date: Monday, February 27, 2006 6:39 am
Subject: Re: [Bioperl-l] Is it worth it?

> 
> 
> 
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
> 
> > It seems to me that your mind is already made up. By asking such a
> > question I think it's safe to say a PhD program in Bioinformatics
> > would not be your cup of tea. This is not to be negative. If you 
> like> bioinformatics, do bioinformatics. Join an open-source 
> project, or
> > start one of your own. If you live in a town with a University, find
> > a lab that needs bioinformatics work and volunteer your time. If you
> > really have a passion for bioinformatics, just do bioinformatics and
> > your path will become clear, opportunities will arise, your salary
> > will be what you need. Just my two shekels of course.
> 
> I would second this sentiment.  Most of the folks that I know that 
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is 
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific 
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said, 
> there are
> dozens of open source projects that would love a helping hand.  If 
> you need
> more face-time, do as Joel suggests and work with a local 
> university (or
> even high school) to design some web-based tools or something like 
> that to
> do things that would be either educational or novel.
> 
> Sean
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From joel at macresearcher.com  Mon Feb 27 20:56:13 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Mon, 27 Feb 2006 18:56:13 -0700
Subject: [Bioperl-l] BioPerlers Represent!
Message-ID: 

Hey list,
	The contest to fill the script repository at MacResearch.org is  
ending very soon. Thus far we've only received a paltry three  
submissions with PERL scripts. The contest take home prize is a black  
iPod nano (2GB) so if you've got anything lying around that you'd  
like to share I'd suggest zipping it up and adding it to the script  
repository. Full contest details can be viewed here:

http://www.macresearch.org/ipod_contest

Now before get ready to smack me with your anti-spam cudgel, or shake  
your fist in my general direction, please note that MacResearch.org  
is completely non-profit, existing only to aid and foster community  
for scientists using OS X. I gain nothing personally by attracting  
BioPerl scripts to the repository but I'd love to see Perl well  
represented. Thanks for understanding.

- Joel


From jforment at ibmcp.upv.es  Tue Feb 28 07:17:59 2006
From: jforment at ibmcp.upv.es (Javier Forment)
Date: Tue, 28 Feb 2006 13:17:59 +0100
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for hit
 or for hsp objects?
Message-ID: <44043F77.1010901@ibmcp.upv.es>

Hi bioperlers... I have some questions when parsing BLAST results.

As far as I know, bioperl documentation for Bio::SearchIO states that 
frac_identical and frac_conserved are methods for hsp objects (e.g., 
$hsp->frac_identical). I have found that it is also possible to use 
these methods for hit objects (e.g., $hit->frac_identical), since it 
does not give an error, but in this case they don't work properly (I 
think that they work fine with blastn, but not with blastx). So my 
questions are:

1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
2.- if so, how they get the frac_identical for a hit when it has more 
than one HSP (maybe getting the average value for all the hsps)?
3.- if so, why they don't work fine sometimes, for example, with blastx?
4.- if not, is there any method to get the fraction of identical or 
conserved residues for a hit, other than averaging the corresponding 
values for all the hsps of this hit?

Thanks a lot in advance,

Javier.

-- 
Javier Forment Millet
Unidad de Bioinformatica del Laboratorio de Genomica
Instituto de Biologia Molecular y Celular de Plantas
Universidad Politecnica de Valencia
Avenida de los Naranjos, s/n
46022 Valencia (Spain)
Tlf.(1): +34-963877885
Tlf.(2): 685142553
FAX: +34-963877859
e-mail: jforment at ibmcp.upv.es


From jason.stajich at duke.edu  Tue Feb 28 08:31:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 08:31:00 -0500
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for
	hit or for hsp objects?
In-Reply-To: <44043F77.1010901@ibmcp.upv.es>
References: <44043F77.1010901@ibmcp.upv.es>
Message-ID: 

Personally, I only use these values from HSPs - the Hit methods  
require HSPs to be tiled to summarize the bases and I'm not convinced  
the method works for all situations.

If you want it summarized to a single value for query/hit pair I  
would use FASTA or use WU-BLAST to if you must use BLAST, get the  
links path out and summarize it on a set of HSPs paths.

-jason
On Feb 28, 2006, at 7:17 AM, Javier Forment wrote:

> Hi bioperlers... I have some questions when parsing BLAST results.
>
> As far as I know, bioperl documentation for Bio::SearchIO states that
> frac_identical and frac_conserved are methods for hsp objects (e.g.,
> $hsp->frac_identical). I have found that it is also possible to use
> these methods for hit objects (e.g., $hit->frac_identical), since it
> does not give an error, but in this case they don't work properly (I
> think that they work fine with blastn, but not with blastx). So my
> questions are:
>
> 1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
> 2.- if so, how they get the frac_identical for a hit when it has more
> than one HSP (maybe getting the average value for all the hsps)?
> 3.- if so, why they don't work fine sometimes, for example, with  
> blastx?
> 4.- if not, is there any method to get the fraction of identical or
> conserved residues for a hit, other than averaging the corresponding
> values for all the hsps of this hit?
>
> Thanks a lot in advance,
>
> Javier.
>
> -- 
> Javier Forment Millet
> Unidad de Bioinformatica del Laboratorio de Genomica
> Instituto de Biologia Molecular y Celular de Plantas
> Universidad Politecnica de Valencia
> Avenida de los Naranjos, s/n
> 46022 Valencia (Spain)
> Tlf.(1): +34-963877885
> Tlf.(2): 685142553
> FAX: +34-963877859
> e-mail: jforment at ibmcp.upv.es
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From julioallen at hotmail.com  Tue Feb 28 08:22:14 2006
From: julioallen at hotmail.com (James Allen)
Date: Tue, 28 Feb 2006 13:22:14 +0000
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
Message-ID: 

Hello,
I'm retrieving data using the 'features' method of Bio::DB::GFF, and when 
the feature is on the reverse strand (ie = -1) the start and end points are 
flipped, so that 'feature->end' is the smaller number (ie what I consider 
the start point) and 'feature->start' is the larger number.
Is there anyway to prevent this behaviour, so that the start value of my 
feature is the same as the start value in my database, regardless of the 
strand?

Thanks,
Julio




From ewijaya at singnet.com.sg  Tue Feb 28 05:01:23 2006
From: ewijaya at singnet.com.sg (Edward WIJAYA)
Date: Tue, 28 Feb 2006 18:01:23 +0800
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file (Fasta)
	into Array
Message-ID: 

Hi,

Does Bio::SeqIO has a method  specially designed for
reading all the sequences from a fasta file into array.

What I have currently is this subroutine, it seems to me
__very inefficient__. I was wondering
is there a better way to achieve it.


sub get_sequence_from_fasta {
     my $file = shift;
     my @seqs= ();

     open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->new(-format => 'fasta',
                              -noclose => 1 ,
                              -fh => \*INFILE);

     while ( my $seq = $in->next_seq() ) {
        push @seqs, $seq->seq();
     }
     return @seqs;
}


BTW, I also have tried to do this. I thought
this might be a better way to do the above job.
but it doesn't work.

sub get_sequence_from_fasta_that_doesnot_work {
     my $file = shift;
      open my fh, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
     return <$in>;
}

Hope to hear from you again.

--
Regards,
Edward WIJAYA
SINGAPORE


From lstein at cshl.edu  Tue Feb 28 10:08:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 28 Feb 2006 10:08:27 -0500
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
In-Reply-To: 
References: 
Message-ID: <200602281008.28373.lstein@cshl.edu>

Call the absolute(1) method, which turns off relative addressing.

Lincoln

On Tuesday 28 February 2006 08:22, James Allen wrote:
> Hello,
> I'm retrieving data using the 'features' method of Bio::DB::GFF, and when
> the feature is on the reverse strand (ie = -1) the start and end points are
> flipped, so that 'feature->end' is the smaller number (ie what I consider
> the start point) and 'feature->start' is the larger number.
> Is there anyway to prevent this behaviour, so that the start value of my
> feature is the same as the start value in my database, regardless of the
> strand?
>
> Thanks,
> Julio
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason.stajich at duke.edu  Tue Feb 28 12:36:34 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 12:36:34 -0500
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file
	(Fasta) into Array
In-Reply-To: 
References: 
Message-ID: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>


On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:

> Hi,
>
> Does Bio::SeqIO has a method  specially designed for
> reading all the sequences from a fasta file into array.
>
no but feel free to contribute one.
> What I have currently is this subroutine, it seems to me
> __very inefficient__. I was wondering
> is there a better way to achieve it.
>
Do you have a reason to think this is the slow part of your algorithm  
or are you just going on a gut reaction?  There is certainly overhead  
in calling a method but I am pretty sure that it isn't that  
significant, depends on how many sequences you are reading in I guess.

Just write a next_seq_array method and have it put the seqs onto an  
array within the method and do a benchmark test to show that it is  
faster.

-jason
>
> sub get_sequence_from_fasta {
>      my $file = shift;
>      my @seqs= ();
>
>      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->new(-format => 'fasta',
>                               -noclose => 1 ,
>                               -fh => \*INFILE);
>
>      while ( my $seq = $in->next_seq() ) {
>         push @seqs, $seq->seq();
>      }
>      return @seqs;
> }
>
>
> BTW, I also have tried to do this. I thought
> this might be a better way to do the above job.
> but it doesn't work.
>
> sub get_sequence_from_fasta_that_doesnot_work {
>      my $file = shift;
>       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
>      return <$in>;
> }
>
> Hope to hear from you again.
>
> --
> Regards,
> Edward WIJAYA
> SINGAPORE
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Tue Feb 28 13:50:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 28 Feb 2006 12:50:50 -0600
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file(Fasta)
	into Array
In-Reply-To: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>
Message-ID: <002001c63c97$e57f20c0$15327e82@pyrimidine>

Is there any particular reason why you aren't opening the file directly with
Bio::SeqIO?  

 sub get_sequence_from_fasta {
      my $file = shift;
      my @seqs= ();
      my $in = Bio::SeqIO->new(-format => 'fasta',
                               -file => "<$file");
      while ( my $seq = $in->next_seq() ) {
         push @seqs, $seq->seq();
      }
      return @seqs;
 }

I'm not completely sure of your intent here, but I think if you want to use
a globbed filehandle this way you need to open the file before entering the
sub then pass the filehandle to the sub.  I'm not sure why you pass the file
name, open the file, attach the file handle, parse the seqs, then return an
array?  Or am I missing something here?

Also, read:

http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

which explains that loading arrays can be memory-intensive if the seqs are
big.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Tuesday, February 28, 2006 11:37 AM
> To: Edward WIJAYA
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence
> file(Fasta) into Array
> 
> 
> On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:
> 
> > Hi,
> >
> > Does Bio::SeqIO has a method  specially designed for
> > reading all the sequences from a fasta file into array.
> >
> no but feel free to contribute one.
> > What I have currently is this subroutine, it seems to me
> > __very inefficient__. I was wondering
> > is there a better way to achieve it.
> >
> Do you have a reason to think this is the slow part of your algorithm
> or are you just going on a gut reaction?  There is certainly overhead
> in calling a method but I am pretty sure that it isn't that
> significant, depends on how many sequences you are reading in I guess.
> 
> Just write a next_seq_array method and have it put the seqs onto an
> array within the method and do a benchmark test to show that it is
> faster.
> 
> -jason
> >
> > sub get_sequence_from_fasta {
> >      my $file = shift;
> >      my @seqs= ();
> >
> >      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->new(-format => 'fasta',
> >                               -noclose => 1 ,
> >                               -fh => \*INFILE);
> >
> >      while ( my $seq = $in->next_seq() ) {
> >         push @seqs, $seq->seq();
> >      }
> >      return @seqs;
> > }
> >
> >
> > BTW, I also have tried to do this. I thought
> > this might be a better way to do the above job.
> > but it doesn't work.
> >
> > sub get_sequence_from_fasta_that_doesnot_work {
> >      my $file = shift;
> >       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
> >      return <$in>;
> > }
> >
> > Hope to hear from you again.
> >
> > --
> > Regards,
> > Edward WIJAYA
> > SINGAPORE
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From pterry2 at unlnotes.unl.edu  Tue Feb 28 13:53:11 2006
From: pterry2 at unlnotes.unl.edu (Philip M Terry)
Date: Tue, 28 Feb 2006 12:53:11 -0600
Subject: [Bioperl-l] Bioperl use question
Message-ID: 


Hello,

Is this an appropriate mailing list for this question?

I am trying Test 4 from the Tisdale book, p-299, "Mastering Perl for
Bioinformatics".

Comparing screen output from p-303 of the Tisdale book for bp1.pl with
mine:

philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$ ./bp1.pl
Sequence name is AI129902
Sequence acc  is AI129902
First 5 bases is CTCCG

-------------------- WARNING ---------------------
MSG: acc (gb|3598416) does not exist
---------------------------------------------------
Submitted Blast for [ROA1_HUMAN]
philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$

Two questions:
i. why the warning message in my screen output?
ii. my Blast fails, that is,
--I don't see "dots" on the output line on screen following "Submitted
Blast for [ROA1_HUMAN]"?
--my output file, blast.out has 0 KB in it?

My computer system:
Power Mac G5, OS X 10.4.5, installed "core" bioperl, that is,
sudo perl -MCPAN -e shell;
cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz

Can you comment?

Thanks,
Philip M. Terry, Ph.D.
University of Nebraska-Lincoln



From staffa at niehs.nih.gov  Tue Feb 28 15:01:42 2006
From: staffa at niehs.nih.gov (staffa)
Date: Tue, 28 Feb 2006 15:01:42 -0500
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
Message-ID: 

Hello,
Does anyone know if Bio::Tools::SeqWords
count_words
or
count_overlap_words
will do DNA pattern searches and honor ambiguity symbols
like exist in some restriction enzyme pattern definitions,
e.g. GGnnCC


> Thank you.
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1028 bytes
Desc: not available
URL: 

From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 16:45:16 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 08:45:16 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: 
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
	
Message-ID: <4404C46C.4010005@infotech.monash.edu.au>

Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 17:01:38 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:01:38 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
Message-ID: <4404C842.2050608@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From staffa at niehs.nih.gov  Tue Feb 28 16:46:30 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 16:46:30 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>

Yes 
N matches any of the four bases.

Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 4:45 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] seq_word and pattern counts


Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From staffa at niehs.nih.gov  Tue Feb 28 17:08:40 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 17:08:40 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>

The real problem is this:
We want to count sites in a long sequence where a restriction enzyme would cut.
This restriction enzyme, in the example I gave will recognize GGnnCC,
that is two G separated by two of any bases followed by two C.

The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
I'm sure there is some module somewhere for this purpose. 





Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 5:02 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l
Subject: Re: [Bioperl-l] seq_word and pattern counts


Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 17:47:01 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:47:01 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
Message-ID: <4404D2E5.4090405@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> The real problem is this:
> We want to count sites in a long sequence where a restriction enzyme would cut.
> This restriction enzyme, in the example I gave will recognize GGnnCC,
> that is two G separated by two of any bases followed by two C.
> The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
> I'm sure there is some module somewhere for this purpose. 

(Nick - please respond to me AND the bioperl-l at bioperl.org mailing list 
ie. "Reply All", so others can benefit from the Q&A - I've re-sent your 
past responses already).

Perhaps this module?

http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html

With this code?

my $enz = "GGNNCC";
my $re = new Bio::Tools::RestrictionEnzyme(-NAME =>"NicksResEnz--$enz",
	  			  	 -MAKE =>'custom');
@fragments = $re->cut_seq($seqobj);
print "$enz cuts ", $seqobj->display_id, " ", scalar(@fragments), " 
times.\n";

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From cjfields at uiuc.edu  Tue Feb 28 21:41:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 28 Feb 2006 20:41:08 -0600
Subject: [Bioperl-l] WGS sequences through Bio::DB::GenBank
Message-ID: <000001c63cd9$98988520$15327e82@pyrimidine>

I know that a recent post showed that you could retrieve CONTIG sequences
from GenBank files fairly easily:

http://bioperl.org/pipermail/bioperl-l/2006-February/020891.html

I'm driving myself a bit buggy looking for this, and I may be blind to it,
but can the same be done with WGS files?  I've tried Bio::DB::GenBank and a
few other Bio::DB* modules to see if it's been implemented but haven't had
any luck yet.  I may try getting around it using Bio::DB::Query::GenBank,
but just trying to find a more direct route.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From chandan.kr.singh at gmail.com  Thu Feb  2 02:26:09 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Thu, 2 Feb 2006 12:56:09 +0530
Subject: [Bioperl-l] Sorry, failure in post on the net,
	so still via email
In-Reply-To: <001001c62793$bef08f70$93656785@zhur>
References: <001001c62793$bef08f70$93656785@zhur>
Message-ID: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>

Hi
It seems that its not a proxy problem. I tried today and faced the same
problem. It has been months since my last try and therefore something might
have changed.
Try reading more on this problem.
I myself will try to do it.
Regards
Chandan

On 2/2/06, Huang Jian  wrote:
>
> I tried  some "Quick getting started scripts" in bptutorial.
>
> use Bio::Perl;
>   $seq = get_sequence('swiss',"ROA1_HUMAN");
>   # uses the default database - nr in this case
>   $blast_result = blast_sequence($seq);
>   write_blast(">roa1.blast",$blast_result);
>
> It returns "Submitted Blast for [ROA1_HUMAN] "
> It does not return me any error after I run the script.  However, it does
> not
> return me any result either.  The file "roa1.blast" is created but is
> always
> empty.
>
> I found the return is like the code below in function "blast_sequence"
>  if( $verbose ) {
>  print STDERR "Submitted Blast for [".$seq->id."] ";
>     }
>     sleep 5;
> ....
> I have tested "( env_proxy => 1 )" ...The problem remains the same...
>
> Help! By the way, could you send me an invitation letter of gmail, I want
> to have a gmail account too... :-)
>
> Best Regards!
> Jian Huang
>
>



From osborne1 at optonline.net  Thu Feb  2 17:06:25 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 02 Feb 2006 17:06:25 -0500
Subject: [Bioperl-l] Sorry, failure in post on the net,
	so still via email
In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
Message-ID: 

Chandan,

I'd be interested in what you find. This is not a new problem, this same
code snippet has been mentioned many times, but for many others, like me,
the code always works.

Brian O.


On 2/2/06 2:26 AM, "CHANDAN SINGH"  wrote:

> Hi
> It seems that its not a proxy problem. I tried today and faced the same
> problem. It has been months since my last try and therefore something might
> have changed.
> Try reading more on this problem.
> I myself will try to do it.
> Regards
> Chandan
> 
> On 2/2/06, Huang Jian  wrote:
>> 
>> I tried  some "Quick getting started scripts" in bptutorial.
>> 
>> use Bio::Perl;
>>   $seq = get_sequence('swiss',"ROA1_HUMAN");
>>   # uses the default database - nr in this case
>>   $blast_result = blast_sequence($seq);
>>   write_blast(">roa1.blast",$blast_result);
>> 
>> It returns "Submitted Blast for [ROA1_HUMAN] "
>> It does not return me any error after I run the script.  However, it does
>> not
>> return me any result either.  The file "roa1.blast" is created but is
>> always
>> empty.
>> 
>> I found the return is like the code below in function "blast_sequence"
>>  if( $verbose ) {
>>  print STDERR "Submitted Blast for [".$seq->id."] ";
>>     }
>>     sleep 5;
>> ....
>> I have tested "( env_proxy => 1 )" ...The problem remains the same...
>> 
>> Help! By the way, could you send me an invitation letter of gmail, I want
>> to have a gmail account too... :-)
>> 
>> Best Regards!
>> Jian Huang
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From nagesh.chakka at anu.edu.au  Thu Feb  2 20:23:50 2006
From: nagesh.chakka at anu.edu.au (Nagesh Chakka)
Date: Fri, 03 Feb 2006 12:23:50 +1100
Subject: [Bioperl-l] RemoteBlast.pm version 1.28
In-Reply-To: <003901c6285e$d1b36670$93656785@zhur>
References: 
	<43E28C39.2060308@anu.edu.au> <003901c6285e$d1b36670$93656785@zhur>
Message-ID: <43E2B0A6.7000307@anu.edu.au>

Hi Huang,
Thanks for the message. The older version of RemoteBlast.pm works on the 
logic of checking the temporary file size to determine whether the Blast 
results are ready. This condition is not getting satisfied may be due to 
some changes brought about by NCBI. I had this problem recently and 
figured out that the solution was to use the latest version which has 
this problem fixed (does not use file size logic any more) which is not 
yet included in the BioPerl package.
Cheers
Nagesh

Huang Jian wrote:

> Dear Nagesh,
>
> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send 
> me. Now it works perfectly!!!
>
> Thank you!!
>
> Huang
>
> ----- Original Message ----- From: "Nagesh Chakka" 
> 
> To: "Huang Jian" ; "bioperl-l" 
> 
> Sent: Friday, February 03, 2006 7:48 AM
> Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still 
> via email
>
>
>> Hi Huang,
>> I see that you are submitting a sequence for a remote blast search. Can
>> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
>> not I have attached it with this email, try to replace it with the old
>> one which has a bug.
>> Let me know if it works.
>> Nagesh
>
>
>
   


From cjfields at uiuc.edu  Fri Feb  3 10:45:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 09:45:23 -0600
Subject: [Bioperl-l] RemoteBlast.pm version 1.28
In-Reply-To: <43E2B0A6.7000307@anu.edu.au>
Message-ID: <001501c628d8$d91cd430$15327e82@pyrimidine>

Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS.  It will
work for saving text output.  However, it will not parse anything using
next_result (it will likely hang) and will not save XML format.  See these
bugs:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934
http://bugzilla.bioperl.org/show_bug.cgi?id=1935

for explanations and possible fixes (changes to RemoteBlast and
Bio::SearchIO::blast).  Note that these haven't been checked in yet so are
still not included in bioperl-live; they may be further modified before
committing to CVS.  If you're not worried about XML, you could just try the
first fix, which is a change to SearchIO::blast.

Nagesh, I remember you posting to the list a month ago using a script which
had problems; the script you used saves the output but doesn't actually
parse it (i.e. you don't use next_result() to go through the data).  Is the
version of BLAST in your text output 2.2.12 or 2.2.13?  Have you tried
parsing the output using "-readmethod => SearchIO" or "-readmethod => blast"
using your version of RemoteBlast and method next_result()? Like below (from
perldoc):  

        while ( my @rids = $factory->each_rid ) {
          foreach my $rid ( @rids ) {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) ) {
              if( $rc < 0 ) {
                $factory->remove_rid($rid);
              }
              print STDERR "." if ( $v > 0 );
              sleep 5;
            } else { 				 		# parsing
starts here
              my $result = $rc->next_result(); 		# it should hang
here
              #save the output
              my $filename = $result->query_name()."\.out";
              $factory->save_output($filename);
              $factory->remove_rid($rid);
              print "\nQuery Name: ", $result->query_name(), "\n";
              while ( my $hit = $result->next_hit ) {
                next unless ( $v > 0);
                print "\thit name is ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                  print "\t\tscore is ", $hsp->score, "\n";
                }
              }
            }
          }
        }
      }


My script hanged if I used next_result() in any way prior to the fixes.  I
want to see how many others are having the same issues with parsing using
the CVS version of bioperl-live.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> Sent: Thursday, February 02, 2006 7:24 PM
> To: Huang Jian; bioperl-l
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Hi Huang,
> Thanks for the message. The older version of RemoteBlast.pm works on the
> logic of checking the temporary file size to determine whether the Blast
> results are ready. This condition is not getting satisfied may be due to
> some changes brought about by NCBI. I had this problem recently and
> figured out that the solution was to use the latest version which has
> this problem fixed (does not use file size logic any more) which is not
> yet included in the BioPerl package.
> Cheers
> Nagesh
> 
> Huang Jian wrote:
> 
> > Dear Nagesh,
> >
> > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > me. Now it works perfectly!!!
> >
> > Thank you!!
> >
> > Huang
> >
> > ----- Original Message ----- From: "Nagesh Chakka"
> > 
> > To: "Huang Jian" ; "bioperl-l"
> > 
> > Sent: Friday, February 03, 2006 7:48 AM
> > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > via email
> >
> >
> >> Hi Huang,
> >> I see that you are submitting a sequence for a remote blast search. Can
> >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> >> not I have attached it with this email, try to replace it with the old
> >> one which has a bug.
> >> Let me know if it works.
> >> Nagesh
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From osborne1 at optonline.net  Fri Feb  3 13:05:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 03 Feb 2006 13:05:44 -0500
Subject: [Bioperl-l] Documentation in the Bioperl package
Message-ID: 

bioperl-l,

The recent work on the Bioperl Wiki moved much of the Bioperl documentation
online. Since we cannot maintain 2 locations for all of this we?ll be
removing a number of files from the package, specifically:

biodatabases.pod   
biodesign.pod    
bioperl.pod   
bioscripts.pod
doc/howto/*
doc/faq/*
FAQ

Rest assured that all of these files have been gone over in detail to make
sure that no important information was lost during the migration. All of
this will be replaced by a single file, such as ?README.docs?, that explains
where all the documentation is. It?s not entirely clear what will happen to
bptutorial.pl. Moving its content to different online locations is possible
but in this case we loose its functionality as a script.

Are there any comments or questions or concerns?

Brian O.




From saldroubi at yahoo.com  Fri Feb  3 13:38:26 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Fri, 3 Feb 2006 10:38:26 -0800 (PST)
Subject: [Bioperl-l] Gibbs sampling algorithm?
Message-ID: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com>

Hi everyone,

I am wondering if anyone has implemented the Gibbs sampling algorithm in BioPerl or otherwise for finding motifs.  I saw Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs program which is not free open source, I think.   I prefer not to write my one Gibbs sampling algorithm if it is already out there.  Any comments are appreciated.

Thank you

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From cjfields at uiuc.edu  Fri Feb  3 14:34:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 13:34:27 -0600
Subject: [Bioperl-l] Gibbs sampling algorithm?
In-Reply-To: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com>
Message-ID: <001901c628f8$d89917b0$15327e82@pyrimidine>

Do you mean this Gibbs program?

ftp://ncbi.nlm.nih.gov/pub/neuwald/ 

You can also request a license from the Gibbs Motif Sampler homepage, which
is more up to date:

http://bayesweb.wadsworth.org/gibbs/gibbs.html.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sam Al-Droubi
> Sent: Friday, February 03, 2006 12:38 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Gibbs sampling algorithm?
> 
> Hi everyone,
> 
> I am wondering if anyone has implemented the Gibbs sampling algorithm in
> BioPerl or otherwise for finding motifs.  I saw
> Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs
> program which is not free open source, I think.   I prefer not to write my
> one Gibbs sampling algorithm if it is already out there.  Any comments are
> appreciated.
> 
> Thank you
> 
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Fri Feb  3 14:44:50 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri, 03 Feb 2006 14:44:50 -0500
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <001501c628d8$d91cd430$15327e82@pyrimidine>
Message-ID: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu>

Hi, Everybody,  
I see this post and am wondering if this is the reason for the malfunctionning of my webserver. We set up a webserver named MAK, for MITE sequence analysis. It was working very well until around November 2005, when it stopped returning any result (the site is fine and seems to be doing sth after submission).  In the CGI script, I used remoteblast (that work was done in 2003) to do searches. I currently do not have access to the server because I moved. Quite several people sent emails to us about its malfunctioning. Is there any suggestion on fixing the problem?  Should I simplily ask the remoteblast.pm be replaced with the new version?  
Thanks a lot,  
Guojun

Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun
      _____  

  From: Chris Fields [mailto:cjfields at uiuc.edu]
To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-l at bioperl.org]
Sent: Fri, 03 Feb 2006 10:45:23 -0500
Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28

Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It will
work for saving text output. However, it will not parse anything using
next_result (it will likely hang) and will not save XML format. See these
bugs:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934
http://bugzilla.bioperl.org/show_bug.cgi?id=1935

for explanations and possible fixes (changes to RemoteBlast and
Bio::SearchIO::blast). Note that these haven't been checked in yet so are
still not included in bioperl-live; they may be further modified before
committing to CVS. If you're not worried about XML, you could just try the
first fix, which is a change to SearchIO::blast.

Nagesh, I remember you posting to the list a month ago using a script which
had problems; the script you used saves the output but doesn't actually
parse it (i.e. you don't use next_result() to go through the data). Is the
version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
parsing the output using "-readmethod => SearchIO" or "-readmethod => blast"
using your version of RemoteBlast and method next_result()? Like below (from
perldoc): 

while ( my @rids = $factory->each_rid ) {
foreach my $rid ( @rids ) {
my $rc = $factory->retrieve_blast($rid);
if( !ref($rc) ) {
if( $rc < 0 ) {
$factory->remove_rid($rid);
}
print STDERR "." if ( $v > 0 );
sleep 5;
} else { # parsing
starts here
my $result = $rc->next_result(); # it should hang
here
#save the output
my $filename = $result->query_name()."\.out";
$factory->save_output($filename);
$factory->remove_rid($rid);
print "\nQuery Name: ", $result->query_name(), "\n";
while ( my $hit = $result->next_hit ) {
next unless ( $v > 0);
print "\thit name is ", $hit->name, "\n";
while( my $hsp = $hit->next_hsp ) {
print "\t\tscore is ", $hsp->score, "\n";
}
}
}
}
}
}


My script hanged if I used next_result() in any way prior to the fixes. I
want to see how many others are having the same issues with parsing using
the CVS version of bioperl-live.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> Sent: Thursday, February 02, 2006 7:24 PM
> To: Huang Jian; bioperl-l
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Hi Huang,
> Thanks for the message. The older version of RemoteBlast.pm works on the
> logic of checking the temporary file size to determine whether the Blast
> results are ready. This condition is not getting satisfied may be due to
> some changes brought about by NCBI. I had this problem recently and
> figured out that the solution was to use the latest version which has
> this problem fixed (does not use file size logic any more) which is not
> yet included in the BioPerl package.
> Cheers
> Nagesh
> 
> Huang Jian wrote:
> 
> > Dear Nagesh,
> >
> > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > me. Now it works perfectly!!!
> >
> > Thank you!!
> >
> > Huang
> >
> > ----- Original Message ----- From: "Nagesh Chakka"
> > 
> > To: "Huang Jian" ; "bioperl-l"
> > 
> > Sent: Friday, February 03, 2006 7:48 AM
> > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > via email
> >
> >
> >> Hi Huang,
> >> I see that you are submitting a sequence for a remote blast search. Can
> >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> >> not I have attached it with this email, try to replace it with the old
> >> one which has a bug.
> >> Let me know if it works.
> >> Nagesh
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
      
   
 


From gbazykin at Princeton.EDU  Fri Feb  3 15:38:04 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Fri, 3 Feb 2006 15:38:04 -0500
Subject: [Bioperl-l] proposed additions to Tree and cladogram
In-Reply-To: <148174979677.20051026172707@princeton.edu>
References: <148174979677.20051026172707@princeton.edu>
Message-ID: <8010525745.20060203153804@princeton.edu>

Hi all,

a while ago, I mailed to bioperl-l some proposed additions to
phylogeny-related modules (see below). I am doing a project on hiv
phylogeny now, and rely on these additions heavily. They expand on
what was already present in the corresponding modules. I expected them
to be also of general usage (at least the first one).

However, I never got any answer, so I assumed that these additions
were considered superfluous by most.

I am now working on an addition to Tree::Draw::Cladogram module. For
my project, I need to color individual tree edges (including internal)
into colors from red to blue (according to the nosynonymous/synonymous
ratios of these branches). This should be technically easy (I guess I
will add -Rcolor, -Gcolor and -Bcolor tags to nodes and use them in
Cladogram to color preceding edges), but I have two questions:

    - will this add-on be of general interest - should I try to do it
    "the right way", updating the pods etc.;
    
    - in general, are there any guidelines about how specific an issue
    a method should address to be included in bioperl distribution?

Thanks,
Yegor Bazykin



This is a forwarded message
From: Georgii Bazykin 
To: bioperl-l at bioperl.org
Date: Wednesday, October 26, 2005, 4:27:07 PM
Subject: suggestions for additions to Tree

===8<==============Original message text===============
Hi,

here are some tree-related methods I needed and added to my bioperl.
Hope someone else finds any of them useful as well.

Yegor Bazykin



=============================================
To NodeI:


# modified from total_branch_length in Tree:Tree module
# gets sum of branches in the subtree - descendents of given node

=head2 children_branch_length

 Title   : children_branch_length
 Usage   : my $size = $node->children_branch_length
 Function: Returns the sum of the length of all branches of the subtree which starts at given node
 Returns : integer
 Args    : none

=cut

sub children_branch_length {
   my ($self) = @_;
   
   return 0 if($self -> is_Leaf) ;

   my $sum = 0;

   for ($self -> get_all_Descendents) {
       $sum += $_->branch_length || 0;
   }

   return $sum;
}


-----------------------------------

=head2 height_nodes

 Title   : height_nodes
 Usage   : my $len = $node->height_nodes
 Function: Returns the height of the tree starting at this
           node.  Height is the maximum branchlength to get to the tip.
 Returns : The longest length to a leaf, in nodes
 Args    : none

=cut

sub height_nodes{
   my ($self) = @_;
   
   return 0 if( $self->is_Leaf );

   my $max = 0;
   foreach my $subnode ( $self->each_Descendent ) { 
       my $s = $subnode->height_nodes + 1;
       if( $s > $max ) { $max = $s; }
   }
   return $max;
}



----------------------------------

=head2 get_all_Descendent_Leaves

 Title   : get_all_Descendent_Leaves($sortby)
 Usage   : my @nodes = $node->get_all_Descendent_Leaves;
 Function: Recursively fetch all the nodes and their descendents, only selecting leaves
           *NOTE* This is different from each_Descendent
 Returns : Array or Bio::Tree::NodeI objects
 Args    : $sortby [optional] "height", "creation" or coderef to be used
           to sort the order of children nodes.

=cut

sub get_all_Descendent_Leaves{
   my ($self, $sortby) = @_;
   $sortby ||= 'height';   
   my @nodes;
   foreach my $node ( $self->each_Descendent($sortby) ) {
       if ($node->is_Leaf) {
           push @nodes, $node;
       }
       else {
           push @nodes, ($node->get_all_Descendents($sortby));
       }
   }
   return @nodes;
} 

=====================================================
To Tree:

=head2 total_internal_branch_length

 Title   : total_internal_branch_length
 Usage   : my $size = $tree->total_internal_branch_length
 Function: Returns the sum of the length of all branches, excluding branches leading to leaves
 Returns : integer
 Args    : none

=cut

sub total_internal_branch_length {
   my ($self) = @_;
   my $sum = 0;
   if( defined $self->get_root_node ) {
       for ( $self->get_root_node->get_Descendents() ) {
           unless ($_->is_Leaf) {       # YB: THIS IS ALL I ADDED
               $sum += $_->branch_length || 0;
           }
       }
   }
   return $sum;
} 


=================================================

To TreeFunctionsI:

=head2 distance_nodes

 Title   : distance_nodes
 Usage   : distance_nodes(-nodes => \@nodes )
 Function: returns the distance between two given nodes in numbers of nodes
 Returns : numerical distance
 Args    : -nodes => arrayref of nodes to test

=cut


# YB: distance_nodes is very similar to distance method in TreeFunctionsI except that 
# it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and 
# daughter, 2 between two sisters, etc.)


sub distance_nodes {
    my ($self, at args) = @_;
    my ($nodes) = $self->_rearrange([qw(NODES)], at args);
    if( ! defined $nodes ) {
        $self->warn("Must supply -nodes parameter to distance_nodes() method");
        return undef;
    }
    my ($node1,$node2) = $self->_check_two_nodes($nodes);
    # algorithm:

    # Find lca: Start with first node, find and save every node from it
    # to root, saving cumulative distance. Then start with second node;
    # for it and each of its ancestor nodes, check to see if it's in
    # the first node's ancestor list - if so it is the lca. Return sum
    # of (cumul. distance from node1 to lca) and (cumul. distance from
    # node2 to lca)

    # find and save every ancestor of node1 (including itself)

    my %node1_ancestors;        # keys are internal ids, values are objects
    my %node1_cumul_dist;       # keys are internal ids, values 
    # are cumulative distance from node1 to given node
    my $place = $node1;         # start at node1
    my $cumul_dist = 0;

    while ( $place ){
        $node1_ancestors{$place->internal_id} = $place;
        $node1_cumul_dist{$place->internal_id} = $cumul_dist;
        $cumul_dist++;                                                # YB
#YB     if ($place->branch_length) {
#YB         $cumul_dist += $place->branch_length; # include current branch
#YB                                               # length in next iteration
#YB     }
        $place = $place->ancestor;
    }

    # now climb up node2, for each node checking whether 
    # it's in node1_ancestors
    $place = $node2;  # start at node2
    $cumul_dist = 0;
    while ( $place ){
        foreach my $key ( keys %node1_ancestors ){ # ugh
            if ( $place->internal_id == $key){ # we're at lca
                return $node1_cumul_dist{$key} + $cumul_dist;
            }
        }
        # include current branch length in next iteration
#YB     $cumul_dist += $place->branch_length || 0; 
        $cumul_dist++;                                                 # YB
        $place = $place->ancestor;
    }
    $self->warn("Could not find distance!"); # should never execute, 
    # if so, there's a problem
    return undef;
}
===8<===========End of original message text===========





From cjfields at uiuc.edu  Fri Feb  3 16:07:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 15:07:29 -0600
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu>
Message-ID: <001a01c62905$d7ef0920$15327e82@pyrimidine>

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below).  I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed.  The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml.  There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless.  If you look back the
last month or so there has been a bit of discussion here about it.  Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission).  In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem?  Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
>       _____
> 
>   From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hlapp at gmx.net  Fri Feb  3 18:11:03 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 3 Feb 2006 15:11:03 -0800
Subject: [Bioperl-l] Documentation in the Bioperl package
In-Reply-To: 
References: 
Message-ID: 

Just to be sure, the wiki will be able to handle versions (releases)?
(documentation and APIs may change between releases and hence a more
recent doc page may not apply to an earlier release)

  -hilmar

On 2/3/06, Brian Osborne  wrote:
> bioperl-l,
>
> The recent work on the Bioperl Wiki moved much of the Bioperl documentation
> online. Since we cannot maintain 2 locations for all of this we?ll be
> removing a number of files from the package, specifically:
>
> biodatabases.pod
> biodesign.pod
> bioperl.pod
> bioscripts.pod
> doc/howto/*
> doc/faq/*
> FAQ
>
> Rest assured that all of these files have been gone over in detail to make
> sure that no important information was lost during the migration. All of
> this will be replaced by a single file, such as ?README.docs?, that explains
> where all the documentation is. It?s not entirely clear what will happen to
> bptutorial.pl. Moving its content to different online locations is possible
> but in this case we loose its functionality as a script.
>
> Are there any comments or questions or concerns?
>
> Brian O.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From hubert.prielinger at gmx.at  Fri Feb  3 17:47:37 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 03 Feb 2006 16:47:37 -0600
Subject: [Bioperl-l] standalone blast composition based statistics parameter
Message-ID: <43E3DD89.7080903@gmx.at>

Hi,
Does anybody know whether it is possible to perform a with the 
standalone blast a database search where the composition based 
statistics parameter is on
and what's the abbreviation for the parameter

thanks
Hubert


From osborne1 at optonline.net  Fri Feb  3 22:32:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 03 Feb 2006 22:32:18 -0500
Subject: [Bioperl-l] Documentation in the Bioperl package
In-Reply-To: 
Message-ID: 

Hilmar,

MediaWiki supports such things as rollback based on date but it is not CVS
where an entire set of pages are tagged by version. It is also scriptable so
it may be possible to emulate this type of tagging by script, but I'm not
entirely sure (see WWW::Mediawiki::Client, Jason pointed this out to me).

So the simple answer is probably "no". But let's be honest: synchrony
between code and documentation wasn't achieved using the previous approach,
CVS, either. 

What Jason, Torsten, and I appreciated when adding content to this new site
was that it was relatively easy, our hope is that this approach will get
more people involved. The assumption is that more involvement will lead to
better documentation - Jason made this assumption when electing to move the
site to MediaWiki and I have to say that I completely agree with this
assumption.

Jason, any thoughts on this question? An interesting one...

Brian O.



On 2/3/06 6:11 PM, "Hilmar Lapp"  wrote:

> Just to be sure, the wiki will be able to handle versions (releases)?
> (documentation and APIs may change between releases and hence a more
> recent doc page may not apply to an earlier release)
> 
>   -hilmar
> 
> On 2/3/06, Brian Osborne  wrote:
>> bioperl-l,
>> 
>> The recent work on the Bioperl Wiki moved much of the Bioperl documentation
>> online. Since we cannot maintain 2 locations for all of this we?ll be
>> removing a number of files from the package, specifically:
>> 
>> biodatabases.pod
>> biodesign.pod
>> bioperl.pod
>> bioscripts.pod
>> doc/howto/*
>> doc/faq/*
>> FAQ
>> 
>> Rest assured that all of these files have been gone over in detail to make
>> sure that no important information was lost during the migration. All of
>> this will be replaced by a single file, such as ?README.docs?, that explains
>> where all the documentation is. It?s not entirely clear what will happen to
>> bptutorial.pl. Moving its content to different online locations is possible
>> but in this case we loose its functionality as a script.
>> 
>> Are there any comments or questions or concerns?
>> 
>> Brian O.
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> 
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From shameer at ncbs.res.in  Sat Feb  4 05:15:33 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Sat, 4 Feb 2006 15:45:33 +0530 (IST)
Subject: [Bioperl-l] Calpha to Co-ordinates Program
In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
Message-ID: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176>

Dear All,

Any one is aware of a perl script / Bio::PERL module that can be used to
construct full atomic coordinates of a protein from a given C(alpha) trace
and optimizes side chain geometry.

I tried the original program Maxsprout from Holms Group, But it is not
giving me proper results (am getting errors like segmentation fault -
backbonchain failed etc.)

Since I need to use as a part of a webs server - I would appreciate if any
one could let me know about a perl script for the same.

Thanks and cheers in advance,
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From torsten.seemann at infotech.monash.edu.au  Sat Feb  4 22:34:35 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 05 Feb 2006 14:34:35 +1100
Subject: [Bioperl-l] standalone blast composition based statistics
	parameter
In-Reply-To: <43E3DD89.7080903@gmx.at>
References: <43E3DD89.7080903@gmx.at>
Message-ID: <43E5724B.5070007@infotech.monash.edu.au>

Hubert,

> Does anybody know whether it is possible to perform a with the 
> standalone blast a database search where the composition based 
> statistics parameter is on
> and what's the abbreviation for the parameter

The StandAloneBlast only runs the "blastall" binary on your system. It 
accepts all the command line options (like "-d" etc.) that "blastall" 
does but just passes them as-is; it doesn't do anything special.

On a Unix system, type "blastall -" to list all the options that your 
BLAST binary supports.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From fernan at iib.unsam.edu.ar  Sat Feb  4 23:34:27 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Sun, 5 Feb 2006 01:34:27 -0300
Subject: [Bioperl-l] standalone blast composition based statistics
	parameter
In-Reply-To: <43E3DD89.7080903@gmx.at>
References: <43E3DD89.7080903@gmx.at>
Message-ID: <20060205043427.GB39264@iib.unsam.edu.ar>

+----[ Hubert Prielinger  (03.Feb.2006 21:06):
|
| Hi,
| Does anybody know whether it is possible to perform a with the 
| standalone blast a database search where the composition based 
| statistics parameter is on
| and what's the abbreviation for the parameter
| 
| thanks
| Hubert
|
+----]

only for tblastn.

As Torsten said, 'blastall' with no arguments would have
revealed it: 

[ ... ]
  -C  Use composition-based statistics for tblastn:
      D or d: default (equivalent to F)
      0 or F or f: no composition-based statistics
      1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001
      2: Composition-based score adjustment as in Bioinformatics 21:902-911,
          2005, conditioned on sequence properties
      3: Composition-based score adjustment as in Bioinformatics 21:902-911,
          2005, unconditionally
      For programs other than tblastn, must either be absent or be D, F or 0.
      [String]
    default = D

Fernan

PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206)


From hubert.prielinger at gmx.at  Sun Feb  5 21:56:07 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 05 Feb 2006 20:56:07 -0600
Subject: [Bioperl-l] standalone blast composition based
	statistics	parameter
In-Reply-To: <20060205043427.GB39264@iib.unsam.edu.ar>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
Message-ID: <43E6BAC7.5050707@gmx.at>

Hi,
thank you very much, If I use the tblastn instead of blastp, I get the 
following error message

[blastall] WARNING: : Unable to open nr.00.nin

I looked up in the folder, but I don't have that file, and if I download 
the database and extract the file, it isn't there either...

thanks

Hubert

Fernan Aguero wrote:

>+----[ Hubert Prielinger  (03.Feb.2006 21:06):
>|
>| Hi,
>| Does anybody know whether it is possible to perform a with the 
>| standalone blast a database search where the composition based 
>| statistics parameter is on
>| and what's the abbreviation for the parameter
>| 
>| thanks
>| Hubert
>|
>+----]
>
>only for tblastn.
>
>As Torsten said, 'blastall' with no arguments would have
>revealed it: 
>
>[ ... ]
>  -C  Use composition-based statistics for tblastn:
>      D or d: default (equivalent to F)
>      0 or F or f: no composition-based statistics
>      1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001
>      2: Composition-based score adjustment as in Bioinformatics 21:902-911,
>          2005, conditioned on sequence properties
>      3: Composition-based score adjustment as in Bioinformatics 21:902-911,
>          2005, unconditionally
>      For programs other than tblastn, must either be absent or be D, F or 0.
>      [String]
>    default = D
>
>Fernan
>
>PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206)
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Sun Feb  5 23:29:11 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 06 Feb 2006 15:29:11 +1100
Subject: [Bioperl-l] standalone blast composition
	based	statistics	parameter
In-Reply-To: <43E6BAC7.5050707@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at>
Message-ID: <43E6D097.7080304@infotech.monash.edu.au>

Hubert

> thank you very much, If I use the tblastn instead of blastp, I get the 
> following error message
> [blastall] WARNING: : Unable to open nr.00.nin
> I looked up in the folder, but I don't have that file, and if I download 
> the database and extract the file, it isn't there either...

"tblastn" requires a NUCLEOTIDE database to search. It appears that you 
have specified a PROTEIN database with "-d nr" ("nr" is protein). You 
probably want to install the "nt" blast database and use that instead.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From hubert.prielinger at gmx.at  Sun Feb  5 23:12:27 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 05 Feb 2006 22:12:27 -0600
Subject: [Bioperl-l] standalone blast
	composition	based	statistics	parameter
In-Reply-To: <43E6D097.7080304@infotech.monash.edu.au>
References: <43E3DD89.7080903@gmx.at>
	<20060205043427.GB39264@iib.unsam.edu.ar>	<43E6BAC7.5050707@gmx.at>
	<43E6D097.7080304@infotech.monash.edu.au>
Message-ID: <43E6CCAB.2060107@gmx.at>

dear torsten,
thanks for your quick reply, I have looked up at the ftp server and 
there are nt.00 to nt.04. Do I have to download all of them, are there 
differences?

thanks
Hubert


Torsten Seemann wrote:

>Hubert
>
>  
>
>>thank you very much, If I use the tblastn instead of blastp, I get the 
>>following error message
>>[blastall] WARNING: : Unable to open nr.00.nin
>>I looked up in the folder, but I don't have that file, and if I download 
>>the database and extract the file, it isn't there either...
>>    
>>
>
>"tblastn" requires a NUCLEOTIDE database to search. It appears that you 
>have specified a PROTEIN database with "-d nr" ("nr" is protein). You 
>probably want to install the "nt" blast database and use that instead.
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Mon Feb  6 00:22:09 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 06 Feb 2006 16:22:09 +1100
Subject: [Bioperl-l] standalone blast
	composition	based	statistics	parameter
In-Reply-To: <43E6CCAB.2060107@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au>
	<43E6CCAB.2060107@gmx.at>
Message-ID: <43E6DD01.2010600@infotech.monash.edu.au>

Hubert

> thanks for your quick reply, I have looked up at the ftp server and 
> there are nt.00 to nt.04. Do I have to download all of them, are there 
> differences?

You have to download them all. The "nt" database (actually the index 
files) is very big, and it is split up into gigabyte (?) parts. Although 
they are called "nt.00" "nt.01" etc, you still pass "-d nt" to 
"blastall", because together these parts are one "nt" database. The 
"blastall" program will automatically use the separate parts; you do not 
have to join them.

You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are 
using the correct BLAST search for your problem.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From shameer at ncbs.res.in  Mon Feb  6 03:27:50 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 6 Feb 2006 13:57:50 +0530 (IST)
Subject: [Bioperl-l] Need a  slogan for OBF
In-Reply-To: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
	<47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
Message-ID: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38>

Dear All,

As we are moving to the all new look wiki-style-web - why dont we think
about a unique logo +  slogan that can express our spirit and excitement
???

For Example we can have a logo with O|B|F its full form and the slogan -
any body is interested - i would be happy to design logos once we have
done with the logo.

I have a couple of suggestions -I hope all OBF members can sent much more
powerful slogans than mine

'Let's Code for Life'
'Let's Decode Life'
'Let's Recode Life'
'Code your Life '

Happy O|B|!!!
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From olsonbr2 at msu.edu  Fri Feb  3 15:54:22 2006
From: olsonbr2 at msu.edu (Bradley J. S. C. Olson)
Date: Fri, 3 Feb 2006 15:54:22 -0500
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the
	method?
Message-ID: <005e01c62904$02b2ad30$db4c0a23@dihedral>

I have been working with the RemoteBlast.pm module and have found that it is
a bit clunky to use loops to keep checking to see if you RID has finished.

 

For example, every time you write a script, you need to add a code block
(see example in the documentation) in order to keep checking if @rid is
finished.

 

Would it be better to maybe write this in as a method in the RemoteBlast
module?  It seems like it would be better for remoteblast to have a method
we could call say retrieve_when_done that would return the blast report when
the value of retrieve_blast is no longer 0.

 

The only issue may be report parsing, but I wonder if it might be better to
separate out submittal/retrieval of BLAST requests from the parsing step and
make these more discrete processes?  Since NCBI seems to be not supporting
text results as a standard, maybe the module should work exclusively with
XML and we could change report handling away from the headaches of text
processing and just allow Bio::SeqIO or blastxml handle the task of making a
blast reports into different forms (such as HTML, text etc).

 

This would definitely simplifying coding using the RemoteBlast.pm module as
then you could treat the report retrieval process as an object and just wait
for the object to return its value, instead of coding in a bunch of test
loops to see if it is done.  This may also help keep bugs out of the module
and make the module longer lasting and not require module users to rewrite
their code every time NCBI makes changes.

 

Any thoughts or ideas?

 

Is anyone working on this?

 

Thanks

 

Brad Olson

 

 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
 


From cjfields at uiuc.edu  Mon Feb  6 12:27:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 6 Feb 2006 11:27:56 -0600
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
	themethod?
In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral>
Message-ID: <002c01c62b42$ab7671a0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson
> Sent: Friday, February 03, 2006 2:54 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
> themethod?
> 
> I have been working with the RemoteBlast.pm module and have found that it
> is
> a bit clunky to use loops to keep checking to see if you RID has finished.
> 
> 
> 
> For example, every time you write a script, you need to add a code block
> (see example in the documentation) in order to keep checking if @rid is
> finished.
> 
> Would it be better to maybe write this in as a method in the RemoteBlast
> module?  It seems like it would be better for remoteblast to have a method
> we could call say retrieve_when_done that would return the blast report
> when
> the value of retrieve_blast is no longer 0.

Sounds reasonable, though I'm not sure how easy it would be to implement.
Why not drop by Bugzilla (http://bugzilla.bioperl.org/) and submit this as
an enhancement?

> The only issue may be report parsing, but I wonder if it might be better
> to
> separate out submittal/retrieval of BLAST requests from the parsing step
> and
> make these more discrete processes?  Since NCBI seems to be not supporting
> text results as a standard, maybe the module should work exclusively with
> XML and we could change report handling away from the headaches of text
> processing and just allow Bio::SeqIO or blastxml handle the task of making
> a
> blast reports into different forms (such as HTML, text etc).

They are separated.  RemoteBlast executes BLAST remotely (via HTTP).
Results are parsed via various Bio::SearchIO modules depending on what you
set '-readmethod' to.  This is from perldoc:

>From Bio::Tools::Run::RemoteBlast
________________________________________________________

DESCRIPTION
    Class for remote execution of the NCBI Blast via HTTP.

    For a description of the many CGI parameters see:
    http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

    Various additional options and input formats are available.

________________________________________________________

>From Bio::SearchIO____________
____________________________________________
DESCRIPTION
    This is a driver for instantiating a parser for report files from
    sequence database searches. This object serves as a wrapper for the
    format parsers in Bio::SearchIO::* - you should not need to ever use
    those format parsers directly. (For people used to the SeqIO system it,
    we are deliberately using the same pattern).

    Once you get a SearchIO object, calling next_result() gives you back a
    Bio::Search::Result::ResultI compliant object, which is an object that
    represents one Blast/Fasta/HMMER whatever report.

    A list of module names and formats is below:

      blast      BLAST (WUBLAST, NCBIBLAST,bl2seq)
      fasta      FASTA -m9 and -m0
      blasttable BLAST -m9 or -m8 output (NCBI not WUBLAST tabular)
      megablast  MEGABLAST
      psl        UCSC PSL format
      waba       WABA output
      axt        AXT format
      sim4       Sim4
      hmmer      HMMER hmmpfam and hmmsearch
      exonerate  Exonerate CIGAR and VULGAR format
      blastxml   NCBI BLAST XML
      wise       Genewise -genesf format

    See the SearchIO HOWTO linked from http://bioperl.org/HOWTOs/

________________________________________________________

This is also in the wiki online now:

http://www.bioperl.org/wiki/Module:Bio::SearchIO 
http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

I think the current line of thought is to make XML the default, but I also
know you would irritate a LOT of people out there by cutting off text output
parsing completely.  Roger Hall or Jason pointed out that doing so will
break many scripts out there.  

Furthermore, the problems with text output parsing are usually minimal.  For
instance, the last one was a small change which broke a regex, causing an
infinite loop; the actual bug was in Bio::SearchIO::blast and not in
RemoteBlast.  A simple addition to the regex fixed it.  The only change to
RemoteBlast was to implement the option of saving XML formatted BLAST
output.

I do like the idea of using XML output to build custom (bioperl-specific)
BLAST reports, but that also requires more work, likely a lot more work.
Again, maybe add that as an enhancement in Bugzilla or, better yet, submit
some sample code maybe as an example.  

> This would definitely simplifying coding using the RemoteBlast.pm module
> as
> then you could treat the report retrieval process as an object and just
> wait
> for the object to return its value, instead of coding in a bunch of test
> loops to see if it is done.  This may also help keep bugs out of the
> module
> and make the module longer lasting and not require module users to rewrite
> their code every time NCBI makes changes.

I think the most stable way of submitting jobs is by using the netblast
client (blastcl3) and parsing the results from that.  No CGI, no HTML, just
saving to a temp file and parsing through SearchIO.

RemoteBlast was designed, I believe, with the idea of letting researchers
with some basic knowledge of perl use an interface familiar to them (i.e.
the BLAST interface at NCBI) and retrieve results on a regular basis.  The
results are parsed via SearchIO::blast/blastxml/blasttable.  The problem is,
though convenient, RemoteBlast is also reliant on the powers that be at NCBI
not changing anything dramatically.  It is possible that NCBI could modify
the HTML code from the BLAST retrieval process, thus breaking RemoteBlast.
Text output could change again, even more dramatically, thus severely
breaking Bio::SearchIO::blast.  Thus, we adapt to those changes by modifying
the broken modules.  It's evolution at its finest.  It's also a fact of life
that code breaks and needs to be fixed every once in a while to stay
current.

Okay, I'm waxing philosophical now so I know I've definitely had too much
coffee.  Must get back to work...

> 
> 
> 
> Any thoughts or ideas?
> 
> 
> 
> Is anyone working on this?
> 
> 
> 
> Thanks
> 
> 
> 
> Brad Olson
> 
> 
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign



From roger at iosea.com  Mon Feb  6 13:14:11 2006
From: roger at iosea.com (Roger Hall)
Date: Mon, 6 Feb 2006 12:14:11 -0600
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
	the	method?
In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral>
Message-ID: <000f01c62b49$25732d30$4301a8c0@LIBERAL>

Brad,

I decided to fix this module about ten days ago, and then was out all of
last week with Strep plus a virus or two - it's one of the advantages of
having young kids.

I see that there have been quite a few messages about this module in just
the last week. I am sitting down now to read through them.

I'll get back to you (and the list) ASAP.

If you have any other questions or suggestions about RemoteBlast, feel free
to bug me with 'em. 

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C.
Olson
Sent: Friday, February 03, 2006 2:54 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the
method?

I have been working with the RemoteBlast.pm module and have found that it is
a bit clunky to use loops to keep checking to see if you RID has finished.

 

For example, every time you write a script, you need to add a code block
(see example in the documentation) in order to keep checking if @rid is
finished.

 

Would it be better to maybe write this in as a method in the RemoteBlast
module?  It seems like it would be better for remoteblast to have a method
we could call say retrieve_when_done that would return the blast report when
the value of retrieve_blast is no longer 0.

 

The only issue may be report parsing, but I wonder if it might be better to
separate out submittal/retrieval of BLAST requests from the parsing step and
make these more discrete processes?  Since NCBI seems to be not supporting
text results as a standard, maybe the module should work exclusively with
XML and we could change report handling away from the headaches of text
processing and just allow Bio::SeqIO or blastxml handle the task of making a
blast reports into different forms (such as HTML, text etc).

 

This would definitely simplifying coding using the RemoteBlast.pm module as
then you could treat the report retrieval process as an object and just wait
for the object to return its value, instead of coding in a bunch of test
loops to see if it is done.  This may also help keep bugs out of the module
and make the module longer lasting and not require module users to rewrite
their code every time NCBI makes changes.

 

Any thoughts or ideas?

 

Is anyone working on this?

 

Thanks

 

Brad Olson

 

 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From barry.m.dancis at gsk.com  Mon Feb  6 12:17:13 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Mon, 6 Feb 2006 12:17:13 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: <003701c625c4$5527d790$2f01a8c0@GOLHARMOBILE1>
Message-ID: 

Hi --

        Are there any classes for manipulating miRNA's with functions such 
as parsing the name, storing and interlinking pri/pre/mat sequences, etc?

Thanks,

Barry


From hubert.prielinger at gmx.at  Mon Feb  6 18:16:01 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 06 Feb 2006 17:16:01 -0600
Subject: [Bioperl-l] no results with standalone tblastn
In-Reply-To: <43E6DD01.2010600@infotech.monash.edu.au>
References: <43E3DD89.7080903@gmx.at>
	<20060205043427.GB39264@iib.unsam.edu.ar>	<43E6BAC7.5050707@gmx.at>
	<43E6D097.7080304@infotech.monash.edu.au>	<43E6CCAB.2060107@gmx.at>
	<43E6DD01.2010600@infotech.monash.edu.au>
Message-ID: <43E7D8B1.5030307@gmx.at>

dear torsten,
I have downloaded all the databases, as you recommended me. And it is 
working, but I don't get any results, if I try it online it works fine.
my result file looks like that:

TBLASTN 2.2.13 [Nov-27-2005]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query=
         (8 letters)

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           3,749,503 sequences; 16,556,997,203 total letters

Searching..................................................done

                                                                
Sequences producing significant alignments:                Score    
E      (bits) Value



the program code for it looks like that:

#!/usr/local/bin/perl -w
BEGIN
{
      $ENV{BLASTDIR}= "/home/Hubert/blast/blast-2.2.13/bin";
    $ENV{BLASTDATADIR}= "/home/Hubert/blast/blast-2.2.13/data"; 
}

use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
use Bio::SeqIO;
use strict;

print "Please insert matrix:\t";
my $matrix_STD = ;
chomp $matrix_STD;

print "Please insert count:\t";
my $count_STD = ;
chomp $count_STD;



# parameters
my $expect_value = 20000;
#my $filter_query_sequence = 'T';
my $one_line_description = 1000;
my $alignments = 1000;
#my $matrix = 'BLOSUM80';
my $gapcost = 10;
my $gapextend = 1;
my $wordsize = 2;
#my $compbasedStat = '1';
#my $count = 1;
# my $strands = 1;

my @params = ('program' => 'tblastn','database' => 'nt');
#my $progress_interval = 100;


my $seqio_obj = Bio::SeqIO->new(
  -file   => "aloneblosum62.txt",
  -format => "raw",
);

# create factory object and set parameters

my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
print "submitted parameters successfully \n";

$factory->e($expect_value);
#$factory->F($filter_query_sequence);
$factory->v($one_line_description);
$factory->b($alignments);
$factory->M($matrix_STD);
$factory->G($gapcost);
$factory->E($gapextend);
$factory->W($wordsize);
#$factory->C($compbasedStat);
#$factory->S($strands);

print "changed parameters successfully \n";
print "\n";


# get query

while ( my $query = $seqio_obj->next_seq) {
      print "entered while loop \n";
      my $blast_report = $factory->blastall($query);
#      print "$blast_report\n";
      $factory->outfile("nucleo80$count_STD.txt");
      $count_STD++;
      print $query->seq;
      print "\n";
     
}



thanks
Hubert



Torsten Seemann wrote:

>Hubert
>
>  
>
>>thanks for your quick reply, I have looked up at the ftp server and 
>>there are nt.00 to nt.04. Do I have to download all of them, are there 
>>differences?
>>    
>>
>
>You have to download them all. The "nt" database (actually the index 
>files) is very big, and it is split up into gigabyte (?) parts. Although 
>they are called "nt.00" "nt.01" etc, you still pass "-d nt" to 
>"blastall", because together these parts are one "nt" database. The 
>"blastall" program will automatically use the separate parts; you do not 
>have to join them.
>
>You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are 
>using the correct BLAST search for your problem.
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Mon Feb  6 21:17:40 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 07 Feb 2006 13:17:40 +1100
Subject: [Bioperl-l] no results with standalone tblastn
In-Reply-To: <43E7D8B1.5030307@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au>
	<43E6CCAB.2060107@gmx.at> <43E6DD01.2010600@infotech.monash.edu.au>
	<43E7D8B1.5030307@gmx.at>
Message-ID: <43E80344.5090207@infotech.monash.edu.au>


> I have downloaded all the databases, as you recommended me. And it is 
> working, but I don't get any results, if I try it online it works fine.
> my result file looks like that:
> 
> TBLASTN 2.2.13 [Nov-27-2005]
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> Query=
>          (8 letters)
> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
>            3,749,503 sequences; 16,556,997,203 total letters
> Searching..................................................done
> Sequences producing significant alignments:                Score    
> E      (bits) Value

Is your query only 8 amino acids long?

This report looks like it did have alignments that were not displayed, 
otherwise it would print "**** No hits ****".

This mailing list is not here to solve your BLAST problems unless it is 
a problem with the Perl module running BLAST.

You first need to try and get your problem working on the command line 
*without* Perl. eg.

/home/Hubert/blast/blast-2.2.13/bin/blastall -p tblastn -d nt -i 
YOUR_FASTA_FILE_WITH_SEQUENCE_IN_IT -o OUTPUT_FILE.txt -e 0.001
...

where "..." is the rest of the options you are setting in your Perl 
script. If it doesn't work that way, it will never work in Perl.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From rahall2 at ualr.edu  Mon Feb  6 21:46:44 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Mon, 6 Feb 2006 20:46:44 -0600
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please
	reply
Message-ID: <002001c62b90$bb9dbe00$4301a8c0@LIBERAL>

To everyone who uses RemoteBlast.pm:

 

Would anyone object to RemoteBlast being rewritten in a way that requires
NCBI's blastcl3 executable?

 

Binary downloads of blastcl3 (column "netblast") are available for numerous
platforms at: http://ncbi.nih.gov/BLAST/download.shtml

 

Does anyone require or desire a "pure perl" implementation? If so, please
explain the advantage you see with such an implementation.

 

Thanks!

 

Roger Hall

Technical Director

MidSouth Bioinformatics Center

University of Arkansas at Little Rock

(501) 569-8074

 



From osborne1 at optonline.net  Tue Feb  7 12:05:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 07 Feb 2006 12:05:56 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: 

Barry,

If the sequence information is in one of the formats that Bioperl
understands (Genbank, Swissprot flat, and so on) then the answer is yes.
This assumes that the details on sequence that you mentioned are found in
some sequence feature section in the file. But it looks to me like there's
no specialized parser for miRNA sequence per se, I'll be corrected if I'm
wrong.

Brian O.


On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com"  wrote:

> Hi --
> 
>         Are there any classes for manipulating miRNA's with functions such
> as parsing the name, storing and interlinking pri/pre/mat sequences, etc?
> 
> Thanks,
> 
> Barry
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From barry.m.dancis at gsk.com  Tue Feb  7 15:26:27 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Tue, 7 Feb 2006 15:26:27 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: 

It's the parser in particular that I need




"Brian Osborne"  
Sent by: bioperl-l-bounces at lists.open-bio.org
07-Feb-2006 12:05
 
To
barry.m.dancis at gsk.com, "bioperl-l" , 
bioperl-l-bounces at lists.open-bio.org
cc

Subject
Re: [Bioperl-l] Handling miRNA's






Barry,

If the sequence information is in one of the formats that Bioperl
understands (Genbank, Swissprot flat, and so on) then the answer is yes.
This assumes that the details on sequence that you mentioned are found in
some sequence feature section in the file. But it looks to me like there's
no specialized parser for miRNA sequence per se, I'll be corrected if I'm
wrong.

Brian O.


On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com"  
wrote:

> Hi --
> 
>         Are there any classes for manipulating miRNA's with functions 
such
> as parsing the name, storing and interlinking pri/pre/mat sequences, 
etc?
> 
> Thanks,
> 
> Barry
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From deep.raman at gmail.com  Tue Feb  7 15:16:48 2006
From: deep.raman at gmail.com (Raman Deep Singh)
Date: Wed, 8 Feb 2006 01:46:48 +0530
Subject: [Bioperl-l] Needed help
Message-ID: 

Hi all
     I have a huge task of retrieving a number of sequences from the
swiss prot databases on some fixed criteria. FOr that i want to index
the swiss prot database on my local disk. I have downloaded the whole
swiss prot database on my local disc  (the january 2006 release).

  I am currently using the bioperl on linux machine . I am using the
code listed below


=======================

    use Bio::Index::Swissprot;

    my $Index_File_Name = shift;
    my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name,
 '-write_flag' => 'WRITE');
    $inx->make_index(@ARGV);
-----------------------------------------
    # Print out several sequences present in the index
    # in gcg format
    use Bio::Index::Swissprot;
    use Bio::SeqIO;

    my $out = Bio::SeqIO->new( '-format' => 'gcg', '-fh' => \*STDOUT );
    my $Index_File_Name = shift;
    my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name);

    foreach my $id (@ARGV) {
        my $seq = $inx->fetch($id); # Returns Bio::Seq object
        $out->write_seq($seq);
    }

    # alternatively

    my $seq1 = $inx->get_Seq_by_id($id);
    my $seq2 = $inx->get_Seq_by_acc($acc);


-- -------------------------------
i am running teh script as

 perl getseqfromid.pl sample.dat

from the shell

and i am getting this error repeatedly

------------- EXCEPTION  -------------
MSG: Can't open 'DB_File' dbm file 'swiss100.dat' : No such file or directory
STACK Bio::Index::Abstract::open_dbm
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:389
STACK Bio::Index::Abstract::new
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
STACK Bio::Index::AbstractSeq::new
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
STACK toplevel i.pl:6


--------------------------
At some place online, i also found some document that some variables
need to be exported. I also did the same but still got teh same errors

kindly  help




Ramandeep Singh



From cjfields at uiuc.edu  Tue Feb  7 17:40:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 16:40:15 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <007701c62c37$7914af60$15327e82@pyrimidine>

Are you talking about sequences or text output from a specific program?  If
you are talking about sequences in a particular format, then listen to
Brian.  If you are talking about output, then we need to know which program
you're using, as a parser may exist or could be built.  

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
>  
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Tue Feb  7 18:06:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 17:06:21 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <000001c62c3b$1c6017b0$15327e82@pyrimidine>

Sorry if this gets posted twice.

Are you talking about sequences or text output from a specific program?  If
you are talking about sequences in a particular format, then Brian's right.
If you are talking about output, then we need to know which program you're
using, as a parser may exist, or prbably could be built from and existing
one.

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
>  
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From paul.boutros at utoronto.ca  Tue Feb  7 20:38:42 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Tue,  7 Feb 2006 20:38:42 -0500
Subject: [Bioperl-l] (no subject)
Message-ID: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>

Hi Roger,

I would definitely prefer a fully Perl-based implementation.  For starters, I have not 
been successful in compiling the Toolkit that contains netblast for some platforms (e.g. 
AIX 5.2 w/gcc 4.0).

I haven't been following the discussion: is there some compelling reason to prefer a 
netblast-based system that's come up recently?  I'm guessing that adding a new non-perl 
dependency would only be done if there was considerable justification for this type of 
change, but I'm not clear from your message what that justification is.

Paul



------------------------------ 

Message: 12 
Date: Mon, 6 Feb 2006 20:46:44 -0600 
From: "Roger Hall"  
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
        please        reply 
To:  
Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
Content-Type: text/plain;        charset="us-ascii" 

To everyone who uses RemoteBlast.pm: 

Would anyone object to RemoteBlast being rewritten in a way that requires 
NCBI's blastcl3 executable? 

Binary downloads of blastcl3 (column "netblast") are available for numerous 
platforms at: http://ncbi.nih.gov/BLAST/download.shtml 

Does anyone require or desire a "pure perl" implementation? If so, please 
explain the advantage you see with such an implementation. 

Thanks! 
 

Roger Hall 

Technical Director 

MidSouth Bioinformatics Center 

University of Arkansas at Little Rock 

(501) 569-8074 

  





From cjfields at uiuc.edu  Tue Feb  7 23:52:36 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 22:52:36 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
Message-ID: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>

I want to submit a module for parsing RNAMotif output  
(Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning  
output and returning Bio::SeqFeature::Generic objects with added tags  
for descriptors/sequences/file info.  I'm in the process of writing  
up tests and going through biodesign to make sure everything's  
kosher, but the module itself is essentially ready-to-go.  What  
should I do next?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From rahall2 at ualr.edu  Wed Feb  8 00:16:44 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Tue, 7 Feb 2006 23:16:44 -0600
Subject: [Bioperl-l] RemoteBlast  [was: (no subject)]
In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
Message-ID: <004401c62c6e$da906a40$4301a8c0@LIBERAL>

Paul,

I think that most core Bioperl folks have long since moved away from
RemoteBlast and are using the functionality in StandAloneBlast to run their
own local servers. More importantly, they are, in general, researchers who
are coming to Bioinformatics from the life sciences side, and are
particularly tired of dealing with the technical issues that RemoteBlast
consistently generates due to changes in the text-formatted BLAST reports. 

They aren't code-for-code-sake geeks like me. ;}

When RemoteBlast was written, XML was barely on the technology radar, and
XML-formatted BLAST reports weren't even available. It seems that everyone
recognizes that the XML reports now generated by NCBI's blast server is the
wave of the future, but I think there is still some concern that not every
flavor of BLAST produces XML yet. Even so, the XML parser is considered to
be very strong, and only helps hasten the end of text-formatted support,
since parsing text-formatted reports is the primary source of pain. 

In discussing the shift from old to new, I think the idea of relying on
NCBI's application (and NCBI's issue system and NCBI's developers) entered
the realm of possibility, so as the guy who just showed up to adopt
RemoteBlast, I am trying to air all options and beg for all requirements. 

Personally, I am okay with the idea of maintaining text-formatted report
parsing, but like I said, I'm pound foolish about code sometimes. Additional
foolishness arises from the fact that the first money I earned in
Bioinformatics was on a contract gig where I relied on RemoteBlast (and the
related text parsers).

For my money, I just needed anyone, anywhere, to say they desired a pure
perl implementation to meet my personal threshold. So far, you're the
second. ;}

I do, however, see the advantage in shifting to XML-formatted reporting and
parsing *only* as soon as every BLAST flavor supports it, if not before.
(Anyone - is this still an issue. Please educate me.)

At the moment, I'm leaning towards adding an option to RemoteBlast. The
default (no option) would use a "pure perl" implementation, and the
enhancement (with explicit option) would merely wrap the NCBI executable.
However, there are other issues (queuing, batches) that I don't fully
understand in context, so I haven't zeroed in on a complete recommendation
yet. Additionally, the end of text-formatted reports, while drawing near, is
not yet agreed, although it is pretty clear that the only way text support
will be continued is if I insist on it and then deliver the support myself.
:}

In any case, I am very interested in a pure perl implementation for exactly
the two reasons stated thus far: it's one less thing for a newbie to worry
about, and it will run on every platform that runs perl. 

Thanks much for the input!

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock
(501) 569-8074




-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Paul Boutros
Sent: Tuesday, February 07, 2006 7:39 PM
To: BioPerl Mailing List
Cc: Roger Hall
Subject: [Bioperl-l] (no subject)

Hi Roger,

I would definitely prefer a fully Perl-based implementation.  For starters,
I have not 
been successful in compiling the Toolkit that contains netblast for some
platforms (e.g. 
AIX 5.2 w/gcc 4.0).

I haven't been following the discussion: is there some compelling reason to
prefer a 
netblast-based system that's come up recently?  I'm guessing that adding a
new non-perl 
dependency would only be done if there was considerable justification for
this type of 
change, but I'm not clear from your message what that justification is.

Paul



------------------------------ 

Message: 12 
Date: Mon, 6 Feb 2006 20:46:44 -0600 
From: "Roger Hall"  
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
        please        reply 
To:  
Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
Content-Type: text/plain;        charset="us-ascii" 

To everyone who uses RemoteBlast.pm: 

Would anyone object to RemoteBlast being rewritten in a way that requires 
NCBI's blastcl3 executable? 

Binary downloads of blastcl3 (column "netblast") are available for numerous 
platforms at: http://ncbi.nih.gov/BLAST/download.shtml 

Does anyone require or desire a "pure perl" implementation? If so, please 
explain the advantage you see with such an implementation. 

Thanks! 
 

Roger Hall 

Technical Director 

MidSouth Bioinformatics Center 

University of Arkansas at Little Rock 

(501) 569-8074 

  



_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From heikki at sanbi.ac.za  Wed Feb  8 01:53:58 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 8 Feb 2006 08:53:58 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
Message-ID: <200602080853.58889.heikki@sanbi.ac.za>

Chris,

Post your files to bugzilla (ticket type enhancement, add files to ticket 
after creation)  and someone with commit ability will add them to CVS once 
the code is in satisfactory condition. 

Thanks,

	-Heikki

On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> I want to submit a module for parsing RNAMotif output
> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> output and returning Bio::SeqFeature::Generic objects with added tags
> for descriptors/sequences/file info.  I'm in the process of writing
> up tests and going through biodesign to make sure everything's
> kosher, but the module itself is essentially ready-to-go.  What
> should I do next?
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hlapp at gmx.net  Wed Feb  8 00:48:40 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 7 Feb 2006 21:48:40 -0800
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
Message-ID: 

I presume you don't have a cvs write account yet - if you do just add
and commit the module and test. Otherwise could you post the POD to
the list please; either somebody with an account will hopefully
volunteer or Jason or I or Heikki or Aaron will assume mentorship and
commit the code with feedback to you. Unless you completely refuse to
heed any and all advice ;) that person will then soon try to absolve
him/herself of having to do this again for you and support you for
receiving a cvs write account of your own.

   -hilmar

On 2/7/06, Chris Fields  wrote:
> I want to submit a module for parsing RNAMotif output
> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> output and returning Bio::SeqFeature::Generic objects with added tags
> for descriptors/sequences/file info.  I'm in the process of writing
> up tests and going through biodesign to make sure everything's
> kosher, but the module itself is essentially ready-to-go.  What
> should I do next?
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb  8 07:57:46 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 06:57:46 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: 
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
	
Message-ID: 

I'll probably goes with Heikki's advice and post the module (with  
POD, tests, and test file) to bugzilla as an enhancement.  That way  
it can be looked through before committing.  I will likely have a few  
more modules for ERPIN and maybe Infernal int he next few months (if  
I can get it up and running).

Also, completely off-topic, I'll post what I have written up for  
installing bioperl-db on WinXP here soon.  I think it should probably  
be included in the wiki in some way, maybe as a link from the bioperl- 
db wiki page.

Thanks Hilmar, Heikki!

Chris


On Feb 7, 2006, at 11:48 PM, Hilmar Lapp wrote:

> I presume you don't have a cvs write account yet - if you do just add
> and commit the module and test. Otherwise could you post the POD to
> the list please; either somebody with an account will hopefully
> volunteer or Jason or I or Heikki or Aaron will assume mentorship and
> commit the code with feedback to you. Unless you completely refuse to
> heed any and all advice ;) that person will then soon try to absolve
> him/herself of having to do this again for you and support you for
> receiving a cvs write account of your own.
>
>    -hilmar
>
> On 2/7/06, Chris Fields  wrote:
>> I want to submit a module for parsing RNAMotif output
>> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
>> output and returning Bio::SeqFeature::Generic objects with added tags
>> for descriptors/sequences/file info.  I'm in the process of writing
>> up tests and going through biodesign to make sure everything's
>> kosher, but the module itself is essentially ready-to-go.  What
>> should I do next?
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Wed Feb  8 10:32:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 09:32:25 -0600
Subject: [Bioperl-l] RemoteBlast  [was: (no subject)]
In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
Message-ID: <000401c62cc4$de0cc9b0$15327e82@pyrimidine>

Roger, 

It might be better to build a wrapper for the blastcl3 and make it a
separate Bio::Tools::Run module, maybe branch it off from RemoteBlast or,
better yet, StandAloneBlast.  All the put/get parameters in the BEGIN{}
block for RemoteBlast look like they are configured for NCBI's HTTP
submission via CGI; I don't think you can use these for blastcl3.  Ergo,
you'll have to create a whole new set of hashes or parameter arrays inside
RemoteBlast just for blastcl3 since everything is passed via command-line
flags, like so (from http://www.ncbi.nlm.nih.gov/blast/docs/netblast.html):

blastcl3 -p blastp -d nr -i MY_QUEYR -o MY_QUERY.out

However, StandAloneBlast looks like it has all the parameters mapped out in
the BEGIN{} block.  And it looks like the command line options support just
about everything you get via the web version.  It probably wouldn't take
much modification from StandAloneBlast to get it to run blastcl3.

As for queueing, I don't think it's supported, though you can send in a
FASTA file with multiple sequences for multiple BLAST queries (I tried this
and it works).  You could also create a queue using a sequence factory,
sending them to the netblast client one at a time, though I'd suggest
putting a delay in between cycles in that case so as not to make the guys at
NCBI cranky.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Roger Hall
> Sent: Tuesday, February 07, 2006 11:17 PM
> To: Paul.Boutros at utoronto.ca; 'BioPerl Mailing List'
> Subject: Re: [Bioperl-l] RemoteBlast [was: (no subject)]
> 
> Paul,
> 
> I think that most core Bioperl folks have long since moved 
> away from RemoteBlast and are using the functionality in 
> StandAloneBlast to run their own local servers. More 
> importantly, they are, in general, researchers who are coming 
> to Bioinformatics from the life sciences side, and are 
> particularly tired of dealing with the technical issues that 
> RemoteBlast consistently generates due to changes in the 
> text-formatted BLAST reports. 
> 
> They aren't code-for-code-sake geeks like me. ;}
> 
> When RemoteBlast was written, XML was barely on the 
> technology radar, and XML-formatted BLAST reports weren't 
> even available. It seems that everyone recognizes that the 
> XML reports now generated by NCBI's blast server is the wave 
> of the future, but I think there is still some concern that 
> not every flavor of BLAST produces XML yet. Even so, the XML 
> parser is considered to be very strong, and only helps hasten 
> the end of text-formatted support, since parsing 
> text-formatted reports is the primary source of pain. 
> 
> In discussing the shift from old to new, I think the idea of 
> relying on NCBI's application (and NCBI's issue system and 
> NCBI's developers) entered the realm of possibility, so as 
> the guy who just showed up to adopt RemoteBlast, I am trying 
> to air all options and beg for all requirements. 
> 
> Personally, I am okay with the idea of maintaining 
> text-formatted report parsing, but like I said, I'm pound 
> foolish about code sometimes. Additional foolishness arises 
> from the fact that the first money I earned in Bioinformatics 
> was on a contract gig where I relied on RemoteBlast (and the 
> related text parsers).
> 
> For my money, I just needed anyone, anywhere, to say they 
> desired a pure perl implementation to meet my personal 
> threshold. So far, you're the second. ;}
> 
> I do, however, see the advantage in shifting to XML-formatted 
> reporting and parsing *only* as soon as every BLAST flavor 
> supports it, if not before.
> (Anyone - is this still an issue. Please educate me.)
> 
> At the moment, I'm leaning towards adding an option to 
> RemoteBlast. The default (no option) would use a "pure perl" 
> implementation, and the enhancement (with explicit option) 
> would merely wrap the NCBI executable.
> However, there are other issues (queuing, batches) that I 
> don't fully understand in context, so I haven't zeroed in on 
> a complete recommendation yet. Additionally, the end of 
> text-formatted reports, while drawing near, is not yet 
> agreed, although it is pretty clear that the only way text 
> support will be continued is if I insist on it and then 
> deliver the support myself.
> :}
> 
> In any case, I am very interested in a pure perl 
> implementation for exactly the two reasons stated thus far: 
> it's one less thing for a newbie to worry about, and it will 
> run on every platform that runs perl. 
> 
> Thanks much for the input!
> 
> Roger Hall
> Technical Director
> MidSouth Bioinformatics Center
> University of Arkansas at Little Rock
> (501) 569-8074
> 
> 
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Paul Boutros
> Sent: Tuesday, February 07, 2006 7:39 PM
> To: BioPerl Mailing List
> Cc: Roger Hall
> Subject: [Bioperl-l] (no subject)
> 
> Hi Roger,
> 
> I would definitely prefer a fully Perl-based implementation.  
> For starters, I have not been successful in compiling the 
> Toolkit that contains netblast for some platforms (e.g. 
> AIX 5.2 w/gcc 4.0).
> 
> I haven't been following the discussion: is there some 
> compelling reason to prefer a netblast-based system that's 
> come up recently?  I'm guessing that adding a new non-perl 
> dependency would only be done if there was considerable 
> justification for this type of change, but I'm not clear from 
> your message what that justification is.
> 
> Paul
> 
> 
> 
> ------------------------------ 
> 
> Message: 12
> Date: Mon, 6 Feb 2006 20:46:44 -0600
> From: "Roger Hall" 
> Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
>         please        reply 
> To: 
> Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
> Content-Type: text/plain;        charset="us-ascii" 
> 
> To everyone who uses RemoteBlast.pm: 
> 
> Would anyone object to RemoteBlast being rewritten in a way 
> that requires NCBI's blastcl3 executable? 
> 
> Binary downloads of blastcl3 (column "netblast") are 
> available for numerous platforms at: 
> http://ncbi.nih.gov/BLAST/download.shtml 
> 
> Does anyone require or desire a "pure perl" implementation? 
> If so, please explain the advantage you see with such an 
> implementation. 
> 
> Thanks! 
>  
> 
> Roger Hall 
> 
> Technical Director 
> 
> MidSouth Bioinformatics Center 
> 
> University of Arkansas at Little Rock 
> 
> (501) 569-8074 
> 
>   
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Wed Feb  8 15:51:41 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 14:51:41 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output
Message-ID: <43EA59DD.1030608@gmx.at>

Hi,
If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO,  
I get the following error message:

MSG: no data for midline Query  1   WWWKWRW  7
STACK Bio::SearchIO::blast::next_result 
/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
STACK toplevel 
/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

is that a bug......

If I want to parse Blast Output (version 2.2.13), I don't get anything.....
I'm using bioperl 1.4

before, I have installed bioperl 1.4, it worked fine parsing Blast 
Output (version 2.2.12), but I don't remember which bioperl version I 
had installed

thanks in advance

Hubert





From cjfields at uiuc.edu  Wed Feb  8 17:15:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 16:15:23 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA59DD.1030608@gmx.at>
Message-ID: <001101c62cfd$28605df0$15327e82@pyrimidine>

My guess is you're running into text parsing problems in
Bio::SearchIO::blast.  Upgrade to the latest developer version (1.5.1) or
bioperl-live (CVS), then see the bug below. 

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

I think the first problem you ran into is solved in bioperl 1.5.1, the last
problem (more recent, not related to the first) has been fixed but hasn't
been committed to bioperl-live yet.  The fixed SearchIO::blast is available
in the link above, but realize it hasn't been committed yet and may change.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Hubert Prielinger
> Sent: Wednesday, February 08, 2006 2:52 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> Hi,
> If I want to parse a Blast Output (Version 2.2.12) with 
> Bio::SearchIO, I get the following error message:
> 
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> is that a bug......
> 
> If I want to parse Blast Output (version 2.2.13), I don't get 
> anything.....
> I'm using bioperl 1.4
> 
> before, I have installed bioperl 1.4, it worked fine parsing 
> Blast Output (version 2.2.12), but I don't remember which 
> bioperl version I had installed
> 
> thanks in advance
> 
> Hubert
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Wed Feb  8 16:41:04 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 15:41:04 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <001101c62cfd$28605df0$15327e82@pyrimidine>
References: <001101c62cfd$28605df0$15327e82@pyrimidine>
Message-ID: <43EA6570.9070909@gmx.at>

hi chris,
thanks, I have upgraded to version 1.5.1 but it isn't still working, do 
you have any ohter idea, the problem I have is that I have to parse a 
lot of textfiles....
or shall I look for another option to parse those files...

regards
Hubert



Chris Fields wrote:

>My guess is you're running into text parsing problems in
>Bio::SearchIO::blast.  Upgrade to the latest developer version (1.5.1) or
>bioperl-live (CVS), then see the bug below. 
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>I think the first problem you ran into is solved in bioperl 1.5.1, the last
>problem (more recent, not related to the first) has been fixed but hasn't
>been committed to bioperl-live yet.  The fixed SearchIO::blast is available
>in the link above, but realize it hasn't been committed yet and may change.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org 
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>Hubert Prielinger
>>Sent: Wednesday, February 08, 2006 2:52 PM
>>To: bioperl-l at bioperl.org
>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>Hi,
>>If I want to parse a Blast Output (Version 2.2.12) with 
>>Bio::SearchIO, I get the following error message:
>>
>>MSG: no data for midline Query  1   WWWKWRW  7
>>STACK Bio::SearchIO::blast::next_result
>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>STACK toplevel
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>>is that a bug......
>>
>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>anything.....
>>I'm using bioperl 1.4
>>
>>before, I have installed bioperl 1.4, it worked fine parsing 
>>Blast Output (version 2.2.12), but I don't remember which 
>>bioperl version I had installed
>>
>>thanks in advance
>>
>>Hubert
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>
>  
>



From cjfields at uiuc.edu  Wed Feb  8 18:00:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 17:00:21 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6570.9070909@gmx.at>
Message-ID: <001201c62d03$703178c0$15327e82@pyrimidine>

Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
just the modules you want; mixing bioperl versions might work, but you might
run into interoperability problems).  Then replace the Bio::SearchIO::blast
with the one in Bugzilla.  The 'other option' you mentioned might be trying
XML instead of text, which is more stable in the long run.  You will still
need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
this:

http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

If you're using SearchIO directly instead of Remoteblast, you should be able
to set the '-readmethod' flag to 'blastxml'.

It also wouldn't hurt to know what OS you're using or see some code.  Roger
is out there somewhere (I think) and may also have some input.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
> Sent: Wednesday, February 08, 2006 3:41 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> hi chris,
> thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, do you have any ohter idea, the problem I have is 
> that I have to parse a lot of textfiles....
> or shall I look for another option to parse those files...
> 
> regards
> Hubert
> 
> 
> 
> Chris Fields wrote:
> 
> >My guess is you're running into text parsing problems in 
> >Bio::SearchIO::blast.  Upgrade to the latest developer 
> version (1.5.1) 
> >or bioperl-live (CVS), then see the bug below.
> >
> >http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >
> >I think the first problem you ran into is solved in bioperl 
> 1.5.1, the 
> >last problem (more recent, not related to the first) has 
> been fixed but 
> >hasn't been committed to bioperl-live yet.  The fixed 
> SearchIO::blast 
> >is available in the link above, but realize it hasn't been 
> committed yet and may change.
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >  
> >
> >>-----Original Message-----
> >>From: bioperl-l-bounces at lists.open-bio.org
> >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>Prielinger
> >>Sent: Wednesday, February 08, 2006 2:52 PM
> >>To: bioperl-l at bioperl.org
> >>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>output
> >>
> >>Hi,
> >>If I want to parse a Blast Output (Version 2.2.12) with 
> Bio::SearchIO, 
> >>I get the following error message:
> >>
> >>MSG: no data for midline Query  1   WWWKWRW  7
> >>STACK Bio::SearchIO::blast::next_result
> >>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>STACK toplevel
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>is that a bug......
> >>
> >>If I want to parse Blast Output (version 2.2.13), I don't get 
> >>anything.....
> >>I'm using bioperl 1.4
> >>
> >>before, I have installed bioperl 1.4, it worked fine parsing Blast 
> >>Output (version 2.2.12), but I don't remember which bioperl 
> version I 
> >>had installed
> >>
> >>thanks in advance
> >>
> >>Hubert
> >>
> >>
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>    
> >>
> >
> >
> >  
> >
> 



From hubert.prielinger at gmx.at  Wed Feb  8 17:22:44 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 16:22:44 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <001201c62d03$703178c0$15327e82@pyrimidine>
References: <001201c62d03$703178c0$15327e82@pyrimidine>
Message-ID: <43EA6F34.4090007@gmx.at>

hi,
I have installed from the following page: 
http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext. 
I'm using only the SearchIO without remoteblast module, because I have 
already all my Blast output files.
My operating system is fedora core 9.

Code:

#!/usr/bin/perl -w

use Bio::SearchIO;

print "start program\n";
my $directory = 
"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
print "read file\n";

my $search = new Bio::SearchIO (-format => 'blast',
                                -file => $file);
                               
my $cutoff_len = 10;
                               


#iterate over each query sequence
while (my $result = $search->next_result) {
print "entered 1st while loop\n";
   
    #iterate over each hit on the query sequence
    while (my $hit = $result->next_hit) {
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
           
            if ($hsp->length('sbjct') <= $cutoff_len) {
                #print $hsp->hit_string, "\n";
                for ($hsp->hit_string) {
               
                   
                    if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || 
tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
                       
                        # Print some tab-delimited data about this HSP
           
                           open (bigShot, ">>BlastOutputTrial.txt") || 
die ("Could not open file. $!");
                                #print $result->query_name, "\t";
           
#                        print $hit->significance, "\t";
                         print bigShot $hit->name, "-->";
                         print bigShot $hit->description, "\n";
                         #print bigShot "Query:   ", 
$hsp->start('query'), "  ", $hsp->query_string, "  ", 
$hsp->end('query'), "\n";
                         print bigShot "Seq:     ", $hsp->start('hit'), 
"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
                          
#                        print $hsp->rank, "\t";
#                        print $hsp->percent_identity, "\t";
#                        print $hsp->evalue, "\t";
#                        print $hsp->hsp_length, "\n";
                   
                        close (bigShot);
                       
                    };
               
           
            }
        }
        }
    }
}

}

closedir(DIR);


Chris Fields wrote:

>Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
>just the modules you want; mixing bioperl versions might work, but you might
>run into interoperability problems).  Then replace the Bio::SearchIO::blast
>with the one in Bugzilla.  The 'other option' you mentioned might be trying
>XML instead of text, which is more stable in the long run.  You will still
>need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
>this:
>
>http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>
>If you're using SearchIO directly instead of Remoteblast, you should be able
>to set the '-readmethod' flag to 'blastxml'.
>
>It also wouldn't hurt to know what OS you're using or see some code.  Roger
>is out there somewhere (I think) and may also have some input.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
>>Sent: Wednesday, February 08, 2006 3:41 PM
>>To: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>hi chris,
>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>working, do you have any ohter idea, the problem I have is 
>>that I have to parse a lot of textfiles....
>>or shall I look for another option to parse those files...
>>
>>regards
>>Hubert
>>
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>My guess is you're running into text parsing problems in 
>>>Bio::SearchIO::blast.  Upgrade to the latest developer 
>>>      
>>>
>>version (1.5.1) 
>>    
>>
>>>or bioperl-live (CVS), then see the bug below.
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>I think the first problem you ran into is solved in bioperl 
>>>      
>>>
>>1.5.1, the 
>>    
>>
>>>last problem (more recent, not related to the first) has 
>>>      
>>>
>>been fixed but 
>>    
>>
>>>hasn't been committed to bioperl-live yet.  The fixed 
>>>      
>>>
>>SearchIO::blast 
>>    
>>
>>>is available in the link above, but realize it hasn't been 
>>>      
>>>
>>committed yet and may change.
>>    
>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>Prielinger
>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>To: bioperl-l at bioperl.org
>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>        
>>>>
>>parsing Blast 
>>    
>>
>>>>output
>>>>
>>>>Hi,
>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>        
>>>>
>>Bio::SearchIO, 
>>    
>>
>>>>I get the following error message:
>>>>
>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>STACK Bio::SearchIO::blast::next_result
>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>STACK toplevel
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>is that a bug......
>>>>
>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>anything.....
>>>>I'm using bioperl 1.4
>>>>
>>>>before, I have installed bioperl 1.4, it worked fine parsing Blast 
>>>>Output (version 2.2.12), but I don't remember which bioperl 
>>>>        
>>>>
>>version I 
>>    
>>
>>>>had installed
>>>>
>>>>thanks in advance
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>
>
>  
>



From rahall2 at ualr.edu  Wed Feb  8 18:34:45 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Wed, 8 Feb 2006 17:34:45 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6F34.4090007@gmx.at>
Message-ID: <000401c62d08$3ede6b70$4301a8c0@LIBERAL>

Hubert,

Give me a bit to look over your code and think this through. I am still
re-familiarizing myself with the relevant modules, so I can't give an answer
off the top of my head.

Also, please send me one or more of your blast reports (zipped) if you don't
mind (and maybe avoid including the list in your reply). Let's take this
"offline" relative to the list - we'll include the list again if there is a
Bioperl issue and solution. (In case you are concerned at all, I promise not
to share or study the actual BLAST results.)

I'm not particularly familiar with the Fedora distributions, but I'm sure I
can either chase down the perl problem or at least eliminate everything else
but Fedora as the culprit. ;}

(Chris - I'm not quite paying attention on an hourly basis yet, but I do
intend to help support these issues for the foreseeable future. Thanks as
always for the assist.)

Thanks!

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock
(501) 569-8074



-----Original Message-----
From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
Sent: Wednesday, February 08, 2006 4:23 PM
To: Chris Fields; bioperl-l at bioperl.org; rahall2 at ualr.edu
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
output

hi,
I have installed from the following page: 
http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext. 
I'm using only the SearchIO without remoteblast module, because I have 
already all my Blast output files.
My operating system is fedora core 9.

Code:

#!/usr/bin/perl -w

use Bio::SearchIO;

print "start program\n";
my $directory = 
"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
print "read file\n";

my $search = new Bio::SearchIO (-format => 'blast',
                                -file => $file);
                               
my $cutoff_len = 10;
                               


#iterate over each query sequence
while (my $result = $search->next_result) {
print "entered 1st while loop\n";
   
    #iterate over each hit on the query sequence
    while (my $hit = $result->next_hit) {
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
           
            if ($hsp->length('sbjct') <= $cutoff_len) {
                #print $hsp->hit_string, "\n";
                for ($hsp->hit_string) {
               
                   
                    if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || 
tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
                       
                        # Print some tab-delimited data about this HSP
           
                           open (bigShot, ">>BlastOutputTrial.txt") || 
die ("Could not open file. $!");
                                #print $result->query_name, "\t";
           
#                        print $hit->significance, "\t";
                         print bigShot $hit->name, "-->";
                         print bigShot $hit->description, "\n";
                         #print bigShot "Query:   ", 
$hsp->start('query'), "  ", $hsp->query_string, "  ", 
$hsp->end('query'), "\n";
                         print bigShot "Seq:     ", $hsp->start('hit'), 
"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
                          
#                        print $hsp->rank, "\t";
#                        print $hsp->percent_identity, "\t";
#                        print $hsp->evalue, "\t";
#                        print $hsp->hsp_length, "\n";
                   
                        close (bigShot);
                       
                    };
               
           
            }
        }
        }
    }
}

}

closedir(DIR);


Chris Fields wrote:

>Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
>just the modules you want; mixing bioperl versions might work, but you
might
>run into interoperability problems).  Then replace the Bio::SearchIO::blast
>with the one in Bugzilla.  The 'other option' you mentioned might be trying
>XML instead of text, which is more stable in the long run.  You will still
>need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
>this:
>
>http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>
>If you're using SearchIO directly instead of Remoteblast, you should be
able
>to set the '-readmethod' flag to 'blastxml'.
>
>It also wouldn't hurt to know what OS you're using or see some code.  Roger
>is out there somewhere (I think) and may also have some input.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
>>Sent: Wednesday, February 08, 2006 3:41 PM
>>To: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>hi chris,
>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>working, do you have any ohter idea, the problem I have is 
>>that I have to parse a lot of textfiles....
>>or shall I look for another option to parse those files...
>>
>>regards
>>Hubert
>>
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>My guess is you're running into text parsing problems in 
>>>Bio::SearchIO::blast.  Upgrade to the latest developer 
>>>      
>>>
>>version (1.5.1) 
>>    
>>
>>>or bioperl-live (CVS), then see the bug below.
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>I think the first problem you ran into is solved in bioperl 
>>>      
>>>
>>1.5.1, the 
>>    
>>
>>>last problem (more recent, not related to the first) has 
>>>      
>>>
>>been fixed but 
>>    
>>
>>>hasn't been committed to bioperl-live yet.  The fixed 
>>>      
>>>
>>SearchIO::blast 
>>    
>>
>>>is available in the link above, but realize it hasn't been 
>>>      
>>>
>>committed yet and may change.
>>    
>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>Prielinger
>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>To: bioperl-l at bioperl.org
>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>        
>>>>
>>parsing Blast 
>>    
>>
>>>>output
>>>>
>>>>Hi,
>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>        
>>>>
>>Bio::SearchIO, 
>>    
>>
>>>>I get the following error message:
>>>>
>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>STACK Bio::SearchIO::blast::next_result
>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>STACK toplevel
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>is that a bug......
>>>>
>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>anything.....
>>>>I'm using bioperl 1.4
>>>>
>>>>before, I have installed bioperl 1.4, it worked fine parsing Blast 
>>>>Output (version 2.2.12), but I don't remember which bioperl 
>>>>        
>>>>
>>version I 
>>    
>>
>>>>had installed
>>>>
>>>>thanks in advance
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>
>
>  
>




From injunjoel at hotmail.com  Wed Feb  8 19:54:26 2006
From: injunjoel at hotmail.com (Joel Steele)
Date: Wed, 08 Feb 2006 16:54:26 -0800
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blastoutput
In-Reply-To: <43EA6F34.4090007@gmx.at>
Message-ID: 

Greetings,
Im not well versed in Bio::SearchIO but there are a few comments about your 
code that may or may not be relevant...

first thing:

=-=-=-=-=code snippet=-=-=-=-=

#!/usr/bin/perl -w
use strict;   #save yourself the headaches and force yourself to write clean 
code.

=-=-=-=-=code snippet=-=-=-=-=

next thing:
when you are reading the files from the directory you are not doing any sort 
of filtering as to what is returned. If you are on a Unix flavored system 
you may be getting the '.' and '..' entries from your readdir(DIR) call. I 
would suggest placing a grep in there somewhere to get only blast files.
something like:

=-=-=-=-=code snippet=-=-=-=-=

#assuming the file extension for blast files is .bls
#the -e and -f are filetests; you could probably get away with just
#-f. Here is a link for reference on the filetests available in Perl.
#
# http://www.perlmonks.org/?node_id=370

my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
closedir(DIR);

#then proceed with your foreach but over @files_to_parse

foreach my $file(@files_to_parse){
     #do cool stuff here...
}

=-=-=-=-=code snippet=-=-=-=-=

Hope that helps.
-Joel Steele


"The surest way to corrupt a youth is to instruct him to hold in higher 
regard those who think alike than those who think differently." -Nietzsche

"I do not feel obliged to believe that the same God who endowed us with 
sense, reason and intellect has intended us to forego their use." -Galileo




>From: Hubert Prielinger 
>To: Chris Fields , bioperl-l at bioperl.org, 
>rahall2 at ualr.edu
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>Blastoutput
>Date: Wed, 08 Feb 2006 16:22:44 -0600
>MIME-Version: 1.0
>Received: from newportal.open-bio.org ([209.59.5.172]) by 
>bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Wed, 8 
>Feb 2006 15:21:55 -0800
>Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by 
>newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k18NKjCX009295;Wed, 8 
>Feb 2006 18:20:53 -0500
>Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by 
>newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for 
>; Wed, 8 Feb 2006 18:20:43 -0500
>Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) 
>[136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006 00:19:21 
>+0100
>X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>X-Authenticated: #16854991
>User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>X-Accept-Language: en-us, en
>References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>X-Y-GMX-Trusted: 0
>X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 
>(newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21 -0500 (EST)
>X-Greylist: IP, sender and recipient auto-whitelisted, not delayed 
>bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Wed, 08 
>Feb 2006 18:20:43 -0500 (EST)
>X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>X-Scanned-By: MIMEDefang 2.52
>X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>X-BeenThere: bioperl-l at lists.open-bio.org
>X-Mailman-Version: 2.1.7
>Precedence: list
>List-Id: Bioperl Project Discussion List 
>List-Unsubscribe: 
>,
>List-Archive: 
>List-Post: 
>List-Help: 
>List-Subscribe: 
>,
>Errors-To: bioperl-l-bounces at lists.open-bio.org
>Return-Path: bioperl-l-bounces at lists.open-bio.org
>X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC) 
>FILETIME=[7419CF20:01C62D06]
>
>hi,
>I have installed from the following page:
>http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext.
>I'm using only the SearchIO without remoteblast module, because I have
>already all my Blast output files.
>My operating system is fedora core 9.
>
>Code:
>
>#!/usr/bin/perl -w
>
>use Bio::SearchIO;
>
>print "start program\n";
>my $directory =
>"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>opendir(DIR, $directory) || die("Cannot open directory");
>print "opened directory\n";
>
>foreach my $file (readdir(DIR))  {
>print "read file\n";
>
>my $search = new Bio::SearchIO (-format => 'blast',
>                                 -file => $file);
>
>my $cutoff_len = 10;
>
>
>
>#iterate over each query sequence
>while (my $result = $search->next_result) {
>print "entered 1st while loop\n";
>
>     #iterate over each hit on the query sequence
>     while (my $hit = $result->next_hit) {
>
>         #iterate over each HSP in the hit
>         while (my $hsp = $hit->next_hsp) {
>
>             if ($hsp->length('sbjct') <= $cutoff_len) {
>                 #print $hsp->hit_string, "\n";
>                 for ($hsp->hit_string) {
>
>
>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>
>                         # Print some tab-delimited data about this HSP
>
>                            open (bigShot, ">>BlastOutputTrial.txt") ||
>die ("Could not open file. $!");
>                                 #print $result->query_name, "\t";
>
>#                        print $hit->significance, "\t";
>                          print bigShot $hit->name, "-->";
>                          print bigShot $hit->description, "\n";
>                          #print bigShot "Query:   ",
>$hsp->start('query'), "  ", $hsp->query_string, "  ",
>$hsp->end('query'), "\n";
>                          print bigShot "Seq:     ", $hsp->start('hit'),
>"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
>
>#                        print $hsp->rank, "\t";
>#                        print $hsp->percent_identity, "\t";
>#                        print $hsp->evalue, "\t";
>#                        print $hsp->hsp_length, "\n";
>
>                         close (bigShot);
>
>                     };
>
>
>             }
>         }
>         }
>     }
>}
>
>}
>
>closedir(DIR);
>
>
>Chris Fields wrote:
>
> >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live 
>(not
> >just the modules you want; mixing bioperl versions might work, but you 
>might
> >run into interoperability problems).  Then replace the 
>Bio::SearchIO::blast
> >with the one in Bugzilla.  The 'other option' you mentioned might be 
>trying
> >XML instead of text, which is more stable in the long run.  You will 
>still
> >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
> >this:
> >
> >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
> >
> >If you're using SearchIO directly instead of Remoteblast, you should be 
>able
> >to set the '-readmethod' flag to 'blastxml'.
> >
> >It also wouldn't hurt to know what OS you're using or see some code.  
>Roger
> >is out there somewhere (I think) and may also have some input.
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >
> >
> >>-----Original Message-----
> >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
> >>Sent: Wednesday, February 08, 2006 3:41 PM
> >>To: Chris Fields; bioperl-l at bioperl.org
> >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>parsing Blast output
> >>
> >>hi chris,
> >>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>working, do you have any ohter idea, the problem I have is
> >>that I have to parse a lot of textfiles....
> >>or shall I look for another option to parse those files...
> >>
> >>regards
> >>Hubert
> >>
> >>
> >>
> >>Chris Fields wrote:
> >>
> >>
> >>
> >>>My guess is you're running into text parsing problems in
> >>>Bio::SearchIO::blast.  Upgrade to the latest developer
> >>>
> >>>
> >>version (1.5.1)
> >>
> >>
> >>>or bioperl-live (CVS), then see the bug below.
> >>>
> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>
> >>>I think the first problem you ran into is solved in bioperl
> >>>
> >>>
> >>1.5.1, the
> >>
> >>
> >>>last problem (more recent, not related to the first) has
> >>>
> >>>
> >>been fixed but
> >>
> >>
> >>>hasn't been committed to bioperl-live yet.  The fixed
> >>>
> >>>
> >>SearchIO::blast
> >>
> >>
> >>>is available in the link above, but realize it hasn't been
> >>>
> >>>
> >>committed yet and may change.
> >>
> >>
> >>>Christopher Fields
> >>>Postdoctoral Researcher - Switzer Lab
> >>>Dept. of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>-----Original Message-----
> >>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
> >>>>Prielinger
> >>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>To: bioperl-l at bioperl.org
> >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>
> >>>>
> >>parsing Blast
> >>
> >>
> >>>>output
> >>>>
> >>>>Hi,
> >>>>If I want to parse a Blast Output (Version 2.2.12) with
> >>>>
> >>>>
> >>Bio::SearchIO,
> >>
> >>
> >>>>I get the following error message:
> >>>>
> >>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>STACK Bio::SearchIO::blast::next_result
> >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>STACK toplevel
> >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>>
> >>>>is that a bug......
> >>>>
> >>>>If I want to parse Blast Output (version 2.2.13), I don't get
> >>>>anything.....
> >>>>I'm using bioperl 1.4
> >>>>
> >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast
> >>>>Output (version 2.2.12), but I don't remember which bioperl
> >>>>
> >>>>
> >>version I
> >>
> >>
> >>>>had installed
> >>>>
> >>>>thanks in advance
> >>>>
> >>>>Hubert
> >>>>
> >>>>
> >>>>
> >>>>_______________________________________________
> >>>>Bioperl-l mailing list
> >>>>Bioperl-l at lists.open-bio.org
> >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l




From saldroubi at yahoo.com  Wed Feb  8 20:12:16 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Wed, 8 Feb 2006 17:12:16 -0800 (PST)
Subject: [Bioperl-l] Documentation link?
Message-ID: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com>

All,
  
 Forgive me but I don't see the documentation link on the  new website.  I only see a link to the HOWTO's. I think I am  looking for the Pdoc link. 
  
  Thank you. 
  


Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From saldroubi at yahoo.com  Wed Feb  8 20:24:23 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Wed, 8 Feb 2006 17:24:23 -0800 (PST)
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>

All,
  
  Say I have an array of nucleotide sequences of of length N.  I  want to calculate the count matrix (weight matrix). That is for each  position 1..N, I want to know how many As, Cs ,Ts and Gs there  are.  Is the code to do this already written in bioperl to build  this matrix if I pass it those strings?
  
  Please excuse my lack of knowledge as I am a new comer to bioinformatics.
  
  Thank you. 
  
  
  
  

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From osborne1 at optonline.net  Wed Feb  8 20:44:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 08 Feb 2006 20:44:56 -0500
Subject: [Bioperl-l] Documentation link?
In-Reply-To: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com>
Message-ID: 

Sam,

http://bioperl.open-bio.org/wiki/Main_Page

Look for the API Docs under "main links".

Brian O.


On 2/8/06 8:12 PM, "Sam Al-Droubi"  wrote:

> All,
>   
>  Forgive me but I don't see the documentation link on the  new website.  I
> only see a link to the HOWTO's. I think I am  looking for the Pdoc link.
>   
>   Thank you. 
>   
> 
> 
> Sincerely, 
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From torsten.seemann at infotech.monash.edu.au  Wed Feb  8 21:54:39 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 13:54:39 +1100
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>
References: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>
Message-ID: <43EAAEEF.3000304@infotech.monash.edu.au>

>   Say I have an array of nucleotide sequences of of length N.  I  want to calculate the count matrix (weight matrix). That is for each  position 1..N, I want to know how many As, Cs ,Ts and Gs there  are.  Is the code to do this already written in bioperl to build  this matrix if I pass it those strings?
>   Please excuse my lack of knowledge as I am a new comer to bioinformatics.

Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
example similar to what you want to do:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html

--Torsten Seemann



From cjfields at uiuc.edu  Thu Feb  9 00:07:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 23:07:15 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blastoutput
In-Reply-To: 
References: 
Message-ID: 


On Feb 8, 2006, at 6:54 PM, Joel Steele wrote:

> Greetings,
> Im not well versed in Bio::SearchIO but there are a few comments  
> about your
> code that may or may not be relevant...
>
> first thing:
>
> =-=-=-=-=code snippet=-=-=-=-=
>
> #!/usr/bin/perl -w
> use strict;   #save yourself the headaches and force yourself to  
> write clean
> code.
>
> =-=-=-=-=code snippet=-=-=-=-=
>

Tread very carefully here.  Just about every book on perl suggests  
'use strict' and adding warnings for code development (ex. the Camel,  
the Llama, and others); in fact, these are the very books most  
beginners start from.  Some would consider NOT using -w or 'use  
strict' a bad habit; everybody has an opinion (I would repeat an oft- 
heard Texas saying, but I'll refrain).  Just remember: try to be a  
little more constructive in your critique and insert a little less  
about your personal coding style.  If you hit the wrong person, you  
might get flamed.

Here's a link that may help a bit here:

http://bioperl.org/Core/Latest/ 
biodesign.html#respect_people_s_code__in_particular_if_it_works_

> next thing:
> when you are reading the files from the directory you are not doing  
> any sort
> of filtering as to what is returned. If you are on a Unix flavored  
> system
> you may be getting the '.' and '..' entries from your readdir(DIR)  
> call. I
> would suggest placing a grep in there somewhere to get only blast  
> files.
> something like:
>

I agree here.  You could probably also use something like File::Find  
here to make things a bit easier with the file names as well; works  
wonderfully, esp. when traversing a directory tree.

> =-=-=-=-=code snippet=-=-=-=-=
>
> #assuming the file extension for blast files is .bls
> #the -e and -f are filetests; you could probably get away with just
> #-f. Here is a link for reference on the filetests available in Perl.
> #
> # http://www.perlmonks.org/?node_id=370
>
> my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
> closedir(DIR);
>
> #then proceed with your foreach but over @files_to_parse
>
> foreach my $file(@files_to_parse){
>      #do cool stuff here...
> }
>

Again, agreed.  But, does it really solve the main problem, which is  
an issue with SearchIO::blast?  It seemed to try parsing a blast file...

> =-=-=-=-=code snippet=-=-=-=-=
>
> Hope that helps.
> -Joel Steele
>
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger 
>> To: Chris Fields , bioperl-l at bioperl.org,
>> rahall2 at ualr.edu
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
>> Blastoutput
>> Date: Wed, 08 Feb 2006 16:22:44 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC 
>> (6.0.3790.211); Wed, 8
>> Feb 2006 15:21:55 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k18NKjCX009295;Wed, 8
>> Feb 2006 18:20:53 -0500
>> Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for
>> ; Wed, 8 Feb 2006 18:20:43 -0500
>> Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006  
>> 00:19:21
>> +0100
>> X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Wed, 08
>> Feb 2006 18:20:43 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List > bio.org>
>> List-Unsubscribe:
>> > l>,
>> List-Archive: 
>> List-Post: 
>> List-Help: 
>> List-Subscribe:
>> > l>,
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC)
>> FILETIME=[7419CF20:01C62D06]
>>
>> hi,
>> I have installed from the following page:
>> http://news.open-bio.org/archives/2005_10.html,  the Core, Run and  
>> Ext.
>> I'm using only the SearchIO without remoteblast module, because I  
>> have
>> already all my Blast output files.
>> My operating system is fedora core 9.
>>
>> Code:
>>
>> #!/usr/bin/perl -w
>>
>> use Bio::SearchIO;
>>
>> print "start program\n";
>> my $directory =
>> "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>> opendir(DIR, $directory) || die("Cannot open directory");
>> print "opened directory\n";
>>
>> foreach my $file (readdir(DIR))  {
>> print "read file\n";
>>
>> my $search = new Bio::SearchIO (-format => 'blast',
>>                                 -file => $file);
>>
>> my $cutoff_len = 10;
>>
>>
>>
>> #iterate over each query sequence
>> while (my $result = $search->next_result) {
>> print "entered 1st while loop\n";
>>
>>     #iterate over each hit on the query sequence
>>     while (my $hit = $result->next_hit) {
>>
>>         #iterate over each HSP in the hit
>>         while (my $hsp = $hit->next_hsp) {
>>
>>             if ($hsp->length('sbjct') <= $cutoff_len) {
>>                 #print $hsp->hit_string, "\n";
>>                 for ($hsp->hit_string) {
>>
>>
>>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>> tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>>
>>                         # Print some tab-delimited data about this  
>> HSP
>>
>>                            open (bigShot,  
>> ">>BlastOutputTrial.txt") ||
>> die ("Could not open file. $!");
>>                                 #print $result->query_name, "\t";
>>
>> #                        print $hit->significance, "\t";
>>                          print bigShot $hit->name, "-->";
>>                          print bigShot $hit->description, "\n";
>>                          #print bigShot "Query:   ",
>> $hsp->start('query'), "  ", $hsp->query_string, "  ",
>> $hsp->end('query'), "\n";
>>                          print bigShot "Seq:     ", $hsp->start 
>> ('hit'),
>> "  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
>>
>> #                        print $hsp->rank, "\t";
>> #                        print $hsp->percent_identity, "\t";
>> #                        print $hsp->evalue, "\t";
>> #                        print $hsp->hsp_length, "\n";
>>
>>                         close (bigShot);
>>
>>                     };
>>
>>
>>             }
>>         }
>>         }
>>     }
>> }
>>
>> }
>>
>> closedir(DIR);
>>
>>
>> Chris Fields wrote:
>>
>>> Make sure you ran a full installation of bioperl-1.5.1 or bioperl- 
>>> live
>> (not
>>> just the modules you want; mixing bioperl versions might work,  
>>> but you
>> might
>>> run into interoperability problems).  Then replace the
>> Bio::SearchIO::blast
>>> with the one in Bugzilla.  The 'other option' you mentioned might be
>> trying
>>> XML instead of text, which is more stable in the long run.  You will
>> still
>>> need to run a full upgrade to bioperl 1.5.1 for that; make sure  
>>> you read
>>> this:
>>>
>>> http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>>>
>>> If you're using SearchIO directly instead of Remoteblast, you  
>>> should be
>> able
>>> to set the '-readmethod' flag to 'blastxml'.
>>>
>>> It also wouldn't hurt to know what OS you're using or see some code.
>> Roger
>>> is out there somewhere (I think) and may also have some input.
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
>>>> Sent: Wednesday, February 08, 2006 3:41 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>> working, do you have any ohter idea, the problem I have is
>>>> that I have to parse a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer
>>>>>
>>>>>
>>>> version (1.5.1)
>>>>
>>>>
>>>>> or bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl
>>>>>
>>>>>
>>>> 1.5.1, the
>>>>
>>>>
>>>>> last problem (more recent, not related to the first) has
>>>>>
>>>>>
>>>> been fixed but
>>>>
>>>>
>>>>> hasn't been committed to bioperl-live yet.  The fixed
>>>>>
>>>>>
>>>> SearchIO::blast
>>>>
>>>>
>>>>> is available in the link above, but realize it hasn't been
>>>>>
>>>>>
>>>> committed yet and may change.
>>>>
>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab
>>>>> Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>
>>>>>>
>>>> Bio::SearchIO,
>>>>
>>>>
>>>>>> I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>> Blast.pl:21
>>>>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine parsing  
>>>>>> Blast
>>>>>> Output (version 2.2.12), but I don't remember which bioperl
>>>>>>
>>>>>>
>>>> version I
>>>>
>>>>
>>>>>> had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From golharam at umdnj.edu  Wed Feb  8 23:46:43 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 08 Feb 2006 23:46:43 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
Message-ID: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>

Does anyone know of tool to mutate a DNA sequence by a specified amount?
For instance, say I have a DNA sequence 1000 bases long, and I want to
simulate mutations to make it 75% (or 80%, etc) similar to the original.


Ryan



From torsten.seemann at infotech.monash.edu.au  Thu Feb  9 06:15:28 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 22:15:28 +1100
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <43EB2450.6000606@infotech.monash.edu.au>

Ryan,

> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.

The EMBOSS suite comes with a tool called "msbar" which can controllably 
mutate sequences:

http://emboss.sourceforge.net/apps/msbar.html

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From cjfields at uiuc.edu  Thu Feb  9 11:16:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 10:16:28 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu>
Message-ID: <001b01c62d94$2e8bee50$15327e82@pyrimidine>


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Thursday, February 09, 2006 9:13 AM
> To: Hubert Prielinger
> Cc: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> > hi chris,
> > thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, 
> > do you have any ohter idea, the problem I have is that I 
> have to parse 
> > a lot of textfiles....
> > or shall I look for another option to parse those files...
> >
> > regards
> > Hubert
> 
> 
> The code from Bioperl 1.5.1 works fine for me for blast 
> 2.2.13 reports but unless you post your blast report we can't 
> really determine the problem.
> 
> If you are still getting the same error like this I am not 
> convinced you have upgraded to 1.5.1 which includes a fix in 
> the fact that NCBI changed the HSP result format to remove 
> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
> as it was apparent sometime in September.
> 
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> If you are just getting no results but also no warnings wrt 
> parsing, are you sure your logic is correct?
> 
> If you remove your filters do you see all the HSPS?
> 
> 
> while (my $result = $search->next_result) {
>      print $result->query_name, "\n";
>      #iterate over each hit on the query sequence
>      while (my $hit = $result->next_hit) {
> 	print $hit->name, "\n";
>          #iterate over each HSP in the hit
>          while (my $hsp = $hit->next_hsp) {
> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>  >hit_string, "\n";	
>         }
>     }
> }

I tested some of the BLAST results that Hubert sent Roger and me with a
similar script to the above.  I removed the file parsing logic and it seemed
to work just fine.  It may very well be a logic issue or that he hasn't
installed the latest fix.
    
It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
though the returned output was from nr, the top of the blast output showed
that it was v2.2.12:  

BLASTP 2.2.12 [Aug-07-2005]

I double-checked my local version and it's definitely v.2.2.13:
-------------------------------------
C:\Perl\Scripts>blastcl3 -

blastcl3 2.2.13   arguments:...
-------------------------------------

If you use RemoteBlast using the same settings, the version in the header
looks like this:

BLASTP 2.2.13 [Nov-27-2005]

I'm wondering if all the blast executables (blast and netblast) from NCBI
have text output like v.2.2.12, while the wwwblast outputs a new format
(2.2.13).  I'll ask blast-help at NCBI about this.

> 
> To clarify some stuff -
> Chris I don't necessarily think the XML is best way forward 
> for BLAST reports generated locally, it isn't as detailed as 
> the Text format and it is what most people expect to be able 
> to scroll through and parse -- it is also harder for the 
> format to change dramatically if you have a static binary on 
> your machine =).  I think for remoteblast the XML format 
> should be the way forward but I expect Bioperl to maintain 
> support of any plain text BLAST report format that people use 
> on a regular basis.
> 

Does XML lack some specific info that text output has?  Didn't know that.  I
believe that XML should be default in RemoteBlast since it will not break,
but I agree with you about text output.  I also agree that it will need
somebody to maintain it constantly, much like RemoteBlast.

> -jason
> >
> >
> > Chris Fields wrote:
> >
> >> My guess is you're running into text parsing problems in 
> >> Bio::SearchIO::blast.  Upgrade to the latest developer version
> >> (1.5.1) or
> >> bioperl-live (CVS), then see the bug below.
> >>
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>
> >> I think the first problem you ran into is solved in bioperl 1.5.1, 
> >> the last problem (more recent, not related to the first) has been 
> >> fixed but hasn't been committed to bioperl-live yet.  The fixed 
> >> SearchIO::blast is available in the link above, but 
> realize it hasn't 
> >> been committed yet and may change.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org
> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>> Prielinger
> >>> Sent: Wednesday, February 08, 2006 2:52 PM
> >>> To: bioperl-l at bioperl.org
> >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>> output
> >>>
> >>> Hi,
> >>> If I want to parse a Blast Output (Version 2.2.12) with 
> >>> Bio::SearchIO, I get the following error message:
> >>>
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>
> >>> is that a bug......
> >>>
> >>> If I want to parse Blast Output (version 2.2.13), I don't get 
> >>> anything.....
> >>> I'm using bioperl 1.4
> >>>
> >>> before, I have installed bioperl 1.4, it worked fine 
> parsing Blast 
> >>> Output (version 2.2.12), but I don't remember which 
> bioperl version 
> >>> I had installed
> >>>
> >>> thanks in advance
> >>>
> >>> Hubert
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  



From cjfields at uiuc.edu  Thu Feb  9 12:53:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 11:53:24 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <200602080853.58889.heikki@sanbi.ac.za>
Message-ID: <000001c62da1$ba346ba0$15327e82@pyrimidine>

Heikki, 

I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and
two test data files to bugzilla.  The first data file is needed for normal
tests, the second is for testing parsing with modified data in the score tag
(using sprintf() in the RNAMotif descriptor).  I ran 'perl t\RNAMotif.t' and
they all passed.

Thanks!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Heikki Lehvaslaiho
> Sent: Wednesday, February 08, 2006 12:54 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> 
> Chris,
> 
> Post your files to bugzilla (ticket type enhancement, add 
> files to ticket after creation)  and someone with commit 
> ability will add them to CVS once the code is in satisfactory 
> condition. 
> 
> Thanks,
> 
> 	-Heikki
> 
> On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > I want to submit a module for parsing RNAMotif output 
> > (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning 
> > output and returning Bio::SeqFeature::Generic objects with 
> added tags 
> > for descriptors/sequences/file info.  I'm in the process of 
> writing up 
> > tests and going through biodesign to make sure everything's kosher, 
> > but the module itself is essentially ready-to-go.  What should I do 
> > next?
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason.stajich at duke.edu  Thu Feb  9 10:13:09 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 10:13:09 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6570.9070909@gmx.at>
References: <001101c62cfd$28605df0$15327e82@pyrimidine>
	<43EA6570.9070909@gmx.at>
Message-ID: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu>

On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> hi chris,
> thanks, I have upgraded to version 1.5.1 but it isn't still  
> working, do
> you have any ohter idea, the problem I have is that I have to parse a
> lot of textfiles....
> or shall I look for another option to parse those files...
>
> regards
> Hubert


The code from Bioperl 1.5.1 works fine for me for blast 2.2.13  
reports but unless you post your blast report we can't really  
determine the problem.

If you are still getting the same error like this I am not convinced  
you have upgraded to 1.5.1 which includes a fix in the fact that NCBI  
changed the HSP result format to remove the ':' from the Query/Sbjct  
prefixes.  We fixed this as soon as it was apparent sometime in  
September.

>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

If you are just getting no results but also no warnings wrt parsing,  
are you sure your logic is correct?

If you remove your filters do you see all the HSPS?


while (my $result = $search->next_result) {
     print $result->query_name, "\n";
     #iterate over each hit on the query sequence
     while (my $hit = $result->next_hit) {
	print $hit->name, "\n";
         #iterate over each HSP in the hit
         while (my $hsp = $hit->next_hsp) {
	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
 >hit_string, "\n";	
        }
    }
}


To clarify some stuff -
Chris I don't necessarily think the XML is best way forward for BLAST  
reports generated locally, it isn't as detailed as the Text format  
and it is what most people expect to be able to scroll through and  
parse -- it is also harder for the format to change dramatically if  
you have a static binary on your machine =).  I think for remoteblast  
the XML format should be the way forward but I expect Bioperl to  
maintain support of any plain text BLAST report format that people  
use on a regular basis.

-jason
>
>
> Chris Fields wrote:
>
>> My guess is you're running into text parsing problems in
>> Bio::SearchIO::blast.  Upgrade to the latest developer version  
>> (1.5.1) or
>> bioperl-live (CVS), then see the bug below.
>>
>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>
>> I think the first problem you ran into is solved in bioperl 1.5.1,  
>> the last
>> problem (more recent, not related to the first) has been fixed but  
>> hasn't
>> been committed to bioperl-live yet.  The fixed SearchIO::blast is  
>> available
>> in the link above, but realize it hasn't been committed yet and  
>> may change.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Hubert Prielinger
>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>> To: bioperl-l at bioperl.org
>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>> parsing Blast output
>>>
>>> Hi,
>>> If I want to parse a Blast Output (Version 2.2.12) with
>>> Bio::SearchIO, I get the following error message:
>>>
>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> is that a bug......
>>>
>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>> anything.....
>>> I'm using bioperl 1.4
>>>
>>> before, I have installed bioperl 1.4, it worked fine parsing
>>> Blast Output (version 2.2.12), but I don't remember which
>>> bioperl version I had installed
>>>
>>> thanks in advance
>>>
>>> Hubert
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From barry.m.dancis at gsk.com  Wed Feb  8 16:44:55 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Wed, 8 Feb 2006 16:44:55 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: <007701c62c37$7914af60$15327e82@pyrimidine>
Message-ID: 

Hi Chris--

        The problem I am solving is given a mature miRna name, how do I 
use it to search for its pre/pri miRna and vice versa. For example, how to 
go from mir-102a* to hsa-mir-102a-1*. Yes, I can write a parser for it, 
but I'm hoping that someone else has already done it and has some bells 
and whistles to go with it.  Below is a hierarchy chart of a data 
structure to hold the naming information. The parsing is not trivial and 
given data in that structure there could be all kinds of neat functions 
that return various aspects of the names.

Barry












"Chris Fields"  
Sent by: bioperl-l-bounces at lists.open-bio.org
07-Feb-2006 17:40
 
To
barry.m.dancis at gsk.com, "'bioperl-l'" 
cc

Subject
Re: [Bioperl-l] Handling miRNA's






Are you talking about sequences or text output from a specific program? If
you are talking about sequences in a particular format, then listen to
Brian.  If you are talking about output, then we need to know which 
program
you're using, as a parser may exist or could be built. 

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
> 
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 8775 bytes
Desc: not available
URL: 

From pmr at ebi.ac.uk  Thu Feb  9 03:25:24 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 9 Feb 2006 08:25:24 -0000 (GMT)
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <2714.86.132.216.50.1139473524.squirrel@webmail.ebi.ac.uk>

Ryan Golhar writes:

> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.

EMBOSS has the msbar program ("mutate sequence beyond all recognition")
which allows you to select the number and type of changes.

With some tuning of options to match the sequence length you should be
able to get results that match whatever your definition of 75% similar
might be (amazing how much more similarity you can get by adding gaps in
an alignment :-)

If you can specify a clear and generally useful way to define what you
need we could of course add a "percent change" option to the msbar program
for a future release.

Hope that helps,

Peter



From sofia at neuro.utah.edu  Thu Feb  9 13:00:05 2006
From: sofia at neuro.utah.edu (Sofia Robb)
Date: Thu, 09 Feb 2006 11:00:05 -0700
Subject: [Bioperl-l] Bio::Assembly::IO::phrap and Bio::Assembly::IO::ace
	with large files
Message-ID: <43EB8325.6050501@neuro.utah.edu>

I am having trouble parsing large (2030 contigs) phrap.out and ace.1 
files.  I have no problem with a small files (1 contig).  Here are the 
errors I get when try the code that is at the end of my email.  My 
script fails on this line:  my $assembly = $in->next_assembly;  I think 
it may be something to do with BTREE in Collection.pm, but have been 
unable to correct my errors.

-------

file with 2030 contigs
Bio::Assembly::IO::ace
Can't call method "get_dup" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 359,  line 
17699.

line 17699 of my ace file is the last line of the record for Contig253

------

file with 2030 contigs
Bio::Assembly::IO::phrap
Can't call method "put" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 225,  line 
39839. 

line 39839 of my phrap.out file is first line of the record for Contig253

------

use Bio::Assembly::IO;

my $filename = $ARGV[0];

my $in = Bio::Assembly::IO->new(-file=>"$filename",
                                -format=>"phrap"    #or -format=>"ace" 
for ace.1 files
                                );
my $assembly = $in->next_assembly;
my @contigs = $assembly->all_contigs();
foreach my $contig ($assembly->all_contigs){
        my $id = $contig->id();
        print "contig id = $id ";
        my $seqObj = $contig->get_consensus_sequence();
        my $seq = $seqObj->seq();
        print "is $seq\n";
}
my $id = $assembly->id();
print "$id\n";       

-----

Thanks for any input,
Sofia

Sofia Robb
Molecular Biology Ph.D Program
Sanchez Laboratory
Department of Neurobiology and Anatomy
University of Utah
http://planaria.neuro.utah.edu





From hubert.prielinger at gmx.at  Thu Feb  9 12:32:39 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 11:32:39 -0600
Subject: [Bioperl-l] zip file
In-Reply-To: 
References: <43EA75FF.7010504@gmx.at>
	
Message-ID: <43EB7CB7.7040602@gmx.at>

Hi Chris,
It doesn't work with the simple input line either, but I have tried my 
script on the command line with the file scanning part and it is 
working, but it takes more than 10 minutes!!!!!!!!!!! for reading one 
file and it doesn't create the output file, so there is no output. 
Before I run the script in the eclipse IDE.
I'm trying to upgrade to bioperl 1.5.1 once more, hopefully that's the 
problem, I have installed the from bioperl.org the core, run and ext part...
the output as you got it is just fine, but nevertheless I need the 
script with the file scanning part, because I have a lot of them.

to Roger: I have tried it with different files, but always the same 
result.....reads the files, but takes them a very long time and no 
Output result file


Hubert




Chris Fields wrote:

> Hubert,
>
> I tried this script out it and it managed to parse your reports.  I  
> removed the file scanning and replaced it with a simple arg line  
> input (i.e. script.pl blast_file).   I attached one of the output files.
>
> Chris
>
>
>
> #!perl
>
> $file = shift @ARGV;
>
> use Bio::SearchIO;
> my $cutoff_len = 10;
> my $searchio = Bio::SearchIO->new( -format => 'blast',
>                                    -file   =>  $file );
> while ( my $result = $searchio->next_result() ) {
>       while( my $hit = $result->next_hit ) {
>           while(my $hsp = $hit->next_hsp) {
>             if ($hsp->length('sbjct') <= $cutoff_len) {
>                 for ($hsp->hit_string) {
>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>                         tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>                          #Print some tab-delimited data about this HSP
>                            open (bigShot, ">>BlastOutputTrial.txt") ||
>                                  die ("Could not open file. $!");
>                          #print $result->query_name, "\t";
>                          #print $hit->significance, "\t";
>                          print bigShot $hit->name, "-->";
>                          print bigShot $hit->description, "\n";
>                          print bigShot "Query:   ",
>                          $hsp->start('query'), "  ", $hsp- 
> >query_string, "  ",
>                             $hsp->end('query'), "\n";
>                          print bigShot "Seq:     ", $hsp->start('hit'),
>                             "  ", $hsp->hit_string, "  ", 
> $hsp->end('hit'), "\n";
> #                        print $hsp->rank, "\t";
> #                        print $hsp->percent_identity, "\t";
> #                        print $hsp->evalue, "\t";
> #                        print $hsp->hsp_length, "\n";
>
>                         close (bigShot);
>
>                     };
>
>
>             }
>         }
>         }
>     }
> }
>
>------------------------------------------------------------------------
>
>  
>



From heikki at sanbi.ac.za  Thu Feb  9 09:54:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 16:54:30 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091654.30890.heikki@sanbi.ac.za>

Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of course,  
all that depends on what you will be using the sequences for. If you are 
going to draw evolutionary conclusions from those sequences, you must mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in every 
location. Mutations do occur multiple times in same places as sequences get 
saturated by mutations. Also, you should decide the relative occurrence of 
transversions versus transitions. Then there are indels; do you want to take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of Celera, 
one of the tools that enabled them to estimate the feasibility of the whole 
genome shotgun sequence assembly, was a very complete program to 'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 06:31:20 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 13:31:20 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091331.21690.heikki@sanbi.ac.za>

Ryan,

Instructions in pseudo code:

take the sequence string out of the object
use a hash to store changed locations
repeat 
    pick a location in the string randomly
    if the location is not in a hash , i.e. changed already, 
        change it into something else
    add the changed location into the hash
    if enough locations have been changed (scalar keys hash), exit loop
put the sequence string back into the seq object

   -Heikki   

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 06:31:20 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 13:31:20 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091331.21690.heikki@sanbi.ac.za>

Ryan,

Instructions in pseudo code:

take the sequence string out of the object
use a hash to store changed locations
repeat 
    pick a location in the string randomly
    if the location is not in a hash , i.e. changed already, 
        change it into something else
    add the changed location into the hash
    if enough locations have been changed (scalar keys hash), exit loop
put the sequence string back into the seq object

   -Heikki   

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From jason.stajich at duke.edu  Thu Feb  9 14:10:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 14:10:54 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>

Depending on whether or not you want to use evolutionary realistic  
models...
* evolver which comes with PAML lets you evolve sequences on a tree
* SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html? 
id=seqgen
also lets you do this
I believe there are PISE interfaces to both of these at the pasteur  
bioweb site - http://bioweb.pasteur.fr/

-jason
On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote:

> Does anyone know of tool to mutate a DNA sequence by a specified  
> amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the  
> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From heikki at sanbi.ac.za  Thu Feb  9 09:54:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 16:54:30 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091654.30890.heikki@sanbi.ac.za>

Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of course,  
all that depends on what you will be using the sequences for. If you are 
going to draw evolutionary conclusions from those sequences, you must mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in every 
location. Mutations do occur multiple times in same places as sequences get 
saturated by mutations. Also, you should decide the relative occurrence of 
transversions versus transitions. Then there are indels; do you want to take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of Celera, 
one of the tools that enabled them to estimate the feasibility of the whole 
genome shotgun sequence assembly, was a very complete program to 'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 14:41:33 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 21:41:33 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <000001c62da1$ba346ba0$15327e82@pyrimidine>
References: <000001c62da1$ba346ba0$15327e82@pyrimidine>
Message-ID: <200602092141.34401.heikki@sanbi.ac.za>

Chris,

I committed your file. All tests pass; code looks like written by a long term 
bioperl contributor! Impressive.

I truncated the larger test file from 270K to 20K (200 lines), to not bloat 
the distribution unnecessarily. Tests pass which is the main thing. Shout if 
if you disagree.

Great job!

	-Heikki
 

On Thursday 09 February 2006 19:53, Chris Fields wrote:
> Heikki,
>
> I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and
> two test data files to bugzilla.  The first data file is needed for normal
> tests, the second is for testing parsing with modified data in the score
> tag (using sprintf() in the RNAMotif descriptor).  I ran 'perl
> t\RNAMotif.t' and they all passed.
>
> Thanks!
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > Heikki Lehvaslaiho
> > Sent: Wednesday, February 08, 2006 12:54 AM
> > To: bioperl-l at lists.open-bio.org
> > Cc: Chris Fields
> > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> >
> > Chris,
> >
> > Post your files to bugzilla (ticket type enhancement, add
> > files to ticket after creation)  and someone with commit
> > ability will add them to CVS once the code is in satisfactory
> > condition.
> >
> > Thanks,
> >
> > 	-Heikki
> >
> > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > I want to submit a module for parsing RNAMotif output
> > > (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> > > output and returning Bio::SeqFeature::Generic objects with
> >
> > added tags
> >
> > > for descriptors/sequences/file info.  I'm in the process of
> >
> > writing up
> >
> > > tests and going through biodesign to make sure everything's kosher,
> > > but the module itself is essentially ready-to-go.  What should I do
> > > next?
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hubert.prielinger at gmx.at  Thu Feb  9 15:13:31 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 14:13:31 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blast	output
In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
Message-ID: <43EBA26B.4010907@gmx.at>

dear roger,
this error message I got, when I tried to parse Blast output (version 
2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot 
of Blast output files
with version 2.2.13 and for that I don't get any error message.....it 
just doesn't work

Hubert



Roger Hall wrote:

>Guys - I'm looking at the error message:
>
>MSG: no data for midline Query  1   WWWKWRW  7
>STACK Bio::SearchIO::blast::next_result
>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>STACK toplevel
>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
>This is my line of thought:
>1. "no data for midline $_" is a unique message generated by blast.pm in one
>location only at the point of a. reading three lines b. dropping lines with
>spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
>2. There is a regexp match that fails in order to reach that error message
>3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
>4. It does anyway
>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
>reports
>
>I suspect a newline/chomp/metacharacter issue. Not finding the string
>anywhere has me thoroughly confused - I asked Hubert for the additional
>file, assuming that I didn't have it.
>
>My next thought is to write a quick script to test perl behavior on "Fedora
>Core 9".
>
>Thoughts?
>
>Did I misread the issue entirely? :}
>
>Roger
>
>
>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>Sent: Thursday, February 09, 2006 10:16 AM
>To: 'Jason Stajich'; 'Hubert Prielinger'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
>output
>
>
>  
>
>>-----Original Message-----
>>From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>>Sent: Thursday, February 09, 2006 9:13 AM
>>To: Hubert Prielinger
>>Cc: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>    
>>
>>>hi chris,
>>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>>      
>>>
>>working, 
>>    
>>
>>>do you have any ohter idea, the problem I have is that I 
>>>      
>>>
>>have to parse 
>>    
>>
>>>a lot of textfiles....
>>>or shall I look for another option to parse those files...
>>>
>>>regards
>>>Hubert
>>>      
>>>
>>The code from Bioperl 1.5.1 works fine for me for blast 
>>2.2.13 reports but unless you post your blast report we can't 
>>really determine the problem.
>>
>>If you are still getting the same error like this I am not 
>>convinced you have upgraded to 1.5.1 which includes a fix in 
>>the fact that NCBI changed the HSP result format to remove 
>>the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
>>as it was apparent sometime in September.
>>
>>    
>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>
>>>>>          
>>>>>
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>>If you are just getting no results but also no warnings wrt 
>>parsing, are you sure your logic is correct?
>>
>>If you remove your filters do you see all the HSPS?
>>
>>
>>while (my $result = $search->next_result) {
>>     print $result->query_name, "\n";
>>     #iterate over each hit on the query sequence
>>     while (my $hit = $result->next_hit) {
>>	print $hit->name, "\n";
>>         #iterate over each HSP in the hit
>>         while (my $hsp = $hit->next_hsp) {
>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>> >hit_string, "\n";	
>>        }
>>    }
>>}
>>    
>>
>
>I tested some of the BLAST results that Hubert sent Roger and me with a
>similar script to the above.  I removed the file parsing logic and it seemed
>to work just fine.  It may very well be a logic issue or that he hasn't
>installed the latest fix.
>    
>It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
>though the returned output was from nr, the top of the blast output showed
>that it was v2.2.12:  
>
>BLASTP 2.2.12 [Aug-07-2005]
>
>I double-checked my local version and it's definitely v.2.2.13:
>-------------------------------------
>C:\Perl\Scripts>blastcl3 -
>
>blastcl3 2.2.13   arguments:...
>-------------------------------------
>
>If you use RemoteBlast using the same settings, the version in the header
>looks like this:
>
>BLASTP 2.2.13 [Nov-27-2005]
>
>I'm wondering if all the blast executables (blast and netblast) from NCBI
>have text output like v.2.2.12, while the wwwblast outputs a new format
>(2.2.13).  I'll ask blast-help at NCBI about this.
>
>  
>
>>To clarify some stuff -
>>Chris I don't necessarily think the XML is best way forward 
>>for BLAST reports generated locally, it isn't as detailed as 
>>the Text format and it is what most people expect to be able 
>>to scroll through and parse -- it is also harder for the 
>>format to change dramatically if you have a static binary on 
>>your machine =).  I think for remoteblast the XML format 
>>should be the way forward but I expect Bioperl to maintain 
>>support of any plain text BLAST report format that people use 
>>on a regular basis.
>>
>>    
>>
>
>Does XML lack some specific info that text output has?  Didn't know that.  I
>believe that XML should be default in RemoteBlast since it will not break,
>but I agree with you about text output.  I also agree that it will need
>somebody to maintain it constantly, much like RemoteBlast.
>
>  
>
>>-jason
>>    
>>
>>>Chris Fields wrote:
>>>
>>>      
>>>
>>>>My guess is you're running into text parsing problems in 
>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>(1.5.1) or
>>>>bioperl-live (CVS), then see the bug below.
>>>>
>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>>I think the first problem you ran into is solved in bioperl 1.5.1, 
>>>>the last problem (more recent, not related to the first) has been 
>>>>fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>>SearchIO::blast is available in the link above, but 
>>>>        
>>>>
>>realize it hasn't 
>>    
>>
>>>>been committed yet and may change.
>>>>
>>>>Christopher Fields
>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>>University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>>Prielinger
>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>To: bioperl-l at bioperl.org
>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>>          
>>>>>
>>parsing Blast 
>>    
>>
>>>>>output
>>>>>
>>>>>Hi,
>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>>Bio::SearchIO, I get the following error message:
>>>>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>
>>>>>          
>>>>>
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>    
>>
>>>>>is that a bug......
>>>>>
>>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>>anything.....
>>>>>I'm using bioperl 1.4
>>>>>
>>>>>before, I have installed bioperl 1.4, it worked fine 
>>>>>          
>>>>>
>>parsing Blast 
>>    
>>
>>>>>Output (version 2.2.12), but I don't remember which 
>>>>>          
>>>>>
>>bioperl version 
>>    
>>
>>>>>I had installed
>>>>>
>>>>>thanks in advance
>>>>>
>>>>>Hubert
>>>>>
>>>>>
>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>      
>>>
>>--
>>Jason Stajich
>>Duke University
>>http://www.duke.edu/~jes12
>>
>>    
>>
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>



From rahall2 at ualr.edu  Thu Feb  9 15:09:52 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Thu, 09 Feb 2006 14:09:52 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blast	output
In-Reply-To: <001b01c62d94$2e8bee50$15327e82@pyrimidine>
Message-ID: <004301c62db4$c9bcbab0$d416a790@LIBERAL>

Guys - I'm looking at the error message:

MSG: no data for midline Query  1   WWWKWRW  7
STACK Bio::SearchIO::blast::next_result
/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
STACK toplevel
/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

This is my line of thought:
1. "no data for midline $_" is a unique message generated by blast.pm in one
location only at the point of a. reading three lines b. dropping lines with
spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
2. There is a regexp match that fails in order to reach that error message
3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
4. It does anyway
5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
reports

I suspect a newline/chomp/metacharacter issue. Not finding the string
anywhere has me thoroughly confused - I asked Hubert for the additional
file, assuming that I didn't have it.

My next thought is to write a quick script to test perl behavior on "Fedora
Core 9".

Thoughts?

Did I misread the issue entirely? :}

Roger


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, February 09, 2006 10:16 AM
To: 'Jason Stajich'; 'Hubert Prielinger'
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
output


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Thursday, February 09, 2006 9:13 AM
> To: Hubert Prielinger
> Cc: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> > hi chris,
> > thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, 
> > do you have any ohter idea, the problem I have is that I 
> have to parse 
> > a lot of textfiles....
> > or shall I look for another option to parse those files...
> >
> > regards
> > Hubert
> 
> 
> The code from Bioperl 1.5.1 works fine for me for blast 
> 2.2.13 reports but unless you post your blast report we can't 
> really determine the problem.
> 
> If you are still getting the same error like this I am not 
> convinced you have upgraded to 1.5.1 which includes a fix in 
> the fact that NCBI changed the HSP result format to remove 
> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
> as it was apparent sometime in September.
> 
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> If you are just getting no results but also no warnings wrt 
> parsing, are you sure your logic is correct?
> 
> If you remove your filters do you see all the HSPS?
> 
> 
> while (my $result = $search->next_result) {
>      print $result->query_name, "\n";
>      #iterate over each hit on the query sequence
>      while (my $hit = $result->next_hit) {
> 	print $hit->name, "\n";
>          #iterate over each HSP in the hit
>          while (my $hsp = $hit->next_hsp) {
> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>  >hit_string, "\n";	
>         }
>     }
> }

I tested some of the BLAST results that Hubert sent Roger and me with a
similar script to the above.  I removed the file parsing logic and it seemed
to work just fine.  It may very well be a logic issue or that he hasn't
installed the latest fix.
    
It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
though the returned output was from nr, the top of the blast output showed
that it was v2.2.12:  

BLASTP 2.2.12 [Aug-07-2005]

I double-checked my local version and it's definitely v.2.2.13:
-------------------------------------
C:\Perl\Scripts>blastcl3 -

blastcl3 2.2.13   arguments:...
-------------------------------------

If you use RemoteBlast using the same settings, the version in the header
looks like this:

BLASTP 2.2.13 [Nov-27-2005]

I'm wondering if all the blast executables (blast and netblast) from NCBI
have text output like v.2.2.12, while the wwwblast outputs a new format
(2.2.13).  I'll ask blast-help at NCBI about this.

> 
> To clarify some stuff -
> Chris I don't necessarily think the XML is best way forward 
> for BLAST reports generated locally, it isn't as detailed as 
> the Text format and it is what most people expect to be able 
> to scroll through and parse -- it is also harder for the 
> format to change dramatically if you have a static binary on 
> your machine =).  I think for remoteblast the XML format 
> should be the way forward but I expect Bioperl to maintain 
> support of any plain text BLAST report format that people use 
> on a regular basis.
> 

Does XML lack some specific info that text output has?  Didn't know that.  I
believe that XML should be default in RemoteBlast since it will not break,
but I agree with you about text output.  I also agree that it will need
somebody to maintain it constantly, much like RemoteBlast.

> -jason
> >
> >
> > Chris Fields wrote:
> >
> >> My guess is you're running into text parsing problems in 
> >> Bio::SearchIO::blast.  Upgrade to the latest developer version
> >> (1.5.1) or
> >> bioperl-live (CVS), then see the bug below.
> >>
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>
> >> I think the first problem you ran into is solved in bioperl 1.5.1, 
> >> the last problem (more recent, not related to the first) has been 
> >> fixed but hasn't been committed to bioperl-live yet.  The fixed 
> >> SearchIO::blast is available in the link above, but 
> realize it hasn't 
> >> been committed yet and may change.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org
> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>> Prielinger
> >>> Sent: Wednesday, February 08, 2006 2:52 PM
> >>> To: bioperl-l at bioperl.org
> >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>> output
> >>>
> >>> Hi,
> >>> If I want to parse a Blast Output (Version 2.2.12) with 
> >>> Bio::SearchIO, I get the following error message:
> >>>
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>
> >>> is that a bug......
> >>>
> >>> If I want to parse Blast Output (version 2.2.13), I don't get 
> >>> anything.....
> >>> I'm using bioperl 1.4
> >>>
> >>> before, I have installed bioperl 1.4, it worked fine 
> parsing Blast 
> >>> Output (version 2.2.12), but I don't remember which 
> bioperl version 
> >>> I had installed
> >>>
> >>> thanks in advance
> >>>
> >>> Hubert
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From Lalancettec at AGR.GC.CA  Thu Feb  9 15:53:10 2006
From: Lalancettec at AGR.GC.CA (Lalancette, Claudia)
Date: Thu, 9 Feb 2006 15:53:10 -0500
Subject: [Bioperl-l] module for finding restriction site in batch of
	sequences?
Message-ID: 

Greetings,

 

I need to find a way to look for a specific restriction enzyme site in
hundreds of sequences.  Been looking at Bio::Restriction, but not sure
if will work...  Any suggestions?

 

Thanks,

Claudia

 

 




From cjfields at uiuc.edu  Thu Feb  9 16:25:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 15:25:01 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <200602092141.34401.heikki@sanbi.ac.za>
Message-ID: <000901c62dbf$49bfae20$15327e82@pyrimidine>

Thanks!  I think, as long as the tests pass everything is fine with me.  I
may be submitting another module or two in the next few weeks; just depends
on how much time I can spend on them.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za] 
> Sent: Thursday, February 09, 2006 1:42 PM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> 
> Chris,
> 
> I committed your file. All tests pass; code looks like 
> written by a long term bioperl contributor! Impressive.
> 
> I truncated the larger test file from 270K to 20K (200 
> lines), to not bloat the distribution unnecessarily. Tests 
> pass which is the main thing. Shout if if you disagree.
> 
> Great job!
> 
> 	-Heikki
>  
> 
> On Thursday 09 February 2006 19:53, Chris Fields wrote:
> > Heikki,
> >
> > I've added the Bio::Tools::RNAMotif module with test suite 
> (24 tests) 
> > and two test data files to bugzilla.  The first data file is needed 
> > for normal tests, the second is for testing parsing with 
> modified data 
> > in the score tag (using sprintf() in the RNAMotif 
> descriptor).  I ran 
> > 'perl t\RNAMotif.t' and they all passed.
> >
> > Thanks!
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki 
> > > Lehvaslaiho
> > > Sent: Wednesday, February 08, 2006 12:54 AM
> > > To: bioperl-l at lists.open-bio.org
> > > Cc: Chris Fields
> > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> > >
> > > Chris,
> > >
> > > Post your files to bugzilla (ticket type enhancement, add 
> files to 
> > > ticket after creation)  and someone with commit ability will add 
> > > them to CVS once the code is in satisfactory condition.
> > >
> > > Thanks,
> > >
> > > 	-Heikki
> > >
> > > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > > I want to submit a module for parsing RNAMotif output 
> > > > (Bio::Tools::RNAMotif).  It is capable, at the moment, 
> of scanning 
> > > > output and returning Bio::SeqFeature::Generic objects with
> > >
> > > added tags
> > >
> > > > for descriptors/sequences/file info.  I'm in the process of
> > >
> > > writing up
> > >
> > > > tests and going through biodesign to make sure everything's 
> > > > kosher, but the module itself is essentially ready-to-go.  What 
> > > > should I do next?
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher
> > > > Lab of Dr. Robert Switzer
> > > > Dept of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > ______ _/      
> _/_____________________________________________________
> > >       _/      _/
> > >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> > >    _/  _/  _/  SANBI, South African National 
> Bioinformatics Institute
> > >   _/  _/  _/  University of Western Cape, South Africa
> > >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > ___ 
> > > _/_/_/_/_/________________________________________________________
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ 
> _/_/_/_/_/________________________________________________________



From golharam at umdnj.edu  Thu Feb  9 16:19:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 09 Feb 2006 16:19:46 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za>
Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1>

Thanks all.  The responses I got were definitely more than helpful.  FYI
- I did initially look at msbar.  I glanced over the "Number of times to
perform mutation operations", which is what I was looking for.  

I'm looking to statistically test some simply scoring matrices.  I think
msbar will do.

Ryan

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Thursday, February 09, 2006 9:55 AM
To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu
Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l';
emboss at emboss.open-bio.org
Subject: Re: [Bioperl-l] Tool to mutate DNA sequence


Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of
course,  
all that depends on what you will be using the sequences for. If you are

going to draw evolutionary conclusions from those sequences, you must
mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in
every 
location. Mutations do occur multiple times in same places as sequences
get 
saturated by mutations. Also, you should decide the relative occurrence
of 
transversions versus transitions. Then there are indels; do you want to
take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of
Celera, 
one of the tools that enabled them to estimate the feasibility of the
whole 
genome shotgun sequence assembly, was a very complete program to
'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of
that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified 
> amount? For instance, say I have a DNA sequence 1000 bases long, and I

> want to simulate mutations to make it 75% (or 80%, etc) similar to the

> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From golharam at umdnj.edu  Thu Feb  9 16:19:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 09 Feb 2006 16:19:46 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za>
Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1>

Thanks all.  The responses I got were definitely more than helpful.  FYI
- I did initially look at msbar.  I glanced over the "Number of times to
perform mutation operations", which is what I was looking for.  

I'm looking to statistically test some simply scoring matrices.  I think
msbar will do.

Ryan

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Thursday, February 09, 2006 9:55 AM
To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu
Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l';
emboss at emboss.open-bio.org
Subject: Re: [Bioperl-l] Tool to mutate DNA sequence


Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of
course,  
all that depends on what you will be using the sequences for. If you are

going to draw evolutionary conclusions from those sequences, you must
mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in
every 
location. Mutations do occur multiple times in same places as sequences
get 
saturated by mutations. Also, you should decide the relative occurrence
of 
transversions versus transitions. Then there are indels; do you want to
take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of
Celera, 
one of the tools that enabled them to estimate the feasibility of the
whole 
genome shotgun sequence assembly, was a very complete program to
'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of
that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified 
> amount? For instance, say I have a DNA sequence 1000 bases long, and I

> want to simulate mutations to make it 75% (or 80%, etc) similar to the

> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From injunjoel at hotmail.com  Thu Feb  9 16:33:45 2006
From: injunjoel at hotmail.com (Joel Steele)
Date: Thu, 09 Feb 2006 13:33:45 -0800
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast
	output
In-Reply-To: <43EBA26B.4010907@gmx.at>
Message-ID: 

Greetings again,
Its the colon...
observe.

-=Code Snippet=-
#!/usr/bin/perl -w
use strict;

#the string as reported from your error.
my $string1 = 'Query  1   WWWKWRW  7';

#your string with a colon thrown in for testing.
my $string2 = 'Query:  1   WWWKWRW  7';

foreach ($string1, $string2){
	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
		print "Match Found in $_\n";
		print $1."\n";
		print $2."\n";
		print $3."\n";
		print $4."\n";
		print $5."\n";
	}else{
		print "no Match for $_\n";
	}
}

-=End Code=-

The Output

-=Code Snippet=-
no Match for Query  1   WWWKWRW  7
Match Found in Query:  1   WWWKWRW  7
Query:  1
Query
1
WWWKWRW
7

-=End Code=-


Now I would suggest changing the regexp

From:
/^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

To:
/^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

in SearchIO::Blast.

General suggestion:
Again I would like to suggest that everyone get use to using the strict 
pragma. Though it may not applicable to this particular problem it becomes 
essential if you wish progress in your use of Perl.
It is a core module so there is nothing to download from CPAN. It helps with 
development and once your code can run without warnings and errors you can 
remove it. This is not a targeted attack as some may interpret it, rather a 
general FYI for those out there new to Perl or programming in general. 
Better to start learning the rules early before bad habits creep in.
One more thing. There is a wonderfully supportive Perl community available 
to anyone who wants to join at PerlMonks.org check it out, who knows you may 
even catch a glimpse of Larry Wall while youre there.

-Joel Steele

"The surest way to corrupt a youth is to instruct him to hold in higher 
regard those who think alike than those who think differently." -Nietzsche

"I do not feel obliged to believe that the same God who endowed us with 
sense, reason and intellect has intended us to forego their use." -Galileo




>From: Hubert Prielinger 
>To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields 
>,        Jason Stajich 
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>parsingBlast	output
>Date: Thu, 09 Feb 2006 14:13:31 -0600
>MIME-Version: 1.0
>Received: from newportal.open-bio.org ([209.59.5.172]) by 
>bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9 
>Feb 2006 13:14:17 -0800
>Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by 
>newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k19LAD2j009778;Thu, 9 
>Feb 2006 16:10:49 -0500
>Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by 
>newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for 
>; Thu, 9 Feb 2006 16:09:59 -0500
>Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) 
>[136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 22:10:05 
>+0100
>X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>X-Authenticated: #16854991
>User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>X-Accept-Language: en-us, en
>References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>X-Y-GMX-Trusted: 0
>X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 
>(newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 -0500 (EST)
>X-Greylist: IP, sender and recipient auto-whitelisted, not delayed 
>bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Thu, 09 
>Feb 2006 16:09:59 -0500 (EST)
>X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>X-Scanned-By: MIMEDefang 2.52
>X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>X-BeenThere: bioperl-l at lists.open-bio.org
>X-Mailman-Version: 2.1.7
>Precedence: list
>List-Id: Bioperl Project Discussion List 
>List-Unsubscribe: 
>,
>List-Archive: 
>List-Post: 
>List-Help: 
>List-Subscribe: 
>,
>Errors-To: bioperl-l-bounces at lists.open-bio.org
>Return-Path: bioperl-l-bounces at lists.open-bio.org
>X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) 
>FILETIME=[C95D94A0:01C62DBD]
>
>dear roger,
>this error message I got, when I tried to parse Blast output (version
>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot
>of Blast output files
>with version 2.2.13 and for that I don't get any error message.....it
>just doesn't work
>
>Hubert
>
>
>
>Roger Hall wrote:
>
> >Guys - I'm looking at the error message:
> >
> >MSG: no data for midline Query  1   WWWKWRW  7
> >STACK Bio::SearchIO::blast::next_result
> >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >STACK toplevel
> >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >
> >This is my line of thought:
> >1. "no data for midline $_" is a unique message generated by blast.pm in 
>one
> >location only at the point of a. reading three lines b. dropping lines 
>with
> >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 
>3)
> >2. There is a regexp match that fails in order to reach that error 
>message
> >3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
> >4. It does anyway
> >5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
> >reports
> >
> >I suspect a newline/chomp/metacharacter issue. Not finding the string
> >anywhere has me thoroughly confused - I asked Hubert for the additional
> >file, assuming that I didn't have it.
> >
> >My next thought is to write a quick script to test perl behavior on 
>"Fedora
> >Core 9".
> >
> >Thoughts?
> >
> >Did I misread the issue entirely? :}
> >
> >Roger
> >
> >
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Thursday, February 09, 2006 10:16 AM
> >To: 'Jason Stajich'; 'Hubert Prielinger'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> >output
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >>Sent: Thursday, February 09, 2006 9:13 AM
> >>To: Hubert Prielinger
> >>Cc: Chris Fields; bioperl-l at bioperl.org
> >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>parsing Blast output
> >>
> >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> >>
> >>
> >>>hi chris,
> >>>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>>
> >>>
> >>working,
> >>
> >>
> >>>do you have any ohter idea, the problem I have is that I
> >>>
> >>>
> >>have to parse
> >>
> >>
> >>>a lot of textfiles....
> >>>or shall I look for another option to parse those files...
> >>>
> >>>regards
> >>>Hubert
> >>>
> >>>
> >>The code from Bioperl 1.5.1 works fine for me for blast
> >>2.2.13 reports but unless you post your blast report we can't
> >>really determine the problem.
> >>
> >>If you are still getting the same error like this I am not
> >>convinced you have upgraded to 1.5.1 which includes a fix in
> >>the fact that NCBI changed the HSP result format to remove
> >>the ':' from the Query/Sbjct prefixes.  We fixed this as soon
> >>as it was apparent sometime in September.
> >>
> >>
> >>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>If you are just getting no results but also no warnings wrt
> >>parsing, are you sure your logic is correct?
> >>
> >>If you remove your filters do you see all the HSPS?
> >>
> >>
> >>while (my $result = $search->next_result) {
> >>     print $result->query_name, "\n";
> >>     #iterate over each hit on the query sequence
> >>     while (my $hit = $result->next_hit) {
> >>	print $hit->name, "\n";
> >>         #iterate over each HSP in the hit
> >>         while (my $hsp = $hit->next_hsp) {
> >>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
> >> >hit_string, "\n";
> >>        }
> >>    }
> >>}
> >>
> >>
> >
> >I tested some of the BLAST results that Hubert sent Roger and me with a
> >similar script to the above.  I removed the file parsing logic and it 
>seemed
> >to work just fine.  It may very well be a logic issue or that he hasn't
> >installed the latest fix.
> >
> >It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), 
>even
> >though the returned output was from nr, the top of the blast output 
>showed
> >that it was v2.2.12:
> >
> >BLASTP 2.2.12 [Aug-07-2005]
> >
> >I double-checked my local version and it's definitely v.2.2.13:
> >-------------------------------------
> >C:\Perl\Scripts>blastcl3 -
> >
> >blastcl3 2.2.13   arguments:...
> >-------------------------------------
> >
> >If you use RemoteBlast using the same settings, the version in the header
> >looks like this:
> >
> >BLASTP 2.2.13 [Nov-27-2005]
> >
> >I'm wondering if all the blast executables (blast and netblast) from NCBI
> >have text output like v.2.2.12, while the wwwblast outputs a new format
> >(2.2.13).  I'll ask blast-help at NCBI about this.
> >
> >
> >
> >>To clarify some stuff -
> >>Chris I don't necessarily think the XML is best way forward
> >>for BLAST reports generated locally, it isn't as detailed as
> >>the Text format and it is what most people expect to be able
> >>to scroll through and parse -- it is also harder for the
> >>format to change dramatically if you have a static binary on
> >>your machine =).  I think for remoteblast the XML format
> >>should be the way forward but I expect Bioperl to maintain
> >>support of any plain text BLAST report format that people use
> >>on a regular basis.
> >>
> >>
> >>
> >
> >Does XML lack some specific info that text output has?  Didn't know that. 
>  I
> >believe that XML should be default in RemoteBlast since it will not 
>break,
> >but I agree with you about text output.  I also agree that it will need
> >somebody to maintain it constantly, much like RemoteBlast.
> >
> >
> >
> >>-jason
> >>
> >>
> >>>Chris Fields wrote:
> >>>
> >>>
> >>>
> >>>>My guess is you're running into text parsing problems in
> >>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
> >>>>(1.5.1) or
> >>>>bioperl-live (CVS), then see the bug below.
> >>>>
> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>
> >>>>I think the first problem you ran into is solved in bioperl 1.5.1,
> >>>>the last problem (more recent, not related to the first) has been
> >>>>fixed but hasn't been committed to bioperl-live yet.  The fixed
> >>>>SearchIO::blast is available in the link above, but
> >>>>
> >>>>
> >>realize it hasn't
> >>
> >>
> >>>>been committed yet and may change.
> >>>>
> >>>>Christopher Fields
> >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
> >>>>University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>-----Original Message-----
> >>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
> >>>>>Prielinger
> >>>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>>To: bioperl-l at bioperl.org
> >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>output
> >>>>>
> >>>>>Hi,
> >>>>>If I want to parse a Blast Output (Version 2.2.12) with
> >>>>>Bio::SearchIO, I get the following error message:
> >>>>>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>
> >>>>>is that a bug......
> >>>>>
> >>>>>If I want to parse Blast Output (version 2.2.13), I don't get
> >>>>>anything.....
> >>>>>I'm using bioperl 1.4
> >>>>>
> >>>>>before, I have installed bioperl 1.4, it worked fine
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>Output (version 2.2.12), but I don't remember which
> >>>>>
> >>>>>
> >>bioperl version
> >>
> >>
> >>>>>I had installed
> >>>>>
> >>>>>thanks in advance
> >>>>>
> >>>>>Hubert
> >>>>>
> >>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Bioperl-l mailing list
> >>>>>Bioperl-l at lists.open-bio.org
> >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at lists.open-bio.org
> >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>--
> >>Jason Stajich
> >>Duke University
> >>http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l




From jason.stajich at duke.edu  Thu Feb  9 17:13:16 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 17:13:16 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast
	output
In-Reply-To: 
References: 
Message-ID: 

Uh, that was done in sept see the CVS log...

On Feb 9, 2006, at 4:33 PM, Joel Steele wrote:

> Greetings again,
> Its the colon...
> observe.
>
> -=Code Snippet=-
> #!/usr/bin/perl -w
> use strict;
>
> #the string as reported from your error.
> my $string1 = 'Query  1   WWWKWRW  7';
>
> #your string with a colon thrown in for testing.
> my $string2 = 'Query:  1   WWWKWRW  7';
>
> foreach ($string1, $string2){
> 	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
> 		print "Match Found in $_\n";
> 		print $1."\n";
> 		print $2."\n";
> 		print $3."\n";
> 		print $4."\n";
> 		print $5."\n";
> 	}else{
> 		print "no Match for $_\n";
> 	}
> }
>
> -=End Code=-
>
> The Output
>
> -=Code Snippet=-
> no Match for Query  1   WWWKWRW  7
> Match Found in Query:  1   WWWKWRW  7
> Query:  1
> Query
> 1
> WWWKWRW
> 7
>
> -=End Code=-
>
>
> Now I would suggest changing the regexp
>
> From:
> /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> To:
> /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> in SearchIO::Blast.
>
> General suggestion:
> Again I would like to suggest that everyone get use to using the  
> strict
> pragma. Though it may not applicable to this particular problem it  
> becomes
> essential if you wish progress in your use of Perl.
> It is a core module so there is nothing to download from CPAN. It  
> helps with
> development and once your code can run without warnings and errors  
> you can
> remove it. This is not a targeted attack as some may interpret it,  
> rather a
> general FYI for those out there new to Perl or programming in general.
> Better to start learning the rules early before bad habits creep in.
> One more thing. There is a wonderfully supportive Perl community  
> available
> to anyone who wants to join at PerlMonks.org check it out, who  
> knows you may
> even catch a glimpse of Larry Wall while youre there.
>
> -Joel Steele
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger 
>> To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields
>> ,        Jason Stajich 
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>> parsingBlast	output
>> Date: Thu, 09 Feb 2006 14:13:31 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211);  
>> Thu, 9
>> Feb 2006 13:14:17 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k19LAD2j009778;Thu, 9
>> Feb 2006 16:10:49 -0500
>> Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for
>> ; Thu, 9 Feb 2006 16:09:59 -0500
>> Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006  
>> 22:10:05
>> +0100
>> X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Thu, 09
>> Feb 2006 16:09:59 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List > bio.org>
>> List-Unsubscribe:
>> > l>,
>> List-Archive: 
>> List-Post: 
>> List-Help: 
>> List-Subscribe:
>> > l>,
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC)
>> FILETIME=[C95D94A0:01C62DBD]
>>
>> dear roger,
>> this error message I got, when I tried to parse Blast output (version
>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>> a lot
>> of Blast output files
>> with version 2.2.13 and for that I don't get any error message.....it
>> just doesn't work
>>
>> Hubert
>>
>>
>>
>> Roger Hall wrote:
>>
>>> Guys - I'm looking at the error message:
>>>
>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> This is my line of thought:
>>> 1. "no data for midline $_" is a unique message generated by  
>>> blast.pm in
>> one
>>> location only at the point of a. reading three lines b. dropping  
>>> lines
>> with
>>> spaces only c. identifying the Query, Midline, and Match lines (0  
>>> <= $i <
>> 3)
>>> 2. There is a regexp match that fails in order to reach that error
>> message
>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>>> expression
>>> 4. It does anyway
>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in  
>>> the blast
>>> reports
>>>
>>> I suspect a newline/chomp/metacharacter issue. Not finding the  
>>> string
>>> anywhere has me thoroughly confused - I asked Hubert for the  
>>> additional
>>> file, assuming that I didn't have it.
>>>
>>> My next thought is to write a quick script to test perl behavior on
>> "Fedora
>>> Core 9".
>>>
>>> Thoughts?
>>>
>>> Did I misread the issue entirely? :}
>>>
>>> Roger
>>>
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>>> Fields
>>> Sent: Thursday, February 09, 2006 10:16 AM
>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>> Cc: bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast
>>> output
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>> To: Hubert Prielinger
>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi chris,
>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>
>>>>>
>>>> working,
>>>>
>>>>
>>>>> do you have any ohter idea, the problem I have is that I
>>>>>
>>>>>
>>>> have to parse
>>>>
>>>>
>>>>> a lot of textfiles....
>>>>> or shall I look for another option to parse those files...
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>> 2.2.13 reports but unless you post your blast report we can't
>>>> really determine the problem.
>>>>
>>>> If you are still getting the same error like this I am not
>>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>>> the fact that NCBI changed the HSP result format to remove
>>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>>> as it was apparent sometime in September.
>>>>
>>>>
>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> If you are just getting no results but also no warnings wrt
>>>> parsing, are you sure your logic is correct?
>>>>
>>>> If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>> while (my $result = $search->next_result) {
>>>>     print $result->query_name, "\n";
>>>>     #iterate over each hit on the query sequence
>>>>     while (my $hit = $result->next_hit) {
>>>> 	print $hit->name, "\n";
>>>>         #iterate over each HSP in the hit
>>>>         while (my $hsp = $hit->next_hsp) {
>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>> hit_string, "\n";
>>>>        }
>>>>    }
>>>> }
>>>>
>>>>
>>>
>>> I tested some of the BLAST results that Hubert sent Roger and me  
>>> with a
>>> similar script to the above.  I removed the file parsing logic  
>>> and it
>> seemed
>>> to work just fine.  It may very well be a logic issue or that he  
>>> hasn't
>>> installed the latest fix.
>>>
>>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>>> 2.2.13),
>> even
>>> though the returned output was from nr, the top of the blast output
>> showed
>>> that it was v2.2.12:
>>>
>>> BLASTP 2.2.12 [Aug-07-2005]
>>>
>>> I double-checked my local version and it's definitely v.2.2.13:
>>> -------------------------------------
>>> C:\Perl\Scripts>blastcl3 -
>>>
>>> blastcl3 2.2.13   arguments:...
>>> -------------------------------------
>>>
>>> If you use RemoteBlast using the same settings, the version in  
>>> the header
>>> looks like this:
>>>
>>> BLASTP 2.2.13 [Nov-27-2005]
>>>
>>> I'm wondering if all the blast executables (blast and netblast)  
>>> from NCBI
>>> have text output like v.2.2.12, while the wwwblast outputs a new  
>>> format
>>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>>
>>>
>>>> To clarify some stuff -
>>>> Chris I don't necessarily think the XML is best way forward
>>>> for BLAST reports generated locally, it isn't as detailed as
>>>> the Text format and it is what most people expect to be able
>>>> to scroll through and parse -- it is also harder for the
>>>> format to change dramatically if you have a static binary on
>>>> your machine =).  I think for remoteblast the XML format
>>>> should be the way forward but I expect Bioperl to maintain
>>>> support of any plain text BLAST report format that people use
>>>> on a regular basis.
>>>>
>>>>
>>>>
>>>
>>> Does XML lack some specific info that text output has?  Didn't  
>>> know that.
>>  I
>>> believe that XML should be default in RemoteBlast since it will not
>> break,
>>> but I agree with you about text output.  I also agree that it  
>>> will need
>>> somebody to maintain it constantly, much like RemoteBlast.
>>>
>>>
>>>
>>>> -jason
>>>>
>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>
>>>>>
>>>>>> My guess is you're running into text parsing problems in
>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>> (1.5.1) or
>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>> I think the first problem you ran into is solved in bioperl  
>>>>>> 1.5.1,
>>>>>> the last problem (more recent, not related to the first) has been
>>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>>> SearchIO::blast is available in the link above, but
>>>>>>
>>>>>>
>>>> realize it hasn't
>>>>
>>>>
>>>>>> been committed yet and may change.
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>>>>>>> Hubert
>>>>>>> Prielinger
>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>> To: bioperl-l at bioperl.org
>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> output
>>>>>>>
>>>>>>> Hi,
>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>>
>>>>>>> is that a bug......
>>>>>>>
>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>>> anything.....
>>>>>>> I'm using bioperl 1.4
>>>>>>>
>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>
>>>>>>>
>>>> bioperl version
>>>>
>>>>
>>>>>>> I had installed
>>>>>>>
>>>>>>> thanks in advance
>>>>>>>
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From boris.steipe at utoronto.ca  Thu Feb  9 16:54:53 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Thu, 9 Feb 2006 16:54:53 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
Message-ID: <1B7E8DA9-86F5-4411-B16C-E6573E5E8C36@utoronto.ca>

Golf, anyone?


#!/usr/bin/perl -nl
for(split//){push at a,$_}
END{
   while($n/@a<0.5) {
     $p=rand(@a);
     if($a[$p]=~/[A-Z]/){$a[$p]=lc((grep!/$a[$p]/,split//,"ACGT")[rand 
(3)]);
       $n++;
     }
   }
print @a;
}

(144, not counting \s and the # !line )

:-)


B.



>> Does anyone know of tool to mutate a DNA sequence by a specified
>> amount?
>> For instance, say I have a DNA sequence 1000 bases long, and I  
>> want to
>> simulate mutations to make it 75% (or 80%, etc) similar to the
>> original.
>>
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Thu Feb  9 17:20:46 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 16:20:46 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
	output
In-Reply-To: <000e01c62dca$bc66df60$15327e82@pyrimidine>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>
Message-ID: <43EBC03E.4040900@gmx.at>

Hi Chris,
I'm incredibly sorry for causing so much inconvenience, yes you are 
right, I had only to change the blast.pm file, it is working very fine, 
thank you very much, and you are right, you have mentioned it ealier 
either to change the file... ;)

but I have another question: does it work with the WU-Blast output too? 

regards
Hubert


Chris Fields wrote:

>Ha!  I come back from meeting and there's a billion emails!  What have we
>started? ;p .  Sorry about this Jason; I know you're busy.
>
>Hubert, if you're out there, I sent you an email with an attachment.  You
>said the output looks like what you were expecting.  So I think we have two
>problems:
>
>1)  I haven't delved into the file scanning, but the fact that it takes so
>long should tell you something's seriously wrong there.  Strip that part out
>and start with a simple script, say, like the one Jason or that I sent you;
>the script I used to generate that output works fine (on two OS's, WinXP and
>Mac OS X).  Use it on one file at a time.  Do everything on command line
>(not through Eclipse).  IDE's can be notoriously flaky about running
>scripts, esp. when they run debugging.  
>
>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast will still
>not work whenever the text blast output has the following header, which
>comes from the new web version of BLAST:
>
>-----------------------------------------------------
>BLASTP 2.2.13 [Nov-27-2005]
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman 
>(1997), "Gapped BLAST and PSI-BLAST: a new generation of 
>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>
>RID: 1139501210-857-165793005128.BLASTQ1
>
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           3,292,813 sequences; 1,128,164,434 total letters
>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis 
>H37Rv].
>Length=193
>.......
>-----------------------------------------------------
>
>It will work if the text output has the following header (or is an older
>version of BLAST):
>
>-----------------------------------------------------
>BLASTP 2.2.12 [Aug-07-2005]
>
>
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
>"Gapped BLAST and PSI-BLAST: a new generation of protein database search
>programs",  Nucleic Acids Res. 25:3389-3402.
>
>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis H37Rv].
>         (193 letters)
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           2,895,325 sequences; 997,103,285 total letters
>-----------------------------------------------------
>You have the former (2.2.13) version.  I know b/c I have your BLAST files.
>Therefore, even bioperl-1.5.1 will not work!
>
>If you want the really gory details on why this is a problem, look here:
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>So, any text output with the above header will not work; it will either hang
>or end abruptly (depending on OS, perl version, memory, patience).  If you
>look in the above, I have added a preliminary fix for this.  I'll reiterate
>for the billionth time, it hasn't been committed yet, so don't kill me if
>blows your computer up ;>   
>
>Here's the direct link:
>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>This is a modified version of Bio::SearchIO::blast.pm (it says it's version
>1.90, but it's lying, I didn't change the version, only the regex; sorry
>Jason).  From what you've been posting it doesn't sound like you've tried
>this, and I believe I've suggested this fix before.
>
>Replace the one in your Bio/SearchIO directory (which looks like
>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your prev.
>message) with this file.  Make sure the filename stays the same (blast.pm).
>
>Run everything again, one file at a time.  Make sure you use Jason's script
>as well as the one I sent you.  Do NOT rely on running through multiple
>files yet.  Fix one bug at a time.  And heed Joel's words about file checks.
>
>
>Here's a small chunk of output from one of your blast files using the
>modifed script I sent you:
>
>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>Query:   1  RWKWKRKK  8
>Seq:     542  RWAWRRKK  549
>
>Look familiar?
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Roger Hall [mailto:rahall2 at ualr.edu] 
>>Sent: Thursday, February 09, 2006 3:24 PM
>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>In other words, yes, I'm on the wrong trail. :}
>>
>>Sorry - I'll look at the output issue this evening (or 
>>realize that Chris already solved the issue).  ;}
>>
>>Thanks!
>>
>>Roger
>>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>Hubert Prielinger
>>Sent: Thursday, February 09, 2006 2:14 PM
>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; 
>>Jason Stajich
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>dear roger,
>>this error message I got, when I tried to parse Blast output (version
>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I 
>>have a lot of Blast output files with version 2.2.13 and for 
>>that I don't get any error message.....it just doesn't work
>>
>>Hubert
>>
>>
>>
>>Roger Hall wrote:
>>
>>    
>>
>>>Guys - I'm looking at the error message:
>>>
>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>STACK Bio::SearchIO::blast::next_result
>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>STACK toplevel
>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>This is my line of thought:
>>>1. "no data for midline $_" is a unique message generated by 
>>>      
>>>
>>blast.pm 
>>    
>>
>>>in
>>>      
>>>
>>one
>>    
>>
>>>location only at the point of a. reading three lines b. 
>>>      
>>>
>>dropping lines 
>>    
>>
>>>with spaces only c. identifying the Query, Midline, and 
>>>      
>>>
>>Match lines (0 
>>    
>>
>>><= $i <
>>>      
>>>
>>3)
>>    
>>
>>>2. There is a regexp match that fails in order to reach that 
>>>      
>>>
>>error message
>>    
>>
>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the 
>>>      
>>>
>>expression
>>    
>>
>>>4. It does anyway
>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere 
>>>      
>>>
>>in the blast
>>    
>>
>>>reports
>>>
>>>I suspect a newline/chomp/metacharacter issue. Not finding 
>>>      
>>>
>>the string 
>>    
>>
>>>anywhere has me thoroughly confused - I asked Hubert for the 
>>>      
>>>
>>additional 
>>    
>>
>>>file, assuming that I didn't have it.
>>>
>>>My next thought is to write a quick script to test perl behavior on 
>>>"Fedora Core 9".
>>>
>>>Thoughts?
>>>
>>>Did I misread the issue entirely? :}
>>>
>>>Roger
>>>
>>>
>>>-----Original Message-----
>>>From: bioperl-l-bounces at lists.open-bio.org
>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>>      
>>>
>>Chris Fields
>>    
>>
>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>Cc: bioperl-l at bioperl.org
>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>Blast output
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>To: Hubert Prielinger
>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>>Blast output
>>>>
>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>   
>>>>
>>>>        
>>>>
>>>>>hi chris,
>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>working,
>>>>   
>>>>
>>>>        
>>>>
>>>>>do you have any ohter idea, the problem I have is that I
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>have to parse
>>>>   
>>>>
>>>>        
>>>>
>>>>>a lot of textfiles....
>>>>>or shall I look for another option to parse those files...
>>>>>
>>>>>regards
>>>>>Hubert
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>2.2.13 reports but unless you post your blast report we 
>>>>        
>>>>
>>can't really 
>>    
>>
>>>>determine the problem.
>>>>
>>>>If you are still getting the same error like this I am not 
>>>>        
>>>>
>>convinced 
>>    
>>
>>>>you have upgraded to 1.5.1 which includes a fix in the fact 
>>>>        
>>>>
>>that NCBI 
>>    
>>
>>>>changed the HSP result format to remove the ':' from the 
>>>>        
>>>>
>>Query/Sbjct 
>>    
>>
>>>>prefixes.  We fixed this as soon as it was apparent sometime in 
>>>>September.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>If you are just getting no results but also no warnings wrt 
>>>>        
>>>>
>>parsing, 
>>    
>>
>>>>are you sure your logic is correct?
>>>>
>>>>If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>>while (my $result = $search->next_result) {
>>>>    print $result->query_name, "\n";
>>>>    #iterate over each hit on the query sequence
>>>>    while (my $hit = $result->next_hit) {
>>>>	print $hit->name, "\n";
>>>>        #iterate over each HSP in the hit
>>>>        while (my $hsp = $hit->next_hsp) {
>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>        
>>>>
>>>>>hit_string, "\n";	
>>>>>          
>>>>>
>>>>       }
>>>>   }
>>>>}
>>>>   
>>>>
>>>>        
>>>>
>>>I tested some of the BLAST results that Hubert sent Roger 
>>>      
>>>
>>and me with a 
>>    
>>
>>>similar script to the above.  I removed the file parsing logic and it
>>>      
>>>
>>seemed
>>    
>>
>>>to work just fine.  It may very well be a logic issue or 
>>>      
>>>
>>that he hasn't 
>>    
>>
>>>installed the latest fix.
>>>   
>>>It's a funny thing, though.  When I tried using blastcl3 (v. 
>>>      
>>>
>>2.2.13), 
>>    
>>
>>>even though the returned output was from nr, the top of the blast 
>>>output showed that it was v2.2.12:
>>>
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>I double-checked my local version and it's definitely v.2.2.13:
>>>-------------------------------------
>>>C:\Perl\Scripts>blastcl3 -
>>>
>>>blastcl3 2.2.13   arguments:...
>>>-------------------------------------
>>>
>>>If you use RemoteBlast using the same settings, the version in the 
>>>header looks like this:
>>>
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>
>>>I'm wondering if all the blast executables (blast and netblast) from 
>>>NCBI have text output like v.2.2.12, while the wwwblast 
>>>      
>>>
>>outputs a new 
>>    
>>
>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>> 
>>>
>>>      
>>>
>>>>To clarify some stuff -
>>>>Chris I don't necessarily think the XML is best way forward 
>>>>        
>>>>
>>for BLAST 
>>    
>>
>>>>reports generated locally, it isn't as detailed as the Text 
>>>>        
>>>>
>>format and 
>>    
>>
>>>>it is what most people expect to be able to scroll through 
>>>>        
>>>>
>>and parse 
>>    
>>
>>>>-- it is also harder for the format to change dramatically 
>>>>        
>>>>
>>if you have 
>>    
>>
>>>>a static binary on your machine =).  I think for 
>>>>        
>>>>
>>remoteblast the XML 
>>    
>>
>>>>format should be the way forward but I expect Bioperl to maintain 
>>>>support of any plain text BLAST report format that people use on a 
>>>>regular basis.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Does XML lack some specific info that text output has?  
>>>      
>>>
>>Didn't know that.
>>I
>>    
>>
>>>believe that XML should be default in RemoteBlast since it will not 
>>>break, but I agree with you about text output.  I also agree that it 
>>>will need somebody to maintain it constantly, much like RemoteBlast.
>>>
>>> 
>>>
>>>      
>>>
>>>>-jason
>>>>   
>>>>
>>>>        
>>>>
>>>>>Chris Fields wrote:
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>My guess is you're running into text parsing problems in 
>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>(1.5.1) or
>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>>I think the first problem you ran into is solved in 
>>>>>>            
>>>>>>
>>bioperl 1.5.1, 
>>    
>>
>>>>>>the last problem (more recent, not related to the first) has been 
>>>>>>fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>>>>SearchIO::blast is available in the link above, but
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>realize it hasn't
>>>>   
>>>>
>>>>        
>>>>
>>>>>>been committed yet and may change.
>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf 
>>>>>>>              
>>>>>>>
>>Of Hubert 
>>    
>>
>>>>>>>Prielinger
>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>output
>>>>>>>
>>>>>>>Hi,
>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>is that a bug......
>>>>>>>
>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>>>>anything.....
>>>>>>>I'm using bioperl 1.4
>>>>>>>
>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>bioperl version
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>I had installed
>>>>>>>
>>>>>>>thanks in advance
>>>>>>>
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>--
>>>>Jason Stajich
>>>>Duke University
>>>>http://www.duke.edu/~jes12
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>    
>>
>
>
>  
>



From olenka.m at gmail.com  Thu Feb  9 17:49:48 2006
From: olenka.m at gmail.com (Olena Morozova)
Date: Thu, 9 Feb 2006 17:49:48 -0500
Subject: [Bioperl-l] Bio::TreeIO
Message-ID: <259a224c0602091449u353e4bf1g5a3cfbb46297217a@mail.gmail.com>

Hi all,

Probably a very stupid question, but the get_lca function does not
work for unrooted trees, does it?
I am trying to get the LCA for a set of nodes in a phylip tree, and I
am using the script in the HOWTO.
Thanks,
Olena

On 2/8/06, Hubert Prielinger  wrote:
> Hi,
> If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO,
> I get the following error message:
>
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
> is that a bug......
>
> If I want to parse Blast Output (version 2.2.13), I don't get anything.....
> I'm using bioperl 1.4
>
> before, I have installed bioperl 1.4, it worked fine parsing Blast
> Output (version 2.2.12), but I don't remember which bioperl version I
> had installed
>
> thanks in advance
>
> Hubert
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From victor.ruotti at gmail.com  Thu Feb  9 18:22:11 2006
From: victor.ruotti at gmail.com (Victor)
Date: Thu, 9 Feb 2006 17:22:11 -0600
Subject: [Bioperl-l] Running BLAT with BioPerl
Message-ID: <36d7e5550602091522g114728a2w57f2a1cb7c1383ee@mail.gmail.com>

Hi,
Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to date
in the lastest bioperl release?



use Bio::Tools::Run::Alignment::Blat;
my $factory = Bio::Tools::Run::Alignment::Blat->new();
my $seq =
"TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";

my @feats = $factory->run( $seq);

Here is what I get when tring to use it:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
/tmp/fB09bp5F76) crashed: -1

Notice that it is using "blat' twice in the path. The way that I fixed this
is by going to the blat.pm module and changing the following lines:
#my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
my $str= Bio::Root::IO->catfile($self->program_name);

Any ideas, maybe I'm missing the $ENV variable somewhere?
I'd like to avoid making this change. Also does anyone have a known synopsis
of this blat module (where to set the parameters, and whether it allows you
to have a config file).
I'll be happy to add a better synopsis to the module if needed.

Thanks in advance,
Victor



From osborne1 at optonline.net  Thu Feb  9 20:37:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 09 Feb 2006 20:37:39 -0500
Subject: [Bioperl-l] module for finding restriction site in batch of
 sequences?
In-Reply-To: 
Message-ID: 

Claudia,

Yes, Bio::Restricion does this, see bptutorial.pl for code examples. Note
that statement "@fragments = $analysis->fragments($enzyme)". If the array
@fragments has more than 1 element that means your sequence has a site for
the enzyme in question.

Alternatively it sounds like you could use some kind of regular expression.

Brian O.


On 2/9/06 3:53 PM, "Lalancette, Claudia"  wrote:

> Greetings,
> 
>  
> 
> I need to find a way to look for a specific restriction enzyme site in
> hundreds of sequences.  Been looking at Bio::Restriction, but not sure
> if will work...  Any suggestions?
> 
>  
> 
> Thanks,
> 
> Claudia
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Thu Feb  9 20:52:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 19:52:34 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
	output
In-Reply-To: <43EBC03E.4040900@gmx.at>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>
	<43EBC03E.4040900@gmx.at>
Message-ID: 

 From 'perldoc Bio::SearchIO::blast':

DESCRIPTION
        This object encapsulated the necessary methods for generating  
events
        suitable for building Bio::Search objects from a BLAST report  
file.
        Read the Bio::SearchIO for more information about how to use  
this.

        This driver can parse:

        o   NCBI produced plain text BLAST reports from blastall,  
this also
            includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
reports.  NCBI
            XML BLAST output is parsed with the blastxml SearchIO driver

        o   WU-BLAST all reports

        o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
BLAT)

        o   BLAST-like output from Paracel BTK output

So, it should.  Let us know if it doesn't.

On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:

> Hi Chris,
> I'm incredibly sorry for causing so much inconvenience, yes you are  
> right, I had only to change the blast.pm file, it is working very  
> fine, thank you very much, and you are right, you have mentioned it  
> ealier either to change the file... ;)
>
> but I have another question: does it work with the WU-Blast output  
> too?
> regards
> Hubert
>
>
> Chris Fields wrote:
>
>> Ha!  I come back from meeting and there's a billion emails!  What  
>> have we
>> started? ;p .  Sorry about this Jason; I know you're busy.
>>
>> Hubert, if you're out there, I sent you an email with an  
>> attachment.  You
>> said the output looks like what you were expecting.  So I think we  
>> have two
>> problems:
>>
>> 1)  I haven't delved into the file scanning, but the fact that it  
>> takes so
>> long should tell you something's seriously wrong there.  Strip  
>> that part out
>> and start with a simple script, say, like the one Jason or that I  
>> sent you;
>> the script I used to generate that output works fine (on two OS's,  
>> WinXP and
>> Mac OS X).  Use it on one file at a time.  Do everything on  
>> command line
>> (not through Eclipse).  IDE's can be notoriously flaky about running
>> scripts, esp. when they run debugging.
>> 2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>> will still
>> not work whenever the text blast output has the following header,  
>> which
>> comes from the new web version of BLAST:
>>
>> -----------------------------------------------------
>> BLASTP 2.2.13 [Nov-27-2005]
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>
>> RID: 1139501210-857-165793005128.BLASTQ1
>>
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           3,292,813 sequences; 1,128,164,434 total letters
>> Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>> Length=193
>> .......
>> -----------------------------------------------------
>>
>> It will work if the text output has the following header (or is an  
>> older
>> version of BLAST):
>>
>> -----------------------------------------------------
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>>
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search
>> programs",  Nucleic Acids Res. 25:3389-3402.
>>
>> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>>         (193 letters)
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           2,895,325 sequences; 997,103,285 total letters
>> -----------------------------------------------------
>> You have the former (2.2.13) version.  I know b/c I have your  
>> BLAST files.
>> Therefore, even bioperl-1.5.1 will not work!
>>
>> If you want the really gory details on why this is a problem, look  
>> here:
>>
>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>
>> So, any text output with the above header will not work; it will  
>> either hang
>> or end abruptly (depending on OS, perl version, memory,  
>> patience).  If you
>> look in the above, I have added a preliminary fix for this.  I'll  
>> reiterate
>> for the billionth time, it hasn't been committed yet, so don't  
>> kill me if
>> blows your computer up ;>
>> Here's the direct link:
>> http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>> This is a modified version of Bio::SearchIO::blast.pm (it says  
>> it's version
>> 1.90, but it's lying, I didn't change the version, only the regex;  
>> sorry
>> Jason).  From what you've been posting it doesn't sound like  
>> you've tried
>> this, and I believe I've suggested this fix before.
>>
>> Replace the one in your Bio/SearchIO directory (which looks like
>> '/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>> prev.
>> message) with this file.  Make sure the filename stays the same  
>> (blast.pm).
>>
>> Run everything again, one file at a time.  Make sure you use  
>> Jason's script
>> as well as the one I sent you.  Do NOT rely on running through  
>> multiple
>> files yet.  Fix one bug at a time.  And heed Joel's words about  
>> file checks.
>>
>>
>> Here's a small chunk of output from one of your blast files using the
>> modifed script I sent you:
>>
>> sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>> Query:   1  RWKWKRKK  8
>> Seq:     542  RWAWRRKK  549
>>
>> Look familiar?
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>> February 09, 2006 3:24 PM
>>> To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> In other words, yes, I'm on the wrong trail. :}
>>>
>>> Sorry - I'll look at the output issue this evening (or realize  
>>> that Chris already solved the issue).  ;}
>>>
>>> Thanks!
>>>
>>> Roger
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>> Prielinger
>>> Sent: Thursday, February 09, 2006 2:14 PM
>>> To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>> Stajich
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> dear roger,
>>> this error message I got, when I tried to parse Blast output  
>>> (version
>>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>> a lot of Blast output files with version 2.2.13 and for that I  
>>> don't get any error message.....it just doesn't work
>>>
>>> Hubert
>>>
>>>
>>>
>>> Roger Hall wrote:
>>>
>>>
>>>> Guys - I'm looking at the error message:
>>>>
>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>> STACK Bio::SearchIO::blast::next_result
>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>> STACK toplevel
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> This is my line of thought:
>>>> 1. "no data for midline $_" is a unique message generated by
>>> blast.pm
>>>> in
>>>>
>>> one
>>>
>>>> location only at the point of a. reading three lines b.
>>> dropping lines
>>>> with spaces only c. identifying the Query, Midline, and
>>> Match lines (0
>>>> <= $i <
>>>>
>>> 3)
>>>
>>>> 2. There is a regexp match that fails in order to reach that
>>> error message
>>>
>>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>> expression
>>>
>>>> 4. It does anyway
>>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>> in the blast
>>>
>>>> reports
>>>>
>>>> I suspect a newline/chomp/metacharacter issue. Not finding
>>> the string
>>>> anywhere has me thoroughly confused - I asked Hubert for the
>>> additional
>>>> file, assuming that I didn't have it.
>>>>
>>>> My next thought is to write a quick script to test perl behavior  
>>>> on "Fedora Core 9".
>>>>
>>>> Thoughts?
>>>>
>>>> Did I misread the issue entirely? :}
>>>>
>>>> Roger
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Chris Fields
>>>
>>>> Sent: Thursday, February 09, 2006 10:16 AM
>>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>> parsing Blast output
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>>> To: Hubert Prielinger
>>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>> parsing Blast output
>>>>>
>>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi chris,
>>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>
>>>>>>
>>>>> working,
>>>>>
>>>>>
>>>>>> do you have any ohter idea, the problem I have is that I
>>>>>>
>>>>>>
>>>>> have to parse
>>>>>
>>>>>
>>>>>> a lot of textfiles....
>>>>>> or shall I look for another option to parse those files...
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>>> 2.2.13 reports but unless you post your blast report we
>>> can't really
>>>>> determine the problem.
>>>>>
>>>>> If you are still getting the same error like this I am not
>>> convinced
>>>>> you have upgraded to 1.5.1 which includes a fix in the fact
>>> that NCBI
>>>>> changed the HSP result format to remove the ':' from the
>>> Query/Sbjct
>>>>> prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>> September.
>>>>>
>>>>>
>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>> If you are just getting no results but also no warnings wrt
>>> parsing,
>>>>> are you sure your logic is correct?
>>>>>
>>>>> If you remove your filters do you see all the HSPS?
>>>>>
>>>>>
>>>>> while (my $result = $search->next_result) {
>>>>>    print $result->query_name, "\n";
>>>>>    #iterate over each hit on the query sequence
>>>>>    while (my $hit = $result->next_hit) {
>>>>> 	print $hit->name, "\n";
>>>>>        #iterate over each HSP in the hit
>>>>>        while (my $hsp = $hit->next_hsp) {
>>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>
>>>>>> hit_string, "\n";	
>>>>>>
>>>>>       }
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>> I tested some of the BLAST results that Hubert sent Roger
>>> and me with a
>>>> similar script to the above.  I removed the file parsing logic  
>>>> and it
>>>>
>>> seemed
>>>
>>>> to work just fine.  It may very well be a logic issue or
>>> that he hasn't
>>>> installed the latest fix.
>>>>   It's a funny thing, though.  When I tried using blastcl3 (v.
>>> 2.2.13),
>>>> even though the returned output was from nr, the top of the  
>>>> blast output showed that it was v2.2.12:
>>>>
>>>> BLASTP 2.2.12 [Aug-07-2005]
>>>>
>>>> I double-checked my local version and it's definitely v.2.2.13:
>>>> -------------------------------------
>>>> C:\Perl\Scripts>blastcl3 -
>>>>
>>>> blastcl3 2.2.13   arguments:...
>>>> -------------------------------------
>>>>
>>>> If you use RemoteBlast using the same settings, the version in  
>>>> the header looks like this:
>>>>
>>>> BLASTP 2.2.13 [Nov-27-2005]
>>>>
>>>> I'm wondering if all the blast executables (blast and netblast)  
>>>> from NCBI have text output like v.2.2.12, while the wwwblast
>>> outputs a new
>>>> format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>
>>>>
>>>>
>>>>> To clarify some stuff -
>>>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST
>>>>> reports generated locally, it isn't as detailed as the Text
>>> format and
>>>>> it is what most people expect to be able to scroll through
>>> and parse
>>>>> -- it is also harder for the format to change dramatically        
>>> if you have
>>>>> a static binary on your machine =).  I think for
>>> remoteblast the XML
>>>>> format should be the way forward but I expect Bioperl to  
>>>>> maintain support of any plain text BLAST report format that  
>>>>> people use on a regular basis.
>>>>>
>>>>>
>>>>>
>>>> Does XML lack some specific info that text output has?
>>> Didn't know that.
>>> I
>>>
>>>> believe that XML should be default in RemoteBlast since it will  
>>>> not break, but I agree with you about text output.  I also agree  
>>>> that it will need somebody to maintain it constantly, much like  
>>>> RemoteBlast.
>>>>
>>>>
>>>>
>>>>> -jason
>>>>>
>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My guess is you're running into text parsing problems in  
>>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>> (1.5.1) or
>>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>>
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>
>>>>>>> I think the first problem you ran into is solved in
>>> bioperl 1.5.1,
>>>>>>> the last problem (more recent, not related to the first) has  
>>>>>>> been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>> The fixed SearchIO::blast is available in the link above, but
>>>>>>>
>>>>>>>
>>>>> realize it hasn't
>>>>>
>>>>>
>>>>>>> been committed yet and may change.
>>>>>>>
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>> Of Hubert
>>>>>>>> Prielinger
>>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>> To: bioperl-l at bioperl.org
>>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> output
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>>
>>>>>>>> is that a bug......
>>>>>>>>
>>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>> get anything.....
>>>>>>>> I'm using bioperl 1.4
>>>>>>>>
>>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>>
>>>>>>>>
>>>>> bioperl version
>>>>>
>>>>>
>>>>>>>> I had installed
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> --
>>>>> Jason Stajich
>>>>> Duke University
>>>>> http://www.duke.edu/~jes12
>>>>>
>>>>>
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






From heikki at sanbi.ac.za  Thu Feb  9 23:47:42 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 06:47:42 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <000901c62dbf$49bfae20$15327e82@pyrimidine>
References: <000901c62dbf$49bfae20$15327e82@pyrimidine>
Message-ID: <200602100647.43173.heikki@sanbi.ac.za>

On Thursday 09 February 2006 23:25, Chris Fields wrote:
> Thanks!  I think, as long as the tests pass everything is fine with me.  I
> may be submitting another module or two in the next few weeks; just depends
> on how much time I can spend on them.

Looking forwart to them!

	-Heikki

> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za]
> > Sent: Thursday, February 09, 2006 1:42 PM
> > To: bioperl-l at lists.open-bio.org
> > Cc: Chris Fields
> > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> >
> > Chris,
> >
> > I committed your file. All tests pass; code looks like
> > written by a long term bioperl contributor! Impressive.
> >
> > I truncated the larger test file from 270K to 20K (200
> > lines), to not bloat the distribution unnecessarily. Tests
> > pass which is the main thing. Shout if if you disagree.
> >
> > Great job!
> >
> > 	-Heikki
> >
> > On Thursday 09 February 2006 19:53, Chris Fields wrote:
> > > Heikki,
> > >
> > > I've added the Bio::Tools::RNAMotif module with test suite
> >
> > (24 tests)
> >
> > > and two test data files to bugzilla.  The first data file is needed
> > > for normal tests, the second is for testing parsing with
> >
> > modified data
> >
> > > in the score tag (using sprintf() in the RNAMotif
> >
> > descriptor).  I ran
> >
> > > 'perl t\RNAMotif.t' and they all passed.
> > >
> > > Thanks!
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org
> > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
> > > > Lehvaslaiho
> > > > Sent: Wednesday, February 08, 2006 12:54 AM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Cc: Chris Fields
> > > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> > > >
> > > > Chris,
> > > >
> > > > Post your files to bugzilla (ticket type enhancement, add
> >
> > files to
> >
> > > > ticket after creation)  and someone with commit ability will add
> > > > them to CVS once the code is in satisfactory condition.
> > > >
> > > > Thanks,
> > > >
> > > > 	-Heikki
> > > >
> > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > > > I want to submit a module for parsing RNAMotif output
> > > > > (Bio::Tools::RNAMotif).  It is capable, at the moment,
> >
> > of scanning
> >
> > > > > output and returning Bio::SeqFeature::Generic objects with
> > > >
> > > > added tags
> > > >
> > > > > for descriptors/sequences/file info.  I'm in the process of
> > > >
> > > > writing up
> > > >
> > > > > tests and going through biodesign to make sure everything's
> > > > > kosher, but the module itself is essentially ready-to-go.  What
> > > > > should I do next?
> > > > >
> > > > > Christopher Fields
> > > > > Postdoctoral Researcher
> > > > > Lab of Dr. Robert Switzer
> > > > > Dept of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > ______ _/
> >
> > _/_____________________________________________________
> >
> > > >       _/      _/
> > > >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > > >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> > > >    _/  _/  _/  SANBI, South African National
> >
> > Bioinformatics Institute
> >
> > > >   _/  _/  _/  University of Western Cape, South Africa
> > > >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > > ___
> > > > _/_/_/_/_/________________________________________________________
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___
> > _/_/_/_/_/________________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 23:51:11 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 06:51:11 +0200
Subject: [Bioperl-l] module for finding restriction site in batch of
	sequences?
In-Reply-To: 
References: 
Message-ID: <200602100651.12028.heikki@sanbi.ac.za>


It should:

#loop over each seq
    my $ra=Bio::Restriction::Analysis->new(-seq=>$seq1);
    @cuts = $ra->fragments('EcoRI'); # or call some other method

or is it something else you are trying to do?

Yours,
	-Heikki


On Thursday 09 February 2006 22:53, Lalancette, Claudia wrote:
> Greetings,
>
>
>
> I need to find a way to look for a specific restriction enzyme site in
> hundreds of sequences.  Been looking at Bio::Restriction, but not sure
> if will work...  Any suggestions?
>
>
>
> Thanks,
>
> Claudia
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Fri Feb 10 02:06:11 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 09:06:11 +0200
Subject: [Bioperl-l] planning sequence mutating modules
Message-ID: <200602100906.11885.heikki@sanbi.ac.za>


Ryan Golhar's mail got me thinking that we should have a simple framework for 
mutating sequences to a desired level. The model can then be extended to 
necessary complexity when needed by subclassing.

To start with, I have been planning:


Bio::SeqEvolution::EvolutionI - interface file
Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
        (defaults to Bio::PrimarySeq)
Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
Bio::SeqEvolution::EvolutionI::each_seqs($count) 
       - returns an array of $count seqs
Bio::SeqEvolution::EvolutionI::_generate_seq() 
Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
      converteed to probabilites of change internally

  various methods to define the extent of divergence:
  only one to start with:
Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
   (= 100% - identity)

Bio::SeqEvolution::Factory - core class to call,
         instantiates subclasses, Bio::SeqEvolution::DNASimple for nucleotides
Bio::SeqEvolution::EvolutionI::type() - evolution model,
      defaults to Bio::SeqEvolution::DNASimple for nucleotides


Bio::SeqEvolution::DNASimple - default for nucleotides
Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
        e.g. 5 => 5:1, defaults to 1:1
        simple alternative to a scoring matrix


I am soliciting usual comments and suggestions about naming and minimal 
functionality.


   -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Pieter.Monsieurs at esat.kuleuven.be  Fri Feb 10 03:53:43 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Fri, 10 Feb 2006 09:53:43 +0100
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	blast	output
In-Reply-To: 
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>
	
Message-ID: <43EC5497.3050505@esat.kuleuven.be>

Hi Chris,

The parsing of the Blast output still doesn't work for me with the bug 
fix download of blast.pm.
The module keeps turning around in the while loop at line 487 looking 
for a database or query-size:

while( defined ($_) ) {
	if( /^Database:/ ) {
		$self->_pushback($_);
		last;
	}
	chomp;               
	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
		$size = $1;
		$size =~ s/,//g;
		last;
	} else {
		$q .= " $_";
		$q =~ s/ +/ /g;
		$q =~ s/^ | $//g;
	}
	$_ = $self->_readline;
}


The code keeps looking for the database information, however - as you 
mentioned - this information is given before the query line in the new 
Blast output format.
This way, all hits and hsps are stored in the query_description 
($hit->query_description), no hits are found and query_length is 0.
Because you already adapted the module to retrieve database information 
at another position in the module, deleting the while loop and adding 
the following lines after $_ = $self->_readline (line 486), worked fine 
for me (using blastn and blastp):

if (/Length=([\d,]+)/) {
	$size = $1;
	$size =~ s/,//g;
}


Regards,
Pieter



Chris Fields wrote:

> From 'perldoc Bio::SearchIO::blast':
>
>DESCRIPTION
>        This object encapsulated the necessary methods for generating  
>events
>        suitable for building Bio::Search objects from a BLAST report  
>file.
>        Read the Bio::SearchIO for more information about how to use  
>this.
>
>        This driver can parse:
>
>        o   NCBI produced plain text BLAST reports from blastall,  
>this also
>            includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>reports.  NCBI
>            XML BLAST output is parsed with the blastxml SearchIO driver
>
>        o   WU-BLAST all reports
>
>        o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>BLAT)
>
>        o   BLAST-like output from Paracel BTK output
>
>So, it should.  Let us know if it doesn't.
>
>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>
>  
>
>>Hi Chris,
>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>right, I had only to change the blast.pm file, it is working very  
>>fine, thank you very much, and you are right, you have mentioned it  
>>ealier either to change the file... ;)
>>
>>but I have another question: does it work with the WU-Blast output  
>>too?
>>regards
>>Hubert
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>have we
>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>
>>>Hubert, if you're out there, I sent you an email with an  
>>>attachment.  You
>>>said the output looks like what you were expecting.  So I think we  
>>>have two
>>>problems:
>>>
>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>takes so
>>>long should tell you something's seriously wrong there.  Strip  
>>>that part out
>>>and start with a simple script, say, like the one Jason or that I  
>>>sent you;
>>>the script I used to generate that output works fine (on two OS's,  
>>>WinXP and
>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>command line
>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>scripts, esp. when they run debugging.
>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>will still
>>>not work whenever the text blast output has the following header,  
>>>which
>>>comes from the new web version of BLAST:
>>>
>>>-----------------------------------------------------
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>
>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>
>>>
>>>Database: All non-redundant GenBank CDS
>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>          3,292,813 sequences; 1,128,164,434 total letters
>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>tuberculosis H37Rv].
>>>Length=193
>>>.......
>>>-----------------------------------------------------
>>>
>>>It will work if the text output has the following header (or is an  
>>>older
>>>version of BLAST):
>>>
>>>-----------------------------------------------------
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>
>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>protein database search
>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>
>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>tuberculosis H37Rv].
>>>        (193 letters)
>>>
>>>Database: All non-redundant GenBank CDS
>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>          2,895,325 sequences; 997,103,285 total letters
>>>-----------------------------------------------------
>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>BLAST files.
>>>Therefore, even bioperl-1.5.1 will not work!
>>>
>>>If you want the really gory details on why this is a problem, look  
>>>here:
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>So, any text output with the above header will not work; it will  
>>>either hang
>>>or end abruptly (depending on OS, perl version, memory,  
>>>patience).  If you
>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>reiterate
>>>for the billionth time, it hasn't been committed yet, so don't  
>>>kill me if
>>>blows your computer up ;>
>>>Here's the direct link:
>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>it's version
>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>sorry
>>>Jason).  From what you've been posting it doesn't sound like  
>>>you've tried
>>>this, and I believe I've suggested this fix before.
>>>
>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>prev.
>>>message) with this file.  Make sure the filename stays the same  
>>>(blast.pm).
>>>
>>>Run everything again, one file at a time.  Make sure you use  
>>>Jason's script
>>>as well as the one I sent you.  Do NOT rely on running through  
>>>multiple
>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>file checks.
>>>
>>>
>>>Here's a small chunk of output from one of your blast files using the
>>>modifed script I sent you:
>>>
>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>Query:   1  RWKWKRKK  8
>>>Seq:     542  RWAWRRKK  549
>>>
>>>Look familiar?
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>February 09, 2006 3:24 PM
>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>parsing Blast output
>>>>
>>>>In other words, yes, I'm on the wrong trail. :}
>>>>
>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>that Chris already solved the issue).  ;}
>>>>
>>>>Thanks!
>>>>
>>>>Roger
>>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>Prielinger
>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>Stajich
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>parsing Blast output
>>>>
>>>>dear roger,
>>>>this error message I got, when I tried to parse Blast output  
>>>>(version
>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>don't get any error message.....it just doesn't work
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>Roger Hall wrote:
>>>>
>>>>
>>>>        
>>>>
>>>>>Guys - I'm looking at the error message:
>>>>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>Blast.pl:21
>>>>>
>>>>>This is my line of thought:
>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>          
>>>>>
>>>>blast.pm
>>>>        
>>>>
>>>>>in
>>>>>
>>>>>          
>>>>>
>>>>one
>>>>
>>>>        
>>>>
>>>>>location only at the point of a. reading three lines b.
>>>>>          
>>>>>
>>>>dropping lines
>>>>        
>>>>
>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>          
>>>>>
>>>>Match lines (0
>>>>        
>>>>
>>>>><= $i <
>>>>>
>>>>>          
>>>>>
>>>>3)
>>>>
>>>>        
>>>>
>>>>>2. There is a regexp match that fails in order to reach that
>>>>>          
>>>>>
>>>>error message
>>>>
>>>>        
>>>>
>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>          
>>>>>
>>>>expression
>>>>
>>>>        
>>>>
>>>>>4. It does anyway
>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>          
>>>>>
>>>>in the blast
>>>>
>>>>        
>>>>
>>>>>reports
>>>>>
>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>          
>>>>>
>>>>the string
>>>>        
>>>>
>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>          
>>>>>
>>>>additional
>>>>        
>>>>
>>>>>file, assuming that I didn't have it.
>>>>>
>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>on "Fedora Core 9".
>>>>>
>>>>>Thoughts?
>>>>>
>>>>>Did I misread the issue entirely? :}
>>>>>
>>>>>Roger
>>>>>
>>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>          
>>>>>
>>>>Chris Fields
>>>>
>>>>        
>>>>
>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>Cc: bioperl-l at bioperl.org
>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>To: Hubert Prielinger
>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>hi chris,
>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>working,
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>have to parse
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>a lot of textfiles....
>>>>>>>or shall I look for another option to parse those files...
>>>>>>>
>>>>>>>regards
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>            
>>>>>>
>>>>can't really
>>>>        
>>>>
>>>>>>determine the problem.
>>>>>>
>>>>>>If you are still getting the same error like this I am not
>>>>>>            
>>>>>>
>>>>convinced
>>>>        
>>>>
>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>            
>>>>>>
>>>>that NCBI
>>>>        
>>>>
>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>            
>>>>>>
>>>>Query/Sbjct
>>>>        
>>>>
>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>September.
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>STACK toplevel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>            
>>>>>>
>>>>parsing,
>>>>        
>>>>
>>>>>>are you sure your logic is correct?
>>>>>>
>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>
>>>>>>
>>>>>>while (my $result = $search->next_result) {
>>>>>>   print $result->query_name, "\n";
>>>>>>   #iterate over each hit on the query sequence
>>>>>>   while (my $hit = $result->next_hit) {
>>>>>>	print $hit->name, "\n";
>>>>>>       #iterate over each HSP in the hit
>>>>>>       while (my $hsp = $hit->next_hsp) {
>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>hit_string, "\n";	
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>      }
>>>>>>  }
>>>>>>}
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>          
>>>>>
>>>>and me with a
>>>>        
>>>>
>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>and it
>>>>>
>>>>>          
>>>>>
>>>>seemed
>>>>
>>>>        
>>>>
>>>>>to work just fine.  It may very well be a logic issue or
>>>>>          
>>>>>
>>>>that he hasn't
>>>>        
>>>>
>>>>>installed the latest fix.
>>>>>  It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>          
>>>>>
>>>>2.2.13),
>>>>        
>>>>
>>>>>even though the returned output was from nr, the top of the  
>>>>>blast output showed that it was v2.2.12:
>>>>>
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>-------------------------------------
>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>
>>>>>blastcl3 2.2.13   arguments:...
>>>>>-------------------------------------
>>>>>
>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>the header looks like this:
>>>>>
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>
>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>          
>>>>>
>>>>outputs a new
>>>>        
>>>>
>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>To clarify some stuff -
>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>            
>>>>>>
>>>>for BLAST
>>>>        
>>>>
>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>            
>>>>>>
>>>>format and
>>>>        
>>>>
>>>>>>it is what most people expect to be able to scroll through
>>>>>>            
>>>>>>
>>>>and parse
>>>>        
>>>>
>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>            
>>>>>>
>>>>if you have
>>>>        
>>>>
>>>>>>a static binary on your machine =).  I think for
>>>>>>            
>>>>>>
>>>>remoteblast the XML
>>>>        
>>>>
>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>people use on a regular basis.
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>Does XML lack some specific info that text output has?
>>>>>          
>>>>>
>>>>Didn't know that.
>>>>I
>>>>
>>>>        
>>>>
>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>RemoteBlast.
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>-jason
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>(1.5.1) or
>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>
>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>
>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>                
>>>>>>>>
>>>>bioperl 1.5.1,
>>>>        
>>>>
>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>realize it hasn't
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>been committed yet and may change.
>>>>>>>>
>>>>>>>>Christopher Fields
>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>-----Original Message-----
>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>                  
>>>>>>>>>
>>>>Of Hubert
>>>>        
>>>>
>>>>>>>>>Prielinger
>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>parsing Blast
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>output
>>>>>>>>>
>>>>>>>>>Hi,
>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>
>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>STACK toplevel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>is that a bug......
>>>>>>>>>
>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>get anything.....
>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>
>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>parsing Blast
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>bioperl version
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>I had installed
>>>>>>>>>
>>>>>>>>>thanks in advance
>>>>>>>>>
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>--
>>>>>>Jason Stajich
>>>>>>Duke University
>>>>>>http://www.duke.edu/~jes12
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>
>Christopher Fields
>Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>Dept of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From Pieter.Monsieurs at esat.kuleuven.be  Fri Feb 10 04:44:10 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Fri, 10 Feb 2006 10:44:10 +0100
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	blast	output
In-Reply-To: <43EC5497.3050505@esat.kuleuven.be>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>	
	<43EC5497.3050505@esat.kuleuven.be>
Message-ID: <43EC606A.20003@esat.kuleuven.be>

Sorry for disturbing. I now works correctly with the bug fix of Chris. 
Thanx,
Pieter

Pieter Monsieurs wrote:

>Hi Chris,
>
>The parsing of the Blast output still doesn't work for me with the bug 
>fix download of blast.pm.
>The module keeps turning around in the while loop at line 487 looking 
>for a database or query-size:
>
>while( defined ($_) ) {
>	if( /^Database:/ ) {
>		$self->_pushback($_);
>		last;
>	}
>	chomp;               
>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>		$size = $1;
>		$size =~ s/,//g;
>		last;
>	} else {
>		$q .= " $_";
>		$q =~ s/ +/ /g;
>		$q =~ s/^ | $//g;
>	}
>	$_ = $self->_readline;
>}
>
>
>The code keeps looking for the database information, however - as you 
>mentioned - this information is given before the query line in the new 
>Blast output format.
>This way, all hits and hsps are stored in the query_description 
>($hit->query_description), no hits are found and query_length is 0.
>Because you already adapted the module to retrieve database information 
>at another position in the module, deleting the while loop and adding 
>the following lines after $_ = $self->_readline (line 486), worked fine 
>for me (using blastn and blastp):
>
>if (/Length=([\d,]+)/) {
>	$size = $1;
>	$size =~ s/,//g;
>}
>
>
>Regards,
>Pieter
>
>
>
>Chris Fields wrote:
>
>  
>
>>From 'perldoc Bio::SearchIO::blast':
>>
>>DESCRIPTION
>>       This object encapsulated the necessary methods for generating  
>>events
>>       suitable for building Bio::Search objects from a BLAST report  
>>file.
>>       Read the Bio::SearchIO for more information about how to use  
>>this.
>>
>>       This driver can parse:
>>
>>       o   NCBI produced plain text BLAST reports from blastall,  
>>this also
>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>reports.  NCBI
>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>
>>       o   WU-BLAST all reports
>>
>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>BLAT)
>>
>>       o   BLAST-like output from Paracel BTK output
>>
>>So, it should.  Let us know if it doesn't.
>>
>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>
>> 
>>
>>    
>>
>>>Hi Chris,
>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>right, I had only to change the blast.pm file, it is working very  
>>>fine, thank you very much, and you are right, you have mentioned it  
>>>ealier either to change the file... ;)
>>>
>>>but I have another question: does it work with the WU-Blast output  
>>>too?
>>>regards
>>>Hubert
>>>
>>>
>>>Chris Fields wrote:
>>>
>>>   
>>>
>>>      
>>>
>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>have we
>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>
>>>>Hubert, if you're out there, I sent you an email with an  
>>>>attachment.  You
>>>>said the output looks like what you were expecting.  So I think we  
>>>>have two
>>>>problems:
>>>>
>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>takes so
>>>>long should tell you something's seriously wrong there.  Strip  
>>>>that part out
>>>>and start with a simple script, say, like the one Jason or that I  
>>>>sent you;
>>>>the script I used to generate that output works fine (on two OS's,  
>>>>WinXP and
>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>command line
>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>scripts, esp. when they run debugging.
>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>will still
>>>>not work whenever the text blast output has the following header,  
>>>>which
>>>>comes from the new web version of BLAST:
>>>>
>>>>-----------------------------------------------------
>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>
>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>
>>>>
>>>>Database: All non-redundant GenBank CDS
>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>tuberculosis H37Rv].
>>>>Length=193
>>>>.......
>>>>-----------------------------------------------------
>>>>
>>>>It will work if the text output has the following header (or is an  
>>>>older
>>>>version of BLAST):
>>>>
>>>>-----------------------------------------------------
>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>
>>>>
>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>protein database search
>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>
>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>tuberculosis H37Rv].
>>>>       (193 letters)
>>>>
>>>>Database: All non-redundant GenBank CDS
>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>-----------------------------------------------------
>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>BLAST files.
>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>
>>>>If you want the really gory details on why this is a problem, look  
>>>>here:
>>>>
>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>>So, any text output with the above header will not work; it will  
>>>>either hang
>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>patience).  If you
>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>reiterate
>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>kill me if
>>>>blows your computer up ;>
>>>>Here's the direct link:
>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>it's version
>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>sorry
>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>you've tried
>>>>this, and I believe I've suggested this fix before.
>>>>
>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>prev.
>>>>message) with this file.  Make sure the filename stays the same  
>>>>(blast.pm).
>>>>
>>>>Run everything again, one file at a time.  Make sure you use  
>>>>Jason's script
>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>multiple
>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>file checks.
>>>>
>>>>
>>>>Here's a small chunk of output from one of your blast files using the
>>>>modifed script I sent you:
>>>>
>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>Query:   1  RWKWKRKK  8
>>>>Seq:     542  RWAWRRKK  549
>>>>
>>>>Look familiar?
>>>>
>>>>Christopher Fields
>>>>Postdoctoral Researcher - Switzer Lab
>>>>Dept. of Biochemistry
>>>>University of Illinois Urbana-Champaign
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>February 09, 2006 3:24 PM
>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>
>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>that Chris already solved the issue).  ;}
>>>>>
>>>>>Thanks!
>>>>>
>>>>>Roger
>>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>Prielinger
>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>Stajich
>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>dear roger,
>>>>>this error message I got, when I tried to parse Blast output  
>>>>>(version
>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>don't get any error message.....it just doesn't work
>>>>>
>>>>>Hubert
>>>>>
>>>>>
>>>>>
>>>>>Roger Hall wrote:
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Guys - I'm looking at the error message:
>>>>>>
>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>STACK toplevel
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>This is my line of thought:
>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>blast.pm
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>in
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>one
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>location only at the point of a. reading three lines b.
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>dropping lines
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Match lines (0
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>><= $i <
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>3)
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>error message
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>expression
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>4. It does anyway
>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>in the blast
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>reports
>>>>>>
>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>the string
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>additional
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>file, assuming that I didn't have it.
>>>>>>
>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>on "Fedora Core 9".
>>>>>>
>>>>>>Thoughts?
>>>>>>
>>>>>>Did I misread the issue entirely? :}
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Chris Fields
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>To: Hubert Prielinger
>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>hi chris,
>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>working,
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>have to parse
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>a lot of textfiles....
>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>
>>>>>>>>regards
>>>>>>>>Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>can't really
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>determine the problem.
>>>>>>>
>>>>>>>If you are still getting the same error like this I am not
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>convinced
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>that NCBI
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>Query/Sbjct
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>September.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>STACK toplevel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>parsing,
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>are you sure your logic is correct?
>>>>>>>
>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>
>>>>>>>
>>>>>>>while (my $result = $search->next_result) {
>>>>>>>  print $result->query_name, "\n";
>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>	print $hit->name, "\n";
>>>>>>>      #iterate over each HSP in the hit
>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>hit_string, "\n";	
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>     }
>>>>>>> }
>>>>>>>}
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>and me with a
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>and it
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>seemed
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>that he hasn't
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>installed the latest fix.
>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>2.2.13),
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>even though the returned output was from nr, the top of the  
>>>>>>blast output showed that it was v2.2.12:
>>>>>>
>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>
>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>-------------------------------------
>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>
>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>-------------------------------------
>>>>>>
>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>the header looks like this:
>>>>>>
>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>
>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>            
>>>>>>
>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>          
>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>outputs a new
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>To clarify some stuff -
>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>for BLAST
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>format and
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>and parse
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>if you have
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>a static binary on your machine =).  I think for
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>remoteblast the XML
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>people use on a regular basis.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Does XML lack some specific info that text output has?
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Didn't know that.
>>>>>I
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>RemoteBlast.
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-jason
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>(1.5.1) or
>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>
>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>bioperl 1.5.1,
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>realize it hasn't
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>been committed yet and may change.
>>>>>>>>>
>>>>>>>>>Christopher Fields
>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>Of Hubert
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>>>>Prielinger
>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>parsing Blast
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>output
>>>>>>>>>>
>>>>>>>>>>Hi,
>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>
>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>STACK toplevel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>is that a bug......
>>>>>>>>>>
>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>get anything.....
>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>
>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>parsing Blast
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>bioperl version
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>I had installed
>>>>>>>>>>
>>>>>>>>>>thanks in advance
>>>>>>>>>>
>>>>>>>>>>Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>Bioperl-l mailing list
>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>--
>>>>>>>Jason Stajich
>>>>>>>Duke University
>>>>>>>http://www.duke.edu/~jes12
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>Dept. of Biochemistry
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>     
>>>>
>>>>        
>>>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> 
>>
>>    
>>
>
>
>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From andrej.kastrin at guest.arnes.si  Fri Feb 10 09:28:19 2006
From: andrej.kastrin at guest.arnes.si (Andrej Kastrin)
Date: Fri, 10 Feb 2006 15:28:19 +0100
Subject: [Bioperl-l] Medline to XML
Message-ID: <43ECA303.8090904@guest.arnes.si>

Dear users,

my problem is not directly related to this list, by I hope, you can help 
me. Is there any tool to convert standard Medline record to XML format. 
I know there is build in function (med2xml) in Pubmed, but I'm looking 
for some independent perl script.

Thanks in advance for any suggesions or pointers.

Cheers, Andrej


From cjfields at uiuc.edu  Fri Feb 10 12:01:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 11:01:27 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <001801c62e63$a4a71090$15327e82@pyrimidine>

I don't think there's anything like this in Bioperl, and I'm unfamilar with
the naming scheme you're using.  If you're searching for specific miRNA's, a
good resource looks like the miRNA database, which seems to be updated
regularly (http://microrna.sanger.ac.uk/sequences/) and uses the same system
for RNA annotation that you use (which, I'm guessing, is a standardized
annotation scheme of some sort).  I believe the database is downloadable and
searchable by name, so you could probably build a querying scheme using LWP
or HTTP::Request (if the web interface allows for this).  I know that Sean
Eddy's Rfam database (http://www.sanger.ac.uk/Software/Rfam/) also has
information on miRNA's, but it's somewhat limited. 


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Wednesday, February 08, 2006 3:45 PM
> To: 'bioperl-l'; bioperl-l-bounces at lists.open-bio.org
> Cc: James.R.Brown at gsk.com
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> Hi Chris--
> 
>         The problem I am solving is given a mature miRna 
> name, how do I use it to search for its pre/pri miRna and 
> vice versa. For example, how to go from mir-102a* to 
> hsa-mir-102a-1*. Yes, I can write a parser for it, but I'm 
> hoping that someone else has already done it and has some 
> bells and whistles to go with it.  Below is a hierarchy chart 
> of a data structure to hold the naming information. The 
> parsing is not trivial and given data in that structure there 
> could be all kinds of neat functions that return various 
> aspects of the names.
> 
> Barry
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> "Chris Fields" 
> Sent by: bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 17:40
>  
> To
> barry.m.dancis at gsk.com, "'bioperl-l'"  cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Are you talking about sequences or text output from a 
> specific program? If you are talking about sequences in a 
> particular format, then listen to Brian.  If you are talking 
> about output, then we need to know which program you're 
> using, as a parser may exist or could be built. 
> 
> There are a few modules in Bio::Tools that handle RNA (like 
> QRNA, tRNAscan-SE), so check those out first.  I'm currently 
> finishing up a Bio::Tools module for RNAMotif and have plans 
> for making an ERPIN parser.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> > barry.m.dancis at gsk.com
> > Sent: Tuesday, February 07, 2006 2:26 PM
> > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Handling miRNA's
> > 
> > It's the parser in particular that I need
> > 
> > 
> > 
> > 
> > "Brian Osborne"  Sent by: 
> > bioperl-l-bounces at lists.open-bio.org
> > 07-Feb-2006 12:05
> > 
> > To
> > barry.m.dancis at gsk.com, "bioperl-l" , 
> > bioperl-l-bounces at lists.open-bio.org
> > cc
> > 
> > Subject
> > Re: [Bioperl-l] Handling miRNA's
> > 
> > 
> > 
> > 
> > 
> > 
> > Barry,
> > 
> > If the sequence information is in one of the formats that Bioperl 
> > understands (Genbank, Swissprot flat, and so on) then the answer is 
> > yes.
> > This assumes that the details on sequence that you 
> mentioned are found 
> > in some sequence feature section in the file. But it looks 
> to me like 
> > there's no specialized parser for miRNA sequence per se, I'll be 
> > corrected if I'm wrong.
> > 
> > Brian O.
> > 
> > 
> > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> 
> > wrote:
> > 
> > > Hi --
> > > 
> > >         Are there any classes for manipulating miRNA's with
> > functions
> > such
> > > as parsing the name, storing and interlinking pri/pre/mat 
> sequences,
> > etc?
> > > 
> > > Thanks,
> > > 
> > > Barry
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 



From allenday at ucla.edu  Fri Feb 10 11:13:39 2006
From: allenday at ucla.edu (Allen Day)
Date: Fri, 10 Feb 2006 08:13:39 -0800 (PST)
Subject: [Bioperl-l] Medline to XML
In-Reply-To: <43ECA303.8090904@guest.arnes.si>
References: <43ECA303.8090904@guest.arnes.si>
Message-ID: 

why not just retrieve xml directly from the eutils service?

-allen

On Fri, 10 Feb 2006, Andrej Kastrin wrote:

> Dear users,
> 
> my problem is not directly related to this list, by I hope, you can help 
> me. Is there any tool to convert standard Medline record to XML format. 
> I know there is build in function (med2xml) in Pubmed, but I'm looking 
> for some independent perl script.
> 
> Thanks in advance for any suggesions or pointers.
> 
> Cheers, Andrej
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Fri Feb 10 12:15:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 10 Feb 2006 12:15:17 -0500
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
Message-ID: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>

Paul -

The reason for suggesting a change has to do with the instability of  
the CGI interface/format of the returned data, the text format is not  
a stable format from the webserver which reportedly will cease to be  
reliably parsed.  Yes we can keep hacking the blast parser code to  
handle this, but the bioperl release cycle is certainly not tied to  
the NCBI blast release cycle so I find it unsatisfying to know that  
we are going to have broken code when they change the output formats  
(but not know when).

Mostly I think we need to try and support something that will  
"ALWAYS" work so that individuals setting up webservices which rely  
on remote blast functionality.  In theory, netblast/blastcl3 should  
always work since NCBI has to update the exe when they change their  
server setup.

In terms of the web-based queues - I think the best change we can  
make is have the XML be the preferred retrieval method.

I also see value in providing a wrapper for netblast since it should  
look an awful lot like running blast locally.

Ideally I'd like to see a more extensible system, something like (and  
please feel free to come up with better names for the modules!):

Bio::Tools::Run::Blast
  -->             StandAlone (support for both WU-BLAST and NCBI- 
BLAST local binaries and MPI-BLAST too if simple)
  -->             RemoteNCBI (currently the RemoteBlast server)
  -->             RemoteEBISOAP (EBI has a nice SOAP interface that  
works quite well, but may not provide all the same databases as what  
people expect from NCBI)
  -->             RemoteNetBlast (blastcl3 or netblast local executable)
  (other things that people want)

[note: If these ideas are appealing or not, someone should archive  
the discussions and discussions on the wiki page so we can rely less  
on people searching the mailing archives for how a decision was  
made.  Perhaps Roger can do this sort of editing in addition to the  
planning for support of this module].

-jason

On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote:

> Hi Roger,
>
> I would definitely prefer a fully Perl-based implementation.  For  
> starters, I have not
> been successful in compiling the Toolkit that contains netblast for  
> some platforms (e.g.
> AIX 5.2 w/gcc 4.0).
>
> I haven't been following the discussion: is there some compelling  
> reason to prefer a
> netblast-based system that's come up recently?  I'm guessing that  
> adding a new non-perl
> dependency would only be done if there was considerable  
> justification for this type of
> change, but I'm not clear from your message what that justification  
> is.
>
> Paul
>
>
>
> ------------------------------
>
> Message: 12
> Date: Mon, 6 Feb 2006 20:46:44 -0600
> From: "Roger Hall" 
> Subject: [Bioperl-l] RemoteBlast users - potentially major changes -
>         please        reply
> To: 
> Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL>
> Content-Type: text/plain;        charset="us-ascii"
>
> To everyone who uses RemoteBlast.pm:
>
> Would anyone object to RemoteBlast being rewritten in a way that  
> requires
> NCBI's blastcl3 executable?
>
> Binary downloads of blastcl3 (column "netblast") are available for  
> numerous
> platforms at: http://ncbi.nih.gov/BLAST/download.shtml
>
> Does anyone require or desire a "pure perl" implementation? If so,  
> please
> explain the advantage you see with such an implementation.
>
> Thanks!
>
>
> Roger Hall
>
> Technical Director
>
> MidSouth Bioinformatics Center
>
> University of Arkansas at Little Rock
>
> (501) 569-8074
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From hubert.prielinger at gmx.at  Fri Feb 10 11:26:47 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 10 Feb 2006 10:26:47 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	blast	output
In-Reply-To: <43EC606A.20003@esat.kuleuven.be>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>	
	<43EC5497.3050505@esat.kuleuven.be>
	<43EC606A.20003@esat.kuleuven.be>
Message-ID: <43ECBEC7.7040506@gmx.at>

Hi,
I'm sorry for disturbing once more. Yesterday the script was working, 
today it isn't working at all, but I didn't change anything, I get the 
following error message:

------------- EXCEPTION  -------------
MSG: Could not open comp80swiss2114.txt: No such file or directory
STACK Bio::Root::IO::_initialize_io 
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
STACK toplevel ./Blast.pl:14

--------------------------------------

the file exists and the bug I have fixed yesterday
thanks for help

Hubert




Pieter Monsieurs wrote:

> Sorry for disturbing. I now works correctly with the bug fix of Chris. 
> Thanx,
> Pieter
>
> Pieter Monsieurs wrote:
>
>>Hi Chris,
>>
>>The parsing of the Blast output still doesn't work for me with the bug 
>>fix download of blast.pm.
>>The module keeps turning around in the while loop at line 487 looking 
>>for a database or query-size:
>>
>>while( defined ($_) ) {
>>	if( /^Database:/ ) {
>>		$self->_pushback($_);
>>		last;
>>	}
>>	chomp;               
>>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>>		$size = $1;
>>		$size =~ s/,//g;
>>		last;
>>	} else {
>>		$q .= " $_";
>>		$q =~ s/ +/ /g;
>>		$q =~ s/^ | $//g;
>>	}
>>	$_ = $self->_readline;
>>}
>>
>>
>>The code keeps looking for the database information, however - as you 
>>mentioned - this information is given before the query line in the new 
>>Blast output format.
>>This way, all hits and hsps are stored in the query_description 
>>($hit->query_description), no hits are found and query_length is 0.
>>Because you already adapted the module to retrieve database information 
>>at another position in the module, deleting the while loop and adding 
>>the following lines after $_ = $self->_readline (line 486), worked fine 
>>for me (using blastn and blastp):
>>
>>if (/Length=([\d,]+)/) {
>>	$size = $1;
>>	$size =~ s/,//g;
>>}
>>
>>
>>Regards,
>>Pieter
>>
>>
>>
>>Chris Fields wrote:
>>
>>  
>>
>>>From 'perldoc Bio::SearchIO::blast':
>>>
>>>DESCRIPTION
>>>       This object encapsulated the necessary methods for generating  
>>>events
>>>       suitable for building Bio::Search objects from a BLAST report  
>>>file.
>>>       Read the Bio::SearchIO for more information about how to use  
>>>this.
>>>
>>>       This driver can parse:
>>>
>>>       o   NCBI produced plain text BLAST reports from blastall,  
>>>this also
>>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>>reports.  NCBI
>>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>>
>>>       o   WU-BLAST all reports
>>>
>>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>>BLAT)
>>>
>>>       o   BLAST-like output from Paracel BTK output
>>>
>>>So, it should.  Let us know if it doesn't.
>>>
>>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>>
>>> 
>>>
>>>    
>>>
>>>>Hi Chris,
>>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>>right, I had only to change the blast.pm file, it is working very  
>>>>fine, thank you very much, and you are right, you have mentioned it  
>>>>ealier either to change the file... ;)
>>>>
>>>>but I have another question: does it work with the WU-Blast output  
>>>>too?
>>>>regards
>>>>Hubert
>>>>
>>>>
>>>>Chris Fields wrote:
>>>>
>>>>   
>>>>
>>>>      
>>>>
>>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>>have we
>>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>>
>>>>>Hubert, if you're out there, I sent you an email with an  
>>>>>attachment.  You
>>>>>said the output looks like what you were expecting.  So I think we  
>>>>>have two
>>>>>problems:
>>>>>
>>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>>takes so
>>>>>long should tell you something's seriously wrong there.  Strip  
>>>>>that part out
>>>>>and start with a simple script, say, like the one Jason or that I  
>>>>>sent you;
>>>>>the script I used to generate that output works fine (on two OS's,  
>>>>>WinXP and
>>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>>command line
>>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>>scripts, esp. when they run debugging.
>>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>>will still
>>>>>not work whenever the text blast output has the following header,  
>>>>>which
>>>>>comes from the new web version of BLAST:
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>>
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>Length=193
>>>>>.......
>>>>>-----------------------------------------------------
>>>>>
>>>>>It will work if the text output has the following header (or is an  
>>>>>older
>>>>>version of BLAST):
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search
>>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>       (193 letters)
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>>-----------------------------------------------------
>>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>>BLAST files.
>>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>>
>>>>>If you want the really gory details on why this is a problem, look  
>>>>>here:
>>>>>
>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>>So, any text output with the above header will not work; it will  
>>>>>either hang
>>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>>patience).  If you
>>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>>reiterate
>>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>>kill me if
>>>>>blows your computer up ;>
>>>>>Here's the direct link:
>>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>>it's version
>>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>>sorry
>>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>>you've tried
>>>>>this, and I believe I've suggested this fix before.
>>>>>
>>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>>prev.
>>>>>message) with this file.  Make sure the filename stays the same  
>>>>>(blast.pm).
>>>>>
>>>>>Run everything again, one file at a time.  Make sure you use  
>>>>>Jason's script
>>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>>multiple
>>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>>file checks.
>>>>>
>>>>>
>>>>>Here's a small chunk of output from one of your blast files using the
>>>>>modifed script I sent you:
>>>>>
>>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>>Query:   1  RWKWKRKK  8
>>>>>Seq:     542  RWAWRRKK  549
>>>>>
>>>>>Look familiar?
>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>>February 09, 2006 3:24 PM
>>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>>
>>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>>that Chris already solved the issue).  ;}
>>>>>>
>>>>>>Thanks!
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>>Prielinger
>>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>>Stajich
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>dear roger,
>>>>>>this error message I got, when I tried to parse Blast output  
>>>>>>(version
>>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>>don't get any error message.....it just doesn't work
>>>>>>
>>>>>>Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>>Roger Hall wrote:
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Guys - I'm looking at the error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>This is my line of thought:
>>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>blast.pm
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>in
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>one
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>location only at the point of a. reading three lines b.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>dropping lines
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Match lines (0
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>><= $i <
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>3)
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>error message
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>expression
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>4. It does anyway
>>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>in the blast
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>reports
>>>>>>>
>>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>the string
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>additional
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>file, assuming that I didn't have it.
>>>>>>>
>>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>>on "Fedora Core 9".
>>>>>>>
>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Did I misread the issue entirely? :}
>>>>>>>
>>>>>>>Roger
>>>>>>>
>>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Chris Fields
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>>To: Hubert Prielinger
>>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>>parsing Blast output
>>>>>>>>
>>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hi chris,
>>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>working,
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>have to parse
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>a lot of textfiles....
>>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>>
>>>>>>>>>regards
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>can't really
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>determine the problem.
>>>>>>>>
>>>>>>>>If you are still getting the same error like this I am not
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>convinced
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>that NCBI
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>Query/Sbjct
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>>September.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>parsing,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>are you sure your logic is correct?
>>>>>>>>
>>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>>
>>>>>>>>
>>>>>>>>while (my $result = $search->next_result) {
>>>>>>>>  print $result->query_name, "\n";
>>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>>	print $hit->name, "\n";
>>>>>>>>      #iterate over each HSP in the hit
>>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hit_string, "\n";	
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>and me with a
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>>and it
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>seemed
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>that he hasn't
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>installed the latest fix.
>>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>2.2.13),
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>even though the returned output was from nr, the top of the  
>>>>>>>blast output showed that it was v2.2.12:
>>>>>>>
>>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>>
>>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>>-------------------------------------
>>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>>
>>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>>-------------------------------------
>>>>>>>
>>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>>the header looks like this:
>>>>>>>
>>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>>
>>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>>            
>>>>>>>
>>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>>          
>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>outputs a new
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>To clarify some stuff -
>>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>for BLAST
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>format and
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>and parse
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>if you have
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>a static binary on your machine =).  I think for
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>remoteblast the XML
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>>people use on a regular basis.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Does XML lack some specific info that text output has?
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Didn't know that.
>>>>>>I
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>>RemoteBlast.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-jason
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>>(1.5.1) or
>>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>
>>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>bioperl 1.5.1,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>realize it hasn't
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>been committed yet and may change.
>>>>>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>Of Hubert
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>>Prielinger
>>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>output
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>is that a bug......
>>>>>>>>>>>
>>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>>get anything.....
>>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>>
>>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>bioperl version
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>I had installed
>>>>>>>>>>>
>>>>>>>>>>>thanks in advance
>>>>>>>>>>>
>>>>>>>>>>>Hubert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>--
>>>>>>>>Jason Stajich
>>>>>>>>Duke University
>>>>>>>>http://www.duke.edu/~jes12
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Christopher Fields
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> 
>>>
>>>    
>>>
>>
>>
>>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>  
>>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> information.
>



From cjfields at uiuc.edu  Fri Feb 10 12:45:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 11:45:32 -0600
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
Message-ID: <002201c62e69$ca8363d0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Friday, February 10, 2006 11:15 AM
> To: Paul.Boutros at utoronto.ca
> Cc: BioPerl Mailing List
> Subject: [Bioperl-l] Remote BLAST support discussion
> 
> Paul -
> 
> The reason for suggesting a change has to do with the 
> instability of the CGI interface/format of the returned data, 
> the text format is not a stable format from the webserver 
> which reportedly will cease to be reliably parsed.  Yes we 
> can keep hacking the blast parser code to handle this, but 
> the bioperl release cycle is certainly not tied to the NCBI 
> blast release cycle so I find it unsatisfying to know that we 
> are going to have broken code when they change the output 
> formats (but not know when).
> 
> Mostly I think we need to try and support something that will 
> "ALWAYS" work so that individuals setting up webservices 
> which rely on remote blast functionality.  In theory, 
> netblast/blastcl3 should always work since NCBI has to update 
> the exe when they change their server setup.
> 
> In terms of the web-based queues - I think the best change we 
> can make is have the XML be the preferred retrieval method.
> 
> I also see value in providing a wrapper for netblast since it 
> should look an awful lot like running blast locally.
> 
> Ideally I'd like to see a more extensible system, something 
> like (and please feel free to come up with better names for 
> the modules!):
> 
> Bio::Tools::Run::Blast
>   -->             StandAlone (support for both WU-BLAST and NCBI-> BLAST
local binaries and MPI-BLAST too if simple)
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that works
quite well, but may not provide all the same databases as what people expect
from NCBI)
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Sounds good to me.  I think any wrapper for netblast could most easily be
based on StandAloneBlast; the parameters look pretty much identical, though
it'll probably need a little configuring as a quick text search through
StandAloneBlast didn't show any 'xml' tags.  Roger seemed to agree on this.
 
> [note: If these ideas are appealing or not, someone should 
> archive the discussions and discussions on the wiki page so 
> we can rely less on people searching the mailing archives for 
> how a decision was made.  Perhaps Roger can do this sort of 
> editing in addition to the planning for support of this module].
> 
> -jason
> 
> On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote:
> 
> > Hi Roger,
> >
> > I would definitely prefer a fully Perl-based implementation.  For 
> > starters, I have not been successful in compiling the Toolkit that 
> > contains netblast for some platforms (e.g.
> > AIX 5.2 w/gcc 4.0).
> >
> > I haven't been following the discussion: is there some compelling 
> > reason to prefer a netblast-based system that's come up 
> recently?  I'm 
> > guessing that adding a new non-perl dependency would only 
> be done if 
> > there was considerable justification for this type of 
> change, but I'm 
> > not clear from your message what that justification is.
> >
> > Paul
> >
> >
> >
> > ------------------------------
> >
> > Message: 12
> > Date: Mon, 6 Feb 2006 20:46:44 -0600
> > From: "Roger Hall" 
> > Subject: [Bioperl-l] RemoteBlast users - potentially major changes -
> >         please        reply
> > To: 
> > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL>
> > Content-Type: text/plain;        charset="us-ascii"
> >
> > To everyone who uses RemoteBlast.pm:
> >
> > Would anyone object to RemoteBlast being rewritten in a way that 
> > requires NCBI's blastcl3 executable?
> >
> > Binary downloads of blastcl3 (column "netblast") are available for 
> > numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml
> >
> > Does anyone require or desire a "pure perl" implementation? If so, 
> > please explain the advantage you see with such an implementation.
> >
> > Thanks!
> >
> >
> > Roger Hall
> >
> > Technical Director
> >
> > MidSouth Bioinformatics Center
> >
> > University of Arkansas at Little Rock
> >
> > (501) 569-8074
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  



From rahall2 at ualr.edu  Fri Feb 10 12:54:23 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 10 Feb 2006 11:54:23 -0600
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <002201c62e69$ca8363d0$15327e82@pyrimidine>
Message-ID: <002501c62e6b$0686be30$d416a790@LIBERAL>

It seems so obvious now. :}

The only issue I see is likely obvious to those of you who have maintained
this over the years - no backward compatibility, but I can live with that if
yall can.

I will document on wikki as suggested and then build the RemoteNCBI module
described. After that is tested and committed, I will contact Torsten to see
if I can help with the rest.

Thanks!

Roger 

> 
> Bio::Tools::Run::Blast
>   -->             StandAlone (support for both WU-BLAST and NCBI-> BLAST
local binaries and MPI-BLAST too if simple)
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that works
quite well, but may not provide all the same databases as what people expect
from NCBI)
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Sounds good to me.  I think any wrapper for netblast could most easily be
based on StandAloneBlast; the parameters look pretty much identical, though
it'll probably need a little configuring as a quick text search through
StandAloneBlast didn't show any 'xml' tags.  Roger seemed to agree on this.
 




From rahall2 at ualr.edu  Fri Feb 10 13:00:51 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 10 Feb 2006 12:00:51 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't
	work	parsing	blast	output
In-Reply-To: <43ECBEC7.7040506@gmx.at>
Message-ID: <002701c62e6b$edd845b0$d416a790@LIBERAL>

Hubert,

I got the same message when I first ran your script. The issue for me was
that "readdir(DIR)" doesn't return the full path, only the file name.

I edited your script to include:

	$file = $directory . '/' . $file;

just before the Bio::SearchIO call.

Roger


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
Sent: Friday, February 10, 2006 10:27 AM
To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; rahall2 at ualr.edu
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
output

Hi,
I'm sorry for disturbing once more. Yesterday the script was working, 
today it isn't working at all, but I didn't change anything, I get the 
following error message:

------------- EXCEPTION  -------------
MSG: Could not open comp80swiss2114.txt: No such file or directory
STACK Bio::Root::IO::_initialize_io 
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
STACK toplevel ./Blast.pl:14

--------------------------------------

the file exists and the bug I have fixed yesterday
thanks for help

Hubert




Pieter Monsieurs wrote:

> Sorry for disturbing. I now works correctly with the bug fix of Chris. 
> Thanx,
> Pieter
>
> Pieter Monsieurs wrote:
>
>>Hi Chris,
>>
>>The parsing of the Blast output still doesn't work for me with the bug 
>>fix download of blast.pm.
>>The module keeps turning around in the while loop at line 487 looking 
>>for a database or query-size:
>>
>>while( defined ($_) ) {
>>	if( /^Database:/ ) {
>>		$self->_pushback($_);
>>		last;
>>	}
>>	chomp;               
>>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>>		$size = $1;
>>		$size =~ s/,//g;
>>		last;
>>	} else {
>>		$q .= " $_";
>>		$q =~ s/ +/ /g;
>>		$q =~ s/^ | $//g;
>>	}
>>	$_ = $self->_readline;
>>}
>>
>>
>>The code keeps looking for the database information, however - as you 
>>mentioned - this information is given before the query line in the new 
>>Blast output format.
>>This way, all hits and hsps are stored in the query_description 
>>($hit->query_description), no hits are found and query_length is 0.
>>Because you already adapted the module to retrieve database information 
>>at another position in the module, deleting the while loop and adding 
>>the following lines after $_ = $self->_readline (line 486), worked fine 
>>for me (using blastn and blastp):
>>
>>if (/Length=([\d,]+)/) {
>>	$size = $1;
>>	$size =~ s/,//g;
>>}
>>
>>
>>Regards,
>>Pieter
>>
>>
>>
>>Chris Fields wrote:
>>
>>  
>>
>>>From 'perldoc Bio::SearchIO::blast':
>>>
>>>DESCRIPTION
>>>       This object encapsulated the necessary methods for generating  
>>>events
>>>       suitable for building Bio::Search objects from a BLAST report  
>>>file.
>>>       Read the Bio::SearchIO for more information about how to use  
>>>this.
>>>
>>>       This driver can parse:
>>>
>>>       o   NCBI produced plain text BLAST reports from blastall,  
>>>this also
>>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>>reports.  NCBI
>>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>>
>>>       o   WU-BLAST all reports
>>>
>>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>>BLAT)
>>>
>>>       o   BLAST-like output from Paracel BTK output
>>>
>>>So, it should.  Let us know if it doesn't.
>>>
>>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>>
>>> 
>>>
>>>    
>>>
>>>>Hi Chris,
>>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>>right, I had only to change the blast.pm file, it is working very  
>>>>fine, thank you very much, and you are right, you have mentioned it  
>>>>ealier either to change the file... ;)
>>>>
>>>>but I have another question: does it work with the WU-Blast output  
>>>>too?
>>>>regards
>>>>Hubert
>>>>
>>>>
>>>>Chris Fields wrote:
>>>>
>>>>   
>>>>
>>>>      
>>>>
>>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>>have we
>>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>>
>>>>>Hubert, if you're out there, I sent you an email with an  
>>>>>attachment.  You
>>>>>said the output looks like what you were expecting.  So I think we  
>>>>>have two
>>>>>problems:
>>>>>
>>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>>takes so
>>>>>long should tell you something's seriously wrong there.  Strip  
>>>>>that part out
>>>>>and start with a simple script, say, like the one Jason or that I  
>>>>>sent you;
>>>>>the script I used to generate that output works fine (on two OS's,  
>>>>>WinXP and
>>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>>command line
>>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>>scripts, esp. when they run debugging.
>>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>>will still
>>>>>not work whenever the text blast output has the following header,  
>>>>>which
>>>>>comes from the new web version of BLAST:
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>>
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>Length=193
>>>>>.......
>>>>>-----------------------------------------------------
>>>>>
>>>>>It will work if the text output has the following header (or is an  
>>>>>older
>>>>>version of BLAST):
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search
>>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>       (193 letters)
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>>-----------------------------------------------------
>>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>>BLAST files.
>>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>>
>>>>>If you want the really gory details on why this is a problem, look  
>>>>>here:
>>>>>
>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>>So, any text output with the above header will not work; it will  
>>>>>either hang
>>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>>patience).  If you
>>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>>reiterate
>>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>>kill me if
>>>>>blows your computer up ;>
>>>>>Here's the direct link:
>>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>>it's version
>>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>>sorry
>>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>>you've tried
>>>>>this, and I believe I've suggested this fix before.
>>>>>
>>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>>prev.
>>>>>message) with this file.  Make sure the filename stays the same  
>>>>>(blast.pm).
>>>>>
>>>>>Run everything again, one file at a time.  Make sure you use  
>>>>>Jason's script
>>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>>multiple
>>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>>file checks.
>>>>>
>>>>>
>>>>>Here's a small chunk of output from one of your blast files using the
>>>>>modifed script I sent you:
>>>>>
>>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>>Query:   1  RWKWKRKK  8
>>>>>Seq:     542  RWAWRRKK  549
>>>>>
>>>>>Look familiar?
>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>>February 09, 2006 3:24 PM
>>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>>
>>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>>that Chris already solved the issue).  ;}
>>>>>>
>>>>>>Thanks!
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>>Prielinger
>>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>>Stajich
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>dear roger,
>>>>>>this error message I got, when I tried to parse Blast output  
>>>>>>(version
>>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>>don't get any error message.....it just doesn't work
>>>>>>
>>>>>>Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>>Roger Hall wrote:
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Guys - I'm looking at the error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>This is my line of thought:
>>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>blast.pm
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>in
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>one
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>location only at the point of a. reading three lines b.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>dropping lines
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Match lines (0
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>><= $i <
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>3)
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>error message
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>expression
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>4. It does anyway
>>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>in the blast
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>reports
>>>>>>>
>>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>the string
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>additional
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>file, assuming that I didn't have it.
>>>>>>>
>>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>>on "Fedora Core 9".
>>>>>>>
>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Did I misread the issue entirely? :}
>>>>>>>
>>>>>>>Roger
>>>>>>>
>>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Chris Fields
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>>To: Hubert Prielinger
>>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>>parsing Blast output
>>>>>>>>
>>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hi chris,
>>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>working,
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>have to parse
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>a lot of textfiles....
>>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>>
>>>>>>>>>regards
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>can't really
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>determine the problem.
>>>>>>>>
>>>>>>>>If you are still getting the same error like this I am not
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>convinced
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>that NCBI
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>Query/Sbjct
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>>September.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>parsing,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>are you sure your logic is correct?
>>>>>>>>
>>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>>
>>>>>>>>
>>>>>>>>while (my $result = $search->next_result) {
>>>>>>>>  print $result->query_name, "\n";
>>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>>	print $hit->name, "\n";
>>>>>>>>      #iterate over each HSP in the hit
>>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hit_string, "\n";	
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>and me with a
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>>and it
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>seemed
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>that he hasn't
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>installed the latest fix.
>>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>2.2.13),
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>even though the returned output was from nr, the top of the  
>>>>>>>blast output showed that it was v2.2.12:
>>>>>>>
>>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>>
>>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>>-------------------------------------
>>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>>
>>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>>-------------------------------------
>>>>>>>
>>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>>the header looks like this:
>>>>>>>
>>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>>
>>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>>            
>>>>>>>
>>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>>          
>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>outputs a new
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>To clarify some stuff -
>>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>for BLAST
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>format and
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>and parse
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>if you have
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>a static binary on your machine =).  I think for
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>remoteblast the XML
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>>people use on a regular basis.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Does XML lack some specific info that text output has?
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Didn't know that.
>>>>>>I
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>>RemoteBlast.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-jason
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>>(1.5.1) or
>>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>
>>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>bioperl 1.5.1,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>realize it hasn't
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>been committed yet and may change.
>>>>>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>Of Hubert
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>>Prielinger
>>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>output
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>is that a bug......
>>>>>>>>>>>
>>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>>get anything.....
>>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>>
>>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>bioperl version
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>I had installed
>>>>>>>>>>>
>>>>>>>>>>>thanks in advance
>>>>>>>>>>>
>>>>>>>>>>>Hubert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>--
>>>>>>>>Jason Stajich
>>>>>>>>Duke University
>>>>>>>>http://www.duke.edu/~jes12
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Christopher Fields
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> 
>>>
>>>    
>>>
>>
>>
>>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>  
>>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> information.
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Fri Feb 10 13:08:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 12:08:37 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't
	work	parsing	blast	output
In-Reply-To: <002701c62e6b$edd845b0$d416a790@LIBERAL>
Message-ID: <002501c62e6d$04158530$15327e82@pyrimidine>

Makes sense.  I didn't see this since I passed the files directly from
command-line.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Roger Hall [mailto:rahall2 at ualr.edu] 
> Sent: Friday, February 10, 2006 12:01 PM
> To: 'Hubert Prielinger'; 'Pieter Monsieurs'; 
> bioperl-l at bioperl.org; 'Chris Fields'
> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing blast output
> 
> Hubert,
> 
> I got the same message when I first ran your script. The 
> issue for me was that "readdir(DIR)" doesn't return the full 
> path, only the file name.
> 
> I edited your script to include:
> 
> 	$file = $directory . '/' . $file;
> 
> just before the Bio::SearchIO call.
> 
> Roger
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Hubert Prielinger
> Sent: Friday, February 10, 2006 10:27 AM
> To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; 
> rahall2 at ualr.edu
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing blast output
> 
> Hi,
> I'm sorry for disturbing once more. Yesterday the script was 
> working, today it isn't working at all, but I didn't change 
> anything, I get the following error message:
> 
> ------------- EXCEPTION  -------------
> MSG: Could not open comp80swiss2114.txt: No such file or 
> directory STACK Bio::Root::IO::_initialize_io
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
> STACK Bio::Root::IO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
> STACK toplevel ./Blast.pl:14
> 
> --------------------------------------
> 
> the file exists and the bug I have fixed yesterday thanks for help
> 
> Hubert
> 
> 
> 
> 
> Pieter Monsieurs wrote:
> 
> > Sorry for disturbing. I now works correctly with the bug 
> fix of Chris. 
> > Thanx,
> > Pieter
> >
> > Pieter Monsieurs wrote:
> >
> >>Hi Chris,
> >>
> >>The parsing of the Blast output still doesn't work for me 
> with the bug 
> >>fix download of blast.pm.
> >>The module keeps turning around in the while loop at line 
> 487 looking 
> >>for a database or query-size:
> >>
> >>while( defined ($_) ) {
> >>	if( /^Database:/ ) {
> >>		$self->_pushback($_);
> >>		last;
> >>	}
> >>	chomp;               
> >>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
> >>		$size = $1;
> >>		$size =~ s/,//g;
> >>		last;
> >>	} else {
> >>		$q .= " $_";
> >>		$q =~ s/ +/ /g;
> >>		$q =~ s/^ | $//g;
> >>	}
> >>	$_ = $self->_readline;
> >>}
> >>
> >>
> >>The code keeps looking for the database information, 
> however - as you 
> >>mentioned - this information is given before the query line 
> in the new 
> >>Blast output format.
> >>This way, all hits and hsps are stored in the query_description 
> >>($hit->query_description), no hits are found and query_length is 0.
> >>Because you already adapted the module to retrieve database 
> >>information at another position in the module, deleting the 
> while loop 
> >>and adding the following lines after $_ = $self->_readline 
> (line 486), 
> >>worked fine for me (using blastn and blastp):
> >>
> >>if (/Length=([\d,]+)/) {
> >>	$size = $1;
> >>	$size =~ s/,//g;
> >>}
> >>
> >>
> >>Regards,
> >>Pieter
> >>
> >>
> >>
> >>Chris Fields wrote:
> >>
> >>  
> >>
> >>>From 'perldoc Bio::SearchIO::blast':
> >>>
> >>>DESCRIPTION
> >>>       This object encapsulated the necessary methods for 
> generating 
> >>>events
> >>>       suitable for building Bio::Search objects from a 
> BLAST report 
> >>>file.
> >>>       Read the Bio::SearchIO for more information about 
> how to use 
> >>>this.
> >>>
> >>>       This driver can parse:
> >>>
> >>>       o   NCBI produced plain text BLAST reports from blastall,  
> >>>this also
> >>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq 
> >>>reports.  NCBI
> >>>           XML BLAST output is parsed with the blastxml SearchIO 
> >>>driver
> >>>
> >>>       o   WU-BLAST all reports
> >>>
> >>>       o   Jim Kent's BLAST-like output from his programs 
> (BLASTZ,  
> >>>BLAT)
> >>>
> >>>       o   BLAST-like output from Paracel BTK output
> >>>
> >>>So, it should.  Let us know if it doesn't.
> >>>
> >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
> >>>
> >>> 
> >>>
> >>>    
> >>>
> >>>>Hi Chris,
> >>>>I'm incredibly sorry for causing so much inconvenience, 
> yes you are 
> >>>>right, I had only to change the blast.pm file, it is working very 
> >>>>fine, thank you very much, and you are right, you have 
> mentioned it 
> >>>>ealier either to change the file... ;)
> >>>>
> >>>>but I have another question: does it work with the 
> WU-Blast output 
> >>>>too?
> >>>>regards
> >>>>Hubert
> >>>>
> >>>>
> >>>>Chris Fields wrote:
> >>>>
> >>>>   
> >>>>
> >>>>      
> >>>>
> >>>>>Ha!  I come back from meeting and there's a billion 
> emails!  What 
> >>>>>have we started? ;p .  Sorry about this Jason; I know 
> you're busy.
> >>>>>
> >>>>>Hubert, if you're out there, I sent you an email with an 
> >>>>>attachment.  You said the output looks like what you were 
> >>>>>expecting.  So I think we have two
> >>>>>problems:
> >>>>>
> >>>>>1)  I haven't delved into the file scanning, but the 
> fact that it 
> >>>>>takes so long should tell you something's seriously 
> wrong there.  
> >>>>>Strip that part out and start with a simple script, say, 
> like the 
> >>>>>one Jason or that I sent you; the script I used to generate that 
> >>>>>output works fine (on two OS's, WinXP and Mac OS X).  
> Use it on one 
> >>>>>file at a time.  Do everything on command line (not through 
> >>>>>Eclipse).  IDE's can be notoriously flaky about running scripts, 
> >>>>>esp. when they run debugging.
> >>>>>2) Even if you have bioperl-1.5.1 installed, 
> Bio::SearchIO::blast 
> >>>>>will still not work whenever the text blast output has the 
> >>>>>following header, which comes from the new web version of BLAST:
> >>>>>
> >>>>>-----------------------------------------------------
> >>>>>BLASTP 2.2.13 [Nov-27-2005]
> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and 
> David J.  
> >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of 
> >>>>>protein database search programs", Nucleic Acids Res. 
> 25:3389-3402.
> >>>>>
> >>>>>RID: 1139501210-857-165793005128.BLASTQ1
> >>>>>
> >>>>>
> >>>>>Database: All non-redundant GenBank CDS
> >>>>>translations+PDB+SwissProt+PIR+PRF excluding 
> environmental samples
> >>>>>         3,292,813 sequences; 1,128,164,434 total 
> letters Query=  
> >>>>>NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium 
> >>>>>tuberculosis H37Rv].
> >>>>>Length=193
> >>>>>.......
> >>>>>-----------------------------------------------------
> >>>>>
> >>>>>It will work if the text output has the following header 
> (or is an 
> >>>>>older version of BLAST):
> >>>>>
> >>>>>-----------------------------------------------------
> >>>>>BLASTP 2.2.12 [Aug-07-2005]
> >>>>>
> >>>>>
> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
> >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of 
> >>>>>protein database search programs",  Nucleic Acids Res. 
> >>>>>25:3389-3402.
> >>>>>
> >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR 
> [Mycobacterium 
> >>>>>tuberculosis H37Rv].
> >>>>>       (193 letters)
> >>>>>
> >>>>>Database: All non-redundant GenBank CDS
> >>>>>translations+PDB+SwissProt+PIR+PRF excluding 
> environmental samples
> >>>>>         2,895,325 sequences; 997,103,285 total letters
> >>>>>-----------------------------------------------------
> >>>>>You have the former (2.2.13) version.  I know b/c I have 
> your BLAST 
> >>>>>files.
> >>>>>Therefore, even bioperl-1.5.1 will not work!
> >>>>>
> >>>>>If you want the really gory details on why this is a 
> problem, look
> >>>>>here:
> >>>>>
> >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>
> >>>>>So, any text output with the above header will not work; it will 
> >>>>>either hang or end abruptly (depending on OS, perl 
> version, memory, 
> >>>>>patience).  If you look in the above, I have added a preliminary 
> >>>>>fix for this.  I'll reiterate for the billionth time, it hasn't 
> >>>>>been committed yet, so don't kill me if blows your 
> computer up ;> 
> >>>>>Here's the direct link:
> >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
> >>>>>This is a modified version of Bio::SearchIO::blast.pm 
> (it says it's 
> >>>>>version 1.90, but it's lying, I didn't change the 
> version, only the 
> >>>>>regex; sorry Jason).  From what you've been posting it doesn't 
> >>>>>sound like you've tried this, and I believe I've 
> suggested this fix 
> >>>>>before.
> >>>>>
> >>>>>Replace the one in your Bio/SearchIO directory (which looks like 
> >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging 
> from your 
> >>>>>prev.
> >>>>>message) with this file.  Make sure the filename stays the same 
> >>>>>(blast.pm).
> >>>>>
> >>>>>Run everything again, one file at a time.  Make sure you use 
> >>>>>Jason's script as well as the one I sent you.  Do NOT rely on 
> >>>>>running through multiple files yet.  Fix one bug at a time.  And 
> >>>>>heed Joel's words about file checks.
> >>>>>
> >>>>>
> >>>>>Here's a small chunk of output from one of your blast 
> files using 
> >>>>>the modifed script I sent you:
> >>>>>
> >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
> >>>>>Query:   1  RWKWKRKK  8
> >>>>>Seq:     542  RWAWRRKK  549
> >>>>>
> >>>>>Look familiar?
> >>>>>
> >>>>>Christopher Fields
> >>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>University of Illinois Urbana-Champaign
> >>>>>
> >>>>>     
> >>>>>
> >>>>>        
> >>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, 
> >>>>>>February 09, 2006 3:24 PM
> >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
> >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't 
> work parsing 
> >>>>>>Blast output
> >>>>>>
> >>>>>>In other words, yes, I'm on the wrong trail. :}
> >>>>>>
> >>>>>>Sorry - I'll look at the output issue this evening (or realize 
> >>>>>>that Chris already solved the issue).  ;}
> >>>>>>
> >>>>>>Thanks!
> >>>>>>
> >>>>>>Roger
> >>>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf 
> Of Hubert 
> >>>>>>Prielinger
> >>>>>>Sent: Thursday, February 09, 2006 2:14 PM
> >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris 
> Fields; Jason 
> >>>>>>Stajich
> >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't 
> work parsing 
> >>>>>>Blast output
> >>>>>>
> >>>>>>dear roger,
> >>>>>>this error message I got, when I tried to parse Blast output 
> >>>>>>(version
> >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, 
> because I have 
> >>>>>>a lot of Blast output files with version 2.2.13 and for that I 
> >>>>>>don't get any error message.....it just doesn't work
> >>>>>>
> >>>>>>Hubert
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>Roger Hall wrote:
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>Guys - I'm looking at the error message:
> >>>>>>>
> >>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>STACK toplevel
> >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>Blast.pl:21
> >>>>>>>
> >>>>>>>This is my line of thought:
> >>>>>>>1. "no data for midline $_" is a unique message generated by
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>blast.pm
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>in
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>one
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>location only at the point of a. reading three lines b.
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>dropping lines
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>with spaces only c. identifying the Query, Midline, and
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Match lines (0
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>><= $i <
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>3)
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>2. There is a regexp match that fails in order to reach that
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>error message
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>expression
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>4. It does anyway
> >>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>in the blast
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>reports
> >>>>>>>
> >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>the string
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>additional
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>file, assuming that I didn't have it.
> >>>>>>>
> >>>>>>>My next thought is to write a quick script to test 
> perl behavior 
> >>>>>>>on "Fedora Core 9".
> >>>>>>>
> >>>>>>>Thoughts?
> >>>>>>>
> >>>>>>>Did I misread the issue entirely? :}
> >>>>>>>
> >>>>>>>Roger
> >>>>>>>
> >>>>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Chris Fields
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
> >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
> >>>>>>>Cc: bioperl-l at bioperl.org
> >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> >>>>>>>parsing Blast output
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>-----Original Message-----
> >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
> >>>>>>>>To: Hubert Prielinger
> >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
> >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> >>>>>>>>parsing Blast output
> >>>>>>>>
> >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>hi chris,
> >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>working,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>do you have any ohter idea, the problem I have is that I
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>have to parse
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>a lot of textfiles....
> >>>>>>>>>or shall I look for another option to parse those files...
> >>>>>>>>>
> >>>>>>>>>regards
> >>>>>>>>>Hubert
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
> >>>>>>>>2.2.13 reports but unless you post your blast report we
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>can't really
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>determine the problem.
> >>>>>>>>
> >>>>>>>>If you are still getting the same error like this I am not
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>convinced
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>that NCBI
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>changed the HSP result format to remove the ':' from the
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>Query/Sbjct
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>prefixes.  We fixed this as soon as it was apparent 
> sometime in 
> >>>>>>>>September.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>>>>>STACK toplevel
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>>Blast.pl:21
> >>>>>>>>
> >>>>>>>>If you are just getting no results but also no warnings wrt
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>parsing,
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>are you sure your logic is correct?
> >>>>>>>>
> >>>>>>>>If you remove your filters do you see all the HSPS?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>while (my $result = $search->next_result) {
> >>>>>>>>  print $result->query_name, "\n";
> >>>>>>>>  #iterate over each hit on the query sequence
> >>>>>>>>  while (my $hit = $result->next_hit) {
> >>>>>>>>	print $hit->name, "\n";
> >>>>>>>>      #iterate over each HSP in the hit
> >>>>>>>>      while (my $hsp = $hit->next_hsp) {
> >>>>>>>>	 print $hsp->evalue, " ", 
> $hsp->length('sbjct'), " ", $hsp-
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>hit_string, "\n";	
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>     }
> >>>>>>>> }
> >>>>>>>>}
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>I tested some of the BLAST results that Hubert sent Roger
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>and me with a
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>similar script to the above.  I removed the file parsing logic 
> >>>>>>>and it
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>seemed
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>to work just fine.  It may very well be a logic issue or
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>that he hasn't
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>installed the latest fix.
> >>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>2.2.13),
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>even though the returned output was from nr, the top 
> of the blast 
> >>>>>>>output showed that it was v2.2.12:
> >>>>>>>
> >>>>>>>BLASTP 2.2.12 [Aug-07-2005]
> >>>>>>>
> >>>>>>>I double-checked my local version and it's definitely v.2.2.13:
> >>>>>>>-------------------------------------
> >>>>>>>C:\Perl\Scripts>blastcl3 -
> >>>>>>>
> >>>>>>>blastcl3 2.2.13   arguments:...
> >>>>>>>-------------------------------------
> >>>>>>>
> >>>>>>>If you use RemoteBlast using the same settings, the version in 
> >>>>>>>the header looks like this:
> >>>>>>>
> >>>>>>>BLASTP 2.2.13 [Nov-27-2005]
> >>>>>>>
> >>>>>>>I'm wondering if all the blast executables (blast and netblast)
> >>>>>>>            
> >>>>>>>
> >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
> >>>>>>          
> >>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>outputs a new
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>To clarify some stuff -
> >>>>>>>>Chris I don't necessarily think the XML is best way forward
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>for BLAST
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>reports generated locally, it isn't as detailed as the Text
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>format and
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>it is what most people expect to be able to scroll through
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>and parse
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>-- it is also harder for the format to change 
> dramatically        
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>if you have
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>a static binary on your machine =).  I think for
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>remoteblast the XML
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>format should be the way forward but I expect Bioperl to 
> >>>>>>>>maintain support of any plain text BLAST report format that 
> >>>>>>>>people use on a regular basis.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>Does XML lack some specific info that text output has?
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Didn't know that.
> >>>>>>I
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>believe that XML should be default in RemoteBlast 
> since it will 
> >>>>>>>not break, but I agree with you about text output.  I 
> also agree 
> >>>>>>>that it will need somebody to maintain it constantly, 
> much like 
> >>>>>>>RemoteBlast.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>-jason
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>Chris Fields wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>>>My guess is you're running into text parsing problems in 
> >>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest 
> developer version
> >>>>>>>>>>(1.5.1) or
> >>>>>>>>>>bioperl-live (CVS), then see the bug below.
> >>>>>>>>>>
> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>>
> >>>>>>>>>>I think the first problem you ran into is solved in
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>bioperl 1.5.1,
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>>>the last problem (more recent, not related to the 
> first) has  
> >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
> >>>>>>>>>>The fixed SearchIO::blast is available in the link 
> above, but
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>realize it hasn't
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>been committed yet and may change.
> >>>>>>>>>>
> >>>>>>>>>>Christopher Fields
> >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>Of Hubert
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>>>>Prielinger
> >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>>>>>>>>To: bioperl-l at bioperl.org
> >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>parsing Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>output
> >>>>>>>>>>>
> >>>>>>>>>>>Hi,
> >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
> >>>>>>>>>>>Bio::SearchIO, I get the following error message:
> >>>>>>>>>>>
> >>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>>>>>STACK toplevel
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>>Blast.pl:21
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>is that a bug......
> >>>>>>>>>>>
> >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), 
> I don't get 
> >>>>>>>>>>>anything.....
> >>>>>>>>>>>I'm using bioperl 1.4
> >>>>>>>>>>>
> >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>parsing Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>Output (version 2.2.12), but I don't remember which
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>bioperl version
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>I had installed
> >>>>>>>>>>>
> >>>>>>>>>>>thanks in advance
> >>>>>>>>>>>
> >>>>>>>>>>>Hubert
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>_______________________________________________
> >>>>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>--
> >>>>>>>>Jason Stajich
> >>>>>>>>Duke University
> >>>>>>>>http://www.duke.edu/~jes12
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>Christopher Fields
> >>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>
> >>>>>>>_______________________________________________
> >>>>>>>Bioperl-l mailing list
> >>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>_______________________________________________
> >>>>>>Bioperl-l mailing list
> >>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>     
> >>>>>
> >>>>>        
> >>>>>
> >>>Christopher Fields
> >>>Postdoctoral Researcher
> >>>Lab of Dr. Robert Switzer
> >>>Dept of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at lists.open-bio.org
> >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> 
> >>>
> >>>    
> >>>
> >>
> >>
> >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>  
> >>
> >
> >
> > Disclaimer: 
> http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> > information.
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From victor.ruotti at gmail.com  Fri Feb 10 15:09:16 2006
From: victor.ruotti at gmail.com (Victor)
Date: Fri, 10 Feb 2006 14:09:16 -0600
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: 
References: 
	
Message-ID: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>

Hi Jason,
Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to
/usr/local/bin, I get the same problem. I think this might have to do with
the _run internal method/sub. If you look at that subroutine, you'll see
that it is using both $self->executable and $self->program_name. The test
passes fine, but we might need to write a better test for this particular
case.

Instead of saying:
     my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
I think the author meant to say:
     my $str=
Bio::Root::IO->catfile($self->program_dir,$self->program_name);

I quickly used Data::Dumper on both executate and program_name and this is
what I get:
$VAR1 = 'blat';
$VAR1 = 'blat';

So the path is hardcoded to be /usr/local/bin/blat/blat when calling run
though factory.

I'd like to change the constructor a bit to deal with the params a little
better and include a config file using
Config::General. Also, I noticed that there is a another Blat.pm module, a
parser module. Should we integrate this parser with the blat run module?

Brian/Jason. Does that sound like a good idea?

Victor


On 2/10/06, Jason Stajich  wrote:
>
> brian -   just FYI -
>
> The AUTOLOAD stuff is present a great number of the run modules so  this
> is standard per se in that set.
>
> I think Victor's problem may have been the BLATDIR env variable pointing
> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor?
>
> tests passed for me before I did the 1.5.1 release for  this module so it
> basically works.   It definitely needs a carekeeper as lot of these run
> modules were built during the fugu group annotation project and never got
> audited/re-vised after that.
>
>
> -jason
> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>
> Victor,
>
> Fantastic, this is certainly a module in need, in fact there was already a
> note on this in the Wiki, I'll update it:
>
> http://bioperl.open-bio.org/wiki/Orphan_modules
>
> So all I did was:
>
> >cd bioperl-run
> >perl ?I. -w t/Blat.t
>
> This is the most recent bioperl-run, the live version, and all tests
> passed. I'd downloaded the most recent binaries and put them in my
> /usr/local/bin, already in my PATH. That's it.
>
> That's the saddest looking new() I've ever seen in Bioperl, a mixture of
> named and unnamed parameters like that, how bizarre. The "proper" way, of
> course, is to use _rearrange, and not use AUTOLOAD.
>
> Thanks again,
>
> Brian O.
>
>
> On 2/10/06 11:02 AM, "Victor"  wrote:
>
> Brian,
> I'd be happy to do that. Can you send me a quick snap on how you got it to
> work first. I'd like to see what is working first, before I start fixing
> things.
>
> And yes I'll take a look at the Blat.t to see more on it.
>
> Victor
>
>
> On 2/9/06, *Brian Osborne*  wrote:
>
> Victor,
>
> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is
> working for me even though I haven't set BLATDIR. This is using the latest
> blat, v. 33.
>
> There is a problem here though, you can see it if you read Blat.t. The
> constructor does not look like your usual new():
>
> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>
> -verbose => $verbose,
>                             "DB"     => $db);
>
> Unfortunate - would you be willing to do more than add a useful SYNOPSIS
> and
> actually fix new()? There is a subtext here, we're trying to find people
> who
> would be willing to maintain useful modules like these, the ideal person
> in
> this case would be someone who'd regularly use the module.
>
> Brian O.
>
>
> On 2/9/06 6:22 PM, "Victor"  wrote:
>
> > Hi,
> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to
> date
> > in the lastest bioperl release?
> >
> >
> >
> > use Bio::Tools::Run::Alignment::Blat;
> > my $factory = Bio::Tools::Run::Alignment::Blat->new();
> > my $seq =
> > "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";
> >
> > my @feats = $factory->run( $seq);
> >
> > Here is what I get when tring to use it:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
> > /tmp/fB09bp5F76) crashed: -1
> >
> > Notice that it is using "blat' twice in the path. The way that I fixed
> this
> > is by going to the blat.pm    module and
> changing the following lines:
> > #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
> > my $str= Bio::Root::IO->catfile($self->program_name);
> >
> > Any ideas, maybe I'm missing the $ENV variable somewhere?
> > I'd like to avoid making this change. Also does anyone have a known
> synopsis
> > of this blat module (where to set the parameters, and whether it allows
> you
> > to have a config file).
> > I'll be happy to add a better synopsis to the module if needed.
> >
> > Thanks in advance,
> > Victor
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>
>
>
>
>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12 
>
>
>



From jason.stajich at duke.edu  Fri Feb 10 15:36:04 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 10 Feb 2006 15:36:04 -0500
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
References: 
	
	<36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
Message-ID: <7F520AFA-84C9-485B-A408-7A9DEFC1186E@duke.edu>


On Feb 10, 2006, at 3:09 PM, Victor wrote:

> Hi Jason,
> Well, in my env. BLATDIR was not setup at all. When setting BLATDIR  
> to /usr/local/bin, I get the same problem. I think this might have  
> to do with the _run internal method/sub. If you look at that  
> subroutine, you'll see that it is using both $self->executable and  
> $self->program_name. The test passes fine, but we might need to  
> write a better test for this particular case.
>
> Instead of saying:
>      my $str= Bio::Root::IO->catfile($self->executable,$self- 
> >program_name);
> I think the author meant to say:
>      my $str= Bio::Root::IO->catfile($self->program_dir,$self- 
> >program_name);
>
> I quickly used Data::Dumper on both executate and program_name and  
> this is what I get:
> $VAR1 = 'blat';
> $VAR1 = 'blat';
>
> So the path is hardcoded to be /usr/local/bin/blat/blat when  
> calling run though factory.
>
Hmm are you sure you are looking at the 1.5.1 code and/or what is in  
CVS?

> I'd like to change the constructor a bit to deal with the params a  
> little better and include a config file using
> Config::General. Also, I noticed that there is a another Blat.pm  
> module, a parser module. Should we integrate this parser with the  
> blat run module?
>
Well maybe as another parser option - I believe I added/edited it to  
use the PSL parser in Bio::SearchIO is that not what you see?

Ick there are also some system commands in this module too which need  
to be removed and replaced with File::Copy or figure out how to  
remove them all together.


> Brian/Jason. Does that sound like a good idea?

But yes it needs some TLC
  I'm not sure I know enough about Config::General  to say  yes or no  
- but all of the run modules need some help in standardization so I  
would propose trying to integrate some changes into the base class  
(WrapperBase) that can be utilized by all the sub-classes -- if you  
want to use this as a model for how to do it that would be great too.

thx,
-j
>
> Victor
>
>
> On 2/10/06, Jason Stajich  wrote:
> brian -
>   just FYI -
>
> The AUTOLOAD stuff is present a great number of the run modules so   
> this is standard per se in that set.
>
> I think Victor's problem may have been the BLATDIR env variable  
> pointing to /usr/local/bin/blat instead of /usr/local/bin - is that  
> the case victor?
>
> tests passed for me before I did the 1.5.1 release for  this module  
> so it basically works.   It definitely needs a carekeeper as lot of  
> these run modules were built during the fugu group annotation  
> project and never got audited/re-vised after that.
>
>
> -jason
>
> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>
>> Victor,
>>
>> Fantastic, this is certainly a module in need, in fact there was  
>> already a note on this in the Wiki, I'll update it:
>>
>> http://bioperl.open-bio.org/wiki/Orphan_modules
>>
>> So all I did was:
>>
>> >cd bioperl-run
>> >perl ?I. -w t/Blat.t
>>
>> This is the most recent bioperl-run, the live version, and all  
>> tests passed. I'd downloaded the most recent binaries and put them  
>> in my /usr/local/bin, already in my PATH. That's it.
>>
>> That's the saddest looking new() I've ever seen in Bioperl, a  
>> mixture of named and unnamed parameters like that, how bizarre.  
>> The "proper" way, of course, is to use _rearrange, and not use  
>> AUTOLOAD.
>>
>> Thanks again,
>>
>> Brian O.
>>
>>
>> On 2/10/06 11:02 AM, "Victor"  wrote:
>>
>>> Brian,
>>> I'd be happy to do that. Can you send me a quick snap on how you  
>>> got it to work first. I'd like to see what is working first,  
>>> before I start fixing things.
>>>
>>> And yes I'll take a look at the Blat.t to see more on it.
>>>
>>> Victor
>>>
>>>
>>> On 2/9/06, Brian Osborne < osborne1 at optonline.net> wrote:
>>>> Victor,
>>>>
>>>> Yes, it may be that blat is not in your path, bioperl-run/t/ 
>>>> Blat.t is
>>>> working for me even though I haven't set BLATDIR. This is using  
>>>> the latest
>>>> blat, v. 33.
>>>>
>>>> There is a problem here though, you can see it if you read  
>>>> Blat.t. The
>>>> constructor does not look like your usual new():
>>>>
>>>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>>>>
>>>> -verbose => $verbose,
>>>>                             "DB"     => $db);
>>>>
>>>> Unfortunate - would you be willing to do more than add a useful  
>>>> SYNOPSIS and
>>>> actually fix new()? There is a subtext here, we're trying to  
>>>> find people who
>>>> would be willing to maintain useful modules like these, the  
>>>> ideal person in
>>>> this case would be someone who'd regularly use the module.
>>>>
>>>> Brian O.
>>>>
>>>>
>>>> On 2/9/06 6:22 PM, "Victor"  wrote:
>>>>
>>>> > Hi,
>>>> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module  
>>>> is up to date
>>>> > in the lastest bioperl release?
>>>> >
>>>> >
>>>> >
>>>> > use Bio::Tools::Run::Alignment::Blat;
>>>> > my $factory = Bio::Tools::Run::Alignment::Blat->new();
>>>> > my $seq =
>>>> >  
>>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAG 
>>>> TA";
>>>> >
>>>> > my @feats = $factory->run( $seq);
>>>> >
>>>> > Here is what I get when tring to use it:
>>>> >
>>>> > ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast   
>>>> TGAAATAAAACTCAGTA
>>>> > /tmp/fB09bp5F76) crashed: -1
>>>> >
>>>> > Notice that it is using "blat' twice in the path. The way that  
>>>> I fixed this
>>>> > is by going to the blat.pm   module and  
>>>> changing the following lines:
>>>> > #my $str= Bio::Root::IO->catfile($self->executable,$self- 
>>>> >program_name);
>>>> > my $str= Bio::Root::IO->catfile($self->program_name);
>>>> >
>>>> > Any ideas, maybe I'm missing the $ENV variable somewhere?
>>>> > I'd like to avoid making this change. Also does anyone have a  
>>>> known synopsis
>>>> > of this blat module (where to set the parameters, and whether  
>>>> it allows you
>>>> > to have a config file).
>>>> > I'll be happy to add a better synopsis to the module if needed.
>>>> >
>>>> > Thanks in advance,
>>>> > Victor
>>>> >
>>>> > _______________________________________________
>>>> > Bioperl-l mailing list
>>>> > Bioperl-l at lists.open-bio.org
>>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l  >>> lists.open-bio.org/mailman/listinfo/bioperl-l>
>>>>
>>>>
>>>
>>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12





From hlapp at gmx.net  Fri Feb 10 16:39:39 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 10 Feb 2006 13:39:39 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c62e60$9acecca0$c2987ca5@pc13>
References: <000001c62e60$9acecca0$c2987ca5@pc13>
Message-ID: 

Sohel,

please allow me to copy the list in my response. There's many good and 
insightful people on the list who may have something to add or 
different ideas.

I've come across that problem myself, for instance with InterPro. What 
I've done so far simply is to stick it unstructured into the definition 
slot, which is not helpful if your purpose goes further than just 
displaying it in an unstructured fashion.

I'm not sure you would want to create another class for this (like 
AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the 
implementation, probably not the interface) annotatable (i.e., 
implement Bio::Annotatable), which supposedly would be simple to do 
(AnnotationCollection is already implemented, you'd just return an 
instance of it).

Even though tag/value pairs sound like quick&fast way to go I'm leaning 
against it; in essence we're moving away from that elsewhere 
(SeqFeatureI) and hence I don't think we should restart it here.

I'm not giving a definitive answer here, just my (initial) thoughts. 
Hope that helps nonetheless. Can you fancy yourself trying the 
Annotatable approach and let us know how it goes?

	-hilmar


On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:

> Hi Hilmar,
> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, 
> Northwestern University. I am working on a parser for an ontology 
> file. I really like the ontology object model which you have 
> contributed to Bioperl. I think its just Awesome!! One of things which 
> I thought would be great to capture is the ontology headers. Right now 
> one can specify only the name, authority information. I was wondering 
> if there is any way, I could also capture other ontology file headers 
> like version of the file, date when that ontology file was made. I was 
> thinking of making a header class or alternatively it could go as Hash 
> of values in the Bio::Ontology::Ontology class itself. I wanted to 
> know whets your thoughts about on this.
> ?
> Thanks,
> Sohel Merchant
> dictyBase
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





From osborne1 at optonline.net  Fri Feb 10 16:49:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 10 Feb 2006 16:49:18 -0500
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
Message-ID: 

Victor,

Just a note on "convention", excuse me if this is obvious. A few different
greps on the modules in bioperl-run shows that executable() gets or sets the
full path to the program in question, program() or program_name() gets or
sets the name of the app (e.g. "blat"). program_dir() does what it sounds
like. So you're right, "($self->executable,$self->program_name)", doesn't
make sense.

I can't speak to Config::General but I'd say that my first concern would be
that the things works in the normal way, either by naming parameters or by
passing an array of arguments, but not a mixture of both!

Of course you're right in thinking that tying execution to parsing is a good
idea, and it looks like this is done already, just glancing at t/Blat.t.

Brian O.


On 2/10/06 3:09 PM, "Victor"  wrote:

> Hi Jason,
> Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to
> /usr/local/bin, I get the same problem. I think this might have to do with
> the _run internal method/sub. If you look at that subroutine, you'll see
> that it is using both $self->executable and $self->program_name. The test
> passes fine, but we might need to write a better test for this particular
> case.
> 
> Instead of saying:
>      my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
> I think the author meant to say:
>      my $str=
> Bio::Root::IO->catfile($self->program_dir,$self->program_name);
> 
> I quickly used Data::Dumper on both executate and program_name and this is
> what I get:
> $VAR1 = 'blat';
> $VAR1 = 'blat';
> 
> So the path is hardcoded to be /usr/local/bin/blat/blat when calling run
> though factory.
> 
> I'd like to change the constructor a bit to deal with the params a little
> better and include a config file using
> Config::General. Also, I noticed that there is a another Blat.pm module, a
> parser module. Should we integrate this parser with the blat run module?
> 
> Brian/Jason. Does that sound like a good idea?
> 
> Victor
> 
> 
> On 2/10/06, Jason Stajich  wrote:
>> 
>> brian -   just FYI -
>> 
>> The AUTOLOAD stuff is present a great number of the run modules so  this
>> is standard per se in that set.
>> 
>> I think Victor's problem may have been the BLATDIR env variable pointing
>> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor?
>> 
>> tests passed for me before I did the 1.5.1 release for  this module so it
>> basically works.   It definitely needs a carekeeper as lot of these run
>> modules were built during the fugu group annotation project and never got
>> audited/re-vised after that.
>> 
>> 
>> -jason
>> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>> 
>> Victor,
>> 
>> Fantastic, this is certainly a module in need, in fact there was already a
>> note on this in the Wiki, I'll update it:
>> 
>> http://bioperl.open-bio.org/wiki/Orphan_modules
>> 
>> So all I did was:
>> 
>>> cd bioperl-run
>>> perl ?I. -w t/Blat.t
>> 
>> This is the most recent bioperl-run, the live version, and all tests
>> passed. I'd downloaded the most recent binaries and put them in my
>> /usr/local/bin, already in my PATH. That's it.
>> 
>> That's the saddest looking new() I've ever seen in Bioperl, a mixture of
>> named and unnamed parameters like that, how bizarre. The "proper" way, of
>> course, is to use _rearrange, and not use AUTOLOAD.
>> 
>> Thanks again,
>> 
>> Brian O.
>> 
>> 
>> On 2/10/06 11:02 AM, "Victor"  wrote:
>> 
>> Brian,
>> I'd be happy to do that. Can you send me a quick snap on how you got it to
>> work first. I'd like to see what is working first, before I start fixing
>> things.
>> 
>> And yes I'll take a look at the Blat.t to see more on it.
>> 
>> Victor
>> 
>> 
>> On 2/9/06, *Brian Osborne*  wrote:
>> 
>> Victor,
>> 
>> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is
>> working for me even though I haven't set BLATDIR. This is using the latest
>> blat, v. 33.
>> 
>> There is a problem here though, you can see it if you read Blat.t. The
>> constructor does not look like your usual new():
>> 
>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>> 
>> -verbose => $verbose,
>>                             "DB"     => $db);
>> 
>> Unfortunate - would you be willing to do more than add a useful SYNOPSIS
>> and
>> actually fix new()? There is a subtext here, we're trying to find people
>> who
>> would be willing to maintain useful modules like these, the ideal person
>> in
>> this case would be someone who'd regularly use the module.
>> 
>> Brian O.
>> 
>> 
>> On 2/9/06 6:22 PM, "Victor"  wrote:
>> 
>>> Hi,
>>> Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to
>> date
>>> in the lastest bioperl release?
>>> 
>>> 
>>> 
>>> use Bio::Tools::Run::Alignment::Blat;
>>> my $factory = Bio::Tools::Run::Alignment::Blat->new();
>>> my $seq =
>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";
>>> 
>>> my @feats = $factory->run( $seq);
>>> 
>>> Here is what I get when tring to use it:
>>> 
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
>>> /tmp/fB09bp5F76) crashed: -1
>>> 
>>> Notice that it is using "blat' twice in the path. The way that I fixed
>> this
>>> is by going to the blat.pm    module and
>> changing the following lines:
>>> #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
>>> my $str= Bio::Root::IO->catfile($self->program_name);
>>> 
>>> Any ideas, maybe I'm missing the $ENV variable somewhere?
>>> I'd like to avoid making this change. Also does anyone have a known
>> synopsis
>>> of this blat module (where to set the parameters, and whether it allows
>> you
>>> to have a config file).
>>> I'll be happy to add a better synopsis to the module if needed.
>>> 
>>> Thanks in advance,
>>> Victor
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > org/mailman/listinfo/bioperl-l>
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12 
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From heikki at sanbi.ac.za  Sat Feb 11 01:54:51 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sat, 11 Feb 2006 08:54:51 +0200
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: 
References: <000001c62e60$9acecca0$c2987ca5@pc13>
	
Message-ID: <200602110854.52116.heikki@sanbi.ac.za>


I second Hilmar's suggestion to use Bio::Annotation::Collection for database 
(ontology database in this case) metadata. While you are at it, why do not 
define or use an existing (?) public ontology to do that. ;-)

	-Heikki

On Friday 10 February 2006 23:39, Hilmar Lapp wrote:
> Sohel,
>
> please allow me to copy the list in my response. There's many good and
> insightful people on the list who may have something to add or
> different ideas.
>
> I've come across that problem myself, for instance with InterPro. What
> I've done so far simply is to stick it unstructured into the definition
> slot, which is not helpful if your purpose goes further than just
> displaying it in an unstructured fashion.
>
> I'm not sure you would want to create another class for this (like
> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> implementation, probably not the interface) annotatable (i.e.,
> implement Bio::Annotatable), which supposedly would be simple to do
> (AnnotationCollection is already implemented, you'd just return an
> instance of it).
>
> Even though tag/value pairs sound like quick&fast way to go I'm leaning
> against it; in essence we're moving away from that elsewhere
> (SeqFeatureI) and hence I don't think we should restart it here.
>
> I'm not giving a definitive answer here, just my (initial) thoughts.
> Hope that helps nonetheless. Can you fancy yourself trying the
> Annotatable approach and let us know how it goes?
>
> 	-hilmar
>
> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> > Hi Hilmar,
> > ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> > Northwestern University. I am working on a parser for an ontology
> > file. I really like the ontology object model which you have
> > contributed to Bioperl. I think its just Awesome!! One of things which
> > I thought would be great to capture is the ontology headers. Right now
> > one can specify only the name, authority information. I was wondering
> > if there is any way, I could also capture other ontology file headers
> > like version of the file, date when that ontology file was made. I was
> > thinking of making a header class or alternatively it could go as Hash
> > of values in the Bio::Ontology::Ontology class itself. I wanted to
> > know whets your thoughts about on this.
> > ?
> > Thanks,
> > Sohel Merchant
> > dictyBase

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________



From hlapp at gmx.net  Sun Feb 12 00:10:35 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 11 Feb 2006 21:10:35 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c62e9a$4f82eee0$c2987ca5@pc13>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
Message-ID: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net>

Sohel, please do keep the discussion on the list, in your own interest 
as there's a multitude of people who can respond to you.

SimpleValue would probably be what I'd use too. As Heikki hinted you 
might even create an ontology for annotating ontologies, which would 
allow you to use Annotation::OntologyTerm for annotation, but then 
there's no qualifier value ...

Bioperl 1.5.1 has been released last year, please check the website.

	-hilmar

On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:

> Hi Hilmar,
>   I really like your suggestion of implementing the Bio::AnnotatableI
> interface in the Bio::Ontology::Ontology class. I am going to implement
> this and play around a little with it. I am planning to use
> Bio::Annotation::SimpleValue for annotating the header as it provides a
> good way of specifying the Tag/value pair. What are your thoughts on
> using this?
>
>   Also, I was wondering if you have any idea about the scheduled date
> for the Bioperl 1.51 release. I would like to contribute some stuff in
> the next release.
>
> Thanks,
> Sohel.
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Friday, February 10, 2006 3:40 PM
> To: Sohel Merchant
> Cc: Bioperl
> Subject: Re: Bio::Ontology::Ontology
>
> Sohel,
>
> please allow me to copy the list in my response. There's many good and
> insightful people on the list who may have something to add or
> different ideas.
>
> I've come across that problem myself, for instance with InterPro. What
> I've done so far simply is to stick it unstructured into the definition
> slot, which is not helpful if your purpose goes further than just
> displaying it in an unstructured fashion.
>
> I'm not sure you would want to create another class for this (like
> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> implementation, probably not the interface) annotatable (i.e.,
> implement Bio::Annotatable), which supposedly would be simple to do
> (AnnotationCollection is already implemented, you'd just return an
> instance of it).
>
> Even though tag/value pairs sound like quick&fast way to go I'm leaning
> against it; in essence we're moving away from that elsewhere
> (SeqFeatureI) and hence I don't think we should restart it here.
>
> I'm not giving a definitive answer here, just my (initial) thoughts.
> Hope that helps nonetheless. Can you fancy yourself trying the
> Annotatable approach and let us know how it goes?
>
> 	-hilmar
>
>
> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
>
>> Hi Hilmar,
>> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
>> Northwestern University. I am working on a parser for an ontology
>> file. I really like the ontology object model which you have
>> contributed to Bioperl. I think its just Awesome!! One of things which
>
>> I thought would be great to capture is the ontology headers. Right now
>
>> one can specify only the name, authority information. I was wondering
>> if there is any way, I could also capture other ontology file headers
>> like version of the file, date when that ontology file was made. I was
>
>> thinking of making a header class or alternatively it could go as Hash
>
>> of values in the Bio::Ontology::Ontology class itself. I wanted to
>> know whets your thoughts about on this.
>> ?
>> Thanks,
>> Sohel Merchant
>> dictyBase
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





From hjm at tacgi.com  Sun Feb 12 01:46:38 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Sat, 11 Feb 2006 22:46:38 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
Message-ID: <200602112246.38926.hjm@tacgi.com>

Hi All,

After perusing the tutorial and other docs for a an evening, I still can't 
find the answer to this.  Forgive me if I've missed something obvious.

This should not be a novel request, but I've not found it answered.  If 
bioperl isn't the best way to do this, I'd be grateful to a pointer to a 
better way, especially if it includes an illuminating bit of code.

The problem is to retrieve genomic sequences plus & minus some offset from a 
locus determined by HUGO keyword or GeneID.  This would be a common followup 
chore for some extra analysis from a gene expression expt.  Or maybe this is 
in the DBFetch routines, but I've missed the sequence type to specify...?


TIA!

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From osborne1 at optonline.net  Sun Feb 12 11:37:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 12 Feb 2006 11:37:39 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602112246.38926.hjm@tacgi.com>
Message-ID: 

Harry,

Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
from its documentation:

  use Bio::DB::Fasta;

  # create database from directory of fasta files
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  # simple access (for those without Bioperl)
  my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
  my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
  my @ids     = $db->ids;
  my $length   = $db->length('CHROMOSOME_I');
  my $alphabet = $db->alphabet('CHROMOSOME_I');
  my $header   = $db->header('CHROMOSOME_I');

  # Bioperl-style access
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
  my $seq     = $obj->seq;
  my $subseq  = $obj->subseq(4_000_000 => 4_100_000);

Do you already have the offsets?

Brian O.


On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:

> Hi All,
> 
> After perusing the tutorial and other docs for a an evening, I still can't
> find the answer to this.  Forgive me if I've missed something obvious.
> 
> This should not be a novel request, but I've not found it answered.  If
> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> better way, especially if it includes an illuminating bit of code.
> 
> The problem is to retrieve genomic sequences plus & minus some offset from a
> locus determined by HUGO keyword or GeneID.  This would be a common followup
> chore for some extra analysis from a gene expression expt.  Or maybe this is
> in the DBFetch routines, but I've missed the sequence type to specify...?
> 
> 
> TIA!




From pmiguel at purdue.edu  Sun Feb 12 15:05:47 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sun, 12 Feb 2006 15:05:47 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	Blast	output
In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
Message-ID: <43EF951B.4030601@purdue.edu>

Roger,
Just a data point, but in case you were not already aware of it, the 
characters W, K and R may be included in some DNA sequences. 'W' means 
'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember 
correctly. These are ambiguous bases, where a basecaller isn't sure, for 
example, whether a particular peak is an A or a T. Although I see these 
ambiguous bases less frequently these days, even common modern 
basecallers (such as Applied Biosystems basecallers) can generally be 
configured so they will generate them. Downstream applications may not 
like them, however.
    I may be just stating the obvious, or this might be irrelevant to 
the issue at hand. If so, my apologies.

Phillip
Roger Hall wrote:
> Guys - I'm looking at the error message:
>
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
> This is my line of thought:
> 1. "no data for midline $_" is a unique message generated by blast.pm in one
> location only at the point of a. reading three lines b. dropping lines with
> spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
> 2. There is a regexp match that fails in order to reach that error message
> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
> 4. It does anyway
> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
> reports
>
> I suspect a newline/chomp/metacharacter issue. Not finding the string
> anywhere has me thoroughly confused - I asked Hubert for the additional
> file, assuming that I didn't have it.
>
> My next thought is to write a quick script to test perl behavior on "Fedora
> Core 9".
>
> Thoughts?
>
> Did I misread the issue entirely? :}
>
> Roger
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 09, 2006 10:16 AM
> To: 'Jason Stajich'; 'Hubert Prielinger'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> output
>
>
>   
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Thursday, February 09, 2006 9:13 AM
>> To: Hubert Prielinger
>> Cc: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>> parsing Blast output
>>
>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>     
>>> hi chris,
>>> thanks, I have upgraded to version 1.5.1 but it isn't still 
>>>       
>> working, 
>>     
>>> do you have any ohter idea, the problem I have is that I 
>>>       
>> have to parse 
>>     
>>> a lot of textfiles....
>>> or shall I look for another option to parse those files...
>>>
>>> regards
>>> Hubert
>>>       
>> The code from Bioperl 1.5.1 works fine for me for blast 
>> 2.2.13 reports but unless you post your blast report we can't 
>> really determine the problem.
>>
>> If you are still getting the same error like this I am not 
>> convinced you have upgraded to 1.5.1 which includes a fix in 
>> the fact that NCBI changed the HSP result format to remove 
>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
>> as it was apparent sometime in September.
>>
>>     
>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>           
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> If you are just getting no results but also no warnings wrt 
>> parsing, are you sure your logic is correct?
>>
>> If you remove your filters do you see all the HSPS?
>>
>>
>> while (my $result = $search->next_result) {
>>      print $result->query_name, "\n";
>>      #iterate over each hit on the query sequence
>>      while (my $hit = $result->next_hit) {
>> 	print $hit->name, "\n";
>>          #iterate over each HSP in the hit
>>          while (my $hsp = $hit->next_hsp) {
>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>>  >hit_string, "\n";	
>>         }
>>     }
>> }
>>     
>
> I tested some of the BLAST results that Hubert sent Roger and me with a
> similar script to the above.  I removed the file parsing logic and it seemed
> to work just fine.  It may very well be a logic issue or that he hasn't
> installed the latest fix.
>     
> It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
> though the returned output was from nr, the top of the blast output showed
> that it was v2.2.12:  
>
> BLASTP 2.2.12 [Aug-07-2005]
>
> I double-checked my local version and it's definitely v.2.2.13:
> -------------------------------------
> C:\Perl\Scripts>blastcl3 -
>
> blastcl3 2.2.13   arguments:...
> -------------------------------------
>
> If you use RemoteBlast using the same settings, the version in the header
> looks like this:
>
> BLASTP 2.2.13 [Nov-27-2005]
>
> I'm wondering if all the blast executables (blast and netblast) from NCBI
> have text output like v.2.2.12, while the wwwblast outputs a new format
> (2.2.13).  I'll ask blast-help at NCBI about this.
>
>   
>> To clarify some stuff -
>> Chris I don't necessarily think the XML is best way forward 
>> for BLAST reports generated locally, it isn't as detailed as 
>> the Text format and it is what most people expect to be able 
>> to scroll through and parse -- it is also harder for the 
>> format to change dramatically if you have a static binary on 
>> your machine =).  I think for remoteblast the XML format 
>> should be the way forward but I expect Bioperl to maintain 
>> support of any plain text BLAST report format that people use 
>> on a regular basis.
>>
>>     
>
> Does XML lack some specific info that text output has?  Didn't know that.  I
> believe that XML should be default in RemoteBlast since it will not break,
> but I agree with you about text output.  I also agree that it will need
> somebody to maintain it constantly, much like RemoteBlast.
>
>   
>> -jason
>>     
>>> Chris Fields wrote:
>>>
>>>       
>>>> My guess is you're running into text parsing problems in 
>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>> (1.5.1) or
>>>> bioperl-live (CVS), then see the bug below.
>>>>
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>> I think the first problem you ran into is solved in bioperl 1.5.1, 
>>>> the last problem (more recent, not related to the first) has been 
>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>> SearchIO::blast is available in the link above, but 
>>>>         
>> realize it hasn't 
>>     
>>>> been committed yet and may change.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>> Prielinger
>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>> To: bioperl-l at bioperl.org
>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>>           
>> parsing Blast 
>>     
>>>>> output
>>>>>
>>>>> Hi,
>>>>> If I want to parse a Blast Output (Version 2.2.12) with 
>>>>> Bio::SearchIO, I get the following error message:
>>>>>
>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>           
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>     
>>>>> is that a bug......
>>>>>
>>>>> If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>> anything.....
>>>>> I'm using bioperl 1.4
>>>>>
>>>>> before, I have installed bioperl 1.4, it worked fine 
>>>>>           
>> parsing Blast 
>>     
>>>>> Output (version 2.2.12), but I don't remember which 
>>>>>           
>> bioperl version 
>>     
>>>>> I had installed
>>>>>
>>>>> thanks in advance
>>>>>
>>>>> Hubert
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>           
>>>>
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>     
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign  
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   



From cjfields at uiuc.edu  Sun Feb 12 17:30:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 12 Feb 2006 16:30:07 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	Blast	output
In-Reply-To: <43EF951B.4030601@purdue.edu>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
	<43EF951B.4030601@purdue.edu>
Message-ID: <855DEC6F-8057-47BA-9D1D-9BDC16D1D83B@uiuc.edu>

Sequences are converted to FASTA format in RemoteBlast using  
Bio::SeqIO, which I think includes IUPAC base and amino acid  
ambiguities like you mention, so my guess is any errors (like odd non- 
IUPAC letters in nucleotide or aa queries) are likely caught there.   
As long as it passes Bio::SeqIO it shouldn't be a problem.  Haven't  
tried this myself, though, so I can't say that with absolute certainty.

Chris



On Feb 12, 2006, at 2:05 PM, Phillip SanMiguel wrote:

> Roger,
> Just a data point, but in case you were not already aware of it, the
> characters W, K and R may be included in some DNA sequences. 'W' means
> 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember
> correctly. These are ambiguous bases, where a basecaller isn't  
> sure, for
> example, whether a particular peak is an A or a T. Although I see  
> these
> ambiguous bases less frequently these days, even common modern
> basecallers (such as Applied Biosystems basecallers) can generally be
> configured so they will generate them. Downstream applications may not
> like them, however.
>     I may be just stating the obvious, or this might be irrelevant to
> the issue at hand. If so, my apologies.
>
> Phillip
> Roger Hall wrote:
>> Guys - I'm looking at the error message:
>>
>> MSG: no data for midline Query  1   WWWKWRW  7
>> STACK Bio::SearchIO::blast::next_result
>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>> STACK toplevel
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> This is my line of thought:
>> 1. "no data for midline $_" is a unique message generated by  
>> blast.pm in one
>> location only at the point of a. reading three lines b. dropping  
>> lines with
>> spaces only c. identifying the Query, Midline, and Match lines (0  
>> <= $i < 3)
>> 2. There is a regexp match that fails in order to reach that error  
>> message
>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>> expression
>> 4. It does anyway
>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the  
>> blast
>> reports
>>
>> I suspect a newline/chomp/metacharacter issue. Not finding the string
>> anywhere has me thoroughly confused - I asked Hubert for the  
>> additional
>> file, assuming that I didn't have it.
>>
>> My next thought is to write a quick script to test perl behavior  
>> on "Fedora
>> Core 9".
>>
>> Thoughts?
>>
>> Did I misread the issue entirely? :}
>>
>> Roger
>>
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>> Fields
>> Sent: Thursday, February 09, 2006 10:16 AM
>> To: 'Jason Stajich'; 'Hubert Prielinger'
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing  
>> Blast
>> output
>>
>>
>>
>>> -----Original Message-----
>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>> Sent: Thursday, February 09, 2006 9:13 AM
>>> To: Hubert Prielinger
>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>> parsing Blast output
>>>
>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>
>>> working,
>>>
>>>> do you have any ohter idea, the problem I have is that I
>>>>
>>> have to parse
>>>
>>>> a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>> The code from Bioperl 1.5.1 works fine for me for blast
>>> 2.2.13 reports but unless you post your blast report we can't
>>> really determine the problem.
>>>
>>> If you are still getting the same error like this I am not
>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>> the fact that NCBI changed the HSP result format to remove
>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>> as it was apparent sometime in September.
>>>
>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> If you are just getting no results but also no warnings wrt
>>> parsing, are you sure your logic is correct?
>>>
>>> If you remove your filters do you see all the HSPS?
>>>
>>>
>>> while (my $result = $search->next_result) {
>>>      print $result->query_name, "\n";
>>>      #iterate over each hit on the query sequence
>>>      while (my $hit = $result->next_hit) {
>>> 	print $hit->name, "\n";
>>>          #iterate over each HSP in the hit
>>>          while (my $hsp = $hit->next_hsp) {
>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>> hit_string, "\n";	
>>>         }
>>>     }
>>> }
>>>
>>
>> I tested some of the BLAST results that Hubert sent Roger and me  
>> with a
>> similar script to the above.  I removed the file parsing logic and  
>> it seemed
>> to work just fine.  It may very well be a logic issue or that he  
>> hasn't
>> installed the latest fix.
>>
>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>> 2.2.13), even
>> though the returned output was from nr, the top of the blast  
>> output showed
>> that it was v2.2.12:
>>
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>> I double-checked my local version and it's definitely v.2.2.13:
>> -------------------------------------
>> C:\Perl\Scripts>blastcl3 -
>>
>> blastcl3 2.2.13   arguments:...
>> -------------------------------------
>>
>> If you use RemoteBlast using the same settings, the version in the  
>> header
>> looks like this:
>>
>> BLASTP 2.2.13 [Nov-27-2005]
>>
>> I'm wondering if all the blast executables (blast and netblast)  
>> from NCBI
>> have text output like v.2.2.12, while the wwwblast outputs a new  
>> format
>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>
>>
>>> To clarify some stuff -
>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST reports generated locally, it isn't as detailed as
>>> the Text format and it is what most people expect to be able
>>> to scroll through and parse -- it is also harder for the
>>> format to change dramatically if you have a static binary on
>>> your machine =).  I think for remoteblast the XML format
>>> should be the way forward but I expect Bioperl to maintain
>>> support of any plain text BLAST report format that people use
>>> on a regular basis.
>>>
>>>
>>
>> Does XML lack some specific info that text output has?  Didn't  
>> know that.  I
>> believe that XML should be default in RemoteBlast since it will  
>> not break,
>> but I agree with you about text output.  I also agree that it will  
>> need
>> somebody to maintain it constantly, much like RemoteBlast.
>>
>>
>>> -jason
>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>> (1.5.1) or
>>>>> bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl 1.5.1,
>>>>> the last problem (more recent, not related to the first) has been
>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>> SearchIO::blast is available in the link above, but
>>>>>
>>> realize it hasn't
>>>
>>>>> been committed yet and may change.
>>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>> parsing Blast
>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>
>>> parsing Blast
>>>
>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>
>>> bioperl version
>>>
>>>>>> I had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From torsten.seemann at infotech.monash.edu.au  Sun Feb 12 18:56:32 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 13 Feb 2006 10:56:32 +1100
Subject: [Bioperl-l] RemoteBlast
In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
References: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
Message-ID: <1139788592.29375.13.camel@chauvel.csse.monash.edu.au>

Roger,

> I think that most core Bioperl folks have long since moved away from
> RemoteBlast and are using the functionality in StandAloneBlast to run their
> own local servers. 

Agreed. Even smaller centres like my workplace need the throughput that
a local PC, SMP system or Cluster can provide.

> wave of the future, but I think there is still some concern that not every
> flavor of BLAST produces XML yet. Even so, the XML parser is considered to
> be very strong, and only helps hasten the end of text-formatted support,
> since parsing text-formatted reports is the primary source of pain. 

If BioPerl switches primarily to XML parsing, the tool authors will soon
add support for XML (not very difficult really) due to BioPerl's
pervasiveness?

> I do, however, see the advantage in shifting to XML-formatted reporting and
> parsing *only* as soon as every BLAST flavor supports it, if not before.
> (Anyone - is this still an issue. Please educate me.)

The four BLAST flavours I utilise all support XML output: 
1) NCBI BLAST 2) WU-BLAST 3) MPI-BLAST 4) FSA-BLAST.

> At the moment, I'm leaning towards adding an option to RemoteBlast. The
> default (no option) would use a "pure perl" implementation, and the
> enhancement (with explicit option) would merely wrap the NCBI executable.

If the API is done correctly both of these could co-exist with very
little redundant code. (I personally rarely use remote blast).

-- 
Torsten Seemann 
Victorian Bioinformatics Consortium



From torsten.seemann at infotech.monash.edu.au  Sun Feb 12 19:35:06 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 13 Feb 2006 11:35:06 +1100
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
	<1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
Message-ID: <1139790906.29375.27.camel@chauvel.csse.monash.edu.au>

> Mostly I think we need to try and support something that will  
> "ALWAYS" work so that individuals setting up webservices which rely  
> on remote blast functionality.  In theory, netblast/blastcl3 should  
> always work since NCBI has to update the exe when they change their  
> server setup.

What usually happens when an older 'blastcl3' binary is used on a newer
server setup? I guess it fails in a deterministic manner so the BioPerl
user can throw a useful exception.

> I also see value in providing a wrapper for netblast since it should  
> look an awful lot like running blast locally.

Agreed - they are virtually indistinguishable.

> Ideally I'd like to see a more extensible system, something like (and  
> please feel free to come up with better names for the modules!):

Do BioPerl coding standards require "::Blast" over "::BLAST" ?
(not important anyway)

> Bio::Tools::Run::Blast
>   -->             StandAlone (support for [..as many flavours as poss])
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that  
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Looks reasonable. I assume there's some interfaces in there like
Bio::Tools::Blast::BlastI etc.

Could probably call "RemoteNetBlast" just "RemoteNet" because it is
already in the Blast:: namespace. (not important though)

My only suggestion for StandAlone (and RemoteNetBlast) is that they both
do a generic "run a local binary with env. vars and parameters and
capture the stdout, stderr and return code". This needs to be abstracted
away (or re-use existing code from bioperl-run?). Jason mentioned
Ensembl::Runnable as a source of code we could incorporate into Bioperl.

-- 
Torsten Seemann 
Victorian Bioinformatics Consortium



From cjfields at uiuc.edu  Mon Feb 13 11:45:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 10:45:14 -0600
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <20060213152603.ed3f3118@dogwood.plantbio.uga.edu>
Message-ID: <001801c630bc$dd35bff0$15327e82@pyrimidine>

If you're using RemoteBlast 1.28, then you've likely updated from CVS which
isn't the latest fix.  

 

Make sure that you check the following: 

 

1) Always post to the mailing list:
http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .  

 

2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed
first.  Perform a clean installation; do not upgrade only
Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee
that mixing modules from old and new distributions (1.4 and 1.5.1, for
instance) will work.  A bioperl-1.5.1 or bioperl-live installation will
allow text output from BLAST v.2.2.12 to be saved and parsed; it will not
parse the newest BLAST text output from NCBI (v2.2.13) but it should still
save it. I believe as long as next_results() isn't called, it will work.

 

3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
are NOT in CVS; they haven't been cleared and checked in by Roger Hall
(who's now taking care of RemoteBlast) and the powers that be (Jason or
whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:

 

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

http://bugzilla.bioperl.org/show_bug.cgi?id=1935

 

The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving
XML output, so isn't necessary if you don't plan on using this option.  And,
remember, they haven't been committed yet to CVS, which means that the final
version will change to refle the new version.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

  _____  

From: Guojun Yang [mailto:gyang at plantbio.uga.edu] 
Sent: Monday, February 13, 2006 9:26 AM
To: Chris Fields
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

 

Hi, Chris

Thanks for your suggestion, however, it doesn't seem to work for my cgi even
after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID.
Is there any suggestion?

 

Guojun



Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun

  _____  

From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
Sent: Fri, 03 Feb 2006 16:07:29 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below). I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed. The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml. There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless. If you look back the
last month or so there has been a bit of discussion here about it. Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission). In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem? Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> _____
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




 

 



From gyang at plantbio.uga.edu  Mon Feb 13 13:32:14 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 13 Feb 2006 13:32:14 -0500
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
In-Reply-To: <001801c630bc$dd35bff0$15327e82@pyrimidine>
Message-ID: <20060213183214.342b90da@dogwood.plantbio.uga.edu>

Hi, Chris,  
I do have different versions of bioperl on my Linux machine (1.4. and 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I need to uninstall and remove the previous versions. I could not find any hint on uninstalling bioperl on linux. Could you please give me some suggestion?  
Thanks,  
Guojun

Department of Plant Biology
University of Georgia
      _____  

  From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Sent: Mon, 13 Feb 2006 11:45:14 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

  
  
If you?re using RemoteBlast 1.28, then you?ve likely updated from CVS which isn?t the latest fix.    
   
Make sure that you check the following:   
   
1) Always post to the mailing list: http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .    
   
2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed first.  Perform a clean installation; do not upgrade only Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee that mixing modules from old and new distributions (1.4 and 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live installation will allow text output from BLAST v.2.2.12 to be saved and parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) but it should still save it. I believe as long as next_results() isn?t called, it will work.  
   
3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output are NOT in CVS; they haven?t been cleared and checked in by Roger Hall (who?s now taking care of RemoteBlast) and the powers that be (Jason or whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:  
   
http://bugzilla.bioperl.org/show_bug.cgi?id=1934  
http://bugzilla.bioperl.org/show_bug.cgi?id=1935  
   
The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving XML output, so isn?t necessary if you don?t plan on using this option.  And, remember, they haven?t been committed yet to CVS, which means that the final version will change to refle the new version.  
  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign   
  
  
    _____  

    
From: Guojun Yang [mailto:gyang at plantbio.uga.edu] 
Sent: Monday, February 13, 2006 9:26 AM
To: Chris Fields
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28  
   
  
Hi, Chris  
  
Thanks for your suggestion, however, it doesn't seem to work for my cgi even after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID. Is there any suggestion?  
  
   
  
Guojun  


Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun  
    _____  

    
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
Sent: Fri, 03 Feb 2006 16:07:29 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below). I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed. The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml. There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless. If you look back the
last month or so there has been a bit of discussion here about it. Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission). In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem? Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> _____
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


  
  
   
  
       
   
 


From cjfields at uiuc.edu  Mon Feb 13 15:39:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 14:39:38 -0600
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
In-Reply-To: <20060213183214.342b90da@dogwood.plantbio.uga.edu>
Message-ID: <000901c630dd$9be54f40$15327e82@pyrimidine>

How do you know two versions are installed (i.e. how are you checking the
version)?  Do you see have two complete bioperl distributions (in two
separate directories) or are you looking in modules?  Here's the way to
check the version (from the FAQ):

perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'

If you have two full bioperl distributions on your computer, normally only
one will be in use unless you have explicitly set the environment variable
PERL5LIB.  The PERL5LIB  directories will be searched first before your
normal perl directory list (@INC) is searched.  You MAY get some mixing
then, but only if perl can't find a particular module in the path designated
in PERL5LIB; then it will progress through the directories listed in @INC.
This may happen if a module is unique to a particular release, but shouldn't
happen for the majority of modules, including RemoteBlast.  You can check
what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will differ
depending on your OS, perl build, etc.

Regardless, if you follow the directions for installing bioperl for your
system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you
explicitly change the installation directory when using 'perl Makefile.PL'),
then 'uninstalling' Bioperl shouldn't be a problem as it will install the
Bioperl distribution you downloaded over the old version in @INC.  See this
page:

http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL

for more details.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Monday, February 13, 2006 12:32 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> 
> Hi, Chris,
> I do have different versions of bioperl on my Linux machine (1.4. and
> 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I
> need to uninstall and remove the previous versions. I could not find any
> hint on uninstalling bioperl on linux. Could you please give me some
> suggestion?
> Thanks,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
>       _____
> 
>   From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Sent: Mon, 13 Feb 2006 11:45:14 -0500
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> 
> 
> If you're using RemoteBlast 1.28, then you've likely updated from CVS
> which isn't the latest fix.
> 
> Make sure that you check the following:
> 
> 1) Always post to the mailing list:
> http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> 
> 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS)
> installed first.  Perform a clean installation; do not upgrade only
> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't
> guarantee that mixing modules from old and new distributions (1.4 and
> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> installation will allow text output from BLAST v.2.2.12 to be saved and
> parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13)
> but it should still save it. I believe as long as next_results() isn't
> called, it will work.
> 
> 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
> are NOT in CVS; they haven't been cleared and checked in by Roger Hall
> (who's now taking care of RemoteBlast) and the powers that be (Jason or
> whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of
> saving XML output, so isn't necessary if you don't plan on using this
> option.  And, remember, they haven't been committed yet to CVS, which
> means that the final version will change to refle the new version.
> 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
>     _____
> 
> 
> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> Sent: Monday, February 13, 2006 9:26 AM
> To: Chris Fields
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> 
> Hi, Chris
> 
> Thanks for your suggestion, however, it doesn't seem to work for my cgi
> even after I replace both blast.pm and RemoteBlast.pm. I didn't even get
> any RID. Is there any suggestion?
> 
> 
> 
> Guojun
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
>     _____
> 
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> Sent: Fri, 03 Feb 2006 16:07:29 -0500
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> I would say give the new code a try, but realize that it hasn't been
> checked
> in (like I said below). I will try going over the modified
> Bio::SearchIO::blast again this weekend to see if there is anything I
> might
> have missed. The changed order in the header of BLAST text output has me a
> bit worried that it might not catch everything, but it at least doesn't
> hang
> in the while() loop I described in the bug report below (bug #1934) and
> seems to process everything fine.
> 
> If you want more stability in the code, you might consider changing over
> to
> XML output and parsing with Bio::SearchIO::blastxml. There are some
> changes
> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
> output, but I believe it parses everything regardless. If you look back
> the
> last month or so there has been a bit of discussion here about it. Jason
> describes a bit on how to set up RemoteBlast for XML:
> 
> http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > Sent: Friday, February 03, 2006 1:45 PM
> > To: bioperl-l at bioperl.org
> > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> >
> > Hi, Everybody,
> > I see this post and am wondering if this is the reason for the
> > malfunctionning of my webserver. We set up a webserver named MAK, for
> MITE
> > sequence analysis. It was working very well until around November 2005,
> > when it stopped returning any result (the site is fine and seems to be
> > doing sth after submission). In the CGI script, I used remoteblast (that
> > work was done in 2003) to do searches. I currently do not have access to
> > the server because I moved. Quite several people sent emails to us about
> > its malfunctioning. Is there any suggestion on fixing the problem?
> Should
> > I simplily ask the remoteblast.pm be replaced with the new version?
> > Thanks a lot,
> > Guojun
> >
> > Department of Plant Biology
> > University of Georgia
> > Tel: 706-542-1857
> > Fax: 706-542-1805
> > http://www.arches.uga.edu/~guojun
> > _____
> >
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> > l at bioperl.org]
> > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> > will
> > work for saving text output. However, it will not parse anything using
> > next_result (it will likely hang) and will not save XML format. See
> these
> > bugs:
> >
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >
> > for explanations and possible fixes (changes to RemoteBlast and
> > Bio::SearchIO::blast). Note that these haven't been checked in yet so
> are
> > still not included in bioperl-live; they may be further modified before
> > committing to CVS. If you're not worried about XML, you could just try
> the
> > first fix, which is a change to SearchIO::blast.
> >
> > Nagesh, I remember you posting to the list a month ago using a script
> > which
> > had problems; the script you used saves the output but doesn't actually
> > parse it (i.e. you don't use next_result() to go through the data). Is
> the
> > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> > parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> > blast"
> > using your version of RemoteBlast and method next_result()? Like below
> > (from
> > perldoc):
> >
> > while ( my @rids = $factory->each_rid ) {
> > foreach my $rid ( @rids ) {
> > my $rc = $factory->retrieve_blast($rid);
> > if( !ref($rc) ) {
> > if( $rc < 0 ) {
> > $factory->remove_rid($rid);
> > }
> > print STDERR "." if ( $v > 0 );
> > sleep 5;
> > } else { # parsing
> > starts here
> > my $result = $rc->next_result(); # it should hang
> > here
> > #save the output
> > my $filename = $result->query_name()."\.out";
> > $factory->save_output($filename);
> > $factory->remove_rid($rid);
> > print "\nQuery Name: ", $result->query_name(), "\n";
> > while ( my $hit = $result->next_hit ) {
> > next unless ( $v > 0);
> > print "\thit name is ", $hit->name, "\n";
> > while( my $hsp = $hit->next_hsp ) {
> > print "\t\tscore is ", $hsp->score, "\n";
> > }
> > }
> > }
> > }
> > }
> > }
> >
> >
> > My script hanged if I used next_result() in any way prior to the fixes.
> I
> > want to see how many others are having the same issues with parsing
> using
> > the CVS version of bioperl-live.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > Sent: Thursday, February 02, 2006 7:24 PM
> > > To: Huang Jian; bioperl-l
> > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >
> > > Hi Huang,
> > > Thanks for the message. The older version of RemoteBlast.pm works on
> the
> > > logic of checking the temporary file size to determine whether the
> Blast
> > > results are ready. This condition is not getting satisfied may be due
> to
> > > some changes brought about by NCBI. I had this problem recently and
> > > figured out that the solution was to use the latest version which has
> > > this problem fixed (does not use file size logic any more) which is
> not
> > > yet included in the BioPerl package.
> > > Cheers
> > > Nagesh
> > >
> > > Huang Jian wrote:
> > >
> > > > Dear Nagesh,
> > > >
> > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > > me. Now it works perfectly!!!
> > > >
> > > > Thank you!!
> > > >
> > > > Huang
> > > >
> > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > 
> > > > To: "Huang Jian" ; "bioperl-l"
> > > > 
> > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > > via email
> > > >
> > > >
> > > >> Hi Huang,
> > > >> I see that you are submitting a sequence for a remote blast search.
> > Can
> > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09).
> If
> > > >> not I have attached it with this email, try to replace it with the
> > old
> > > >> one which has a bug.
> > > >> Let me know if it works.
> > > >> Nagesh
> > > >
> > > >
> > > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Mon Feb 13 16:00:11 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 13 Feb 2006 16:00:11 -0500
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
Message-ID: <20060213160011.1e89108c@dogwood.plantbio.uga.edu>

Thanks, Chris,
I installed version 1.5.1 and replaced the blast.pm file with the one from your bug report. The running version is 1.5 when I use the command you sent me. But when I tried the script, it doesn't change much. My remoteblast code (portion) is here:

sub search {
local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
local $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= 'no';
local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
my $query = Bio::Seq -> new ( -seq=>"$_[0]",
			      -id=>"query",
			      -desc=>"new seq");
my $len=$query->length();
@db=('nr','htgs','wgs');
foreach my $db (@db) {
my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
						'-data' =>"$db",
					        '-expect'=>"$E_value");


my $blast_report = $factory->submit_blast($query);

my @rids = $factory->each_rid();
foreach my $rid ( @rids ) {
    print STDERR "$rid\n";
}
# RID = Remote Blast ID (e.g: 1017772174-16400-6638)
print STDERR "waiting...";
sleep 60;

foreach my $rid ( @rids ) {
    my $rc = $factory->retrieve_blast($rid);
    while (!ref($rc) ) {
	if( $rc < 0 ) {
# retrieve_blast returns -1 on error
	    $factory->remove_rid($rid);
	    print "Error!\n";
	    send_error($email,$function,$seqname,$queryname[$ST]);
	    die "Can't retrieve $rid";
	} if ($rc==0) { # retrieve_blast returns 0 on 'job not finished'
	    sleep 60;
	    $rc = $factory->retrieve_blast($rid);
	}	
    }
    if (ref($rc)) {
	print STDERR "Done.\n";
	 while( my $result = $rc->next_result) {
	    while( my $hit = $result->next_hit()) {
	    	$hit_name=$hit->name;
		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
		$name=$1;
		@left_plus_start=();
		@left_plus_end=();
		@left_minus_start=();
		@left_minus_end=();
		@right_plus_start=();
		@right_plus_end=();
		@right_minus_start=();
		@right_minus_end=();

		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
		while( my $hsp = $hit->next_hsp()) { 
......

It was working quite well before around October laster year, but it has stopped since then, When a submission is sent via a webpage, the cgi starts to work and use a memory of ~20 Mb. Then it hangs there, finally the expected email is received but without real results although it does contain something from other parts of the script. Apparently the search sub did not return anything (I know there is something should be returned.). Is it also possible the format of the NCBI output for each result has changed?
Thank you,
Guojun


Department of Plant Biology
University of Georgia



----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28


> How do you know two versions are installed (i.e. how are you checking the
> version)?  Do you see have two complete bioperl distributions (in two
> separate directories) or are you looking in modules?  Here's the way to
> check the version (from the FAQ):
> > perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> > If you have two full bioperl distributions on your computer, normally only
> one will be in use unless you have explicitly set the environment variable
> PERL5LIB.  The PERL5LIB  directories will be searched first before your
> normal perl directory list (@INC) is searched.  You MAY get some mixing
> then, but only if perl can't find a particular module in the path designated
> in PERL5LIB; then it will progress through the directories listed in @INC.
> This may happen if a module is unique to a particular release, but shouldn't
> happen for the majority of modules, including RemoteBlast.  You can check
> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will differ
> depending on your OS, perl build, etc.
> > Regardless, if you follow the directions for installing bioperl for your
> system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you
> explicitly change the installation directory when using 'perl Makefile.PL'),
> then 'uninstalling' Bioperl shouldn't be a problem as it will install the
> Bioperl distribution you downloaded over the old version in @INC.  See this
> page:
> > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > for more details.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > Sent: Monday, February 13, 2006 12:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > Hi, Chris,
> > I do have different versions of bioperl on my Linux machine (1.4. and
> > 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I
> > need to uninstall and remove the previous versions. I could not find any
> > hint on uninstalling bioperl on linux. Could you please give me some
> > suggestion?
> > Thanks,
> > Guojun
> > > > Department of Plant Biology
> > University of Georgia
> >       _____
> > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > > > > > If you're using RemoteBlast 1.28, then you've likely updated from CVS
> > which isn't the latest fix.
> > > > Make sure that you check the following:
> > > > 1) Always post to the mailing list:
> > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS)
> > installed first.  Perform a clean installation; do not upgrade only
> > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't
> > guarantee that mixing modules from old and new distributions (1.4 and
> > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > installation will allow text output from BLAST v.2.2.12 to be saved and
> > parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13)
> > but it should still save it. I believe as long as next_results() isn't
> > called, it will work.
> > > > 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
> > are NOT in CVS; they haven't been cleared and checked in by Roger Hall
> > (who's now taking care of RemoteBlast) and the powers that be (Jason or
> > whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:
> > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of
> > saving XML output, so isn't necessary if you don't plan on using this
> > option.  And, remember, they haven't been committed yet to CVS, which
> > means that the final version will change to refle the new version.
> > > > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > >     _____
> > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > Sent: Monday, February 13, 2006 9:26 AM
> > To: Chris Fields
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > > > Hi, Chris
> > > > Thanks for your suggestion, however, it doesn't seem to work for my cgi
> > even after I replace both blast.pm and RemoteBlast.pm. I didn't even get
> > any RID. Is there any suggestion?
> > > > > > > > Guojun
> > > > > > Guojun Yang
> > Department of Plant Biology
> > University of Georgia
> > Tel: 706-542-1857
> > Fax: 706-542-1805
> > http://www.arches.uga.edu/~guojun
> >     _____
> > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > I would say give the new code a try, but realize that it hasn't been
> > checked
> > in (like I said below). I will try going over the modified
> > Bio::SearchIO::blast again this weekend to see if there is anything I
> > might
> > have missed. The changed order in the header of BLAST text output has me a
> > bit worried that it might not catch everything, but it at least doesn't
> > hang
> > in the while() loop I described in the bug report below (bug #1934) and
> > seems to process everything fine.
> > > > If you want more stability in the code, you might consider changing over
> > to
> > XML output and parsing with Bio::SearchIO::blastxml. There are some
> > changes
> > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
> > output, but I believe it parses everything regardless. If you look back
> > the
> > last month or so there has been a bit of discussion here about it. Jason
> > describes a bit on how to set up RemoteBlast for XML:
> > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/
> > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > Sent: Friday, February 03, 2006 1:45 PM
> > > To: bioperl-l at bioperl.org
> > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> > >
> > > Hi, Everybody,
> > > I see this post and am wondering if this is the reason for the
> > > malfunctionning of my webserver. We set up a webserver named MAK, for
> > MITE
> > > sequence analysis. It was working very well until around November 2005,
> > > when it stopped returning any result (the site is fine and seems to be
> > > doing sth after submission). In the CGI script, I used remoteblast (that
> > > work was done in 2003) to do searches. I currently do not have access to
> > > the server because I moved. Quite several people sent emails to us about
> > > its malfunctioning. Is there any suggestion on fixing the problem?
> > Should
> > > I simplily ask the remoteblast.pm be replaced with the new version?
> > > Thanks a lot,
> > > Guojun
> > >
> > > Department of Plant Biology
> > > University of Georgia
> > > Tel: 706-542-1857
> > > Fax: 706-542-1805
> > > http://www.arches.uga.edu/~guojun
> > > _____
> > >
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> > > l at bioperl.org]
> > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >
> > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> > > will
> > > work for saving text output. However, it will not parse anything using
> > > next_result (it will likely hang) and will not save XML format. See
> > these
> > > bugs:
> > >
> > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >
> > > for explanations and possible fixes (changes to RemoteBlast and
> > > Bio::SearchIO::blast). Note that these haven't been checked in yet so
> > are
> > > still not included in bioperl-live; they may be further modified before
> > > committing to CVS. If you're not worried about XML, you could just try
> > the
> > > first fix, which is a change to SearchIO::blast.
> > >
> > > Nagesh, I remember you posting to the list a month ago using a script
> > > which
> > > had problems; the script you used saves the output but doesn't actually
> > > parse it (i.e. you don't use next_result() to go through the data). Is
> > the
> > > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> > > parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> > > blast"
> > > using your version of RemoteBlast and method next_result()? Like below
> > > (from
> > > perldoc):
> > >
> > > while ( my @rids = $factory->each_rid ) {
> > > foreach my $rid ( @rids ) {
> > > my $rc = $factory->retrieve_blast($rid);
> > > if( !ref($rc) ) {
> > > if( $rc < 0 ) {
> > > $factory->remove_rid($rid);
> > > }
> > > print STDERR "." if ( $v > 0 );
> > > sleep 5;
> > > } else { # parsing
> > > starts here
> > > my $result = $rc->next_result(); # it should hang
> > > here
> > > #save the output
> > > my $filename = $result->query_name()."\.out";
> > > $factory->save_output($filename);
> > > $factory->remove_rid($rid);
> > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > while ( my $hit = $result->next_hit ) {
> > > next unless ( $v > 0);
> > > print "\thit name is ", $hit->name, "\n";
> > > while( my $hsp = $hit->next_hsp ) {
> > > print "\t\tscore is ", $hsp->score, "\n";
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > >
> > >
> > > My script hanged if I used next_result() in any way prior to the fixes.
> > I
> > > want to see how many others are having the same issues with parsing
> > using
> > > the CVS version of bioperl-live.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > To: Huang Jian; bioperl-l
> > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >
> > > > Hi Huang,
> > > > Thanks for the message. The older version of RemoteBlast.pm works on
> > the
> > > > logic of checking the temporary file size to determine whether the
> > Blast
> > > > results are ready. This condition is not getting satisfied may be due
> > to
> > > > some changes brought about by NCBI. I had this problem recently and
> > > > figured out that the solution was to use the latest version which has
> > > > this problem fixed (does not use file size logic any more) which is
> > not
> > > > yet included in the BioPerl package.
> > > > Cheers
> > > > Nagesh
> > > >
> > > > Huang Jian wrote:
> > > >
> > > > > Dear Nagesh,
> > > > >
> > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > > > me. Now it works perfectly!!!
> > > > >
> > > > > Thank you!!
> > > > >
> > > > > Huang
> > > > >
> > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > 
> > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > 
> > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > > > via email
> > > > >
> > > > >
> > > > >> Hi Huang,
> > > > >> I see that you are submitting a sequence for a remote blast search.
> > > Can
> > > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09).
> > If
> > > > >> not I have attached it with this email, try to replace it with the
> > > old
> > > > >> one which has a bug.
> > > > >> Let me know if it works.
> > > > >> Nagesh
> > > > >
> > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 



From akarger at CGR.Harvard.edu  Mon Feb 13 15:57:08 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 13 Feb 2006 15:57:08 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
Message-ID: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>

I'm trying to get the sequences of each exon in a gene. I have a genbank
file with mRNA and exon features (among others) that look like: 
     mRNA            join(complement(22257..22386),complement(22067..22186),
                     complement(16753..17101),complement(13840..13962),
                     complement(10649..10820),complement(502..3028))
                     /gene="ENSG00000005812"
                     /note="transcript_id=ENST00000355619"
     exon            complement(13840..13962)
                     /note="exon_id=ENSE00000802462"

I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
the mRNA above. I tried writing the below code, but it doesn't do what I
want. (You'll note that the code is stolen from the Bio::Seq and Feature
HOWTOs.)

my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
while (my $seq = $inseq->next_seq) {
    my @features = $seq->get_SeqFeatures(); # just top level
    foreach my $feat ( @features ) {
        my $type = $feat->primary_tag;
        if ($type eq "mRNA") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
                my @feats = $feat->get_SeqFeatures();
                print "Found ", scalar @feats, " sub-features\n";
        } elsif ($type eq "exon") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
        }
     }
}

When I run the above, it says that the mRNA features have no sub-features.
So how do I pull out the 6 sequences?

Thanks,
- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626


From cjfields at uiuc.edu  Mon Feb 13 18:18:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 17:18:24 -0600
Subject: [Bioperl-l] INSTALL.WIN in wiki
Message-ID: <000001c630f3$c9efa5f0$15327e82@pyrimidine>

I just added "Installing Bioperl on Windows" to the wiki.  It needs some
major updating and changes in formatting:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Jason has mentioned changing up some of the INSTALL docs for the wiki
(http://www.bioperl.org/wiki/Talk:Getting_BioPerl).  Any thoughts?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From osborne1 at optonline.net  Mon Feb 13 20:38:30 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 13 Feb 2006 20:38:30 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
Message-ID: 

Amir,

The idea is to look at the sub-locations in the SplitLocation object, this
is discussed in FAQ 5.2:

http://www.bioperl.org/wiki/FAQ#How_do_I_parse_the_CDS_join_or_complement_st
atements_in_GenBank_or_EMBL_files_to_get_the_sub-locations.3F

The sequence of the feature itself can be obtained by using the entire_seq()
method:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences


Brian O.


On 2/13/06 3:57 PM, "Amir Karger"  wrote:

> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
> 
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
> 
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
> 
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From hlapp at gmx.net  Mon Feb 13 18:58:46 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 13 Feb 2006 15:58:46 -0800
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
References: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
Message-ID: 

Why you want subfeatures? This is genbank format you're parsing,
right? Your mRNA features will have a split location. Loop over
$feat->location->each_Location() and get $seq->subseq() with the start
and end of each sublocation. If you don't know how to do this check
out the implementation of $feature->splice_seq().

This should be in the HOWTO. Is it not?

    -hilmar


On 2/13/06, Amir Karger  wrote:
> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
>
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
>
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
>
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From osborne1 at optonline.net  Mon Feb 13 21:11:33 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 13 Feb 2006 21:11:33 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: 
Message-ID: 

Hilmar,

It could be spelled out a bit more explicitly.

Brian O.


On 2/13/06 6:58 PM, "Hilmar Lapp"  wrote:

> This should be in the HOWTO. Is it not?




From rmb32 at cornell.edu  Mon Feb 13 17:12:10 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Mon, 13 Feb 2006 17:12:10 -0500
Subject: [Bioperl-l] game xml SeqIO
Message-ID: <43F1043A.2000205@cornell.edu>

Hi all,

Currently, the SeqIO for doing GAME XML does not seem to support writing 
(or reading?)  elements.  Am I correct?

If I am, are there any plans to add this functionality?  Can I help / do it?

If there are plans to add this, how would one distinguish SeqFeatures 
that should be rendered as  from SeqFeatures 
that should be rendered as ?  Would we do that with 
Bio::SeqFeature::Computation?  I assume that a given Seq can have 
SeqFeatures of different types associated with it (I don't know, I'm a 
bioperl newb).

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 607-255-2360
rmb32 at cornell.edu
http://www.sgn.cornell.edu




From heikki at sanbi.ac.za  Tue Feb 14 01:59:29 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 14 Feb 2006 08:59:29 +0200
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602100906.11885.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
Message-ID: <200602140859.30136.heikki@sanbi.ac.za>

I've committed an interim solution to the sequence evolution problem:

    $newseq = Bio::SeqUtils-> evolve
        ($seq, $similarity, $transition_transversion_rate);

I will go on to transform this code to fully OO, extensible solution.

   -Heikki


On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> Ryan Golhar's mail got me thinking that we should have a simple framework
> for mutating sequences to a desired level. The model can then be extended
> to necessary complexity when needed by subclassing.
>
> To start with, I have been planning:
>
>
> Bio::SeqEvolution::EvolutionI - interface file
> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>         (defaults to Bio::PrimarySeq)
> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>        - returns an array of $count seqs
> Bio::SeqEvolution::EvolutionI::_generate_seq()
> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>       converteed to probabilites of change internally
>
>   various methods to define the extent of divergence:
>   only one to start with:
> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>    (= 100% - identity)
>
> Bio::SeqEvolution::Factory - core class to call,
>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>
>
> Bio::SeqEvolution::DNASimple - default for nucleotides
> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>         e.g. 5 => 5:1, defaults to 1:1
>         simple alternative to a scoring matrix
>
>
> I am soliciting usual comments and suggestions about naming and minimal
> functionality.
>
>
>    -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From gbazykin at Princeton.EDU  Tue Feb 14 09:34:54 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Tue, 14 Feb 2006 09:34:54 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602140859.30136.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
Message-ID: <214316262.20060214093454@princeton.edu>

Hi,

Just a thought: I really think that in perspective, it would be nice
to be able to evolve the sequence along a tree of given shape. I think
PAML's "evolver" has this functionality. I've already been doing this
in my scripts, but I am not sure how to couple the tree and the
sequence data properly.

Yegor (George) Bazykin


------------------------------
Tuesday, February 14, 2006, 1:59:29 AM, you wrote:

> I've committed an interim solution to the sequence evolution problem:

>     $newseq = Bio::SeqUtils-> evolve
>         ($seq, $similarity, $transition_transversion_rate);

> I will go on to transform this code to fully OO, extensible solution.

>    -Heikki


> On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> Ryan Golhar's mail got me thinking that we should have a simple framework
>> for mutating sequences to a desired level. The model can then be extended
>> to necessary complexity when needed by subclassing.
>>
>> To start with, I have been planning:
>>
>>
>> Bio::SeqEvolution::EvolutionI - interface file
>> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>>         (defaults to Bio::PrimarySeq)
>> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>>        - returns an array of $count seqs
>> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>>       converteed to probabilites of change internally
>>
>>   various methods to define the extent of divergence:
>>   only one to start with:
>> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>>    (= 100% - identity)
>>
>> Bio::SeqEvolution::Factory - core class to call,
>>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>>
>>
>> Bio::SeqEvolution::DNASimple - default for nucleotides
>> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>>         e.g. 5 => 5:1, defaults to 1:1
>>         simple alternative to a scoring matrix
>>
>>
>> I am soliciting usual comments and suggestions about naming and minimal
>> functionality.
>>
>>
>>    -Heikki




From maximilianh at gmail.com  Tue Feb 14 05:11:42 2006
From: maximilianh at gmail.com (Maximilian Haeussler)
Date: Tue, 14 Feb 2006 11:11:42 +0100
Subject: [Bioperl-l] [BiO BB] Re:  Tool to mutate DNA sequence
In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
Message-ID: <76f031ae0602140211n2a0bbf4fl@mail.gmail.com>

The tool ROSE also evolves sequences on a tree. There is a web
interface and downloadable source at
http://bibiserv.techfak.uni-bielefeld.de/rose/

Max

On 09/02/06, Jason Stajich  wrote:
> Depending on whether or not you want to use evolutionary realistic
> models...
> * evolver which comes with PAML lets you evolve sequences on a tree
> * SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html?
> id=seqgen
> also lets you do this
> I believe there are PISE interfaces to both of these at the pasteur
> bioweb site - http://bioweb.pasteur.fr/
>
> -jason
> On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote:
>
> > Does anyone know of tool to mutate a DNA sequence by a specified
> > amount?
> > For instance, say I have a DNA sequence 1000 bases long, and I want to
> > simulate mutations to make it 75% (or 80%, etc) similar to the
> > original.
> >
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


--
Maximilian Haeussler,
CNRS Gif-sur-Yvette, Paris
tel: +33 6 12 82 76 16
icq: 3825815  -- msn: maximilian.haeussler at hpi.uni-potsdam.de
skype: maximilianhaeussler



From heikki at sanbi.ac.za  Tue Feb 14 11:09:27 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 14 Feb 2006 18:09:27 +0200
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <214316262.20060214093454@princeton.edu>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
	<214316262.20060214093454@princeton.edu>
Message-ID: <200602141809.28057.heikki@sanbi.ac.za>


Yegor,

Like you said, there are examples how it is done.. It should be possible to 
evolve sequences based on a rooted tree. You just walk the tree and evolve 
each sequence from its parent.  If there is  an agreement how the branch 
lengths get translated to  mutations, even that could be done. Do you have 
any suggestions?

	-Heikki



On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
> Hi,
>
> Just a thought: I really think that in perspective, it would be nice
> to be able to evolve the sequence along a tree of given shape. I think
> PAML's "evolver" has this functionality. I've already been doing this
> in my scripts, but I am not sure how to couple the tree and the
> sequence data properly.
>
> Yegor (George) Bazykin
>
>
> ------------------------------
>
> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
> > I've committed an interim solution to the sequence evolution problem:
> >
> >     $newseq = Bio::SeqUtils-> evolve
> >         ($seq, $similarity, $transition_transversion_rate);
> >
> > I will go on to transform this code to fully OO, extensible solution.
> >
> >    -Heikki
> >
> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> >> Ryan Golhar's mail got me thinking that we should have a simple
> >> framework for mutating sequences to a desired level. The model can then
> >> be extended to necessary complexity when needed by subclassing.
> >>
> >> To start with, I have been planning:
> >>
> >>
> >> Bio::SeqEvolution::EvolutionI - interface file
> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
> >>         (defaults to Bio::PrimarySeq)
> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
> >>        - returns an array of $count seqs
> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
> >>       converteed to probabilites of change internally
> >>
> >>   various methods to define the extent of divergence:
> >>   only one to start with:
> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
> >>    (= 100% - identity)
> >>
> >> Bio::SeqEvolution::Factory - core class to call,
> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
> >>
> >>
> >> Bio::SeqEvolution::DNASimple - default for nucleotides
> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
> >>         e.g. 5 => 5:1, defaults to 1:1
> >>         simple alternative to a scoring matrix
> >>
> >>
> >> I am soliciting usual comments and suggestions about naming and minimal
> >> functionality.
> >>
> >>
> >>    -Heikki
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From golharam at umdnj.edu  Tue Feb 14 12:01:38 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 14 Feb 2006 12:01:38 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za>
Message-ID: <016401c63188$52c9d4b0$2f01a8c0@GOLHARMOBILE1>

Here are my two cents....

1.  Allow sequences to be mutated by some percent amount.
2.  Use mutation patterns implied by PAM matrices or some known models
of mutation.
3.  Have the output show the original sequences and the mutated sequence
so you can easily identify what was mutated and what is conserved.

Ryan


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Tuesday, February 14, 2006 11:09 AM
To: bioperl-l at lists.open-bio.org; Georgii A Bazykin
Subject: Re: [Bioperl-l] planning sequence mutating modules



Yegor,

Like you said, there are examples how it is done.. It should be possible
to 
evolve sequences based on a rooted tree. You just walk the tree and
evolve 
each sequence from its parent.  If there is  an agreement how the branch

lengths get translated to  mutations, even that could be done. Do you
have 
any suggestions?

	-Heikki



On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
> Hi,
>
> Just a thought: I really think that in perspective, it would be nice 
> to be able to evolve the sequence along a tree of given shape. I think

> PAML's "evolver" has this functionality. I've already been doing this 
> in my scripts, but I am not sure how to couple the tree and the 
> sequence data properly.
>
> Yegor (George) Bazykin
>
>
> ------------------------------
>
> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
> > I've committed an interim solution to the sequence evolution 
> > problem:
> >
> >     $newseq = Bio::SeqUtils-> evolve
> >         ($seq, $similarity, $transition_transversion_rate);
> >
> > I will go on to transform this code to fully OO, extensible 
> > solution.
> >
> >    -Heikki
> >
> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> >> Ryan Golhar's mail got me thinking that we should have a simple 
> >> framework for mutating sequences to a desired level. The model can 
> >> then be extended to necessary complexity when needed by 
> >> subclassing.
> >>
> >> To start with, I have been planning:
> >>
> >>
> >> Bio::SeqEvolution::EvolutionI - interface file
> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
> >>         (defaults to Bio::PrimarySeq)
> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by 
> >> subclasses
> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
> >>        - returns an array of $count seqs
> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
> >>       converteed to probabilites of change internally
> >>
> >>   various methods to define the extent of divergence:
> >>   only one to start with:
> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
> >>    (= 100% - identity)
> >>
> >> Bio::SeqEvolution::Factory - core class to call,
> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for 
> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution
model,
> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
> >>
> >>
> >> Bio::SeqEvolution::DNASimple - default for nucleotides 
> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
> >>         e.g. 5 => 5:1, defaults to 1:1
> >>         simple alternative to a scoring matrix
> >>
> >>
> >> I am soliciting usual comments and suggestions about naming and 
> >> minimal functionality.
> >>
> >>
> >>    -Heikki
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hjm at tacgi.com  Tue Feb 14 12:15:11 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Tue, 14 Feb 2006 09:15:11 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
Message-ID: <200602140915.11604.hjm@tacgi.com>

Hi Brian,

Thanks very much for the pointers and the speed of your reply and apologies 
for the speed of mine.

This looks good, but what I was looking for was a bioP approach for hooking to 
an API at NCBI or EBI so I could get this info and seqs from them.  In this 
case, speed of retrieval is not critical and I'd rather not download the 
entirety of the sequences to a local disk to hack at them.

I've determined a screen-scraping approach to get them and could script that, 
but I thought that bioP had a method for using NCBI's external API's, tho it 
may be that my memory is faulty or the approach is no longer supported due to 
overload.  

Does NCBI make such APIs available anymore?  I searched a bit for docs on them 
but couldn't find anything (unless it's buried in the NCBI tookit, which I 
haven't started to excavate).

Failing that, would SEALS provide such a service? Any PerlPinipeds listening?

Harry






On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> Harry,
>
> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
> from its documentation:
>
>   use Bio::DB::Fasta;
>
>   # create database from directory of fasta files
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   # simple access (for those without Bioperl)
>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>   my @ids     = $db->ids;
>   my $length   = $db->length('CHROMOSOME_I');
>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>   my $header   = $db->header('CHROMOSOME_I');
>
>   # Bioperl-style access
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>   my $seq     = $obj->seq;
>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>
> Do you already have the offsets?
>
> Brian O.
>
> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > Hi All,
> >
> > After perusing the tutorial and other docs for a an evening, I still
> > can't find the answer to this.  Forgive me if I've missed something
> > obvious.
> >
> > This should not be a novel request, but I've not found it answered.  If
> > bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> > better way, especially if it includes an illuminating bit of code.
> >
> > The problem is to retrieve genomic sequences plus & minus some offset
> > from a locus determined by HUGO keyword or GeneID.  This would be a
> > common followup chore for some extra analysis from a gene expression
> > expt.  Or maybe this is in the DBFetch routines, but I've missed the
> > sequence type to specify...?
> >
> >
> > TIA!

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From jason.stajich at duke.edu  Tue Feb 14 13:25:21 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 14 Feb 2006 13:25:21 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
References: 
	<200602140915.11604.hjm@tacgi.com>
Message-ID: <13B3724F-3716-4C4B-95A7-6849EF167A80@duke.edu>

Are you working spp that are in Ensembl?  Is what you need not  
provided by Ensembl/EnsMart? Seems like they are doing the best job  
integrating gene ids to a central place.

It is not exactly clear what API you are referring to - you can query  
Entrez via Bio::DB::Query::GenBank so if you can construct your query  
via the Entrez syntax you can access and retrieve it in bioperl.

-jason
On Feb 14, 2006, at 12:15 PM, Harry Mangalam wrote:

> Hi Brian,
>
> Thanks very much for the pointers and the speed of your reply and  
> apologies
> for the speed of mine.
>
> This looks good, but what I was looking for was a bioP approach for  
> hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.   
> In this
> case, speed of retrieval is not critical and I'd rather not  
> download the
> entirety of the sequences to a local disk to hack at them.
>
> I've determined a screen-scraping approach to get them and could  
> script that,
> but I thought that bioP had a method for using NCBI's external  
> API's, tho it
> may be that my memory is faulty or the approach is no longer  
> supported due to
> overload.
>
> Does NCBI make such APIs available anymore?  I searched a bit for  
> docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit,  
> which I
> haven't started to excavate).
>
> Failing that, would SEALS provide such a service? Any PerlPinipeds  
> listening?
>
> Harry
>
>
>
>
>
>
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>>
>> Hope you're doing well. The approach could be based on  
>> Bio::DB::Fasta. So,
>> from its documentation:
>>
>>   use Bio::DB::Fasta;
>>
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>>
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>
>> Do you already have the offsets?
>>
>> Brian O.
>>
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>>
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>>
>>> This should not be a novel request, but I've not found it  
>>> answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a  
>>> pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>>
>>> The problem is to retrieve genomic sequences plus & minus some  
>>> offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>>
>>>
>>> TIA!
>
> -- 
> Cheers, Harry
> Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com
>             <>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Tue Feb 14 13:40:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 12:40:31 -0600
Subject: [Bioperl-l] FW:  more on RemoteBlast.pm version 1.2
Message-ID: <000e01c63196$225159d0$15327e82@pyrimidine>

Sorry, forgot to add that I didn't see the regex issue that you mentioned.
It could be a perl-related issue.  Try the fixes I mentioned and see what
happens.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, February 14, 2006 12:36 PM
> To: 'gyang at plantbio.uga.edu'
> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> 
> It's a good habit to always add single quotes around words.  The perl
> interpreter may think a single bare word is a subroutine or perlfunc
> called with no args so will try to find a subroutine named blastp().  My
> debugger actually gives the error that the bare word blastp may conflict
> with a future reserved word.  Like you said, 'use strict' will point that
> out.
> 
> As for the regex, it should match all the blast programs at NCBI (blastp,
> blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> else passes through.
> 
> So, if you are using the script below, there are several errors.  The bare
> words for $prog and $db need quotes, and the flags for you @params array
> don't have a dash before them.  I get this after adding quotes but before
> adding the dashes to @params:
> 
> C:\Perl\Scripts>test_blast.pl
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG:
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> live/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> live/Bio/Tools/Run/RemoteBlast.pm:256
> STACK: C:\Perl\Scripts\test_blast.pl:15
> -----------------------------------------------------------
> 
> The last line indicates a problem with this line:
> 
> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> 
> Changing the @params to this:
> 
> my @params=( -prog=>$prog,
> 	-data=>$db,
> 	-expect=>$e_val,
> 	-readmethod=>'SearchIO');
> 
> fixes it, and I get output as expected.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > Sent: Tuesday, February 14, 2006 11:48 AM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >
> > Hi, Chris,
> > When I tried with the perldoc script, It did not work either. First it
> > says $prog can not be bare word if I "use strict". I added quotes on the
> > words, then it says the value for $prog does not match expression
> > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> script
> > is shown below. Why is the expression "t?blast[pnx]"?
> >
> > #!/usr/bin/perl
> >
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use Bio::Tools::Run::RemoteBlast;
> > use Bio::SearchIO;
> >
> >
> > my $prog=blastp;
> > my $db=swissprot;
> > my $e_val=1e-10;
> > my @params=( prog=>$prog,
> > 	data=>$db,
> > 	expect=>$e_val,
> > 	readmethod=>'SearchIO');
> > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >
> > my $v = 1;
> >
> > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >
> > while (my $input = $str->next_seq()){
> >   #Blast a sequence against a database:
> >   #Alternatively, you could  pass in a file with many
> >   #sequences rather than loop through sequence one at a time
> >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> >   #and swap the two lines below for an example of that.
> >   my $r = $factory->submit_blast($input);
> >   #my $r = $factory->submit_blast('amino.fa');
> >   print STDERR "waiting..." if( $v > 0 );
> >   while ( my @rids = $factory->each_rid ) {
> >     foreach my $rid ( @rids ) {
> >       my $rc = $factory->retrieve_blast($rid);
> >       if( !ref($rc) ) {
> >         if( $rc < 0 ) {
> >           $factory->remove_rid($rid);
> >         }
> >         print STDERR "." if ( $v > 0 );
> >         sleep 5;
> >       } else {
> >         my $result = $rc->next_result();
> >         #save the output
> >         my $filename = $result->query_name()."\.out";
> >         $factory->save_output($filename);
> >         $factory->remove_rid($rid);
> >         print "\nQuery Name: ", $result->query_name(), "\n";
> >         while ( my $hit = $result->next_hit ) {
> >           next unless ( $v > 0);
> >           print "\thit name is ", $hit->name, "\n";
> >           while( my $hsp = $hit->next_hsp ) {
> >             print "\t\tscore is ", $hsp->score, "\n";
> >           }
> >         }
> >       }
> >     }
> >   }
> > }
> >
> > Thank you for your help!
> >
> >
> > Guojun
> > Department of Plant Biology
> > University of Georgia
> >
> > ----- Original Message -----
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >
> >
> > > Try two things:
> > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > wrong
> > > with the logic in your subroutine:
> > > > my $v = 1;
> > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > > while (my $input = $str->next_seq()){
> > >   #Blast a sequence against a database:
> > >   #Alternatively, you could  pass in a file with many
> > >   #sequences rather than loop through sequence one at a time
> > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >   #and swap the two lines below for an example of that.
> > >   my $r = $factory->submit_blast($input);
> > >   #my $r = $factory->submit_blast('amino.fa');
> > >   print STDERR "waiting..." if( $v > 0 );
> > >   while ( my @rids = $factory->each_rid ) {
> > >     foreach my $rid ( @rids ) {
> > >       my $rc = $factory->retrieve_blast($rid);
> > >       if( !ref($rc) ) {
> > >         if( $rc < 0 ) {
> > >           $factory->remove_rid($rid);
> > >         }
> > >         print STDERR "." if ( $v > 0 );
> > >         sleep 5;
> > >       } else {
> > >         my $result = $rc->next_result();
> > >         #save the output
> > >         my $filename = $result->query_name()."\.out";
> > >         $factory->save_output($filename);
> > >         $factory->remove_rid($rid);
> > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > >         while ( my $hit = $result->next_hit ) {
> > >           next unless ( $v > 0);
> > >           print "\thit name is ", $hit->name, "\n";
> > >           while( my $hsp = $hit->next_hsp ) {
> > >             print "\t\tscore is ", $hsp->score, "\n";
> > >           }
> > >         }
> > >       }
> > >     }
> > >   }
> > > }
> > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> really
> > > shouldn't make that much of a difference, but I noticed that the CVS
> > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > released; the Bugzilla version is based off CVS.
> > > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > Thanks, Chris,
> > > > I installed version 1.5.1 and replaced the blast.pm file with the
> one
> > from
> > > > your bug report. The running version is 1.5 when I use the command
> you
> > > > sent me. But when I tried the script, it doesn't change much. My
> > > > remoteblast code (portion) is here:
> > > > > > sub search {
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > local
> > > >
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > 'no';
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > 			      -id=>"query",
> > > > 			      -desc=>"new seq");
> > > > my $len=$query->length();
> > > > @db=('nr','htgs','wgs');
> > > > foreach my $db (@db) {
> > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> > > > 						'-data' =>"$db",
> > > >
'-expect'=>"$E_value");
> > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > my @rids = $factory->each_rid();
> > > > foreach my $rid ( @rids ) {
> > > >     print STDERR "$rid\n";
> > > > }
> > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > print STDERR "waiting...";
> > > > sleep 60;
> > > > > > foreach my $rid ( @rids ) {
> > > >     my $rc = $factory->retrieve_blast($rid);
> > > >     while (!ref($rc) ) {
> > > > 	if( $rc < 0 ) {
> > > > # retrieve_blast returns -1 on error
> > > > 	    $factory->remove_rid($rid);
> > > > 	    print "Error!\n";
> > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > 	    die "Can't retrieve $rid";
> > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> finished'
> > > > 	    sleep 60;
> > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > 	}
> > > >     }
> > > >     if (ref($rc)) {
> > > > 	print STDERR "Done.\n";
> > > > 	 while( my $result = $rc->next_result) {
> > > > 	    while( my $hit = $result->next_hit()) {
> > > > 	    	$hit_name=$hit->name;
> > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > 		$name=$1;
> > > > 		@left_plus_start=();
> > > > 		@left_plus_end=();
> > > > 		@left_minus_start=();
> > > > 		@left_minus_end=();
> > > > 		@right_plus_start=();
> > > > 		@right_plus_end=();
> > > > 		@right_minus_start=();
> > > > 		@right_minus_end=();
> > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > ......
> > > > > > It was working quite well before around October laster year, but
> > it has
> > > > stopped since then, When a submission is sent via a webpage, the cgi
> > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > finally
> > > > the expected email is received but without real results although it
> > does
> > > > contain something from other parts of the script. Apparently the
> > search
> > > > sub did not return anything (I know there is something should be
> > > > returned.). Is it also possible the format of the NCBI output for
> each
> > > > result has changed?
> > > > Thank you,
> > > > Guojun
> > > > > > > > Department of Plant Biology
> > > > University of Georgia
> > > > > > > > > > ----- Original Message -----
> > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > How do you know two versions are installed (i.e. how are
> you
> > checking
> > > > the
> > > > > version)?  Do you see have two complete bioperl distributions (in
> > two
> > > > > separate directories) or are you looking in modules?  Here's the
> way
> > to
> > > > > check the version (from the FAQ):
> > > > > > perl -MBio::Root::Version -e 'print
> > $Bio::Root::Version::VERSION,"\n"'
> > > > > > If you have two full bioperl distributions on your computer,
> > normally
> > > > only
> > > > > one will be in use unless you have explicitly set the environment
> > > > variable
> > > > > PERL5LIB.  The PERL5LIB  directories will be searched first before
> > your
> > > > > normal perl directory list (@INC) is searched.  You MAY get some
> > mixing
> > > > > then, but only if perl can't find a particular module in the path
> > > > designated
> > > > > in PERL5LIB; then it will progress through the directories listed
> in
> > > > @INC.
> > > > > This may happen if a module is unique to a particular release, but
> > > > shouldn't
> > > > > happen for the majority of modules, including RemoteBlast.  You
> can
> > > > check
> > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > differ
> > > > > depending on your OS, perl build, etc.
> > > > > > Regardless, if you follow the directions for installing bioperl
> > for
> > > > your
> > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > unless
> > > > you
> > > > > explicitly change the installation directory when using 'perl
> > > > Makefile.PL'),
> > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > install
> > > > the
> > > > > Bioperl distribution you downloaded over the old version in @INC.
> > See
> > > > this
> > > > > page:
> > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > for more details.
> > > > > > Christopher Fields
> > > > > Postdoctoral Researcher - Switzer Lab
> > > > > Dept. of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > > > > > -----Original Message-----
> > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > Hi, Chris,
> > > > > > I do have different versions of bioperl on my Linux machine
> (1.4.
> > and
> > > > > > 1.5.0), this may be the problem. Should I just install bioperl-
> > 1.5.1
> > > > or I
> > > > > > need to uninstall and remove the previous versions. I could not
> > find
> > > > any
> > > > > > hint on uninstalling bioperl on linux. Could you please give me
> > some
> > > > > > suggestion?
> > > > > > Thanks,
> > > > > > Guojun
> > > > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > >       _____
> > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely
> > > > updated from CVS
> > > > > > which isn't the latest fix.
> > > > > > > > Make sure that you check the following:
> > > > > > > > 1) Always post to the mailing list:
> > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > (CVS)
> > > > > > installed first.  Perform a clean installation; do not upgrade
> > only
> > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> can't
> > > > > > guarantee that mixing modules from old and new distributions
> (1.4
> > and
> > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > > > > installation will allow text output from BLAST v.2.2.12 to be
> > saved
> > > > and
> > > > > > parsed; it will not parse the newest BLAST text output from NCBI
> > > > (v2.2.13)
> > > > > > but it should still save it. I believe as long as next_results()
> > isn't
> > > > > > called, it will work.
> > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> 2.2.13
> > > > text output
> > > > > > are NOT in CVS; they haven't been cleared and checked in by
> Roger
> > Hall
> > > > > > (who's now taking care of RemoteBlast) and the powers that be
> > (Jason
> > > > or
> > > > > > whomever is in charge of Bio::SearchIO).  They can be found in
> > > > Bugzilla:
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > option
> > > > of
> > > > > > saving XML output, so isn't necessary if you don't plan on using
> > this
> > > > > > option.  And, remember, they haven't been committed yet to CVS,
> > which
> > > > > > means that the final version will change to refle the new
> version.
> > > > > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > >     _____
> > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > To: Chris Fields
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > > > Hi, Chris
> > > > > > > > Thanks for your suggestion, however, it doesn't seem to work
> > for
> > > > my cgi
> > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > even
> > > > get
> > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > Guojun
> > > > > > > > > > Guojun Yang
> > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > > Tel: 706-542-1857
> > > > > > Fax: 706-542-1805
> > > > > > http://www.arches.uga.edu/~guojun
> > > > > >     _____
> > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > I would say give the new code a try, but realize that it
> > hasn't
> > > > been
> > > > > > checked
> > > > > > in (like I said below). I will try going over the modified
> > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > anything I
> > > > > > might
> > > > > > have missed. The changed order in the header of BLAST text
> output
> > has
> > > > me a
> > > > > > bit worried that it might not catch everything, but it at least
> > > > doesn't
> > > > > > hang
> > > > > > in the while() loop I described in the bug report below (bug
> > #1934)
> > > > and
> > > > > > seems to process everything fine.
> > > > > > > > If you want more stability in the code, you might consider
> > > > changing over
> > > > > > to
> > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are
> > some
> > > > > > changes
> > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > saving
> > > > XML
> > > > > > output, but I believe it parses everything regardless. If you
> look
> > > > back
> > > > > > the
> > > > > > last month or so there has been a bit of discussion here about
> it.
> > > > Jason
> > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > > remoteblast/
> > > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > -----Original Message-----
> > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > To: bioperl-l at bioperl.org
> > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > version
> > > > 1.28
> > > > > > >
> > > > > > > Hi, Everybody,
> > > > > > > I see this post and am wondering if this is the reason for the
> > > > > > > malfunctionning of my webserver. We set up a webserver named
> > MAK,
> > > > for
> > > > > > MITE
> > > > > > > sequence analysis. It was working very well until around
> > November
> > > > 2005,
> > > > > > > when it stopped returning any result (the site is fine and
> seems
> > to
> > > > be
> > > > > > > doing sth after submission). In the CGI script, I used
> > remoteblast
> > > > (that
> > > > > > > work was done in 2003) to do searches. I currently do not have
> > > > access to
> > > > > > > the server because I moved. Quite several people sent emails
> to
> > us
> > > > about
> > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > problem?
> > > > > > Should
> > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > version?
> > > > > > > Thanks a lot,
> > > > > > > Guojun
> > > > > > >
> > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > > Tel: 706-542-1857
> > > > > > > Fax: 706-542-1805
> > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > _____
> > > > > > >
> > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > Jian'
> > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> [mailto:bioperl-
> > > > > > > l at bioperl.org]
> > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > >
> > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > CVS.
> > > > It
> > > > > > > will
> > > > > > > work for saving text output. However, it will not parse
> anything
> > > > using
> > > > > > > next_result (it will likely hang) and will not save XML
> format.
> > See
> > > > > > these
> > > > > > > bugs:
> > > > > > >
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > >
> > > > > > > for explanations and possible fixes (changes to RemoteBlast
> and
> > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in
> > yet
> > > > so
> > > > > > are
> > > > > > > still not included in bioperl-live; they may be further
> modified
> > > > before
> > > > > > > committing to CVS. If you're not worried about XML, you could
> > just
> > > > try
> > > > > > the
> > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > >
> > > > > > > Nagesh, I remember you posting to the list a month ago using a
> > > > script
> > > > > > > which
> > > > > > > had problems; the script you used saves the output but doesn't
> > > > actually
> > > > > > > parse it (i.e. you don't use next_result() to go through the
> > data).
> > > > Is
> > > > > > the
> > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have
> you
> > > > tried
> > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > readmethod
> > > > =>
> > > > > > > blast"
> > > > > > > using your version of RemoteBlast and method next_result()?
> Like
> > > > below
> > > > > > > (from
> > > > > > > perldoc):
> > > > > > >
> > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > foreach my $rid ( @rids ) {
> > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > if( !ref($rc) ) {
> > > > > > > if( $rc < 0 ) {
> > > > > > > $factory->remove_rid($rid);
> > > > > > > }
> > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > sleep 5;
> > > > > > > } else { # parsing
> > > > > > > starts here
> > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > here
> > > > > > > #save the output
> > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > $factory->save_output($filename);
> > > > > > > $factory->remove_rid($rid);
> > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > next unless ( $v > 0);
> > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > My script hanged if I used next_result() in any way prior to
> the
> > > > fixes.
> > > > > > I
> > > > > > > want to see how many others are having the same issues with
> > parsing
> > > > > > using
> > > > > > > the CVS version of bioperl-live.
> > > > > > >
> > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > >
> > > > > > > > Hi Huang,
> > > > > > > > Thanks for the message. The older version of RemoteBlast.pm
> > works
> > > > on
> > > > > > the
> > > > > > > > logic of checking the temporary file size to determine
> whether
> > the
> > > > > > Blast
> > > > > > > > results are ready. This condition is not getting satisfied
> may
> > be
> > > > due
> > > > > > to
> > > > > > > > some changes brought about by NCBI. I had this problem
> > recently
> > > > and
> > > > > > > > figured out that the solution was to use the latest version
> > which
> > > > has
> > > > > > > > this problem fixed (does not use file size logic any more)
> > which
> > > > is
> > > > > > not
> > > > > > > > yet included in the BioPerl package.
> > > > > > > > Cheers
> > > > > > > > Nagesh
> > > > > > > >
> > > > > > > > Huang Jian wrote:
> > > > > > > >
> > > > > > > > > Dear Nagesh,
> > > > > > > > >
> > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > you
> > > > send
> > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > >
> > > > > > > > > Thank you!!
> > > > > > > > >
> > > > > > > > > Huang
> > > > > > > > >
> > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > 
> > > > > > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > > > > > 
> > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> net,
> > so
> > > > still
> > > > > > > > > via email
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> Hi Huang,
> > > > > > > > >> I see that you are submitting a sequence for a remote
> blast
> > > > search.
> > > > > > > Can
> > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > (2005/12/09).
> > > > > > If
> > > > > > > > >> not I have attached it with this email, try to replace it
> > with
> > > > the
> > > > > > > old
> > > > > > > > >> one which has a bug.
> > > > > > > > >> Let me know if it works.
> > > > > > > > >> Nagesh
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > _______________________________________________
> > > > > > Bioperl-l mailing list
> > > > > > Bioperl-l at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > >
> > > > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >



From sdavis2 at mail.nih.gov  Tue Feb 14 15:02:59 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 14 Feb 2006 15:02:59 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

You can look get the upstream regions for genes via the table browser at
UCSC.  If you want to do it yourself, just download their refGene table (as
a tab-delimited text file) that includes the HUGO gene name.  Then, use the
method given by Brian to look up the locations.  The genome just isn't THAT
big to download and to store locally.  Note that most of the big sites (like
NCBI, for example) impose restrictions on the number and timing of hits, so
utilizing them for high-thoughput analysis (like for gene expression
studies) is not always feasible.  I have found that having the data locally
is almost always better.

Sean
 


On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From cjfields at uiuc.edu  Tue Feb 14 15:32:42 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 14:32:42 -0600
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
Message-ID: <001201c631a5$ce7496f0$15327e82@pyrimidine>

Hilmar, 

Good News: I've added a section to the bioperl wiki on installing bioperl-db
in Windows:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Installing_bioperl
-db

Bad News:  There's a new problem now. I updated from CVS yesterday; I walked
through the steps and ran 'nmake test', with everything passing fine.
However, load_seqdatabase.pl is extremely slow; it's loading a sequence
every 5 minutes or so.  I noticed (when using '-debug') that it is hanging
up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a database,
load the biosql schema, and load sequences w/o loading taxonomy, the problem
goes away.

Here's the debugging output (I cut it off at the point it hangs up):
----------------------------------------------------------------------------
-------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -driver
mysql -namespace test -dbname biosql -dbuser root -dbpass ********** -format
genbank  -debug NP_252217.gpt
Loading NP_252217.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
        attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
        attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Annotation::Collection
        attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::Root::Root
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
        attempting to load module Bio::DB::BioSQL::RootIAdaptor
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionIAdaptor
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
        attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
        attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
        attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
        attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
        attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
        attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
        attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
        attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
        attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
        attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
        attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
        attempting to load module Bio::DB::BioSQL::LocationIAdaptor
        attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
        attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer
for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "test" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES
(?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "test" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id =
?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)  
----------------------------------------------------------------------------
-------------------------

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From osborne1 at optonline.net  Tue Feb 14 16:32:42 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 14 Feb 2006 16:32:42 -0500
Subject: [Bioperl-l] game xml SeqIO
In-Reply-To: <43F1043A.2000205@cornell.edu>
Message-ID: 

Robert,

It looks like you're right that this data isn't handled by SeqIO/game. If
you'd like to add this then feel free to do it, the modified files or
patches can be submitted to bugzilla.bioperl.org. If you take this on then
please add a test or 2 to t/game.t as well.

Yes, Bio::SeqFeature::Computation sounds right - does it match the data
you're trying to parse? SeqFeature::Generic is the most commonly used, and
it's flexible, but if another type of SeqFeature fits your data more
precisely then that's the one you should use.

Brian O.


On 2/13/06 5:12 PM, "Robert Buels"  wrote:

> Hi all,
> 
> Currently, the SeqIO for doing GAME XML does not seem to support writing
> (or reading?)  elements.  Am I correct?
> 
> If I am, are there any plans to add this functionality?  Can I help / do it?
> 
> If there are plans to add this, how would one distinguish SeqFeatures
> that should be rendered as  from SeqFeatures
> that should be rendered as ?  Would we do that with
> Bio::SeqFeature::Computation?  I assume that a given Seq can have
> SeqFeatures of different types associated with it (I don't know, I'm a
> bioperl newb).
> 
> Rob




From saldroubi at yahoo.com  Tue Feb 14 22:54:42 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Tue, 14 Feb 2006 19:54:42 -0800 (PST)
Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix
Message-ID: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>

All,
 
 I am trying to use Bio::Matrix::GenericMatrix module.  
 I simply put this line in my program:
     use Bio::Matrix::GenericMatrix;
 
 but I get the followin error:
 
 Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: /usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 /usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18.
 BEGIN failed--compilation aborted at sf.pl line 18.
 
 I found this module using find which is called Generic.pm in this directory
     /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix
 
 Could someone tell me why it is not working.  I have no trouble including these modules in my file.  
     use Bio::SeqIO;
     use Bio::DB::GenBank;
 
 Thank you. 
 
   

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From jason.stajich at duke.edu  Tue Feb 14 23:10:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 14 Feb 2006 23:10:56 -0500
Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix
In-Reply-To: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>
References: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>
Message-ID: 

try:
use Bio::Matrix::Generic;

Apparently I screwed up the SYNOPSIS.  fixed that just now.

-jason
On Feb 14, 2006, at 10:54 PM, Sam Al-Droubi wrote:

> All,
>
>  I am trying to use Bio::Matrix::GenericMatrix module.
>  I simply put this line in my program:
>      use Bio::Matrix::GenericMatrix;
>
>  but I get the followin error:
>
>  Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: / 
> usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 / 
> usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/ 
> perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/ 
> vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/ 
> vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18.
>  BEGIN failed--compilation aborted at sf.pl line 18.
>
>  I found this module using find which is called Generic.pm in this  
> directory
>      /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix
>
>  Could someone tell me why it is not working.  I have no trouble  
> including these modules in my file.
>      use Bio::SeqIO;
>      use Bio::DB::GenBank;
>
>  Thank you.
>
>
>
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From daniel.lang at biologie.uni-freiburg.de  Wed Feb 15 05:35:40 2006
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Wed, 15 Feb 2006 11:35:40 +0100
Subject: [Bioperl-l] distmat matrix
Message-ID: <43F303FC.9000806@biologie.uni-freiburg.de>

Hi,

I need to go through a uncorrected distmat matrix (EMBOSS, run locally)
to filter sequences from an MSA.
I had a look around and didn't find an obvious candidate. Before I start
writing something my own...
Is there a bioperl parser for reading distmat matrices or can I trick
the Bio::MapIO parsers for scoring or PHYLIP in doing so?
If anyone knows of course a tool to generate an uncorrected distance
matrix of protein MSAs that is supported by bioperl, would be also OK
for me:)

I have no experience with the Pise
(Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand
it it's only to execute the application on a remote web server? Or can I
solve my task with Pise?

Thanks in advance!

Daniel



From praveecbt at yahoo.co.in  Wed Feb 15 03:57:44 2006
From: praveecbt at yahoo.co.in (Praveen Raj)
Date: Wed, 15 Feb 2006 08:57:44 +0000 (GMT)
Subject: [Bioperl-l] Help
Message-ID: <20060215085744.14911.qmail@web8711.mail.in.yahoo.com>

Dear  Peter Schattner Sir,
   
                                       I have one problem with the profile_align() of  Clustalw object.
   
  I have given the code like this,
   ......
  12 @seq_array=($seqobj1,$seqobj2,$seqobj3);
13 $seq_array_ref=\@seq_array;
  14 $aln=$factory->align($seq_array_ref);
  15 print $out $aln;   # this works fine
  16 $sen = Bio::Seq->new(-display_id => '>gi|userdata|',
17                      -seq => "MTKKPGGPGKNRA....",
18                      -format => "fasta");
19 $aln=$factory->profile_align($aln,$sen); #problem here
  20 print $out1 $aln;
   
  I have got one error like this in Line No. 19
   
  ERROR: Could not open sequence file (-profile) 
  No. of seqs. read = -1. No alignment!
   
  How I can I solve this problem?
  Hope you provide a proper solution.
   
                           Thanking you,
                                         Praveen Raj,
                                         Project Student,
                                         NIV, India.

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.


From jason.stajich at duke.edu  Wed Feb 15 08:19:41 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 08:19:41 -0500
Subject: [Bioperl-l] distmat matrix
In-Reply-To: <43F303FC.9000806@biologie.uni-freiburg.de>
References: <43F303FC.9000806@biologie.uni-freiburg.de>
Message-ID: <550C115C-1216-4285-8BE5-EC217C3F1BE9@duke.edu>

Bioperl can parse PHYLIP distance matricies, see Bio::Matrix::IO.  I  
didn't write an EMBOSS distmat result parser but that would be nice  
to have (but check that EMBOSS doesn't already allow output in phylip  
format first).

There is pure-perl distance matrix calculation of a MSA for DNA  
sequences
Bio::Align::DNAStatistics
and for protein
Bio::Align::ProteinStatistics

There is some initial discussion here on the website, but could  
certainly use some more details.

http://bioperl.org/wiki/Phylogenetics
http://bioperl.org/wiki/HOWTO:Trees
http://bioperl.org/wiki/Module:Bio::Align::DNAStatistics


-jason
On Feb 15, 2006, at 5:35 AM, Daniel Lang wrote:

> Hi,
>
> I need to go through a uncorrected distmat matrix (EMBOSS, run  
> locally)
> to filter sequences from an MSA.
> I had a look around and didn't find an obvious candidate. Before I  
> start
> writing something my own...
> Is there a bioperl parser for reading distmat matrices or can I trick
> the Bio::MapIO parsers for scoring or PHYLIP in doing so?
> If anyone knows of course a tool to generate an uncorrected distance
> matrix of protein MSAs that is supported by bioperl, would be also OK
> for me:)
>
> I have no experience with the Pise
> (Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand
> it it's only to execute the application on a remote web server? Or  
> can I
> solve my task with Pise?
>
> Thanks in advance!
>
> Daniel
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From michael.watson at bbsrc.ac.uk  Wed Feb 15 10:06:29 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed, 15 Feb 2006 15:06:29 -0000
Subject: [Bioperl-l] Website issues
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

The links on the left of bioperl.org don't work in konqueror 3.1.1,
which is a real b*gger because that's the browser I use on Linux... :-S

Mick



From rmb32 at cornell.edu  Wed Feb 15 11:01:07 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Wed, 15 Feb 2006 11:01:07 -0500
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
Message-ID: <43F35043.7070705@cornell.edu>

Hi all,

I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using 
FeatureIO, except it purports not to support gff 2), and the file looks 
like:

##gff-version 2
##date 2006-02-13
##sequence-region C01HBa0088L02.seq 1 120525
C01HBa0088L02   RepeatMasker    similarity      3537    4267     3.3    
-       .       Target "Motif:bac_end_repeat_family_345" 1 740
C01HBa0088L02   RepeatMasker    similarity      4172    4279     2.9    
+       .       Target "Motif:HRSiTERT00100141" 1 104
C01HBa0088L02   RepeatMasker    similarity      4267    4323     0.0    
-       .       Target "Motif:k_29" 150 206
C01HBa0088L02   RepeatMasker    similarity      4322    4492    26.6    
+       .       Target "Motif:PRSiTERT00300001" 1960 2129
C01HBa0088L02   RepeatMasker    similarity      4557    5124    29.5    
+       .       Target "Motif:PRSiTERT00300001" 2142 2711

Notice the score column is padded with spaces.

Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid 
score.  My question is, who is wrong here, my input file or 
Bio::Tools::GFF?  Should Bio::Tools::GFF be able to read this file?

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 607-255-2360
rmb32 at cornell.edu
http://www.sgn.cornell.edu




From jason.stajich at duke.edu  Wed Feb 15 11:12:59 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 11:12:59 -0500
Subject: [Bioperl-l] Website issues
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>

Okay I guess someone will have to look into that.  Can you normally  
browse on wikipedia, we're just using their software, maybe it is a  
javascript problem?

Please send a system bug request to our helpdesk:
support at open-bio.org

-jason
On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> The links on the left of bioperl.org don't work in konqueror 3.1.1,
> which is a real b*gger because that's the browser I use on  
> Linux... :-S
>
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From Marc.Logghe at DEVGEN.com  Wed Feb 15 11:13:16 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 15 Feb 2006 17:13:16 +0100
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B2E@ANTARESIA.be.devgen.com>

Hi Rob,
According to the GFF Specifications Document @
http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml :

All of the above described fields should be separated by TAB characters
('\t'). All values of the mandatory fields should not include whitespace
(i.e. the strings for ,  and  fields).

Reading that, I am afraid you have to pre-process your gff input file
...
HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Robert Buels
> Sent: Wednesday, February 15, 2006 5:01 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::GFF parsing error
> 
> Hi all,
> 
> I'm parsing a GFF2 file with Bio::Tools::GFF (I would be 
> using FeatureIO, except it purports not to support gff 2), 
> and the file looks
> like:
> 
> ##gff-version 2
> ##date 2006-02-13
> ##sequence-region C01HBa0088L02.seq 1 120525
> C01HBa0088L02   RepeatMasker    similarity      3537    4267  
>    3.3    
> -       .       Target "Motif:bac_end_repeat_family_345" 1 740
> C01HBa0088L02   RepeatMasker    similarity      4172    4279  
>    2.9    
> +       .       Target "Motif:HRSiTERT00100141" 1 104
> C01HBa0088L02   RepeatMasker    similarity      4267    4323  
>    0.0    
> -       .       Target "Motif:k_29" 150 206
> C01HBa0088L02   RepeatMasker    similarity      4322    4492  
>   26.6    
> +       .       Target "Motif:PRSiTERT00300001" 1960 2129
> C01HBa0088L02   RepeatMasker    similarity      4557    5124  
>   29.5    
> +       .       Target "Motif:PRSiTERT00300001" 2142 2711
> 
> Notice the score column is padded with spaces.
> 
> Bio::Tools::GFF does not like this, and says that ' 3.3' is 
> not a valid score.  My question is, who is wrong here, my 
> input file or Bio::Tools::GFF?  Should Bio::Tools::GFF be 
> able to read this file?
> 
> Rob
> 
> --
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 607-255-2360
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From jason.stajich at duke.edu  Wed Feb 15 11:29:14 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 11:29:14 -0500
Subject: [Bioperl-l] Website issues
In-Reply-To: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
Message-ID: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>

I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE  
3.1.4-9)

But it works fine for me on 3.2.2-8.FC2 ....

So I'm going to go with this being a konqueror bug, sorry to say, but  
feel free to still report the bug to the helpdesk.
	
-jason
On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote:

> Okay I guess someone will have to look into that.  Can you normally
> browse on wikipedia, we're just using their software, maybe it is a
> javascript problem?
>
> Please send a system bug request to our helpdesk:
> support at open-bio.org
>
> -jason
> On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:
>
>> Hi
>>
>> The links on the left of bioperl.org don't work in konqueror 3.1.1,
>> which is a real b*gger because that's the browser I use on
>> Linux... :-S
>>
>> Mick
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Wed Feb 15 11:57:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 10:57:13 -0600
Subject: [Bioperl-l] Added 'Installing Bioperl for Unix' to wiki
Message-ID: <000301c63250$de506120$15327e82@pyrimidine>

I added an Installing Bioperl for Unix page, 

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

which is a quick redo of the INSTALL text file in the bioperl distribution.
It's in workable shape but needs links revisions etc.  

Please leave any comments on the discussion pages here.  

http://www.bioperl.org/wiki/Talk:Getting_BioPerl
http://www.bioperl.org/wiki/Talk:Installing_Bioperl_for_Unix

Thanks to Brian for helping out with the Windows install doc!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From khoueiry at ibdm.univ-mrs.fr  Wed Feb 15 12:23:21 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Wed, 15 Feb 2006 18:23:21 +0100
Subject: [Bioperl-l] Website issues
In-Reply-To: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
	<82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
Message-ID: <1140024202.2689.45.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 

From heikki at sanbi.ac.za  Wed Feb 15 13:55:07 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 15 Feb 2006 20:55:07 +0200
Subject: [Bioperl-l] Website issues
In-Reply-To: <1140024202.2689.45.camel@localhost>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
	<1140024202.2689.45.camel@localhost>
Message-ID: <200602152055.07667.heikki@sanbi.ac.za>

Konqueror 3.5.1.  has no problems, either. Clearly, older konqueror had a bug 
that has been permanently fixed.

Michael, time for you to upgrade.

	-Heikki

On Wednesday 15 February 2006 19:23, khoueiry wrote:
> I test it on konqueror 3.4.2 and it works well !!!
>
> On Wed, 2006-02-15 at 11:29 -0500, Jason Stajich wrote:
> > I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE
> > 3.1.4-9)
> >
> > But it works fine for me on 3.2.2-8.FC2 ....
> >
> > So I'm going to go with this being a konqueror bug, sorry to say, but
> > feel free to still report the bug to the helpdesk.
> >
> > -jason
> >
> > On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote:
> > > Okay I guess someone will have to look into that.  Can you normally
> > > browse on wikipedia, we're just using their software, maybe it is a
> > > javascript problem?
> > >
> > > Please send a system bug request to our helpdesk:
> > > support at open-bio.org
> > >
> > > -jason
> > >
> > > On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:
> > >> Hi
> > >>
> > >> The links on the left of bioperl.org don't work in konqueror 3.1.1,
> > >> which is a real b*gger because that's the browser I use on
> > >> Linux... :-S
> > >>
> > >> Mick
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > http://www.duke.edu/~jes12
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From gyang at plantbio.uga.edu  Wed Feb 15 14:39:41 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Wed, 15 Feb 2006 14:39:41 -0500
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
Message-ID: <20060215143941.54e91487@dogwood.plantbio.uga.edu>

Hi, Chris,
Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: 
"
waiting........
------------- EXCEPTION  -------------
MSG: no data for midline  Features flanking this part of subject sequence:
STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc                             hIO/blast.pm:1172
STACK toplevel remoteblast_test:40
"
The query sequence is:
CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG

The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'):
#!/usr/bin/perl

use Bio::SeqIO;
use Bio::Seq;
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use strict;
my $prog='blastn';
my $db='nr';
my $e_val=1e-10;
my @params=( -prog=>$prog,
	-data=>$db,
	-expect=>$e_val,
	-readmethod=>'SearchIO');
my $factory=Bio::Tools::Run::RemoteBlast->new(@params);

my $v = 1;

my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );

while (my $input = $str->next_seq()){
  #Blast a sequence against a database:
  #Alternatively, you could  pass in a file with many
  #sequences rather than loop through sequence one at a time
  #Remove the loop starting 'while (my $input = $str->next_seq())'
  #and swap the two lines below for an example of that.
  my $r = $factory->submit_blast($input);
  #my $r = $factory->submit_blast('amino.fa');
  print STDERR "waiting..." if( $v > 0 );
  while ( my @rids = $factory->each_rid ) {
    foreach my $rid ( @rids ) {
      my $rc = $factory->retrieve_blast($rid);
      if( !ref($rc) ) {
        if( $rc < 0 ) {
          $factory->remove_rid($rid);
        }
        print STDERR "." if ( $v > 0 );
        sleep 5;
      } else {
        my $result = $rc->next_result();
        #save the output
        my $filename = $result->query_name()."\.out";
        $factory->save_output($filename);
        $factory->remove_rid($rid);
        print "\nQuery Name: ", $result->query_name(), "\n";
        while ( my $hit = $result->next_hit ) {
          next unless ( $v > 0);
          print "\thit name is ", $hit->name, "\n";
          while( my $hsp = $hit->next_hsp ) {
            print "\t\tscore is ", $hsp->score, "\n";
          }
        }
      }
    }
  }
}


Do you think there might still be something in the NCBI output format?

Thank you,
Guojun




Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun



----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2


> Sorry, forgot to add that I didn't see the regex issue that you mentioned.
> It could be a perl-related issue.  Try the fixes I mentioned and see what
> happens.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > Sent: Tuesday, February 14, 2006 12:36 PM
> > To: 'gyang at plantbio.uga.edu'
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > > It's a good habit to always add single quotes around words.  The perl
> > interpreter may think a single bare word is a subroutine or perlfunc
> > called with no args so will try to find a subroutine named blastp().  My
> > debugger actually gives the error that the bare word blastp may conflict
> > with a future reserved word.  Like you said, 'use strict' will point that
> > out.
> > > > As for the regex, it should match all the blast programs at NCBI (blastp,
> > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> > else passes through.
> > > > So, if you are using the script below, there are several errors.  The bare
> > words for $prog and $db need quotes, and the flags for you @params array
> > don't have a dash before them.  I get this after adding quotes but before
> > adding the dashes to @params:
> > > > C:\Perl\Scripts>test_blast.pl
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG:
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > live/Bio/Root/Root.pm:328
> > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > live/Bio/Tools/Run/RemoteBlast.pm:256
> > STACK: C:\Perl\Scripts\test_blast.pl:15
> > -----------------------------------------------------------
> > > > The last line indicates a problem with this line:
> > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > > Changing the @params to this:
> > > > my @params=( -prog=>$prog,
> > 	-data=>$db,
> > 	-expect=>$e_val,
> > 	-readmethod=>'SearchIO');
> > > > fixes it, and I get output as expected.
> > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > Sent: Tuesday, February 14, 2006 11:48 AM
> > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >
> > > Hi, Chris,
> > > When I tried with the perldoc script, It did not work either. First it
> > > says $prog can not be bare word if I "use strict". I added quotes on the
> > > words, then it says the value for $prog does not match expression
> > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > script
> > > is shown below. Why is the expression "t?blast[pnx]"?
> > >
> > > #!/usr/bin/perl
> > >
> > > use Bio::SeqIO;
> > > use Bio::Seq;
> > > use Bio::Tools::Run::RemoteBlast;
> > > use Bio::SearchIO;
> > >
> > >
> > > my $prog=blastp;
> > > my $db=swissprot;
> > > my $e_val=1e-10;
> > > my @params=( prog=>$prog,
> > > 	data=>$db,
> > > 	expect=>$e_val,
> > > 	readmethod=>'SearchIO');
> > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >
> > > my $v = 1;
> > >
> > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > >
> > > while (my $input = $str->next_seq()){
> > >   #Blast a sequence against a database:
> > >   #Alternatively, you could  pass in a file with many
> > >   #sequences rather than loop through sequence one at a time
> > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >   #and swap the two lines below for an example of that.
> > >   my $r = $factory->submit_blast($input);
> > >   #my $r = $factory->submit_blast('amino.fa');
> > >   print STDERR "waiting..." if( $v > 0 );
> > >   while ( my @rids = $factory->each_rid ) {
> > >     foreach my $rid ( @rids ) {
> > >       my $rc = $factory->retrieve_blast($rid);
> > >       if( !ref($rc) ) {
> > >         if( $rc < 0 ) {
> > >           $factory->remove_rid($rid);
> > >         }
> > >         print STDERR "." if ( $v > 0 );
> > >         sleep 5;
> > >       } else {
> > >         my $result = $rc->next_result();
> > >         #save the output
> > >         my $filename = $result->query_name()."\.out";
> > >         $factory->save_output($filename);
> > >         $factory->remove_rid($rid);
> > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > >         while ( my $hit = $result->next_hit ) {
> > >           next unless ( $v > 0);
> > >           print "\thit name is ", $hit->name, "\n";
> > >           while( my $hsp = $hit->next_hsp ) {
> > >             print "\t\tscore is ", $hsp->score, "\n";
> > >           }
> > >         }
> > >       }
> > >     }
> > >   }
> > > }
> > >
> > > Thank you for your help!
> > >
> > >
> > > Guojun
> > > Department of Plant Biology
> > > University of Georgia
> > >
> > > ----- Original Message -----
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > To: gyang at plantbio.uga.edu
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >
> > >
> > > > Try two things:
> > > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > > wrong
> > > > with the logic in your subroutine:
> > > > > my $v = 1;
> > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > > > while (my $input = $str->next_seq()){
> > > >   #Blast a sequence against a database:
> > > >   #Alternatively, you could  pass in a file with many
> > > >   #sequences rather than loop through sequence one at a time
> > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >   #and swap the two lines below for an example of that.
> > > >   my $r = $factory->submit_blast($input);
> > > >   #my $r = $factory->submit_blast('amino.fa');
> > > >   print STDERR "waiting..." if( $v > 0 );
> > > >   while ( my @rids = $factory->each_rid ) {
> > > >     foreach my $rid ( @rids ) {
> > > >       my $rc = $factory->retrieve_blast($rid);
> > > >       if( !ref($rc) ) {
> > > >         if( $rc < 0 ) {
> > > >           $factory->remove_rid($rid);
> > > >         }
> > > >         print STDERR "." if ( $v > 0 );
> > > >         sleep 5;
> > > >       } else {
> > > >         my $result = $rc->next_result();
> > > >         #save the output
> > > >         my $filename = $result->query_name()."\.out";
> > > >         $factory->save_output($filename);
> > > >         $factory->remove_rid($rid);
> > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > >         while ( my $hit = $result->next_hit ) {
> > > >           next unless ( $v > 0);
> > > >           print "\thit name is ", $hit->name, "\n";
> > > >           while( my $hsp = $hit->next_hsp ) {
> > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > >           }
> > > >         }
> > > >       }
> > > >     }
> > > >   }
> > > > }
> > > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > really
> > > > shouldn't make that much of a difference, but I noticed that the CVS
> > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > > released; the Bugzilla version is based off CVS.
> > > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > > > > -----Original Message-----
> > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > > To: bioperl-l at lists.open-bio.org
> > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > Thanks, Chris,
> > > > > I installed version 1.5.1 and replaced the blast.pm file with the
> > one
> > > from
> > > > > your bug report. The running version is 1.5 when I use the command
> > you
> > > > > sent me. But when I tried the script, it doesn't change much. My
> > > > > remoteblast code (portion) is here:
> > > > > > > sub search {
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > > local
> > > > >
> > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > > 'no';
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > > 			      -id=>"query",
> > > > > 			      -desc=>"new seq");
> > > > > my $len=$query->length();
> > > > > @db=('nr','htgs','wgs');
> > > > > foreach my $db (@db) {
> > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> > > > > 						'-data' =>"$db",
> > > > >
> '-expect'=>"$E_value");
> > > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > > my @rids = $factory->each_rid();
> > > > > foreach my $rid ( @rids ) {
> > > > >     print STDERR "$rid\n";
> > > > > }
> > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > > print STDERR "waiting...";
> > > > > sleep 60;
> > > > > > > foreach my $rid ( @rids ) {
> > > > >     my $rc = $factory->retrieve_blast($rid);
> > > > >     while (!ref($rc) ) {
> > > > > 	if( $rc < 0 ) {
> > > > > # retrieve_blast returns -1 on error
> > > > > 	    $factory->remove_rid($rid);
> > > > > 	    print "Error!\n";
> > > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > > 	    die "Can't retrieve $rid";
> > > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > finished'
> > > > > 	    sleep 60;
> > > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > > 	}
> > > > >     }
> > > > >     if (ref($rc)) {
> > > > > 	print STDERR "Done.\n";
> > > > > 	 while( my $result = $rc->next_result) {
> > > > > 	    while( my $hit = $result->next_hit()) {
> > > > > 	    	$hit_name=$hit->name;
> > > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > > 		$name=$1;
> > > > > 		@left_plus_start=();
> > > > > 		@left_plus_end=();
> > > > > 		@left_minus_start=();
> > > > > 		@left_minus_end=();
> > > > > 		@right_plus_start=();
> > > > > 		@right_plus_end=();
> > > > > 		@right_minus_start=();
> > > > > 		@right_minus_end=();
> > > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > > ......
> > > > > > > It was working quite well before around October laster year, but
> > > it has
> > > > > stopped since then, When a submission is sent via a webpage, the cgi
> > > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > finally
> > > > > the expected email is received but without real results although it
> > > does
> > > > > contain something from other parts of the script. Apparently the
> > > search
> > > > > sub did not return anything (I know there is something should be
> > > > > returned.). Is it also possible the format of the NCBI output for
> > each
> > > > > result has changed?
> > > > > Thank you,
> > > > > Guojun
> > > > > > > > > Department of Plant Biology
> > > > > University of Georgia
> > > > > > > > > > > ----- Original Message -----
> > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > How do you know two versions are installed (i.e. how are
> > you
> > > checking
> > > > > the
> > > > > > version)?  Do you see have two complete bioperl distributions (in
> > > two
> > > > > > separate directories) or are you looking in modules?  Here's the
> > way
> > > to
> > > > > > check the version (from the FAQ):
> > > > > > > perl -MBio::Root::Version -e 'print
> > > $Bio::Root::Version::VERSION,"\n"'
> > > > > > > If you have two full bioperl distributions on your computer,
> > > normally
> > > > > only
> > > > > > one will be in use unless you have explicitly set the environment
> > > > > variable
> > > > > > PERL5LIB.  The PERL5LIB  directories will be searched first before
> > > your
> > > > > > normal perl directory list (@INC) is searched.  You MAY get some
> > > mixing
> > > > > > then, but only if perl can't find a particular module in the path
> > > > > designated
> > > > > > in PERL5LIB; then it will progress through the directories listed
> > in
> > > > > @INC.
> > > > > > This may happen if a module is unique to a particular release, but
> > > > > shouldn't
> > > > > > happen for the majority of modules, including RemoteBlast.  You
> > can
> > > > > check
> > > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > > differ
> > > > > > depending on your OS, perl build, etc.
> > > > > > > Regardless, if you follow the directions for installing bioperl
> > > for
> > > > > your
> > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > > unless
> > > > > you
> > > > > > explicitly change the installation directory when using 'perl
> > > > > Makefile.PL'),
> > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > install
> > > > > the
> > > > > > Bioperl distribution you downloaded over the old version in @INC.
> > > See
> > > > > this
> > > > > > page:
> > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > > for more details.
> > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > -----Original Message-----
> > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > Hi, Chris,
> > > > > > > I do have different versions of bioperl on my Linux machine
> > (1.4.
> > > and
> > > > > > > 1.5.0), this may be the problem. Should I just install bioperl-
> > > 1.5.1
> > > > > or I
> > > > > > > need to uninstall and remove the previous versions. I could not
> > > find
> > > > > any
> > > > > > > hint on uninstalling bioperl on linux. Could you please give me
> > > some
> > > > > > > suggestion?
> > > > > > > Thanks,
> > > > > > > Guojun
> > > > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > >       _____
> > > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely
> > > > > updated from CVS
> > > > > > > which isn't the latest fix.
> > > > > > > > > Make sure that you check the following:
> > > > > > > > > 1) Always post to the mailing list:
> > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > > (CVS)
> > > > > > > installed first.  Perform a clean installation; do not upgrade
> > > only
> > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > can't
> > > > > > > guarantee that mixing modules from old and new distributions
> > (1.4
> > > and
> > > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > > > > > installation will allow text output from BLAST v.2.2.12 to be
> > > saved
> > > > > and
> > > > > > > parsed; it will not parse the newest BLAST text output from NCBI
> > > > > (v2.2.13)
> > > > > > > but it should still save it. I believe as long as next_results()
> > > isn't
> > > > > > > called, it will work.
> > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> > 2.2.13
> > > > > text output
> > > > > > > are NOT in CVS; they haven't been cleared and checked in by
> > Roger
> > > Hall
> > > > > > > (who's now taking care of RemoteBlast) and the powers that be
> > > (Jason
> > > > > or
> > > > > > > whomever is in charge of Bio::SearchIO).  They can be found in
> > > > > Bugzilla:
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > > option
> > > > > of
> > > > > > > saving XML output, so isn't necessary if you don't plan on using
> > > this
> > > > > > > option.  And, remember, they haven't been committed yet to CVS,
> > > which
> > > > > > > means that the final version will change to refle the new
> > version.
> > > > > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > >     _____
> > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > > To: Chris Fields
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > > > Hi, Chris
> > > > > > > > > Thanks for your suggestion, however, it doesn't seem to work
> > > for
> > > > > my cgi
> > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > > even
> > > > > get
> > > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > > Guojun
> > > > > > > > > > > Guojun Yang
> > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > > Tel: 706-542-1857
> > > > > > > Fax: 706-542-1805
> > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > >     _____
> > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > I would say give the new code a try, but realize that it
> > > hasn't
> > > > > been
> > > > > > > checked
> > > > > > > in (like I said below). I will try going over the modified
> > > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > > anything I
> > > > > > > might
> > > > > > > have missed. The changed order in the header of BLAST text
> > output
> > > has
> > > > > me a
> > > > > > > bit worried that it might not catch everything, but it at least
> > > > > doesn't
> > > > > > > hang
> > > > > > > in the while() loop I described in the bug report below (bug
> > > #1934)
> > > > > and
> > > > > > > seems to process everything fine.
> > > > > > > > > If you want more stability in the code, you might consider
> > > > > changing over
> > > > > > > to
> > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are
> > > some
> > > > > > > changes
> > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > saving
> > > > > XML
> > > > > > > output, but I believe it parses everything regardless. If you
> > look
> > > > > back
> > > > > > > the
> > > > > > > last month or so there has been a bit of discussion here about
> > it.
> > > > > Jason
> > > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > > > remoteblast/
> > > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > > To: bioperl-l at bioperl.org
> > > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > > version
> > > > > 1.28
> > > > > > > >
> > > > > > > > Hi, Everybody,
> > > > > > > > I see this post and am wondering if this is the reason for the
> > > > > > > > malfunctionning of my webserver. We set up a webserver named
> > > MAK,
> > > > > for
> > > > > > > MITE
> > > > > > > > sequence analysis. It was working very well until around
> > > November
> > > > > 2005,
> > > > > > > > when it stopped returning any result (the site is fine and
> > seems
> > > to
> > > > > be
> > > > > > > > doing sth after submission). In the CGI script, I used
> > > remoteblast
> > > > > (that
> > > > > > > > work was done in 2003) to do searches. I currently do not have
> > > > > access to
> > > > > > > > the server because I moved. Quite several people sent emails
> > to
> > > us
> > > > > about
> > > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > > problem?
> > > > > > > Should
> > > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > > version?
> > > > > > > > Thanks a lot,
> > > > > > > > Guojun
> > > > > > > >
> > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > > Tel: 706-542-1857
> > > > > > > > Fax: 706-542-1805
> > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > > _____
> > > > > > > >
> > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > > Jian'
> > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > [mailto:bioperl-
> > > > > > > > l at bioperl.org]
> > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > >
> > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > > CVS.
> > > > > It
> > > > > > > > will
> > > > > > > > work for saving text output. However, it will not parse
> > anything
> > > > > using
> > > > > > > > next_result (it will likely hang) and will not save XML
> > format.
> > > See
> > > > > > > these
> > > > > > > > bugs:
> > > > > > > >
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > >
> > > > > > > > for explanations and possible fixes (changes to RemoteBlast
> > and
> > > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in
> > > yet
> > > > > so
> > > > > > > are
> > > > > > > > still not included in bioperl-live; they may be further
> > modified
> > > > > before
> > > > > > > > committing to CVS. If you're not worried about XML, you could
> > > just
> > > > > try
> > > > > > > the
> > > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > > >
> > > > > > > > Nagesh, I remember you posting to the list a month ago using a
> > > > > script
> > > > > > > > which
> > > > > > > > had problems; the script you used saves the output but doesn't
> > > > > actually
> > > > > > > > parse it (i.e. you don't use next_result() to go through the
> > > data).
> > > > > Is
> > > > > > > the
> > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > you
> > > > > tried
> > > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > > readmethod
> > > > > =>
> > > > > > > > blast"
> > > > > > > > using your version of RemoteBlast and method next_result()?
> > Like
> > > > > below
> > > > > > > > (from
> > > > > > > > perldoc):
> > > > > > > >
> > > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > > foreach my $rid ( @rids ) {
> > > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > > if( !ref($rc) ) {
> > > > > > > > if( $rc < 0 ) {
> > > > > > > > $factory->remove_rid($rid);
> > > > > > > > }
> > > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > > sleep 5;
> > > > > > > > } else { # parsing
> > > > > > > > starts here
> > > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > > here
> > > > > > > > #save the output
> > > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > > $factory->save_output($filename);
> > > > > > > > $factory->remove_rid($rid);
> > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > > next unless ( $v > 0);
> > > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > > My script hanged if I used next_result() in any way prior to
> > the
> > > > > fixes.
> > > > > > > I
> > > > > > > > want to see how many others are having the same issues with
> > > parsing
> > > > > > > using
> > > > > > > > the CVS version of bioperl-live.
> > > > > > > >
> > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > l-
> > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > >
> > > > > > > > > Hi Huang,
> > > > > > > > > Thanks for the message. The older version of RemoteBlast.pm
> > > works
> > > > > on
> > > > > > > the
> > > > > > > > > logic of checking the temporary file size to determine
> > whether
> > > the
> > > > > > > Blast
> > > > > > > > > results are ready. This condition is not getting satisfied
> > may
> > > be
> > > > > due
> > > > > > > to
> > > > > > > > > some changes brought about by NCBI. I had this problem
> > > recently
> > > > > and
> > > > > > > > > figured out that the solution was to use the latest version
> > > which
> > > > > has
> > > > > > > > > this problem fixed (does not use file size logic any more)
> > > which
> > > > > is
> > > > > > > not
> > > > > > > > > yet included in the BioPerl package.
> > > > > > > > > Cheers
> > > > > > > > > Nagesh
> > > > > > > > >
> > > > > > > > > Huang Jian wrote:
> > > > > > > > >
> > > > > > > > > > Dear Nagesh,
> > > > > > > > > >
> > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > > you
> > > > > send
> > > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > > >
> > > > > > > > > > Thank you!!
> > > > > > > > > >
> > > > > > > > > > Huang
> > > > > > > > > >
> > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > > 
> > > > > > > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > > > > > > 
> > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > net,
> > > so
> > > > > still
> > > > > > > > > > via email
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> Hi Huang,
> > > > > > > > > >> I see that you are submitting a sequence for a remote
> > blast
> > > > > search.
> > > > > > > > Can
> > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > > (2005/12/09).
> > > > > > > If
> > > > > > > > > >> not I have attached it with this email, try to replace it
> > > with
> > > > > the
> > > > > > > > old
> > > > > > > > > >> one which has a bug.
> > > > > > > > > >> Let me know if it works.
> > > > > > > > > >> Nagesh
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > 



From cjfields at uiuc.edu  Wed Feb 15 15:17:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 14:17:27 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on
	RemoteBlast.pmversion 1.28
In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
Message-ID: <000001c6326c$d72dd640$15327e82@pyrimidine>

This looks like a genuine bug and may be something that changed in BLASTN
text output; I'm getting it here, too.  Running verbose shows that text
output is returned, so, from that and from the stack trace it looks like
another error in text parsing in Bio::SearchIO::blast.  Bio::SearchIO::blast
line 1172 throws a conditional exception.  

I'm adding this to bug 1934 in bugzilla (reference to your email and this
response) for now.  I'll try messing around with it when I can; I'm really
busy this week.  I'll also forward this to Roger Hall.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Wednesday, February 15, 2006 1:40 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pmversion 1.28
> 
> Hi, Chris,
> Finally the remoteblast test script works for the amino.fa query. but when
> I try a nucleic acid sequence (see below), Error occurs:
> "
> waiting........
> ------------- EXCEPTION  -------------
> MSG: no data for midline  Features flanking this part of subject sequence:
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> hIO/blast.pm:1172
> STACK toplevel remoteblast_test:40
> "
> The query sequence is:
> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> 
> The script (basically same as the remoteblast test, I only changed
> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> #!/usr/bin/perl
> 
> use Bio::SeqIO;
> use Bio::Seq;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> use strict;
> my $prog='blastn';
> my $db='nr';
> my $e_val=1e-10;
> my @params=( -prog=>$prog,
> 	-data=>$db,
> 	-expect=>$e_val,
> 	-readmethod=>'SearchIO');
> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> 
> my $v = 1;
> 
> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> 
> while (my $input = $str->next_seq()){
>   #Blast a sequence against a database:
>   #Alternatively, you could  pass in a file with many
>   #sequences rather than loop through sequence one at a time
>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>   #and swap the two lines below for an example of that.
>   my $r = $factory->submit_blast($input);
>   #my $r = $factory->submit_blast('amino.fa');
>   print STDERR "waiting..." if( $v > 0 );
>   while ( my @rids = $factory->each_rid ) {
>     foreach my $rid ( @rids ) {
>       my $rc = $factory->retrieve_blast($rid);
>       if( !ref($rc) ) {
>         if( $rc < 0 ) {
>           $factory->remove_rid($rid);
>         }
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>       } else {
>         my $result = $rc->next_result();
>         #save the output
>         my $filename = $result->query_name()."\.out";
>         $factory->save_output($filename);
>         $factory->remove_rid($rid);
>         print "\nQuery Name: ", $result->query_name(), "\n";
>         while ( my $hit = $result->next_hit ) {
>           next unless ( $v > 0);
>           print "\thit name is ", $hit->name, "\n";
>           while( my $hsp = $hit->next_hsp ) {
>             print "\t\tscore is ", $hsp->score, "\n";
>           }
>         }
>       }
>     }
>   }
> }
> 
> 
> Do you think there might still be something in the NCBI output format?
> 
> Thank you,
> Guojun
> 
> 
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> 
> 
> 
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> 
> 
> > Sorry, forgot to add that I didn't see the regex issue that you
> mentioned.
> > It could be a perl-related issue.  Try the fixes I mentioned and see
> what
> > happens.
> > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > Sent: Tuesday, February 14, 2006 12:36 PM
> > > To: 'gyang at plantbio.uga.edu'
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > > > It's a good habit to always add single quotes around words.  The
> perl
> > > interpreter may think a single bare word is a subroutine or perlfunc
> > > called with no args so will try to find a subroutine named blastp().
> My
> > > debugger actually gives the error that the bare word blastp may
> conflict
> > > with a future reserved word.  Like you said, 'use strict' will point
> that
> > > out.
> > > > > As for the regex, it should match all the blast programs at NCBI
> (blastp,
> > > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> > > else passes through.
> > > > > So, if you are using the script below, there are several errors.
> The bare
> > > words for $prog and $db need quotes, and the flags for you @params
> array
> > > don't have a dash before them.  I get this after adding quotes but
> before
> > > adding the dashes to @params:
> > > > > C:\Perl\Scripts>test_blast.pl
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG:
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > > live/Bio/Root/Root.pm:328
> > > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > > live/Bio/Tools/Run/RemoteBlast.pm:256
> > > STACK: C:\Perl\Scripts\test_blast.pl:15
> > > -----------------------------------------------------------
> > > > > The last line indicates a problem with this line:
> > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > > > Changing the @params to this:
> > > > > my @params=( -prog=>$prog,
> > > 	-data=>$db,
> > > 	-expect=>$e_val,
> > > 	-readmethod=>'SearchIO');
> > > > > fixes it, and I get output as expected.
> > > > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > > > > > > -----Original Message-----
> > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > Sent: Tuesday, February 14, 2006 11:48 AM
> > > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >
> > > > Hi, Chris,
> > > > When I tried with the perldoc script, It did not work either. First
> it
> > > > says $prog can not be bare word if I "use strict". I added quotes on
> the
> > > > words, then it says the value for $prog does not match expression
> > > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > > script
> > > > is shown below. Why is the expression "t?blast[pnx]"?
> > > >
> > > > #!/usr/bin/perl
> > > >
> > > > use Bio::SeqIO;
> > > > use Bio::Seq;
> > > > use Bio::Tools::Run::RemoteBlast;
> > > > use Bio::SearchIO;
> > > >
> > > >
> > > > my $prog=blastp;
> > > > my $db=swissprot;
> > > > my $e_val=1e-10;
> > > > my @params=( prog=>$prog,
> > > > 	data=>$db,
> > > > 	expect=>$e_val,
> > > > 	readmethod=>'SearchIO');
> > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >
> > > > my $v = 1;
> > > >
> > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > >
> > > > while (my $input = $str->next_seq()){
> > > >   #Blast a sequence against a database:
> > > >   #Alternatively, you could  pass in a file with many
> > > >   #sequences rather than loop through sequence one at a time
> > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >   #and swap the two lines below for an example of that.
> > > >   my $r = $factory->submit_blast($input);
> > > >   #my $r = $factory->submit_blast('amino.fa');
> > > >   print STDERR "waiting..." if( $v > 0 );
> > > >   while ( my @rids = $factory->each_rid ) {
> > > >     foreach my $rid ( @rids ) {
> > > >       my $rc = $factory->retrieve_blast($rid);
> > > >       if( !ref($rc) ) {
> > > >         if( $rc < 0 ) {
> > > >           $factory->remove_rid($rid);
> > > >         }
> > > >         print STDERR "." if ( $v > 0 );
> > > >         sleep 5;
> > > >       } else {
> > > >         my $result = $rc->next_result();
> > > >         #save the output
> > > >         my $filename = $result->query_name()."\.out";
> > > >         $factory->save_output($filename);
> > > >         $factory->remove_rid($rid);
> > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > >         while ( my $hit = $result->next_hit ) {
> > > >           next unless ( $v > 0);
> > > >           print "\thit name is ", $hit->name, "\n";
> > > >           while( my $hsp = $hit->next_hsp ) {
> > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > >           }
> > > >         }
> > > >       }
> > > >     }
> > > >   }
> > > > }
> > > >
> > > > Thank you for your help!
> > > >
> > > >
> > > > Guojun
> > > > Department of Plant Biology
> > > > University of Georgia
> > > >
> > > > ----- Original Message -----
> > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > To: gyang at plantbio.uga.edu
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >
> > > >
> > > > > Try two things:
> > > > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's
> something
> > > > wrong
> > > > > with the logic in your subroutine:
> > > > > > my $v = 1;
> > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> );
> > > > > > while (my $input = $str->next_seq()){
> > > > >   #Blast a sequence against a database:
> > > > >   #Alternatively, you could  pass in a file with many
> > > > >   #sequences rather than loop through sequence one at a time
> > > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > > >   #and swap the two lines below for an example of that.
> > > > >   my $r = $factory->submit_blast($input);
> > > > >   #my $r = $factory->submit_blast('amino.fa');
> > > > >   print STDERR "waiting..." if( $v > 0 );
> > > > >   while ( my @rids = $factory->each_rid ) {
> > > > >     foreach my $rid ( @rids ) {
> > > > >       my $rc = $factory->retrieve_blast($rid);
> > > > >       if( !ref($rc) ) {
> > > > >         if( $rc < 0 ) {
> > > > >           $factory->remove_rid($rid);
> > > > >         }
> > > > >         print STDERR "." if ( $v > 0 );
> > > > >         sleep 5;
> > > > >       } else {
> > > > >         my $result = $rc->next_result();
> > > > >         #save the output
> > > > >         my $filename = $result->query_name()."\.out";
> > > > >         $factory->save_output($filename);
> > > > >         $factory->remove_rid($rid);
> > > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > > >         while ( my $hit = $result->next_hit ) {
> > > > >           next unless ( $v > 0);
> > > > >           print "\thit name is ", $hit->name, "\n";
> > > > >           while( my $hsp = $hit->next_hsp ) {
> > > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > > >           }
> > > > >         }
> > > > >       }
> > > > >     }
> > > > >   }
> > > > > }
> > > > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > > really
> > > > > shouldn't make that much of a difference, but I noticed that the
> CVS
> > > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1
> was
> > > > > released; the Bugzilla version is based off CVS.
> > > > > > Christopher Fields
> > > > > Postdoctoral Researcher - Switzer Lab
> > > > > Dept. of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > Thanks, Chris,
> > > > > > I installed version 1.5.1 and replaced the blast.pm file with
> the
> > > one
> > > > from
> > > > > > your bug report. The running version is 1.5 when I use the
> command
> > > you
> > > > > > sent me. But when I tried the script, it doesn't change much. My
> > > > > > remoteblast code (portion) is here:
> > > > > > > > sub search {
> > > > > > local
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > > > local
> $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > > > local
> > > > > >
> > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > > > 'no';
> > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > > > 			      -id=>"query",
> > > > > > 			      -desc=>"new seq");
> > > > > > my $len=$query->length();
> > > > > > @db=('nr','htgs','wgs');
> > > > > > foreach my $db (@db) {
> > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> =>'blastn',
> > > > > > 						'-data' =>"$db",
> > > > > >
> > '-expect'=>"$E_value");
> > > > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > > > my @rids = $factory->each_rid();
> > > > > > foreach my $rid ( @rids ) {
> > > > > >     print STDERR "$rid\n";
> > > > > > }
> > > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > > > print STDERR "waiting...";
> > > > > > sleep 60;
> > > > > > > > foreach my $rid ( @rids ) {
> > > > > >     my $rc = $factory->retrieve_blast($rid);
> > > > > >     while (!ref($rc) ) {
> > > > > > 	if( $rc < 0 ) {
> > > > > > # retrieve_blast returns -1 on error
> > > > > > 	    $factory->remove_rid($rid);
> > > > > > 	    print "Error!\n";
> > > > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > > > 	    die "Can't retrieve $rid";
> > > > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > > finished'
> > > > > > 	    sleep 60;
> > > > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > > > 	}
> > > > > >     }
> > > > > >     if (ref($rc)) {
> > > > > > 	print STDERR "Done.\n";
> > > > > > 	 while( my $result = $rc->next_result) {
> > > > > > 	    while( my $hit = $result->next_hit()) {
> > > > > > 	    	$hit_name=$hit->name;
> > > > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > > > 		$name=$1;
> > > > > > 		@left_plus_start=();
> > > > > > 		@left_plus_end=();
> > > > > > 		@left_minus_start=();
> > > > > > 		@left_minus_end=();
> > > > > > 		@right_plus_start=();
> > > > > > 		@right_plus_end=();
> > > > > > 		@right_minus_start=();
> > > > > > 		@right_minus_end=();
> > > > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i))
{
> > > > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > > > ......
> > > > > > > > It was working quite well before around October laster year,
> but
> > > > it has
> > > > > > stopped since then, When a submission is sent via a webpage, the
> cgi
> > > > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > > finally
> > > > > > the expected email is received but without real results although
> it
> > > > does
> > > > > > contain something from other parts of the script. Apparently the
> > > > search
> > > > > > sub did not return anything (I know there is something should be
> > > > > > returned.). Is it also possible the format of the NCBI output
> for
> > > each
> > > > > > result has changed?
> > > > > > Thank you,
> > > > > > Guojun
> > > > > > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > > > > > > > > ----- Original Message -----
> > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > > How do you know two versions are installed (i.e. how
> are
> > > you
> > > > checking
> > > > > > the
> > > > > > > version)?  Do you see have two complete bioperl distributions
> (in
> > > > two
> > > > > > > separate directories) or are you looking in modules?  Here's
> the
> > > way
> > > > to
> > > > > > > check the version (from the FAQ):
> > > > > > > > perl -MBio::Root::Version -e 'print
> > > > $Bio::Root::Version::VERSION,"\n"'
> > > > > > > > If you have two full bioperl distributions on your computer,
> > > > normally
> > > > > > only
> > > > > > > one will be in use unless you have explicitly set the
> environment
> > > > > > variable
> > > > > > > PERL5LIB.  The PERL5LIB  directories will be searched first
> before
> > > > your
> > > > > > > normal perl directory list (@INC) is searched.  You MAY get
> some
> > > > mixing
> > > > > > > then, but only if perl can't find a particular module in the
> path
> > > > > > designated
> > > > > > > in PERL5LIB; then it will progress through the directories
> listed
> > > in
> > > > > > @INC.
> > > > > > > This may happen if a module is unique to a particular release,
> but
> > > > > > shouldn't
> > > > > > > happen for the majority of modules, including RemoteBlast.
> You
> > > can
> > > > > > check
> > > > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC
> will
> > > > differ
> > > > > > > depending on your OS, perl build, etc.
> > > > > > > > Regardless, if you follow the directions for installing
> bioperl
> > > > for
> > > > > > your
> > > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make
> install',
> > > > unless
> > > > > > you
> > > > > > > explicitly change the installation directory when using 'perl
> > > > > > Makefile.PL'),
> > > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > > install
> > > > > > the
> > > > > > > Bioperl distribution you downloaded over the old version in
> @INC.
> > > > See
> > > > > > this
> > > > > > > page:
> > > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > > > for more details.
> > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > Hi, Chris,
> > > > > > > > I do have different versions of bioperl on my Linux machine
> > > (1.4.
> > > > and
> > > > > > > > 1.5.0), this may be the problem. Should I just install
> bioperl-
> > > > 1.5.1
> > > > > > or I
> > > > > > > > need to uninstall and remove the previous versions. I could
> not
> > > > find
> > > > > > any
> > > > > > > > hint on uninstalling bioperl on linux. Could you please give
> me
> > > > some
> > > > > > > > suggestion?
> > > > > > > > Thanks,
> > > > > > > > Guojun
> > > > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > >       _____
> > > > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've
> likely
> > > > > > updated from CVS
> > > > > > > > which isn't the latest fix.
> > > > > > > > > > Make sure that you check the following:
> > > > > > > > > > 1) Always post to the mailing list:
> > > > > > > >
> http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-
> live
> > > > (CVS)
> > > > > > > > installed first.  Perform a clean installation; do not
> upgrade
> > > > only
> > > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > > can't
> > > > > > > > guarantee that mixing modules from old and new distributions
> > > (1.4
> > > > and
> > > > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-
> live
> > > > > > > > installation will allow text output from BLAST v.2.2.12 to
> be
> > > > saved
> > > > > > and
> > > > > > > > parsed; it will not parse the newest BLAST text output from
> NCBI
> > > > > > (v2.2.13)
> > > > > > > > but it should still save it. I believe as long as
> next_results()
> > > > isn't
> > > > > > > > called, it will work.
> > > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> > > 2.2.13
> > > > > > text output
> > > > > > > > are NOT in CVS; they haven't been cleared and checked in by
> > > Roger
> > > > Hall
> > > > > > > > (who's now taking care of RemoteBlast) and the powers that
> be
> > > > (Jason
> > > > > > or
> > > > > > > > whomever is in charge of Bio::SearchIO).  They can be found
> in
> > > > > > Bugzilla:
> > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow
> the
> > > > option
> > > > > > of
> > > > > > > > saving XML output, so isn't necessary if you don't plan on
> using
> > > > this
> > > > > > > > option.  And, remember, they haven't been committed yet to
> CVS,
> > > > which
> > > > > > > > means that the final version will change to refle the new
> > > version.
> > > > > > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > > >     _____
> > > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > > > To: Chris Fields
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > > > Hi, Chris
> > > > > > > > > > Thanks for your suggestion, however, it doesn't seem to
> work
> > > > for
> > > > > > my cgi
> > > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I
> didn't
> > > > even
> > > > > > get
> > > > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > > > Guojun
> > > > > > > > > > > > Guojun Yang
> > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > > Tel: 706-542-1857
> > > > > > > > Fax: 706-542-1805
> > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > >     _____
> > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > I would say give the new code a try, but realize that it
> > > > hasn't
> > > > > > been
> > > > > > > > checked
> > > > > > > > in (like I said below). I will try going over the modified
> > > > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > > > anything I
> > > > > > > > might
> > > > > > > > have missed. The changed order in the header of BLAST text
> > > output
> > > > has
> > > > > > me a
> > > > > > > > bit worried that it might not catch everything, but it at
> least
> > > > > > doesn't
> > > > > > > > hang
> > > > > > > > in the while() loop I described in the bug report below (bug
> > > > #1934)
> > > > > > and
> > > > > > > > seems to process everything fine.
> > > > > > > > > > If you want more stability in the code, you might
> consider
> > > > > > changing over
> > > > > > > > to
> > > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There
> are
> > > > some
> > > > > > > > changes
> > > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > > saving
> > > > > > XML
> > > > > > > > output, but I believe it parses everything regardless. If
> you
> > > look
> > > > > > back
> > > > > > > > the
> > > > > > > > last month or so there has been a bit of discussion here
> about
> > > it.
> > > > > > Jason
> > > > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-
> using-
> > > > > > remoteblast/
> > > > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > > > To: bioperl-l at bioperl.org
> > > > > > > > > Subject: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > >
> > > > > > > > > Hi, Everybody,
> > > > > > > > > I see this post and am wondering if this is the reason for
> the
> > > > > > > > > malfunctionning of my webserver. We set up a webserver
> named
> > > > MAK,
> > > > > > for
> > > > > > > > MITE
> > > > > > > > > sequence analysis. It was working very well until around
> > > > November
> > > > > > 2005,
> > > > > > > > > when it stopped returning any result (the site is fine and
> > > seems
> > > > to
> > > > > > be
> > > > > > > > > doing sth after submission). In the CGI script, I used
> > > > remoteblast
> > > > > > (that
> > > > > > > > > work was done in 2003) to do searches. I currently do not
> have
> > > > > > access to
> > > > > > > > > the server because I moved. Quite several people sent
> emails
> > > to
> > > > us
> > > > > > about
> > > > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > > > problem?
> > > > > > > > Should
> > > > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > > > version?
> > > > > > > > > Thanks a lot,
> > > > > > > > > Guojun
> > > > > > > > >
> > > > > > > > > Department of Plant Biology
> > > > > > > > > University of Georgia
> > > > > > > > > Tel: 706-542-1857
> > > > > > > > > Fax: 706-542-1805
> > > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > > > _____
> > > > > > > > >
> > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au],
> 'Huang
> > > > Jian'
> > > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > > [mailto:bioperl-
> > > > > > > > > l at bioperl.org]
> > > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > >
> > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-
> live
> > > > CVS.
> > > > > > It
> > > > > > > > > will
> > > > > > > > > work for saving text output. However, it will not parse
> > > anything
> > > > > > using
> > > > > > > > > next_result (it will likely hang) and will not save XML
> > > format.
> > > > See
> > > > > > > > these
> > > > > > > > > bugs:
> > > > > > > > >
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > >
> > > > > > > > > for explanations and possible fixes (changes to
> RemoteBlast
> > > and
> > > > > > > > > Bio::SearchIO::blast). Note that these haven't been
> checked in
> > > > yet
> > > > > > so
> > > > > > > > are
> > > > > > > > > still not included in bioperl-live; they may be further
> > > modified
> > > > > > before
> > > > > > > > > committing to CVS. If you're not worried about XML, you
> could
> > > > just
> > > > > > try
> > > > > > > > the
> > > > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > > > >
> > > > > > > > > Nagesh, I remember you posting to the list a month ago
> using a
> > > > > > script
> > > > > > > > > which
> > > > > > > > > had problems; the script you used saves the output but
> doesn't
> > > > > > actually
> > > > > > > > > parse it (i.e. you don't use next_result() to go through
> the
> > > > data).
> > > > > > Is
> > > > > > > > the
> > > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13?
> Have
> > > you
> > > > > > tried
> > > > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > > > readmethod
> > > > > > =>
> > > > > > > > > blast"
> > > > > > > > > using your version of RemoteBlast and method
> next_result()?
> > > Like
> > > > > > below
> > > > > > > > > (from
> > > > > > > > > perldoc):
> > > > > > > > >
> > > > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > > > foreach my $rid ( @rids ) {
> > > > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > > > if( !ref($rc) ) {
> > > > > > > > > if( $rc < 0 ) {
> > > > > > > > > $factory->remove_rid($rid);
> > > > > > > > > }
> > > > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > > > sleep 5;
> > > > > > > > > } else { # parsing
> > > > > > > > > starts here
> > > > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > > > here
> > > > > > > > > #save the output
> > > > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > > > $factory->save_output($filename);
> > > > > > > > > $factory->remove_rid($rid);
> > > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > > > next unless ( $v > 0);
> > > > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > My script hanged if I used next_result() in any way prior
> to
> > > the
> > > > > > fixes.
> > > > > > > > I
> > > > > > > > > want to see how many others are having the same issues
> with
> > > > parsing
> > > > > > > > using
> > > > > > > > > the CVS version of bioperl-live.
> > > > > > > > >
> > > > > > > > > Christopher Fields
> > > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > > Dept. of Biochemistry
> > > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-
> > > l-
> > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > > >
> > > > > > > > > > Hi Huang,
> > > > > > > > > > Thanks for the message. The older version of
> RemoteBlast.pm
> > > > works
> > > > > > on
> > > > > > > > the
> > > > > > > > > > logic of checking the temporary file size to determine
> > > whether
> > > > the
> > > > > > > > Blast
> > > > > > > > > > results are ready. This condition is not getting
> satisfied
> > > may
> > > > be
> > > > > > due
> > > > > > > > to
> > > > > > > > > > some changes brought about by NCBI. I had this problem
> > > > recently
> > > > > > and
> > > > > > > > > > figured out that the solution was to use the latest
> version
> > > > which
> > > > > > has
> > > > > > > > > > this problem fixed (does not use file size logic any
> more)
> > > > which
> > > > > > is
> > > > > > > > not
> > > > > > > > > > yet included in the BioPerl package.
> > > > > > > > > > Cheers
> > > > > > > > > > Nagesh
> > > > > > > > > >
> > > > > > > > > > Huang Jian wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Nagesh,
> > > > > > > > > > >
> > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v
> 1.28
> > > > you
> > > > > > send
> > > > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > > > >
> > > > > > > > > > > Thank you!!
> > > > > > > > > > >
> > > > > > > > > > > Huang
> > > > > > > > > > >
> > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > > > 
> > > > > > > > > > > To: "Huang Jian" ;
> "bioperl-l"
> > > > > > > > > > > 
> > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > > net,
> > > > so
> > > > > > still
> > > > > > > > > > > via email
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >> Hi Huang,
> > > > > > > > > > >> I see that you are submitting a sequence for a remote
> > > blast
> > > > > > search.
> > > > > > > > > Can
> > > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > > > (2005/12/09).
> > > > > > > > If
> > > > > > > > > > >> not I have attached it with this email, try to
> replace it
> > > > with
> > > > > > the
> > > > > > > > > old
> > > > > > > > > > >> one which has a bug.
> > > > > > > > > > >> Let me know if it works.
> > > > > > > > > > >> Nagesh
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Bioperl-l mailing list
> > > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > Bioperl-l mailing list
> > > > > > Bioperl-l at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > >
> > >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From sdavis2 at mail.nih.gov  Wed Feb 15 19:39:33 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 16 Feb 2006 00:39:33 -0000
Subject: [Bioperl-l] error running load_seqdatabase.pl
References: 
Message-ID: <000c01c63291$5de08600$6601a8c0@WATSON>


----- Original Message ----- 
From: "Angshu Kar" 
To: "bioperl-l" 
Sent: Thursday, December 29, 2005 5:50 PM
Subject: [Bioperl-l] error running load_seqdatabase.pl


> Hi,
>
> I'm getting the following error while trying to run :
>
> ./load_seqdatabase.pl -host localhost -dbname USBA -dbuser 
> postgres -format
> genbank NC_003076.gbk
>
> But I've a postgreSQL db and not a MySQL one...could anyone please guide 
> me
> troubleshoot this?

Angshu,

I would probably start with:

perldoc load_seqdatabase.pl

I think that will likely give you your answer.  Again, it is best to exhaust 
the resources at hand and to let the list know that you have done so 
(like--"I read the perldoc and tried this....").

Sean




From cain at cshl.edu  Wed Feb 15 11:07:28 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 15 Feb 2006 11:07:28 -0500
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
In-Reply-To: <43F35043.7070705@cornell.edu>
References: <43F35043.7070705@cornell.edu>
Message-ID: <1140019648.2849.58.camel@localhost.localdomain>

Hi Robert,

No column should ever be padded with spaces; GFF columns should always
be separated by a single tab.  Therefore, I don't thing Bio::Tools::GFF
is at fault here.

Scott


On Wed, 2006-02-15 at 11:01 -0500, Robert Buels wrote:
> Hi all,
> 
> I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using 
> FeatureIO, except it purports not to support gff 2), and the file looks 
> like:
> 
> ##gff-version 2
> ##date 2006-02-13
> ##sequence-region C01HBa0088L02.seq 1 120525
> C01HBa0088L02   RepeatMasker    similarity      3537    4267     3.3    
> -       .       Target "Motif:bac_end_repeat_family_345" 1 740
> C01HBa0088L02   RepeatMasker    similarity      4172    4279     2.9    
> +       .       Target "Motif:HRSiTERT00100141" 1 104
> C01HBa0088L02   RepeatMasker    similarity      4267    4323     0.0    
> -       .       Target "Motif:k_29" 150 206
> C01HBa0088L02   RepeatMasker    similarity      4322    4492    26.6    
> +       .       Target "Motif:PRSiTERT00300001" 1960 2129
> C01HBa0088L02   RepeatMasker    similarity      4557    5124    29.5    
> +       .       Target "Motif:PRSiTERT00300001" 2142 2711
> 
> Notice the score column is padded with spaces.
> 
> Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid 
> score.  My question is, who is wrong here, my input file or 
> Bio::Tools::GFF?  Should Bio::Tools::GFF be able to read this file?
> 
> Rob
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hlapp at gmx.net  Wed Feb 15 20:54:01 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 17:54:01 -0800
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: <001201c631a5$ce7496f0$15327e82@pyrimidine>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
Message-ID: 


On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:

> Hilmar,
>
> Good News: I've added a section to the bioperl wiki on installing  
> bioperl-db
> in Windows:
>
> http://www.bioperl.org/wiki/ 
> Installing_Bioperl_on_Windows#Installing_bioperl
> -db
>
> Bad News:  There's a new problem now. I updated from CVS yesterday; I  
> walked
> through the steps and ran 'nmake test', with everything passing fine.
> However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> every 5 minutes or so.  I noticed (when using '-debug') that it is  
> hanging
> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a  
> database,
> load the biosql schema, and load sequences w/o loading taxonomy, the  
> problem
> goes away.
>
> Here's the debugging output (I cut it off at the point it hangs up):
> [...]

> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND  
> ncbi_taxon_id =
> ?
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)

I'm a bit surprised if this is the query where it hangs. Are the  
indexes all there? There should be a primary key index on  
taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name  
over (taxon_id,name,name_class). Also, there should be separate indexes  
on taxon_name.taxon_id and taxon_name.name. Are they all there? If you  
reinstantiated the schema from the DDL then it seems unlikely that  
somehow the indexes have vanished except if you messed with the schema  
or the DDL.

Putting an index on taxon_name.name_class really can't make sense, so  
let's assume it can't be that.

So really I suspect this has something to do with the state of the  
database and the version of MySQL. In particular, from some 4.x version  
of MySQL under certain circumstances you have to analyze the statistics  
of the tables in order to get the optimizer pick up the indexes  
properly. Are you on MySQL 4.x and if so, have you done that?

There's the ANALYZE TABLE command:
http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html

Note the comment: "This statement works with MyISAM, BDB, and (as of  
MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?

Also, you can check the execution plan for the query using EXPLAIN.
http://dev.mysql.com/doc/refman/4.1/en/explain.html

This should show you whether the index would be picked up for the query  
or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to  
the db using the mysql shell (mysql).

I believe something similarly strange was encountered by someone using  
DB::GFF (or Chado) under MySQL, and if I recall correctly the solution  
was to optimize (analyze) the tables. Maybe someone who was in that  
thread reads this and can comment?

	-hilmar


>
> ----------------------------------------------------------------------- 
> -----
> -------------------------
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb 15 22:56:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 21:56:14 -0600
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: 
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	
Message-ID: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>



On Feb 15, 2006, at 7:54 PM, Hilmar Lapp wrote:

>
> On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:
>
>> Hilmar,
>>
>> Good News: I've added a section to the bioperl wiki on installing
>> bioperl-db
>> in Windows:
>>
>> http://www.bioperl.org/wiki/
>> Installing_Bioperl_on_Windows#Installing_bioperl
>> -db
>>
>> Bad News:  There's a new problem now. I updated from CVS yesterday; I
>> walked
>> through the steps and ran 'nmake test', with everything passing fine.
>> However, load_seqdatabase.pl is extremely slow; it's loading a  
>> sequence
>> every 5 minutes or so.  I noticed (when using '-debug') that it is
>> hanging
>> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
>> database,
>> load the biosql schema, and load sequences w/o loading taxonomy, the
>> problem
>> goes away.
>>
>> Here's the debugging output (I cut it off at the point it hangs up):
>> [...]
>
>> preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
>> NULL,
>> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name  
>> WHERE
>> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
>> ncbi_taxon_id =
>> ?
>> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
>> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
>
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on  
> taxon_name
> over (taxon_id,name,name_class). Also, there should be separate  
> indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

I looked in the mailing list archives and Barry mentions something here:

http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html

He rebuilt the database from scratch and got it working; no reason  
was given.  I wouldn't be surprised if it is something Mysql-related  
that pops up.  The strange thing is that only a few months ago  
everything ran well with this version of MySQL (v.5); this was with  
the first test database I installed on it.  Another strange thing (I  
think I mentioned it) is that NOT loading the taxonomy with  
load_ncbi_taxonomy.pl worked (everything was entered).  I'll try  
rebuilding the database from scratch to see what happens.  I am  
running this on Windows, so this is new territory...

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
>
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x  
> version
> of MySQL under certain circumstances you have to analyze the  
> statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
>
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
>
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
>
> This should show you whether the index would be picked up for the  
> query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).

I'll give these a shot and post what I find in the next few days.

> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?
>
> 	-hilmar

I wanted to also mention that we shouldn't check in the modifications  
to Bio::Root:Root until I confirm something (I'm at home and  
currently can't).  I tried running a script on an unrelated module  
using the modified Bio::Root::Roo (with the commas added after the  
'throw $class' statements.  Everything worked for $self->throw(),  
except the thrown message wasn't displayed.  I'll dig into it a bit  
more to see what happens.

>
>
>>
>> --------------------------------------------------------------------- 
>> --
>> -----
>> -------------------------
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From osborne1 at optonline.net  Thu Feb 16 00:16:04 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 00:16:04 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

Harry,

It's not clear to me that NCBI's eutils offers this capability directly. You
can probably download Entrez Gene entries and parse them for coordinates but
I know of no way to remotely retrieve genomic sequences like this from NCBI
(ENSEMBL API perhaps?). What I had in mind uses the local approach that some
of us favor and to prove to myself that this is simple to do I wrote a
script that I just added to examples/tools, it's called extract_genes.pl and
it's based on Bio::DB::Fasta. Download the sequence files for a given
species to some dir, download Entrez Gene's gene2accession file, and run. It
creates and stores a hash for lookups, it won't read gene2accession each
time it runs.

Brian O.


On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From hlapp at gmx.net  Thu Feb 16 01:31:54 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 22:31:54 -0800
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	
	<12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
Message-ID: 


On Feb 15, 2006, at 7:56 PM, Chris Fields wrote:

> [...]
> I looked in the mailing list archives and Barry mentions something 
> here:
>
> http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html
>
> He rebuilt the database from scratch and got it working; no reason
> was given.  I wouldn't be surprised if it is something Mysql-related
> that pops up.

Note though that he was using PostgreSQL. With Pg you definitely need 
to 'vacuum,' which is their name for analyzing/optimizing the table(s).

>   The strange thing is that only a few months ago
> everything ran well with this version of MySQL (v.5); this was with
> the first test database I installed on it.  Another strange thing (I
> think I mentioned it) is that NOT loading the taxonomy with
> load_ncbi_taxonomy.pl worked (everything was entered).

That's not really strange, it is in fact consistent with the query you 
report as taking a long time. If you don't pre-load the taxonomy then 
the taxon and taxon_name tables are empty or almost empty and look-ups 
and joins of empty tables are amazingly fast :-J

[...]
> I wanted to also mention that we shouldn't check in the modifications
> to Bio::Root:Root until I confirm something (I'm at home and
> currently can't).

OK we'll hold off.

	-hilmar
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From michael.watson at bbsrc.ac.uk  Thu Feb 16 05:31:54 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 16 Feb 2006 10:31:54 -0000
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

I have two questions really.  I fetched bacterial genome sequences from
the NCBI using Bio::DB::GenBank.

Some of these sequence entries are CONTIG sequences, ie they just point
to other sequences that need to be joined together to form the entire
genome.

Looking at my downloads, it looks as if bioperl has done all the
necessary joining for me - or maybe it was the NCBI that did the
joining?

OK, so firstly, did bioperl do the joining, and if so, are all the
co-ordinates of the features updated to reflect their new location on
the new, joined sequence?

And secondly, sequence versions... I'm thinking that possibly the
sequence version of the CONTIG may be 1 (as it hasn't changed) yet the
versions of the sequences it refers to might have changed, so when I ask
bioperl if these sequences have been updated, I will be told no because
the CONTIG sequence version is 1, but I should be told yes because the
underlying sequences have...?

Make sense?

Thanks
Mick



From cjfields at uiuc.edu  Thu Feb 16 07:51:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 06:51:50 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
References: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
	<43F449E1.80605@esat.kuleuven.be>
Message-ID: <369C1D1F-DBCB-4161-A24A-7C3E579D337A@uiuc.edu>

Yeah, looks like it broke text output nucleotide parsing with that.   
XML output parsing still works though (as expected).  I'll give it a  
look.

Chris

On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote:

> Hi,
>
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast- 
> output. (see e.g. "Features flanking this part..." or "Features in  
> this part ..."), example added.
> The blast.pm module starts looking for the hsp-alignement- 
> information, but it dies when it hits this Feature-information.
>
> Pieter
>
>
>> gi|77552765|gb|DP000011.1| > query.fcgi? 
>> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza  
>> sativa (japonica cultivar-group) chromosome 12, complete
>
> sequence
> Length=27492551
>
> Features flanking this part of subject sequence:
>   3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm  
> sub-class  val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts>
>   2655 bp at 3' side: hypothetical protein  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts>
>
> Score = 36.2 bits (18),  Expect = 0.22
> Identities = 18/18 (100%), Gaps = 0/18 (0%)
> Strand=Plus/Minus
>
> Query  4         GTACTACTCTACTCTACT  21
>                 ||||||||||||||||||
>
> Sbjct  19257436  GTACTACTCTACTCTACT  19257419
>
>
> Features flanking this part of subject sequence:
>   2991 bp at 5' side: hypothetical protein  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts>
>   1131 bp at 3' side: hypothetical protein
>  val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts>
>
> Score = 36.2 bits (18),  Expect = 0.22
> Identities = 18/18 (100%), Gaps = 0/18 (0%)
> Strand=Plus/Minus
>
> Query  2         ATGTACTACTCTACTCTA  19
>                 ||||||||||||||||||
> Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
>
>
>
> Features in this part of subject sequence:
>   DHHC zinc finger domain, putative
>  val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts>
>
> Score = 34.2 bits (17),  Expect = 0.87
> Identities = 17/17 (100%), Gaps = 0/17 (0%)
> Strand=Plus/Plus
>
> Query  5         TACTACTCTACTCTACT  21
>                 |||||||||||||||||
> Sbjct  17616437  TACTACTCTACTCTACT  17616453
>
>
>
> Features flanking this part of subject sequence:
>   102 bp at 5' side: bZIP transcription factor, putative
>  val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts>
>   3740 bp at 3' side: yeast dcp1, putative  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts>
>
> Score = 32.2 bits (16),  Expect = 3.4
> Identities = 16/16 (100%), Gaps = 0/16 (0%)
> Strand=Plus/Plus
>
> Query  7        CTACTCTACTCTACTC  22
>                ||||||||||||||||
> Sbjct  2775880  CTACTCTACTCTACTC  2775895
>
>
> Features flanking this part of subject sequence:
>
>   21 bp at 5' side: peptide transporter T17F3.11, putative  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts>
>   10230 bp at 3' side: transposon protein, putative, unclassified  
>  val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts>
>
> Score = 32.2 bits (16),  Expect = 3.4
> Identities = 16/16 (100%), Gaps = 0/16 (0%)
> Strand=Plus/Minus
>
> Query  7         CTACTCTACTCTACTC  22
>
>                 ||||||||||||||||
> Sbjct  27323153  CTACTCTACTCTACTC  27323138
>
>
>
>
> Guojun Yang wrote:
>
>> Hi, Chris,
>> Finally the remoteblast test script works for the amino.fa query.  
>> but when I try a nucleic acid sequence (see below), Error occurs: "
>> waiting........
>> ------------- EXCEPTION  -------------
>> MSG: no data for midline  Features flanking this part of subject  
>> sequence:
>> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/ 
>> 5.8.3/Bio/Searc                             hIO/blast.pm:1172
>> STACK toplevel remoteblast_test:40
>> "
>> The query sequence is:
>> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
>> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
>> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
>> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
>>
>> The script (basically same as the remoteblast test, I only changed  
>> database to 'nr' and program to 'blastn' and filename to 'ost3'):
>> #!/usr/bin/perl
>>
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use Bio::Tools::Run::RemoteBlast;
>> use Bio::SearchIO;
>> use strict;
>> my $prog='blastn';
>> my $db='nr';
>> my $e_val=1e-10;
>> my @params=( -prog=>$prog,
>> 	-data=>$db,
>> 	-expect=>$e_val,
>> 	-readmethod=>'SearchIO');
>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>
>> my $v = 1;
>>
>> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
>>
>> while (my $input = $str->next_seq()){
>>  #Blast a sequence against a database:
>>  #Alternatively, you could  pass in a file with many
>>  #sequences rather than loop through sequence one at a time
>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>  #and swap the two lines below for an example of that.
>>  my $r = $factory->submit_blast($input);
>>  #my $r = $factory->submit_blast('amino.fa');
>>  print STDERR "waiting..." if( $v > 0 );
>>  while ( my @rids = $factory->each_rid ) {
>>    foreach my $rid ( @rids ) {
>>      my $rc = $factory->retrieve_blast($rid);
>>      if( !ref($rc) ) {
>>        if( $rc < 0 ) {
>>          $factory->remove_rid($rid);
>>        }
>>        print STDERR "." if ( $v > 0 );
>>        sleep 5;
>>      } else {
>>        my $result = $rc->next_result();
>>        #save the output
>>        my $filename = $result->query_name()."\.out";
>>        $factory->save_output($filename);
>>        $factory->remove_rid($rid);
>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>        while ( my $hit = $result->next_hit ) {
>>          next unless ( $v > 0);
>>          print "\thit name is ", $hit->name, "\n";
>>          while( my $hsp = $hit->next_hsp ) {
>>            print "\t\tscore is ", $hsp->score, "\n";
>>          }
>>        }
>>      }
>>    }
>>  }
>> }
>>
>>
>> Do you think there might still be something in the NCBI output  
>> format?
>>
>> Thank you,
>> Guojun
>>
>>
>>
>>
>> Guojun Yang
>> Department of Plant Biology
>> University of Georgia
>> Tel: 706-542-1857
>> Fax: 706-542-1805
>> http://www.arches.uga.edu/~guojun
>>
>>
>>
>> ----- Original Message -----
>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>
>>
>>
>>> Sorry, forgot to add that I didn't see the regex issue that you  
>>> mentioned.
>>> It could be a perl-related issue.  Try the fixes I mentioned and  
>>> see what
>>> happens.
>>>
>>>> Christopher Fields
>>>>
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>>>> -----Original Message-----
>>>>>>
>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>> Sent: Tuesday, February 14, 2006 12:36 PM
>>>> To: 'gyang at plantbio.uga.edu'
>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>
>>>>>> It's a good habit to always add single quotes around words.   
>>>>>> The perl
>>>>>>
>>>> interpreter may think a single bare word is a subroutine or  
>>>> perlfunc
>>>> called with no args so will try to find a subroutine named blastp 
>>>> ().  My
>>>> debugger actually gives the error that the bare word blastp may  
>>>> conflict
>>>> with a future reserved word.  Like you said, 'use strict' will  
>>>> point that
>>>> out.
>>>>
>>>>>> As for the regex, it should match all the blast programs at  
>>>>>> NCBI (blastp,
>>>>>>
>>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure  
>>>> nothing
>>>> else passes through.
>>>>
>>>>>> So, if you are using the script below, there are several  
>>>>>> errors.  The bare
>>>>>>
>>>> words for $prog and $db need quotes, and the flags for you  
>>>> @params array
>>>> don't have a dash before them.  I get this after adding quotes  
>>>> but before
>>>> adding the dashes to @params:
>>>>
>>>>>> C:\Perl\Scripts>test_blast.pl
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>
>>>> MSG:
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
>>>> live/Bio/Root/Root.pm:328
>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
>>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl 
>>>> \bioperl-
>>>> live/Bio/Tools/Run/RemoteBlast.pm:256
>>>> STACK: C:\Perl\Scripts\test_blast.pl:15
>>>> -----------------------------------------------------------
>>>>
>>>>>> The last line indicates a problem with this line:
>>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>> Changing the @params to this:
>>>>>> my @params=( -prog=>$prog,
>>>>>>
>>>> 	-data=>$db,
>>>> 	-expect=>$e_val,
>>>> 	-readmethod=>'SearchIO');
>>>>
>>>>>> fixes it, and I get output as expected.
>>>>>> Christopher Fields
>>>>>>
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>>
>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>> Sent: Tuesday, February 14, 2006 11:48 AM
>>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>>
>>>>> Hi, Chris,
>>>>> When I tried with the perldoc script, It did not work either.  
>>>>> First it
>>>>> says $prog can not be bare word if I "use strict". I added  
>>>>> quotes on the
>>>>> words, then it says the value for $prog does not match expression
>>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
>>>>>
>>>> script
>>>>
>>>>> is shown below. Why is the expression "t?blast[pnx]"?
>>>>>
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use Bio::SeqIO;
>>>>> use Bio::Seq;
>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>> use Bio::SearchIO;
>>>>>
>>>>>
>>>>> my $prog=blastp;
>>>>> my $db=swissprot;
>>>>> my $e_val=1e-10;
>>>>> my @params=( prog=>$prog,
>>>>> 	data=>$db,
>>>>> 	expect=>$e_val,
>>>>> 	readmethod=>'SearchIO');
>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>
>>>>> my $v = 1;
>>>>>
>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  
>>>>> 'fasta' );
>>>>>
>>>>> while (my $input = $str->next_seq()){
>>>>>  #Blast a sequence against a database:
>>>>>  #Alternatively, you could  pass in a file with many
>>>>>  #sequences rather than loop through sequence one at a time
>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>  #and swap the two lines below for an example of that.
>>>>>  my $r = $factory->submit_blast($input);
>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>    foreach my $rid ( @rids ) {
>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>      if( !ref($rc) ) {
>>>>>        if( $rc < 0 ) {
>>>>>          $factory->remove_rid($rid);
>>>>>        }
>>>>>        print STDERR "." if ( $v > 0 );
>>>>>        sleep 5;
>>>>>      } else {
>>>>>        my $result = $rc->next_result();
>>>>>        #save the output
>>>>>        my $filename = $result->query_name()."\.out";
>>>>>        $factory->save_output($filename);
>>>>>        $factory->remove_rid($rid);
>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>          next unless ( $v > 0);
>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>          }
>>>>>        }
>>>>>      }
>>>>>    }
>>>>>  }
>>>>> }
>>>>>
>>>>> Thank you for your help!
>>>>>
>>>>>
>>>>> Guojun
>>>>> Department of Plant Biology
>>>>> University of Georgia
>>>>>
>>>>> ----- Original Message -----
>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>> To: gyang at plantbio.uga.edu
>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>
>>>>>
>>>>>
>>>>>> Try two things:
>>>>>>
>>>>>>> 1)  Use a much simpler script, like the one in 'perldoc
>>>>>>>
>>>>>> Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's  
>>>>>> something
>>>>>>
>>>>> wrong
>>>>>
>>>>>> with the logic in your subroutine:
>>>>>>
>>>>>>> my $v = 1;
>>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  
>>>>>>> 'fasta' );
>>>>>>> while (my $input = $str->next_seq()){
>>>>>>>
>>>>>>  #Blast a sequence against a database:
>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>  #and swap the two lines below for an example of that.
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>      if( !ref($rc) ) {
>>>>>>        if( $rc < 0 ) {
>>>>>>          $factory->remove_rid($rid);
>>>>>>        }
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>      } else {
>>>>>>        my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>        my $filename = $result->query_name()."\.out";
>>>>>>        $factory->save_output($filename);
>>>>>>        $factory->remove_rid($rid);
>>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>>          next unless ( $v > 0);
>>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>>          }
>>>>>>        }
>>>>>>      }
>>>>>>    }
>>>>>>  }
>>>>>> }
>>>>>>
>>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works.  It
>>>>>>>
>>>> really
>>>>
>>>>>> shouldn't make that much of a difference, but I noticed that  
>>>>>> the CVS
>>>>>> RemoteBlast (1.28) was changed in Dec 2005, after  
>>>>>> bioperl-1.5.1 was
>>>>>> released; the Bugzilla version is based off CVS.
>>>>>>
>>>>>>> Christopher Fields
>>>>>>>
>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>> Dept. of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>>
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>> Sent: Monday, February 13, 2006 3:00 PM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>
>>>>>>>>> Thanks, Chris,
>>>>>>>>>
>>>>>>> I installed version 1.5.1 and replaced the blast.pm file with  
>>>>>>> the
>>>>>>>
>>>> one
>>>>
>>>>> from
>>>>>
>>>>>>> your bug report. The running version is 1.5 when I use the  
>>>>>>> command
>>>>>>>
>>>> you
>>>>
>>>>>>> sent me. But when I tried the script, it doesn't change much. My
>>>>>>> remoteblast code (portion) is here:
>>>>>>>
>>>>>>>>> sub search {
>>>>>>>>>
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} 
>>>>>>> ="$ORGN";
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} 
>>>>>>> =5000;
>>>>>>> local
>>>>>>>
>>>>>>>
>>>> $Bio::Tools::Run::RemoteBlast::HEADER 
>>>> {'COMPOSITION_BASED_STATISTICS'}=
>>>>
>>>>>>> 'no';
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
>>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]",
>>>>>>> 			      -id=>"query",
>>>>>>> 			      -desc=>"new seq");
>>>>>>> my $len=$query->length();
>>>>>>> @db=('nr','htgs','wgs');
>>>>>>> foreach my $db (@db) {
>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'  
>>>>>>> =>'blastn',
>>>>>>> 						'-data' =>"$db",
>>>>>>>
>>>>>>>
>>> '-expect'=>"$E_value");
>>>
>>>>>>>>>>> my $blast_report = $factory->submit_blast($query);
>>>>>>>>>>>
>>>>>>>>> my @rids = $factory->each_rid();
>>>>>>>>>
>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>    print STDERR "$rid\n";
>>>>>>> }
>>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
>>>>>>> print STDERR "waiting...";
>>>>>>> sleep 60;
>>>>>>>
>>>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>>>
>>>>>>>    my $rc = $factory->retrieve_blast($rid);
>>>>>>>    while (!ref($rc) ) {
>>>>>>> 	if( $rc < 0 ) {
>>>>>>> # retrieve_blast returns -1 on error
>>>>>>> 	    $factory->remove_rid($rid);
>>>>>>> 	    print "Error!\n";
>>>>>>> 	    send_error($email,$function,$seqname,$queryname[$ST]);
>>>>>>> 	    die "Can't retrieve $rid";
>>>>>>> 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
>>>>>>>
>>>> finished'
>>>>
>>>>>>> 	    sleep 60;
>>>>>>> 	    $rc = $factory->retrieve_blast($rid);
>>>>>>> 	}
>>>>>>>    }
>>>>>>>    if (ref($rc)) {
>>>>>>> 	print STDERR "Done.\n";
>>>>>>> 	 while( my $result = $rc->next_result) {
>>>>>>> 	    while( my $hit = $result->next_hit()) {
>>>>>>> 	    	$hit_name=$hit->name;
>>>>>>> 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
>>>>>>> 		$name=$1;
>>>>>>> 		@left_plus_start=();
>>>>>>> 		@left_plus_end=();
>>>>>>> 		@left_minus_start=();
>>>>>>> 		@left_minus_end=();
>>>>>>> 		@right_plus_start=();
>>>>>>> 		@right_plus_end=();
>>>>>>> 		@right_minus_start=();
>>>>>>> 		@right_minus_end=();
>>>>>>>
>>>>>>>>> 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
>>>>>>>>>
>>>>>>> 		while( my $hsp = $hit->next_hsp()) {
>>>>>>> ......
>>>>>>>
>>>>>>>>> It was working quite well before around October laster  
>>>>>>>>> year, but
>>>>>>>>>
>>>>> it has
>>>>>
>>>>>>> stopped since then, When a submission is sent via a webpage,  
>>>>>>> the cgi
>>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there,
>>>>>>>
>>>>> finally
>>>>>
>>>>>>> the expected email is received but without real results  
>>>>>>> although it
>>>>>>>
>>>>> does
>>>>>
>>>>>>> contain something from other parts of the script. Apparently the
>>>>>>>
>>>>> search
>>>>>
>>>>>>> sub did not return anything (I know there is something should be
>>>>>>> returned.). Is it also possible the format of the NCBI output  
>>>>>>> for
>>>>>>>
>>>> each
>>>>
>>>>>>> result has changed?
>>>>>>> Thank you,
>>>>>>> Guojun
>>>>>>>
>>>>>>>>>>> Department of Plant Biology
>>>>>>>>>>>
>>>>>>> University of Georgia
>>>>>>>
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>
>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>
>>>>>>>>>>>> How do you know two versions are installed (i.e. how are
>>>>>>>>>>>>
>>>> you
>>>>
>>>>> checking
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> version)?  Do you see have two complete bioperl  
>>>>>>>> distributions (in
>>>>>>>>
>>>>> two
>>>>>
>>>>>>>> separate directories) or are you looking in modules?  Here's  
>>>>>>>> the
>>>>>>>>
>>>> way
>>>>
>>>>> to
>>>>>
>>>>>>>> check the version (from the FAQ):
>>>>>>>>
>>>>>>>>> perl -MBio::Root::Version -e 'print
>>>>>>>>>
>>>>> $Bio::Root::Version::VERSION,"\n"'
>>>>>
>>>>>>>>> If you have two full bioperl distributions on your computer,
>>>>>>>>>
>>>>> normally
>>>>>
>>>>>>> only
>>>>>>>
>>>>>>>> one will be in use unless you have explicitly set the  
>>>>>>>> environment
>>>>>>>>
>>>>>>> variable
>>>>>>>
>>>>>>>> PERL5LIB.  The PERL5LIB  directories will be searched first  
>>>>>>>> before
>>>>>>>>
>>>>> your
>>>>>
>>>>>>>> normal perl directory list (@INC) is searched.  You MAY get  
>>>>>>>> some
>>>>>>>>
>>>>> mixing
>>>>>
>>>>>>>> then, but only if perl can't find a particular module in the  
>>>>>>>> path
>>>>>>>>
>>>>>>> designated
>>>>>>>
>>>>>>>> in PERL5LIB; then it will progress through the directories  
>>>>>>>> listed
>>>>>>>>
>>>> in
>>>>
>>>>>>> @INC.
>>>>>>>
>>>>>>>> This may happen if a module is unique to a particular  
>>>>>>>> release, but
>>>>>>>>
>>>>>>> shouldn't
>>>>>>>
>>>>>>>> happen for the majority of modules, including RemoteBlast.  You
>>>>>>>>
>>>> can
>>>>
>>>>>>> check
>>>>>>>
>>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC  
>>>>>>>> will
>>>>>>>>
>>>>> differ
>>>>>
>>>>>>>> depending on your OS, perl build, etc.
>>>>>>>>
>>>>>>>>> Regardless, if you follow the directions for installing  
>>>>>>>>> bioperl
>>>>>>>>>
>>>>> for
>>>>>
>>>>>>> your
>>>>>>>
>>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make  
>>>>>>>> install',
>>>>>>>>
>>>>> unless
>>>>>
>>>>>>> you
>>>>>>>
>>>>>>>> explicitly change the installation directory when using 'perl
>>>>>>>>
>>>>>>> Makefile.PL'),
>>>>>>>
>>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will
>>>>>>>>
>>>>> install
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> Bioperl distribution you downloaded over the old version in  
>>>>>>>> @INC.
>>>>>>>>
>>>>> See
>>>>>
>>>>>>> this
>>>>>>>
>>>>>>>> page:
>>>>>>>>
>>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
>>>>>>>>> for more details.
>>>>>>>>> Christopher Fields
>>>>>>>>>
>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>> Dept. of Biochemistry
>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>>>
>>>>>>>>>>> Hi, Chris,
>>>>>>>>>>>
>>>>>>>>> I do have different versions of bioperl on my Linux machine
>>>>>>>>>
>>>> (1.4.
>>>>
>>>>> and
>>>>>
>>>>>>>>> 1.5.0), this may be the problem. Should I just install  
>>>>>>>>> bioperl-
>>>>>>>>>
>>>>> 1.5.1
>>>>>
>>>>>>> or I
>>>>>>>
>>>>>>>>> need to uninstall and remove the previous versions. I could  
>>>>>>>>> not
>>>>>>>>>
>>>>> find
>>>>>
>>>>>>> any
>>>>>>>
>>>>>>>>> hint on uninstalling bioperl on linux. Could you please  
>>>>>>>>> give me
>>>>>>>>>
>>>>> some
>>>>>
>>>>>>>>> suggestion?
>>>>>>>>> Thanks,
>>>>>>>>> Guojun
>>>>>>>>>
>>>>>>>>>>> Department of Plant Biology
>>>>>>>>>>>
>>>>>>>>> University of Georgia
>>>>>>>>>      _____
>>>>>>>>>
>>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>
>>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely
>>>>>>>>>>>>>>>
>>>>>>> updated from CVS
>>>>>>>
>>>>>>>>> which isn't the latest fix.
>>>>>>>>>
>>>>>>>>>>> Make sure that you check the following:
>>>>>>>>>>> 1) Always post to the mailing list:
>>>>>>>>>>>
>>>>>>>>> http://www.bioperl.org/wiki/ 
>>>>>>>>> HOWTO:Beginners#Getting_Assistance .
>>>>>>>>>
>>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live
>>>>>>>>>>>
>>>>> (CVS)
>>>>>
>>>>>>>>> installed first.  Perform a clean installation; do not upgrade
>>>>>>>>>
>>>>> only
>>>>>
>>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
>>>>>>>>>
>>>> can't
>>>>
>>>>>>>>> guarantee that mixing modules from old and new distributions
>>>>>>>>>
>>>> (1.4
>>>>
>>>>> and
>>>>>
>>>>>>>>> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl- 
>>>>>>>>> live
>>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be
>>>>>>>>>
>>>>> saved
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>> parsed; it will not parse the newest BLAST text output from  
>>>>>>>>> NCBI
>>>>>>>>>
>>>>>>> (v2.2.13)
>>>>>>>
>>>>>>>>> but it should still save it. I believe as long as  
>>>>>>>>> next_results()
>>>>>>>>>
>>>>> isn't
>>>>>
>>>>>>>>> called, it will work.
>>>>>>>>>
>>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST
>>>>>>>>>>>
>>>> 2.2.13
>>>>
>>>>>>> text output
>>>>>>>
>>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by
>>>>>>>>>
>>>> Roger
>>>>
>>>>> Hall
>>>>>
>>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be
>>>>>>>>>
>>>>> (Jason
>>>>>
>>>>>>> or
>>>>>>>
>>>>>>>>> whomever is in charge of Bio::SearchIO).  They can be found in
>>>>>>>>>
>>>>>>> Bugzilla:
>>>>>>>
>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>>
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>
>>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the
>>>>>>>>>>>
>>>>> option
>>>>>
>>>>>>> of
>>>>>>>
>>>>>>>>> saving XML output, so isn't necessary if you don't plan on  
>>>>>>>>> using
>>>>>>>>>
>>>>> this
>>>>>
>>>>>>>>> option.  And, remember, they haven't been committed yet to  
>>>>>>>>> CVS,
>>>>>>>>>
>>>>> which
>>>>>
>>>>>>>>> means that the final version will change to refle the new
>>>>>>>>>
>>>> version.
>>>>
>>>>>>>>>>>>> Christopher Fields
>>>>>>>>>>>>>
>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>> Dept. of Biochemistry
>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>>>>>    _____
>>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>>>>>>>>>>
>>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM
>>>>>>>>> To: Chris Fields
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>>>> Hi, Chris
>>>>>>>>>>>>>
>>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work
>>>>>>>>>>>
>>>>> for
>>>>>
>>>>>>> my cgi
>>>>>>>
>>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I  
>>>>>>>>> didn't
>>>>>>>>>
>>>>> even
>>>>>
>>>>>>> get
>>>>>>>
>>>>>>>>> any RID. Is there any suggestion?
>>>>>>>>>
>>>>>>>>>>>>>>> Guojun
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Guojun Yang
>>>>>>>>>>>>>
>>>>>>>>> Department of Plant Biology
>>>>>>>>> University of Georgia
>>>>>>>>> Tel: 706-542-1857
>>>>>>>>> Fax: 706-542-1805
>>>>>>>>> http://www.arches.uga.edu/~guojun
>>>>>>>>>    _____
>>>>>>>>>
>>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>>>
>>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
>>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>> I would say give the new code a try, but realize that it
>>>>>>>>>>>
>>>>> hasn't
>>>>>
>>>>>>> been
>>>>>>>
>>>>>>>>> checked
>>>>>>>>> in (like I said below). I will try going over the modified
>>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is
>>>>>>>>>
>>>>> anything I
>>>>>
>>>>>>>>> might
>>>>>>>>> have missed. The changed order in the header of BLAST text
>>>>>>>>>
>>>> output
>>>>
>>>>> has
>>>>>
>>>>>>> me a
>>>>>>>
>>>>>>>>> bit worried that it might not catch everything, but it at  
>>>>>>>>> least
>>>>>>>>>
>>>>>>> doesn't
>>>>>>>
>>>>>>>>> hang
>>>>>>>>> in the while() loop I described in the bug report below (bug
>>>>>>>>>
>>>>> #1934)
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>> seems to process everything fine.
>>>>>>>>>
>>>>>>>>>>> If you want more stability in the code, you might consider
>>>>>>>>>>>
>>>>>>> changing over
>>>>>>>
>>>>>>>>> to
>>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are
>>>>>>>>>
>>>>> some
>>>>>
>>>>>>>>> changes
>>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
>>>>>>>>>
>>>>> saving
>>>>>
>>>>>>> XML
>>>>>>>
>>>>>>>>> output, but I believe it parses everything regardless. If you
>>>>>>>>>
>>>> look
>>>>
>>>>>>> back
>>>>>>>
>>>>>>>>> the
>>>>>>>>> last month or so there has been a bit of discussion here about
>>>>>>>>>
>>>> it.
>>>>
>>>>>>> Jason
>>>>>>>
>>>>>>>>> describes a bit on how to set up RemoteBlast for XML:
>>>>>>>>>
>>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using-
>>>>>>>>>>>
>>>>>>> remoteblast/
>>>>>>>
>>>>>>>>>>> Christopher Fields
>>>>>>>>>>>
>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>> Dept. of Biochemistry
>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM
>>>>>>>>>> To: bioperl-l at bioperl.org
>>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>>>
>>>>> version
>>>>>
>>>>>>> 1.28
>>>>>>>
>>>>>>>>>> Hi, Everybody,
>>>>>>>>>> I see this post and am wondering if this is the reason for  
>>>>>>>>>> the
>>>>>>>>>> malfunctionning of my webserver. We set up a webserver named
>>>>>>>>>>
>>>>> MAK,
>>>>>
>>>>>>> for
>>>>>>>
>>>>>>>>> MITE
>>>>>>>>>
>>>>>>>>>> sequence analysis. It was working very well until around
>>>>>>>>>>
>>>>> November
>>>>>
>>>>>>> 2005,
>>>>>>>
>>>>>>>>>> when it stopped returning any result (the site is fine and
>>>>>>>>>>
>>>> seems
>>>>
>>>>> to
>>>>>
>>>>>>> be
>>>>>>>
>>>>>>>>>> doing sth after submission). In the CGI script, I used
>>>>>>>>>>
>>>>> remoteblast
>>>>>
>>>>>>> (that
>>>>>>>
>>>>>>>>>> work was done in 2003) to do searches. I currently do not  
>>>>>>>>>> have
>>>>>>>>>>
>>>>>>> access to
>>>>>>>
>>>>>>>>>> the server because I moved. Quite several people sent emails
>>>>>>>>>>
>>>> to
>>>>
>>>>> us
>>>>>
>>>>>>> about
>>>>>>>
>>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the
>>>>>>>>>>
>>>>> problem?
>>>>>
>>>>>>>>> Should
>>>>>>>>>
>>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new
>>>>>>>>>>
>>>>> version?
>>>>>
>>>>>>>>>> Thanks a lot,
>>>>>>>>>> Guojun
>>>>>>>>>>
>>>>>>>>>> Department of Plant Biology
>>>>>>>>>> University of Georgia
>>>>>>>>>> Tel: 706-542-1857
>>>>>>>>>> Fax: 706-542-1805
>>>>>>>>>> http://www.arches.uga.edu/~guojun
>>>>>>>>>> _____
>>>>>>>>>>
>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
>>>>>>>>>>
>>>>> Jian'
>>>>>
>>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
>>>>>>>>>>
>>>> [mailto:bioperl-
>>>>
>>>>>>>>>> l at bioperl.org]
>>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500
>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>
>>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl- 
>>>>>>>>>> live
>>>>>>>>>>
>>>>> CVS.
>>>>>
>>>>>>> It
>>>>>>>
>>>>>>>>>> will
>>>>>>>>>> work for saving text output. However, it will not parse
>>>>>>>>>>
>>>> anything
>>>>
>>>>>>> using
>>>>>>>
>>>>>>>>>> next_result (it will likely hang) and will not save XML
>>>>>>>>>>
>>>> format.
>>>>
>>>>> See
>>>>>
>>>>>>>>> these
>>>>>>>>>
>>>>>>>>>> bugs:
>>>>>>>>>>
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>>
>>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast
>>>>>>>>>>
>>>> and
>>>>
>>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been  
>>>>>>>>>> checked in
>>>>>>>>>>
>>>>> yet
>>>>>
>>>>>>> so
>>>>>>>
>>>>>>>>> are
>>>>>>>>>
>>>>>>>>>> still not included in bioperl-live; they may be further
>>>>>>>>>>
>>>> modified
>>>>
>>>>>>> before
>>>>>>>
>>>>>>>>>> committing to CVS. If you're not worried about XML, you could
>>>>>>>>>>
>>>>> just
>>>>>
>>>>>>> try
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> first fix, which is a change to SearchIO::blast.
>>>>>>>>>>
>>>>>>>>>> Nagesh, I remember you posting to the list a month ago  
>>>>>>>>>> using a
>>>>>>>>>>
>>>>>>> script
>>>>>>>
>>>>>>>>>> which
>>>>>>>>>> had problems; the script you used saves the output but  
>>>>>>>>>> doesn't
>>>>>>>>>>
>>>>>>> actually
>>>>>>>
>>>>>>>>>> parse it (i.e. you don't use next_result() to go through the
>>>>>>>>>>
>>>>> data).
>>>>>
>>>>>>> Is
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have
>>>>>>>>>>
>>>> you
>>>>
>>>>>>> tried
>>>>>>>
>>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "-
>>>>>>>>>>
>>>>> readmethod
>>>>>
>>>>>>> =>
>>>>>>>
>>>>>>>>>> blast"
>>>>>>>>>> using your version of RemoteBlast and method next_result()?
>>>>>>>>>>
>>>> Like
>>>>
>>>>>>> below
>>>>>>>
>>>>>>>>>> (from
>>>>>>>>>> perldoc):
>>>>>>>>>>
>>>>>>>>>> while ( my @rids = $factory->each_rid ) {
>>>>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>>>> my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>> if( !ref($rc) ) {
>>>>>>>>>> if( $rc < 0 ) {
>>>>>>>>>> $factory->remove_rid($rid);
>>>>>>>>>> }
>>>>>>>>>> print STDERR "." if ( $v > 0 );
>>>>>>>>>> sleep 5;
>>>>>>>>>> } else { # parsing
>>>>>>>>>> starts here
>>>>>>>>>> my $result = $rc->next_result(); # it should hang
>>>>>>>>>> here
>>>>>>>>>> #save the output
>>>>>>>>>> my $filename = $result->query_name()."\.out";
>>>>>>>>>> $factory->save_output($filename);
>>>>>>>>>> $factory->remove_rid($rid);
>>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>>>>> while ( my $hit = $result->next_hit ) {
>>>>>>>>>> next unless ( $v > 0);
>>>>>>>>>> print "\thit name is ", $hit->name, "\n";
>>>>>>>>>> while( my $hsp = $hit->next_hsp ) {
>>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n";
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My script hanged if I used next_result() in any way prior to
>>>>>>>>>>
>>>> the
>>>>
>>>>>>> fixes.
>>>>>>>
>>>>>>>>> I
>>>>>>>>>
>>>>>>>>>> want to see how many others are having the same issues with
>>>>>>>>>>
>>>>> parsing
>>>>>
>>>>>>>>> using
>>>>>>>>>
>>>>>>>>>> the CVS version of bioperl-live.
>>>>>>>>>>
>>>>>>>>>> Christopher Fields
>>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>>> Dept. of Biochemistry
>>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
>>>>>>>>>>>
>>>> l-
>>>>
>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
>>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM
>>>>>>>>>>> To: Huang Jian; bioperl-l
>>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>>
>>>>>>>>>>> Hi Huang,
>>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm
>>>>>>>>>>>
>>>>> works
>>>>>
>>>>>>> on
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>> logic of checking the temporary file size to determine
>>>>>>>>>>>
>>>> whether
>>>>
>>>>> the
>>>>>
>>>>>>>>> Blast
>>>>>>>>>
>>>>>>>>>>> results are ready. This condition is not getting satisfied
>>>>>>>>>>>
>>>> may
>>>>
>>>>> be
>>>>>
>>>>>>> due
>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>>> some changes brought about by NCBI. I had this problem
>>>>>>>>>>>
>>>>> recently
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>>>> figured out that the solution was to use the latest version
>>>>>>>>>>>
>>>>> which
>>>>>
>>>>>>> has
>>>>>>>
>>>>>>>>>>> this problem fixed (does not use file size logic any more)
>>>>>>>>>>>
>>>>> which
>>>>>
>>>>>>> is
>>>>>>>
>>>>>>>>> not
>>>>>>>>>
>>>>>>>>>>> yet included in the BioPerl package.
>>>>>>>>>>> Cheers
>>>>>>>>>>> Nagesh
>>>>>>>>>>>
>>>>>>>>>>> Huang Jian wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Dear Nagesh,
>>>>>>>>>>>>
>>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
>>>>>>>>>>>>
>>>>> you
>>>>>
>>>>>>> send
>>>>>>>
>>>>>>>>>>>> me. Now it works perfectly!!!
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you!!
>>>>>>>>>>>>
>>>>>>>>>>>> Huang
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka"
>>>>>>>>>>>> 
>>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l"
>>>>>>>>>>>> 
>>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM
>>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the
>>>>>>>>>>>>
>>>> net,
>>>>
>>>>> so
>>>>>
>>>>>>> still
>>>>>>>
>>>>>>>>>>>> via email
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Huang,
>>>>>>>>>>>>> I see that you are submitting a sequence for a remote
>>>>>>>>>>>>>
>>>> blast
>>>>
>>>>>>> search.
>>>>>>>
>>>>>>>>>> Can
>>>>>>>>>>
>>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28
>>>>>>>>>>>>>
>>>>>>> (2005/12/09).
>>>>>>>
>>>>>>>>> If
>>>>>>>>>
>>>>>>>>>>>>> not I have attached it with this email, try to replace it
>>>>>>>>>>>>>
>>>>> with
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>>>> old
>>>>>>>>>>
>>>>>>>>>>>>> one which has a bug.
>>>>>>>>>>>>> Let me know if it works.
>>>>>>>>>>>>> Nagesh
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Thu Feb 16 07:52:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 06:52:31 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
Message-ID: 

I think a method was recently implemented in Bio::DB::GenBank to  
retrieve a segment of DNA given start and end coordinates in GenBank  
format; that should contain the features you need.  I requested it  
~Nov-Dec in the mailing list but didn't get a chance to test it.   
Would that help?

On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:

> Harry,
>
> It's not clear to me that NCBI's eutils offers this capability  
> directly. You
> can probably download Entrez Gene entries and parse them for  
> coordinates but
> I know of no way to remotely retrieve genomic sequences like this  
> from NCBI
> (ENSEMBL API perhaps?). What I had in mind uses the local approach  
> that some
> of us favor and to prove to myself that this is simple to do I wrote a
> script that I just added to examples/tools, it's called  
> extract_genes.pl and
> it's based on Bio::DB::Fasta. Download the sequence files for a given
> species to some dir, download Entrez Gene's gene2accession file,  
> and run. It
> creates and stores a hash for lookups, it won't read gene2accession  
> each
> time it runs.
>
> Brian O.
>
>
> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>
>> Hi Brian,
>>
>> Thanks very much for the pointers and the speed of your reply and  
>> apologies
>> for the speed of mine.
>>
>> This looks good, but what I was looking for was a bioP approach  
>> for hooking to
>> an API at NCBI or EBI so I could get this info and seqs from  
>> them.  In this
>> case, speed of retrieval is not critical and I'd rather not  
>> download the
>> entirety of the sequences to a local disk to hack at them.
>>
>> I've determined a screen-scraping approach to get them and could  
>> script that,
>> but I thought that bioP had a method for using NCBI's external  
>> API's, tho it
>> may be that my memory is faulty or the approach is no longer  
>> supported due to
>> overload.
>>
>> Does NCBI make such APIs available anymore?  I searched a bit for  
>> docs on them
>> but couldn't find anything (unless it's buried in the NCBI tookit,  
>> which I
>> haven't started to excavate).
>>
>> Failing that, would SEALS provide such a service? Any PerlPinipeds  
>> listening?
>>
>> Harry
>>
>>
>>
>>
>>
>>
>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>> Harry,
>>>
>>> Hope you're doing well. The approach could be based on  
>>> Bio::DB::Fasta. So,
>>> from its documentation:
>>>
>>>   use Bio::DB::Fasta;
>>>
>>>   # create database from directory of fasta files
>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>
>>>   # simple access (for those without Bioperl)
>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>   my @ids     = $db->ids;
>>>   my $length   = $db->length('CHROMOSOME_I');
>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>   my $header   = $db->header('CHROMOSOME_I');
>>>
>>>   # Bioperl-style access
>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>
>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>   my $seq     = $obj->seq;
>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>
>>> Do you already have the offsets?
>>>
>>> Brian O.
>>>
>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>> Hi All,
>>>>
>>>> After perusing the tutorial and other docs for a an evening, I  
>>>> still
>>>> can't find the answer to this.  Forgive me if I've missed something
>>>> obvious.
>>>>
>>>> This should not be a novel request, but I've not found it  
>>>> answered.  If
>>>> bioperl isn't the best way to do this, I'd be grateful to a  
>>>> pointer to a
>>>> better way, especially if it includes an illuminating bit of code.
>>>>
>>>> The problem is to retrieve genomic sequences plus & minus some  
>>>> offset
>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>> common followup chore for some extra analysis from a gene  
>>>> expression
>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed  
>>>> the
>>>> sequence type to specify...?
>>>>
>>>>
>>>> TIA!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From anst at kvl.dk  Thu Feb 16 04:24:51 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Thu, 16 Feb 2006 10:24:51 +0100
Subject: [Bioperl-l] searchIO bug?
Message-ID: <43F452F30200009B00000EC9@gwia.kvl.dk>

Hi! 
 
 
I am blasting a protein seq against an identical protein. 
I am trying to parse the protein header by using the query_description
method in the SearchIO module. 
After using the query_description method I use split / /      in order
to easily access the different header components. 
Here I discover that the query_description method is somehow introducing
a space between number 5 comma and the following chromosome position
number 
in the exon chromosome position list!? 
This truncates the list of exon chromosome positions from 7 to 4, later
yielding a wrong number of the introns counted. 
 
Is this a bug? 
 
Attached is: 
 
testblast1.pl: the blastprogram to run. 
 
Q0045 the seq that is used as both query and database seq. 
(Q0045 has to be formated in order to be used as a database: formatdb -i
Q0045 -p T -o F) 
 
 
Regards Anders. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastp5.pl
Type: application/octet-stream
Size: 50384 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 

From anst at kvl.dk  Thu Feb 16 05:20:06 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Thu, 16 Feb 2006 11:20:06 +0100
Subject: [Bioperl-l] another searchIO bug?
Message-ID: <43F45FE60200009B00000ED6@gwia.kvl.dk>

Hi! 
 
I am blasting a protein seq (query) against an identical seq with a
deletion of Aa nr 61 (subject). 
Then I print out the type of nomatch Aa and its position. 
The nomatch for the query seq is Aa G at position 61, which is correct. 
The nomatch for the subject seq is V at position 60, which is definitely
not correct!? 
 
Is this a bug? 
 
testblast2.pl is the program to run 
 
Q0045 is the query seq. 
 
Q0045del61 is the subject seq (it has to be formated: formatdb -i
Q0045del61 -p T -o F). 
 
Regards Anders. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testblast2.pl
Type: application/octet-stream
Size: 6109 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045del61
Type: application/octet-stream
Size: 872 bytes
Desc: not available
URL: 

From mcoyne at channing.harvard.edu  Wed Feb 15 16:20:17 2006
From: mcoyne at channing.harvard.edu (Michael Coyne)
Date: Wed, 15 Feb 2006 16:20:17 -0500
Subject: [Bioperl-l] Primer maps?
Message-ID: <6.2.0.14.0.20060215155422.01d44a98@localhost>

An HTML attachment was scrubbed...
URL: 

From Pieter.Monsieurs at esat.kuleuven.be  Thu Feb 16 04:46:09 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Thu, 16 Feb 2006 10:46:09 +0100
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
 version 1.28
In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
References: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
Message-ID: <43F449E1.80605@esat.kuleuven.be>

Hi,

I have the same problem with the blast.pm-file.
The people of NCBI added some extra info when giving the Blast-output. 
(see e.g. "Features flanking this part..." or "Features in this part 
..."), example added.
The blast.pm module starts looking for the hsp-alignement-information, 
but it dies when it hits this Feature-information.

Pieter


>gi|77552765|gb|DP000011.1|  Oryza sativa (japonica cultivar-group) chromosome 12, complete 

sequence
Length=27492551

 Features flanking this part of subject sequence:
   
3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class 
   
2655 bp at 3' side: hypothetical protein 

 Score = 36.2 bits (18),  Expect = 0.22
 Identities = 18/18 (100%), Gaps = 0/18 (0%)
 Strand=Plus/Minus

Query  4         GTACTACTCTACTCTACT  21
                 ||||||||||||||||||

Sbjct  19257436  GTACTACTCTACTCTACT  19257419


 Features flanking this part of subject sequence:
   
2991 bp at 5' side: hypothetical protein 
   1131 bp at 3' side: hypothetical protein
 

 Score = 36.2 bits (18),  Expect = 0.22
 Identities = 18/18 (100%), Gaps = 0/18 (0%)
 Strand=Plus/Minus

Query  2         ATGTACTACTCTACTCTA  19
                 ||||||||||||||||||
Sbjct  27006915  ATGTACTACTCTACTCTA  27006898



 Features in this part of subject sequence:
   DHHC zinc finger domain, putative
 

 Score = 34.2 bits (17),  Expect = 0.87
 Identities = 17/17 (100%), Gaps = 0/17 (0%)
 Strand=Plus/Plus

Query  5         TACTACTCTACTCTACT  21
                 |||||||||||||||||
Sbjct  17616437  TACTACTCTACTCTACT  17616453



 Features flanking this part of subject sequence:
   102 bp at 5' side: bZIP transcription factor, putative
 
   3740 bp at 3' side: yeast dcp1, putative 

 Score = 32.2 bits (16),  Expect = 
3.4
 Identities = 16/16 (100%), Gaps = 0/16 (0%)
 Strand=Plus/Plus

Query  7        CTACTCTACTCTACTC  22
                ||||||||||||||||
Sbjct  2775880  CTACTCTACTCTACTC  2775895


 Features flanking this part of subject sequence:

   21 bp at 5' side: peptide transporter T17F3.11, putative 
   
10230 bp at 3' side: transposon protein, putative, unclassified 

 Score = 32.2 bits (16),  Expect = 3.4
 Identities = 16/16 (100%), Gaps = 0/16 (0%)
 Strand=Plus/Minus

Query  7         CTACTCTACTCTACTC  22

                 ||||||||||||||||
Sbjct  27323153  CTACTCTACTCTACTC  27323138




Guojun Yang wrote:

>Hi, Chris,
>Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: 
>"
>waiting........
>------------- EXCEPTION  -------------
>MSG: no data for midline  Features flanking this part of subject sequence:
>STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc                             hIO/blast.pm:1172
>STACK toplevel remoteblast_test:40
>"
>The query sequence is:
>CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
>GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
>AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
>AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
>
>The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'):
>#!/usr/bin/perl
>
>use Bio::SeqIO;
>use Bio::Seq;
>use Bio::Tools::Run::RemoteBlast;
>use Bio::SearchIO;
>use strict;
>my $prog='blastn';
>my $db='nr';
>my $e_val=1e-10;
>my @params=( -prog=>$prog,
>	-data=>$db,
>	-expect=>$e_val,
>	-readmethod=>'SearchIO');
>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>
>my $v = 1;
>
>my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
>
>while (my $input = $str->next_seq()){
>  #Blast a sequence against a database:
>  #Alternatively, you could  pass in a file with many
>  #sequences rather than loop through sequence one at a time
>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>  #and swap the two lines below for an example of that.
>  my $r = $factory->submit_blast($input);
>  #my $r = $factory->submit_blast('amino.fa');
>  print STDERR "waiting..." if( $v > 0 );
>  while ( my @rids = $factory->each_rid ) {
>    foreach my $rid ( @rids ) {
>      my $rc = $factory->retrieve_blast($rid);
>      if( !ref($rc) ) {
>        if( $rc < 0 ) {
>          $factory->remove_rid($rid);
>        }
>        print STDERR "." if ( $v > 0 );
>        sleep 5;
>      } else {
>        my $result = $rc->next_result();
>        #save the output
>        my $filename = $result->query_name()."\.out";
>        $factory->save_output($filename);
>        $factory->remove_rid($rid);
>        print "\nQuery Name: ", $result->query_name(), "\n";
>        while ( my $hit = $result->next_hit ) {
>          next unless ( $v > 0);
>          print "\thit name is ", $hit->name, "\n";
>          while( my $hsp = $hit->next_hsp ) {
>            print "\t\tscore is ", $hsp->score, "\n";
>          }
>        }
>      }
>    }
>  }
>}
>
>
>Do you think there might still be something in the NCBI output format?
>
>Thank you,
>Guojun
>
>
>
>
>Guojun Yang
>Department of Plant Biology
>University of Georgia
>Tel: 706-542-1857
>Fax: 706-542-1805
>http://www.arches.uga.edu/~guojun
>
>
>
>----- Original Message -----
>From: Chris Fields [mailto:cjfields at uiuc.edu]
>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
>
>
>  
>
>>Sorry, forgot to add that I didn't see the regex issue that you mentioned.
>>It could be a perl-related issue.  Try the fixes I mentioned and see what
>>happens.
>>    
>>
>>>Christopher Fields
>>>      
>>>
>>Postdoctoral Researcher - Switzer Lab
>>Dept. of Biochemistry
>>University of Illinois Urbana-Champaign 
>>    
>>
>>>>>-----Original Message-----
>>>>>          
>>>>>
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Tuesday, February 14, 2006 12:36 PM
>>>To: 'gyang at plantbio.uga.edu'
>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>      
>>>
>>>>>It's a good habit to always add single quotes around words.  The perl
>>>>>          
>>>>>
>>>interpreter may think a single bare word is a subroutine or perlfunc
>>>called with no args so will try to find a subroutine named blastp().  My
>>>debugger actually gives the error that the bare word blastp may conflict
>>>with a future reserved word.  Like you said, 'use strict' will point that
>>>out.
>>>      
>>>
>>>>>As for the regex, it should match all the blast programs at NCBI (blastp,
>>>>>          
>>>>>
>>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
>>>else passes through.
>>>      
>>>
>>>>>So, if you are using the script below, there are several errors.  The bare
>>>>>          
>>>>>
>>>words for $prog and $db need quotes, and the flags for you @params array
>>>don't have a dash before them.  I get this after adding quotes but before
>>>adding the dashes to @params:
>>>      
>>>
>>>>>C:\Perl\Scripts>test_blast.pl
>>>>>------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>          
>>>>>
>>>MSG:
>>>STACK: Error::throw
>>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
>>>live/Bio/Root/Root.pm:328
>>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
>>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
>>>live/Bio/Tools/Run/RemoteBlast.pm:256
>>>STACK: C:\Perl\Scripts\test_blast.pl:15
>>>-----------------------------------------------------------
>>>      
>>>
>>>>>The last line indicates a problem with this line:
>>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>Changing the @params to this:
>>>>>my @params=( -prog=>$prog,
>>>>>          
>>>>>
>>>	-data=>$db,
>>>	-expect=>$e_val,
>>>	-readmethod=>'SearchIO');
>>>      
>>>
>>>>>fixes it, and I get output as expected.
>>>>>Christopher Fields
>>>>>          
>>>>>
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>      
>>>
>>>>>>>>-----Original Message-----
>>>>>>>>                
>>>>>>>>
>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>Sent: Tuesday, February 14, 2006 11:48 AM
>>>>To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>
>>>>Hi, Chris,
>>>>When I tried with the perldoc script, It did not work either. First it
>>>>says $prog can not be bare word if I "use strict". I added quotes on the
>>>>words, then it says the value for $prog does not match expression
>>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
>>>>        
>>>>
>>>script
>>>      
>>>
>>>>is shown below. Why is the expression "t?blast[pnx]"?
>>>>
>>>>#!/usr/bin/perl
>>>>
>>>>use Bio::SeqIO;
>>>>use Bio::Seq;
>>>>use Bio::Tools::Run::RemoteBlast;
>>>>use Bio::SearchIO;
>>>>
>>>>
>>>>my $prog=blastp;
>>>>my $db=swissprot;
>>>>my $e_val=1e-10;
>>>>my @params=( prog=>$prog,
>>>>	data=>$db,
>>>>	expect=>$e_val,
>>>>	readmethod=>'SearchIO');
>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>
>>>>my $v = 1;
>>>>
>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
>>>>
>>>>while (my $input = $str->next_seq()){
>>>>  #Blast a sequence against a database:
>>>>  #Alternatively, you could  pass in a file with many
>>>>  #sequences rather than loop through sequence one at a time
>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>  #and swap the two lines below for an example of that.
>>>>  my $r = $factory->submit_blast($input);
>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>  while ( my @rids = $factory->each_rid ) {
>>>>    foreach my $rid ( @rids ) {
>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>      if( !ref($rc) ) {
>>>>        if( $rc < 0 ) {
>>>>          $factory->remove_rid($rid);
>>>>        }
>>>>        print STDERR "." if ( $v > 0 );
>>>>        sleep 5;
>>>>      } else {
>>>>        my $result = $rc->next_result();
>>>>        #save the output
>>>>        my $filename = $result->query_name()."\.out";
>>>>        $factory->save_output($filename);
>>>>        $factory->remove_rid($rid);
>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>        while ( my $hit = $result->next_hit ) {
>>>>          next unless ( $v > 0);
>>>>          print "\thit name is ", $hit->name, "\n";
>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>          }
>>>>        }
>>>>      }
>>>>    }
>>>>  }
>>>>}
>>>>
>>>>Thank you for your help!
>>>>
>>>>
>>>>Guojun
>>>>Department of Plant Biology
>>>>University of Georgia
>>>>
>>>>----- Original Message -----
>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>To: gyang at plantbio.uga.edu
>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>
>>>>
>>>>        
>>>>
>>>>>Try two things:
>>>>>          
>>>>>
>>>>>>1)  Use a much simpler script, like the one in 'perldoc
>>>>>>            
>>>>>>
>>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
>>>>>          
>>>>>
>>>>wrong
>>>>        
>>>>
>>>>>with the logic in your subroutine:
>>>>>          
>>>>>
>>>>>>my $v = 1;
>>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
>>>>>>while (my $input = $str->next_seq()){
>>>>>>            
>>>>>>
>>>>>  #Blast a sequence against a database:
>>>>>  #Alternatively, you could  pass in a file with many
>>>>>  #sequences rather than loop through sequence one at a time
>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>  #and swap the two lines below for an example of that.
>>>>>  my $r = $factory->submit_blast($input);
>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>    foreach my $rid ( @rids ) {
>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>      if( !ref($rc) ) {
>>>>>        if( $rc < 0 ) {
>>>>>          $factory->remove_rid($rid);
>>>>>        }
>>>>>        print STDERR "." if ( $v > 0 );
>>>>>        sleep 5;
>>>>>      } else {
>>>>>        my $result = $rc->next_result();
>>>>>        #save the output
>>>>>        my $filename = $result->query_name()."\.out";
>>>>>        $factory->save_output($filename);
>>>>>        $factory->remove_rid($rid);
>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>          next unless ( $v > 0);
>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>          }
>>>>>        }
>>>>>      }
>>>>>    }
>>>>>  }
>>>>>}
>>>>>          
>>>>>
>>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
>>>>>>            
>>>>>>
>>>really
>>>      
>>>
>>>>>shouldn't make that much of a difference, but I noticed that the CVS
>>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
>>>>>released; the Bugzilla version is based off CVS.
>>>>>          
>>>>>
>>>>>>Christopher Fields
>>>>>>            
>>>>>>
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>          
>>>>>
>>>>>>>-----Original Message-----
>>>>>>>              
>>>>>>>
>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>Sent: Monday, February 13, 2006 3:00 PM
>>>>>>To: bioperl-l at lists.open-bio.org
>>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>            
>>>>>>
>>>>>>>>Thanks, Chris,
>>>>>>>>                
>>>>>>>>
>>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
>>>>>>            
>>>>>>
>>>one
>>>      
>>>
>>>>from
>>>>        
>>>>
>>>>>>your bug report. The running version is 1.5 when I use the command
>>>>>>            
>>>>>>
>>>you
>>>      
>>>
>>>>>>sent me. But when I tried the script, it doesn't change much. My
>>>>>>remoteblast code (portion) is here:
>>>>>>            
>>>>>>
>>>>>>>>sub search {
>>>>>>>>                
>>>>>>>>
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
>>>>>>local
>>>>>>
>>>>>>            
>>>>>>
>>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
>>>      
>>>
>>>>>>'no';
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
>>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
>>>>>>			      -id=>"query",
>>>>>>			      -desc=>"new seq");
>>>>>>my $len=$query->length();
>>>>>>@db=('nr','htgs','wgs');
>>>>>>foreach my $db (@db) {
>>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
>>>>>>						'-data' =>"$db",
>>>>>>
>>>>>>            
>>>>>>
>>'-expect'=>"$E_value");
>>    
>>
>>>>>>>>>>my $blast_report = $factory->submit_blast($query);
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>my @rids = $factory->each_rid();
>>>>>>>>                
>>>>>>>>
>>>>>>foreach my $rid ( @rids ) {
>>>>>>    print STDERR "$rid\n";
>>>>>>}
>>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
>>>>>>print STDERR "waiting...";
>>>>>>sleep 60;
>>>>>>            
>>>>>>
>>>>>>>>foreach my $rid ( @rids ) {
>>>>>>>>                
>>>>>>>>
>>>>>>    my $rc = $factory->retrieve_blast($rid);
>>>>>>    while (!ref($rc) ) {
>>>>>>	if( $rc < 0 ) {
>>>>>># retrieve_blast returns -1 on error
>>>>>>	    $factory->remove_rid($rid);
>>>>>>	    print "Error!\n";
>>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
>>>>>>	    die "Can't retrieve $rid";
>>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
>>>>>>            
>>>>>>
>>>finished'
>>>      
>>>
>>>>>>	    sleep 60;
>>>>>>	    $rc = $factory->retrieve_blast($rid);
>>>>>>	}
>>>>>>    }
>>>>>>    if (ref($rc)) {
>>>>>>	print STDERR "Done.\n";
>>>>>>	 while( my $result = $rc->next_result) {
>>>>>>	    while( my $hit = $result->next_hit()) {
>>>>>>	    	$hit_name=$hit->name;
>>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
>>>>>>		$name=$1;
>>>>>>		@left_plus_start=();
>>>>>>		@left_plus_end=();
>>>>>>		@left_minus_start=();
>>>>>>		@left_minus_end=();
>>>>>>		@right_plus_start=();
>>>>>>		@right_plus_end=();
>>>>>>		@right_minus_start=();
>>>>>>		@right_minus_end=();
>>>>>>            
>>>>>>
>>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
>>>>>>>>                
>>>>>>>>
>>>>>>		while( my $hsp = $hit->next_hsp()) {
>>>>>>......
>>>>>>            
>>>>>>
>>>>>>>>It was working quite well before around October laster year, but
>>>>>>>>                
>>>>>>>>
>>>>it has
>>>>        
>>>>
>>>>>>stopped since then, When a submission is sent via a webpage, the cgi
>>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
>>>>>>            
>>>>>>
>>>>finally
>>>>        
>>>>
>>>>>>the expected email is received but without real results although it
>>>>>>            
>>>>>>
>>>>does
>>>>        
>>>>
>>>>>>contain something from other parts of the script. Apparently the
>>>>>>            
>>>>>>
>>>>search
>>>>        
>>>>
>>>>>>sub did not return anything (I know there is something should be
>>>>>>returned.). Is it also possible the format of the NCBI output for
>>>>>>            
>>>>>>
>>>each
>>>      
>>>
>>>>>>result has changed?
>>>>>>Thank you,
>>>>>>Guojun
>>>>>>            
>>>>>>
>>>>>>>>>>Department of Plant Biology
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>University of Georgia
>>>>>>            
>>>>>>
>>>>>>>>>>>>----- Original Message -----
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>            
>>>>>>
>>>>>>>>>>>How do you know two versions are installed (i.e. how are
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>you
>>>      
>>>
>>>>checking
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>version)?  Do you see have two complete bioperl distributions (in
>>>>>>>              
>>>>>>>
>>>>two
>>>>        
>>>>
>>>>>>>separate directories) or are you looking in modules?  Here's the
>>>>>>>              
>>>>>>>
>>>way
>>>      
>>>
>>>>to
>>>>        
>>>>
>>>>>>>check the version (from the FAQ):
>>>>>>>              
>>>>>>>
>>>>>>>>perl -MBio::Root::Version -e 'print
>>>>>>>>                
>>>>>>>>
>>>>$Bio::Root::Version::VERSION,"\n"'
>>>>        
>>>>
>>>>>>>>If you have two full bioperl distributions on your computer,
>>>>>>>>                
>>>>>>>>
>>>>normally
>>>>        
>>>>
>>>>>>only
>>>>>>            
>>>>>>
>>>>>>>one will be in use unless you have explicitly set the environment
>>>>>>>              
>>>>>>>
>>>>>>variable
>>>>>>            
>>>>>>
>>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first before
>>>>>>>              
>>>>>>>
>>>>your
>>>>        
>>>>
>>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
>>>>>>>              
>>>>>>>
>>>>mixing
>>>>        
>>>>
>>>>>>>then, but only if perl can't find a particular module in the path
>>>>>>>              
>>>>>>>
>>>>>>designated
>>>>>>            
>>>>>>
>>>>>>>in PERL5LIB; then it will progress through the directories listed
>>>>>>>              
>>>>>>>
>>>in
>>>      
>>>
>>>>>>@INC.
>>>>>>            
>>>>>>
>>>>>>>This may happen if a module is unique to a particular release, but
>>>>>>>              
>>>>>>>
>>>>>>shouldn't
>>>>>>            
>>>>>>
>>>>>>>happen for the majority of modules, including RemoteBlast.  You
>>>>>>>              
>>>>>>>
>>>can
>>>      
>>>
>>>>>>check
>>>>>>            
>>>>>>
>>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
>>>>>>>              
>>>>>>>
>>>>differ
>>>>        
>>>>
>>>>>>>depending on your OS, perl build, etc.
>>>>>>>              
>>>>>>>
>>>>>>>>Regardless, if you follow the directions for installing bioperl
>>>>>>>>                
>>>>>>>>
>>>>for
>>>>        
>>>>
>>>>>>your
>>>>>>            
>>>>>>
>>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
>>>>>>>              
>>>>>>>
>>>>unless
>>>>        
>>>>
>>>>>>you
>>>>>>            
>>>>>>
>>>>>>>explicitly change the installation directory when using 'perl
>>>>>>>              
>>>>>>>
>>>>>>Makefile.PL'),
>>>>>>            
>>>>>>
>>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
>>>>>>>              
>>>>>>>
>>>>install
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>Bioperl distribution you downloaded over the old version in @INC.
>>>>>>>              
>>>>>>>
>>>>See
>>>>        
>>>>
>>>>>>this
>>>>>>            
>>>>>>
>>>>>>>page:
>>>>>>>              
>>>>>>>
>>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
>>>>>>>>for more details.
>>>>>>>>Christopher Fields
>>>>>>>>                
>>>>>>>>
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>              
>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
>>>>>>>>To: bioperl-l at lists.open-bio.org
>>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Hi, Chris,
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>I do have different versions of bioperl on my Linux machine
>>>>>>>>                
>>>>>>>>
>>>(1.4.
>>>      
>>>
>>>>and
>>>>        
>>>>
>>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
>>>>>>>>                
>>>>>>>>
>>>>1.5.1
>>>>        
>>>>
>>>>>>or I
>>>>>>            
>>>>>>
>>>>>>>>need to uninstall and remove the previous versions. I could not
>>>>>>>>                
>>>>>>>>
>>>>find
>>>>        
>>>>
>>>>>>any
>>>>>>            
>>>>>>
>>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
>>>>>>>>                
>>>>>>>>
>>>>some
>>>>        
>>>>
>>>>>>>>suggestion?
>>>>>>>>Thanks,
>>>>>>>>Guojun
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Department of Plant Biology
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>University of Georgia
>>>>>>>>      _____
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>updated from CVS
>>>>>>            
>>>>>>
>>>>>>>>which isn't the latest fix.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Make sure that you check the following:
>>>>>>>>>>1) Always post to the mailing list:
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
>>>>>>>>>>                    
>>>>>>>>>>
>>>>(CVS)
>>>>        
>>>>
>>>>>>>>installed first.  Perform a clean installation; do not upgrade
>>>>>>>>                
>>>>>>>>
>>>>only
>>>>        
>>>>
>>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
>>>>>>>>                
>>>>>>>>
>>>can't
>>>      
>>>
>>>>>>>>guarantee that mixing modules from old and new distributions
>>>>>>>>                
>>>>>>>>
>>>(1.4
>>>      
>>>
>>>>and
>>>>        
>>>>
>>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
>>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
>>>>>>>>                
>>>>>>>>
>>>>saved
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
>>>>>>>>                
>>>>>>>>
>>>>>>(v2.2.13)
>>>>>>            
>>>>>>
>>>>>>>>but it should still save it. I believe as long as next_results()
>>>>>>>>                
>>>>>>>>
>>>>isn't
>>>>        
>>>>
>>>>>>>>called, it will work.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
>>>>>>>>>>                    
>>>>>>>>>>
>>>2.2.13
>>>      
>>>
>>>>>>text output
>>>>>>            
>>>>>>
>>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
>>>>>>>>                
>>>>>>>>
>>>Roger
>>>      
>>>
>>>>Hall
>>>>        
>>>>
>>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
>>>>>>>>                
>>>>>>>>
>>>>(Jason
>>>>        
>>>>
>>>>>>or
>>>>>>            
>>>>>>
>>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
>>>>>>>>                
>>>>>>>>
>>>>>>Bugzilla:
>>>>>>            
>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
>>>>>>>>>>                    
>>>>>>>>>>
>>>>option
>>>>        
>>>>
>>>>>>of
>>>>>>            
>>>>>>
>>>>>>>>saving XML output, so isn't necessary if you don't plan on using
>>>>>>>>                
>>>>>>>>
>>>>this
>>>>        
>>>>
>>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
>>>>>>>>                
>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>>>means that the final version will change to refle the new
>>>>>>>>                
>>>>>>>>
>>>version.
>>>      
>>>
>>>>>>>>>>>>Christopher Fields
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>Dept. of Biochemistry
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>    _____
>>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
>>>>>>>>To: Chris Fields
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>Hi, Chris
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
>>>>>>>>>>                    
>>>>>>>>>>
>>>>for
>>>>        
>>>>
>>>>>>my cgi
>>>>>>            
>>>>>>
>>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
>>>>>>>>                
>>>>>>>>
>>>>even
>>>>        
>>>>
>>>>>>get
>>>>>>            
>>>>>>
>>>>>>>>any RID. Is there any suggestion?
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>>>Guojun
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>Guojun Yang
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Department of Plant Biology
>>>>>>>>University of Georgia
>>>>>>>>Tel: 706-542-1857
>>>>>>>>Fax: 706-542-1805
>>>>>>>>http://www.arches.uga.edu/~guojun
>>>>>>>>    _____
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
>>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>I would say give the new code a try, but realize that it
>>>>>>>>>>                    
>>>>>>>>>>
>>>>hasn't
>>>>        
>>>>
>>>>>>been
>>>>>>            
>>>>>>
>>>>>>>>checked
>>>>>>>>in (like I said below). I will try going over the modified
>>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
>>>>>>>>                
>>>>>>>>
>>>>anything I
>>>>        
>>>>
>>>>>>>>might
>>>>>>>>have missed. The changed order in the header of BLAST text
>>>>>>>>                
>>>>>>>>
>>>output
>>>      
>>>
>>>>has
>>>>        
>>>>
>>>>>>me a
>>>>>>            
>>>>>>
>>>>>>>>bit worried that it might not catch everything, but it at least
>>>>>>>>                
>>>>>>>>
>>>>>>doesn't
>>>>>>            
>>>>>>
>>>>>>>>hang
>>>>>>>>in the while() loop I described in the bug report below (bug
>>>>>>>>                
>>>>>>>>
>>>>#1934)
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>seems to process everything fine.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>If you want more stability in the code, you might consider
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>changing over
>>>>>>            
>>>>>>
>>>>>>>>to
>>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
>>>>>>>>                
>>>>>>>>
>>>>some
>>>>        
>>>>
>>>>>>>>changes
>>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
>>>>>>>>                
>>>>>>>>
>>>>saving
>>>>        
>>>>
>>>>>>XML
>>>>>>            
>>>>>>
>>>>>>>>output, but I believe it parses everything regardless. If you
>>>>>>>>                
>>>>>>>>
>>>look
>>>      
>>>
>>>>>>back
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>last month or so there has been a bit of discussion here about
>>>>>>>>                
>>>>>>>>
>>>it.
>>>      
>>>
>>>>>>Jason
>>>>>>            
>>>>>>
>>>>>>>>describes a bit on how to set up RemoteBlast for XML:
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>remoteblast/
>>>>>>            
>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>Dept. of Biochemistry
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>>                  
>>>>>>>>>
>>>>version
>>>>        
>>>>
>>>>>>1.28
>>>>>>            
>>>>>>
>>>>>>>>>Hi, Everybody,
>>>>>>>>>I see this post and am wondering if this is the reason for the
>>>>>>>>>malfunctionning of my webserver. We set up a webserver named
>>>>>>>>>                  
>>>>>>>>>
>>>>MAK,
>>>>        
>>>>
>>>>>>for
>>>>>>            
>>>>>>
>>>>>>>>MITE
>>>>>>>>                
>>>>>>>>
>>>>>>>>>sequence analysis. It was working very well until around
>>>>>>>>>                  
>>>>>>>>>
>>>>November
>>>>        
>>>>
>>>>>>2005,
>>>>>>            
>>>>>>
>>>>>>>>>when it stopped returning any result (the site is fine and
>>>>>>>>>                  
>>>>>>>>>
>>>seems
>>>      
>>>
>>>>to
>>>>        
>>>>
>>>>>>be
>>>>>>            
>>>>>>
>>>>>>>>>doing sth after submission). In the CGI script, I used
>>>>>>>>>                  
>>>>>>>>>
>>>>remoteblast
>>>>        
>>>>
>>>>>>(that
>>>>>>            
>>>>>>
>>>>>>>>>work was done in 2003) to do searches. I currently do not have
>>>>>>>>>                  
>>>>>>>>>
>>>>>>access to
>>>>>>            
>>>>>>
>>>>>>>>>the server because I moved. Quite several people sent emails
>>>>>>>>>                  
>>>>>>>>>
>>>to
>>>      
>>>
>>>>us
>>>>        
>>>>
>>>>>>about
>>>>>>            
>>>>>>
>>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
>>>>>>>>>                  
>>>>>>>>>
>>>>problem?
>>>>        
>>>>
>>>>>>>>Should
>>>>>>>>                
>>>>>>>>
>>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
>>>>>>>>>                  
>>>>>>>>>
>>>>version?
>>>>        
>>>>
>>>>>>>>>Thanks a lot,
>>>>>>>>>Guojun
>>>>>>>>>
>>>>>>>>>Department of Plant Biology
>>>>>>>>>University of Georgia
>>>>>>>>>Tel: 706-542-1857
>>>>>>>>>Fax: 706-542-1805
>>>>>>>>>http://www.arches.uga.edu/~guojun
>>>>>>>>>_____
>>>>>>>>>
>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
>>>>>>>>>                  
>>>>>>>>>
>>>>Jian'
>>>>        
>>>>
>>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
>>>>>>>>>                  
>>>>>>>>>
>>>[mailto:bioperl-
>>>      
>>>
>>>>>>>>>l at bioperl.org]
>>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>
>>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
>>>>>>>>>                  
>>>>>>>>>
>>>>CVS.
>>>>        
>>>>
>>>>>>It
>>>>>>            
>>>>>>
>>>>>>>>>will
>>>>>>>>>work for saving text output. However, it will not parse
>>>>>>>>>                  
>>>>>>>>>
>>>anything
>>>      
>>>
>>>>>>using
>>>>>>            
>>>>>>
>>>>>>>>>next_result (it will likely hang) and will not save XML
>>>>>>>>>                  
>>>>>>>>>
>>>format.
>>>      
>>>
>>>>See
>>>>        
>>>>
>>>>>>>>these
>>>>>>>>                
>>>>>>>>
>>>>>>>>>bugs:
>>>>>>>>>
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>
>>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
>>>>>>>>>                  
>>>>>>>>>
>>>and
>>>      
>>>
>>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
>>>>>>>>>                  
>>>>>>>>>
>>>>yet
>>>>        
>>>>
>>>>>>so
>>>>>>            
>>>>>>
>>>>>>>>are
>>>>>>>>                
>>>>>>>>
>>>>>>>>>still not included in bioperl-live; they may be further
>>>>>>>>>                  
>>>>>>>>>
>>>modified
>>>      
>>>
>>>>>>before
>>>>>>            
>>>>>>
>>>>>>>>>committing to CVS. If you're not worried about XML, you could
>>>>>>>>>                  
>>>>>>>>>
>>>>just
>>>>        
>>>>
>>>>>>try
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>first fix, which is a change to SearchIO::blast.
>>>>>>>>>
>>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
>>>>>>>>>                  
>>>>>>>>>
>>>>>>script
>>>>>>            
>>>>>>
>>>>>>>>>which
>>>>>>>>>had problems; the script you used saves the output but doesn't
>>>>>>>>>                  
>>>>>>>>>
>>>>>>actually
>>>>>>            
>>>>>>
>>>>>>>>>parse it (i.e. you don't use next_result() to go through the
>>>>>>>>>                  
>>>>>>>>>
>>>>data).
>>>>        
>>>>
>>>>>>Is
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
>>>>>>>>>                  
>>>>>>>>>
>>>you
>>>      
>>>
>>>>>>tried
>>>>>>            
>>>>>>
>>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
>>>>>>>>>                  
>>>>>>>>>
>>>>readmethod
>>>>        
>>>>
>>>>>>=>
>>>>>>            
>>>>>>
>>>>>>>>>blast"
>>>>>>>>>using your version of RemoteBlast and method next_result()?
>>>>>>>>>                  
>>>>>>>>>
>>>Like
>>>      
>>>
>>>>>>below
>>>>>>            
>>>>>>
>>>>>>>>>(from
>>>>>>>>>perldoc):
>>>>>>>>>
>>>>>>>>>while ( my @rids = $factory->each_rid ) {
>>>>>>>>>foreach my $rid ( @rids ) {
>>>>>>>>>my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>if( !ref($rc) ) {
>>>>>>>>>if( $rc < 0 ) {
>>>>>>>>>$factory->remove_rid($rid);
>>>>>>>>>}
>>>>>>>>>print STDERR "." if ( $v > 0 );
>>>>>>>>>sleep 5;
>>>>>>>>>} else { # parsing
>>>>>>>>>starts here
>>>>>>>>>my $result = $rc->next_result(); # it should hang
>>>>>>>>>here
>>>>>>>>>#save the output
>>>>>>>>>my $filename = $result->query_name()."\.out";
>>>>>>>>>$factory->save_output($filename);
>>>>>>>>>$factory->remove_rid($rid);
>>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>>>>while ( my $hit = $result->next_hit ) {
>>>>>>>>>next unless ( $v > 0);
>>>>>>>>>print "\thit name is ", $hit->name, "\n";
>>>>>>>>>while( my $hsp = $hit->next_hsp ) {
>>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>My script hanged if I used next_result() in any way prior to
>>>>>>>>>                  
>>>>>>>>>
>>>the
>>>      
>>>
>>>>>>fixes.
>>>>>>            
>>>>>>
>>>>>>>>I
>>>>>>>>                
>>>>>>>>
>>>>>>>>>want to see how many others are having the same issues with
>>>>>>>>>                  
>>>>>>>>>
>>>>parsing
>>>>        
>>>>
>>>>>>>>using
>>>>>>>>                
>>>>>>>>
>>>>>>>>>the CVS version of bioperl-live.
>>>>>>>>>
>>>>>>>>>Christopher Fields
>>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>>Dept. of Biochemistry
>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
>>>>>>>>>>                    
>>>>>>>>>>
>>>l-
>>>      
>>>
>>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
>>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
>>>>>>>>>>To: Huang Jian; bioperl-l
>>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>
>>>>>>>>>>Hi Huang,
>>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
>>>>>>>>>>                    
>>>>>>>>>>
>>>>works
>>>>        
>>>>
>>>>>>on
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>logic of checking the temporary file size to determine
>>>>>>>>>>                    
>>>>>>>>>>
>>>whether
>>>      
>>>
>>>>the
>>>>        
>>>>
>>>>>>>>Blast
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>results are ready. This condition is not getting satisfied
>>>>>>>>>>                    
>>>>>>>>>>
>>>may
>>>      
>>>
>>>>be
>>>>        
>>>>
>>>>>>due
>>>>>>            
>>>>>>
>>>>>>>>to
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>some changes brought about by NCBI. I had this problem
>>>>>>>>>>                    
>>>>>>>>>>
>>>>recently
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>>>figured out that the solution was to use the latest version
>>>>>>>>>>                    
>>>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>has
>>>>>>            
>>>>>>
>>>>>>>>>>this problem fixed (does not use file size logic any more)
>>>>>>>>>>                    
>>>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>is
>>>>>>            
>>>>>>
>>>>>>>>not
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>yet included in the BioPerl package.
>>>>>>>>>>Cheers
>>>>>>>>>>Nagesh
>>>>>>>>>>
>>>>>>>>>>Huang Jian wrote:
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>Dear Nagesh,
>>>>>>>>>>>
>>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>you
>>>>        
>>>>
>>>>>>send
>>>>>>            
>>>>>>
>>>>>>>>>>>me. Now it works perfectly!!!
>>>>>>>>>>>
>>>>>>>>>>>Thank you!!
>>>>>>>>>>>
>>>>>>>>>>>Huang
>>>>>>>>>>>
>>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
>>>>>>>>>>>
>>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
>>>>>>>>>>>
>>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
>>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>net,
>>>      
>>>
>>>>so
>>>>        
>>>>
>>>>>>still
>>>>>>            
>>>>>>
>>>>>>>>>>>via email
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>Hi Huang,
>>>>>>>>>>>>I see that you are submitting a sequence for a remote
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>blast
>>>      
>>>
>>>>>>search.
>>>>>>            
>>>>>>
>>>>>>>>>Can
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>(2005/12/09).
>>>>>>            
>>>>>>
>>>>>>>>If
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>not I have attached it with this email, try to replace it
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>with
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>>>old
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>>>one which has a bug.
>>>>>>>>>>>>Let me know if it works.
>>>>>>>>>>>>Nagesh
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>                  
>>>>>>>>>
>>>>>>_______________________________________________
>>>>>>            
>>>>>>
>>>>>>>>Bioperl-l mailing list
>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>                
>>>>>>>>
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>            
>>>>>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From jason.stajich at duke.edu  Thu Feb 16 09:00:01 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 16 Feb 2006 09:00:01 -0500
Subject: [Bioperl-l] searchIO bug?
In-Reply-To: <43F452F30200009B00000EC9@gwia.kvl.dk>
References: <43F452F30200009B00000EC9@gwia.kvl.dk>
Message-ID: <11B49C84-9C04-4F43-9278-A3AA09C9B773@duke.edu>

i think it would be more helpful if you posted the actual report  
rather than the protein since this may be dependent on the version of  
blast you are using.

if you used
split(/\s+/, $header)
  it wouldn't matter how many spaces.

On Feb 16, 2006, at 4:24 AM, Anders Stegmann wrote:

> Hi!
>
>
> I am blasting a protein seq against an identical protein.
> I am trying to parse the protein header by using the query_description
> method in the SearchIO module.
> After using the query_description method I use split / /      in order
> to easily access the different header components.
> Here I discover that the query_description method is somehow  
> introducing
> a space between number 5 comma and the following chromosome position
> number
> in the exon chromosome position list!?
> This truncates the list of exon chromosome positions from 7 to 4,  
> later
> yielding a wrong number of the introns counted.
>
> Is this a bug?
>
> Attached is:
>
> testblast1.pl: the blastprogram to run.
>
> Q0045 the seq that is used as both query and database seq.
> (Q0045 has to be formated in order to be used as a database:  
> formatdb -i
> Q0045 -p T -o F)
>
>
> Regards Anders.
>
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/




From cjfields at uiuc.edu  Thu Feb 16 10:50:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 09:50:04 -0600
Subject: [Bioperl-l] additional error message
In-Reply-To: <20060216100410.54a1a6d5@dogwood.plantbio.uga.edu>
Message-ID: <002901c63310$a7da1b20$15327e82@pyrimidine>

I don't think the apache error is related to the main issue here, but you
could always try upgrading LWP to see if that fixes it.  The second issue is
text parsing issues in SearchIO specific to nucleotide BLAST information,
which I'm looking into.

Jason has posted a bit on using XML.  Basically, do the following:

my $prog = 'blastn';
my $db = 'nr';
my $e_val=1e-10;
my $v = 1;
my @params=(-prog=>$prog,
 		-data=>$db,
	-expect=>$e_val,
	-readmethod=>'xml');

my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
$factory->retrieve_parameter('FORMAT_TYPE', 'XML');

You'll also need to modify following line:

my $filename = $result->query_name()."\.out";

b/c the XML tag for this feature is actually part of the rid for some
reason, so you'll get a weird output file name.  This is a problem with
NCBI's XML output, not SearchIO::XML parsing.

XML BLAST files can be really big (~5 MB and up depending on how much
information is returned), so it may take a little time to go through the
data.  Right now, it is the only consistently reliable way that output can
be parsed at this moment as NCBI keeps changing text output, sending us back
into "SearchIO::blast hell," as J.S. puts it.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> Sent: Thursday, February 16, 2006 9:04 AM
> To: Chris Fields; Pieter Monsieurs
> Cc: bioperl-l at lists.open-bio.org
> Subject: additional error message
> 
> when I check my apache error_log, there is one line saying:
> "waiting...Parsing of undecoded UTF-8 will give garbage when decoding
> entities at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 137.,"
> I also see an error saying "MSG: no data for midline  Features flanking
> this part of subject sequence:, " that is mentioned by Pieter.
> Chris, may I have your suggestion on change it to XML parsing? I read
> Jason's comments/suggestions about it, but could not make it work.
> Thanks
> 
> Guojun
> Department of Plant Biology
> University of Georgia
> 
> 
> 
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: Pieter Monsieurs [mailto:Pieter.Monsieurs at esat.kuleuven.be]
> Cc: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> 	version 1.28
> 
> 
> > Yeah, looks like it broke text output nucleotide parsing with that.
> > XML output parsing still works though (as expected).  I'll give it a
> > look.
> > > Chris
> > > On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote:
> > > > Hi,
> > >
> > > I have the same problem with the blast.pm-file.
> > > The people of NCBI added some extra info when giving the Blast-
> > > output. (see e.g. "Features flanking this part..." or "Features in
> > > this part ..."), example added.
> > > The blast.pm module starts looking for the hsp-alignement-
> > > information, but it dies when it hits this Feature-information.
> > >
> > > Pieter
> > >
> > >
> > >> gi|77552765|gb|DP000011.1|  > >> query.fcgi?
> > >> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza
> > >> sativa (japonica cultivar-group) chromosome 12, complete
> > >
> > > sequence
> > > Length=27492551
> > >
> > > Features flanking this part of subject sequence:
> > >   3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm
> > > sub-class  > > val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts>
> > >   2655 bp at 3' side: hypothetical protein  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts>
> > >
> > > Score = 36.2 bits (18),  Expect = 0.22
> > > Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  4         GTACTACTCTACTCTACT  21
> > >                 ||||||||||||||||||
> > >
> > > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >   2991 bp at 5' side: hypothetical protein  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts>
> > >   1131 bp at 3' side: hypothetical protein
> > >  > > val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts>
> > >
> > > Score = 36.2 bits (18),  Expect = 0.22
> > > Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  2         ATGTACTACTCTACTCTA  19
> > >                 ||||||||||||||||||
> > > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> > >
> > >
> > >
> > > Features in this part of subject sequence:
> > >   DHHC zinc finger domain, putative
> > >  > > val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts>
> > >
> > > Score = 34.2 bits (17),  Expect = 0.87
> > > Identities = 17/17 (100%), Gaps = 0/17 (0%)
> > > Strand=Plus/Plus
> > >
> > > Query  5         TACTACTCTACTCTACT  21
> > >                 |||||||||||||||||
> > > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> > >
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >   102 bp at 5' side: bZIP transcription factor, putative
> > >  > > val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts>
> > >   3740 bp at 3' side: yeast dcp1, putative  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts>
> > >
> > > Score = 32.2 bits (16),  Expect = 3.4
> > > Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > > Strand=Plus/Plus
> > >
> > > Query  7        CTACTCTACTCTACTC  22
> > >                ||||||||||||||||
> > > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >
> > >   21 bp at 5' side: peptide transporter T17F3.11, putative  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts>
> > >   10230 bp at 3' side: transposon protein, putative, unclassified
> > >  > > val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts>
> > >
> > > Score = 32.2 bits (16),  Expect = 3.4
> > > Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  7         CTACTCTACTCTACTC  22
> > >
> > >                 ||||||||||||||||
> > > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> > >
> > >
> > >
> > >
> > > Guojun Yang wrote:
> > >
> > >> Hi, Chris,
> > >> Finally the remoteblast test script works for the amino.fa query.
> > >> but when I try a nucleic acid sequence (see below), Error occurs: "
> > >> waiting........
> > >> ------------- EXCEPTION  -------------
> > >> MSG: no data for midline  Features flanking this part of subject
> > >> sequence:
> > >> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/
> > >> 5.8.3/Bio/Searc                             hIO/blast.pm:1172
> > >> STACK toplevel remoteblast_test:40
> > >> "
> > >> The query sequence is:
> > >> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > >> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > >> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > >> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > >>
> > >> The script (basically same as the remoteblast test, I only changed
> > >> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > >> #!/usr/bin/perl
> > >>
> > >> use Bio::SeqIO;
> > >> use Bio::Seq;
> > >> use Bio::Tools::Run::RemoteBlast;
> > >> use Bio::SearchIO;
> > >> use strict;
> > >> my $prog='blastn';
> > >> my $db='nr';
> > >> my $e_val=1e-10;
> > >> my @params=( -prog=>$prog,
> > >> 	-data=>$db,
> > >> 	-expect=>$e_val,
> > >> 	-readmethod=>'SearchIO');
> > >> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>
> > >> my $v = 1;
> > >>
> > >> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > >>
> > >> while (my $input = $str->next_seq()){
> > >>  #Blast a sequence against a database:
> > >>  #Alternatively, you could  pass in a file with many
> > >>  #sequences rather than loop through sequence one at a time
> > >>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>  #and swap the two lines below for an example of that.
> > >>  my $r = $factory->submit_blast($input);
> > >>  #my $r = $factory->submit_blast('amino.fa');
> > >>  print STDERR "waiting..." if( $v > 0 );
> > >>  while ( my @rids = $factory->each_rid ) {
> > >>    foreach my $rid ( @rids ) {
> > >>      my $rc = $factory->retrieve_blast($rid);
> > >>      if( !ref($rc) ) {
> > >>        if( $rc < 0 ) {
> > >>          $factory->remove_rid($rid);
> > >>        }
> > >>        print STDERR "." if ( $v > 0 );
> > >>        sleep 5;
> > >>      } else {
> > >>        my $result = $rc->next_result();
> > >>        #save the output
> > >>        my $filename = $result->query_name()."\.out";
> > >>        $factory->save_output($filename);
> > >>        $factory->remove_rid($rid);
> > >>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>        while ( my $hit = $result->next_hit ) {
> > >>          next unless ( $v > 0);
> > >>          print "\thit name is ", $hit->name, "\n";
> > >>          while( my $hsp = $hit->next_hsp ) {
> > >>            print "\t\tscore is ", $hsp->score, "\n";
> > >>          }
> > >>        }
> > >>      }
> > >>    }
> > >>  }
> > >> }
> > >>
> > >>
> > >> Do you think there might still be something in the NCBI output
> > >> format?
> > >>
> > >> Thank you,
> > >> Guojun
> > >>
> > >>
> > >>
> > >>
> > >> Guojun Yang
> > >> Department of Plant Biology
> > >> University of Georgia
> > >> Tel: 706-542-1857
> > >> Fax: 706-542-1805
> > >> http://www.arches.uga.edu/~guojun
> > >>
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>
> > >>
> > >>
> > >>> Sorry, forgot to add that I didn't see the regex issue that you
> > >>> mentioned.
> > >>> It could be a perl-related issue.  Try the fixes I mentioned and
> > >>> see what
> > >>> happens.
> > >>>
> > >>>> Christopher Fields
> > >>>>
> > >>> Postdoctoral Researcher - Switzer Lab
> > >>> Dept. of Biochemistry
> > >>> University of Illinois Urbana-Champaign
> > >>>>>> -----Original Message-----
> > >>>>>>
> > >>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>> Sent: Tuesday, February 14, 2006 12:36 PM
> > >>>> To: 'gyang at plantbio.uga.edu'
> > >>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>
> > >>>>>> It's a good habit to always add single quotes around words.
> > >>>>>> The perl
> > >>>>>>
> > >>>> interpreter may think a single bare word is a subroutine or
> > >>>> perlfunc
> > >>>> called with no args so will try to find a subroutine named blastp
> > >>>> ().  My
> > >>>> debugger actually gives the error that the bare word blastp may
> > >>>> conflict
> > >>>> with a future reserved word.  Like you said, 'use strict' will
> > >>>> point that
> > >>>> out.
> > >>>>
> > >>>>>> As for the regex, it should match all the blast programs at
> > >>>>>> NCBI (blastp,
> > >>>>>>
> > >>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure
> > >>>> nothing
> > >>>> else passes through.
> > >>>>
> > >>>>>> So, if you are using the script below, there are several
> > >>>>>> errors.  The bare
> > >>>>>>
> > >>>> words for $prog and $db need quotes, and the flags for you
> > >>>> @params array
> > >>>> don't have a dash before them.  I get this after adding quotes
> > >>>> but before
> > >>>> adding the dashes to @params:
> > >>>>
> > >>>>>> C:\Perl\Scripts>test_blast.pl
> > >>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>>>>>
> > >>>> MSG:
> > >>>> STACK: Error::throw
> > >>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > >>>> live/Bio/Root/Root.pm:328
> > >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > >>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > >>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl
> > >>>> \bioperl-
> > >>>> live/Bio/Tools/Run/RemoteBlast.pm:256
> > >>>> STACK: C:\Perl\Scripts\test_blast.pl:15
> > >>>> -----------------------------------------------------------
> > >>>>
> > >>>>>> The last line indicates a problem with this line:
> > >>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>> Changing the @params to this:
> > >>>>>> my @params=( -prog=>$prog,
> > >>>>>>
> > >>>> 	-data=>$db,
> > >>>> 	-expect=>$e_val,
> > >>>> 	-readmethod=>'SearchIO');
> > >>>>
> > >>>>>> fixes it, and I get output as expected.
> > >>>>>> Christopher Fields
> > >>>>>>
> > >>>> Postdoctoral Researcher - Switzer Lab
> > >>>> Dept. of Biochemistry
> > >>>> University of Illinois Urbana-Champaign
> > >>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>>
> > >>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>> Sent: Tuesday, February 14, 2006 11:48 AM
> > >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>>
> > >>>>> Hi, Chris,
> > >>>>> When I tried with the perldoc script, It did not work either.
> > >>>>> First it
> > >>>>> says $prog can not be bare word if I "use strict". I added
> > >>>>> quotes on the
> > >>>>> words, then it says the value for $prog does not match expression
> > >>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > >>>>>
> > >>>> script
> > >>>>
> > >>>>> is shown below. Why is the expression "t?blast[pnx]"?
> > >>>>>
> > >>>>> #!/usr/bin/perl
> > >>>>>
> > >>>>> use Bio::SeqIO;
> > >>>>> use Bio::Seq;
> > >>>>> use Bio::Tools::Run::RemoteBlast;
> > >>>>> use Bio::SearchIO;
> > >>>>>
> > >>>>>
> > >>>>> my $prog=blastp;
> > >>>>> my $db=swissprot;
> > >>>>> my $e_val=1e-10;
> > >>>>> my @params=( prog=>$prog,
> > >>>>> 	data=>$db,
> > >>>>> 	expect=>$e_val,
> > >>>>> 	readmethod=>'SearchIO');
> > >>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>
> > >>>>> my $v = 1;
> > >>>>>
> > >>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  > >>>>>
> 'fasta' );
> > >>>>>
> > >>>>> while (my $input = $str->next_seq()){
> > >>>>>  #Blast a sequence against a database:
> > >>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>  #and swap the two lines below for an example of that.
> > >>>>>  my $r = $factory->submit_blast($input);
> > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>    foreach my $rid ( @rids ) {
> > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>      if( !ref($rc) ) {
> > >>>>>        if( $rc < 0 ) {
> > >>>>>          $factory->remove_rid($rid);
> > >>>>>        }
> > >>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>        sleep 5;
> > >>>>>      } else {
> > >>>>>        my $result = $rc->next_result();
> > >>>>>        #save the output
> > >>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>        $factory->save_output($filename);
> > >>>>>        $factory->remove_rid($rid);
> > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>          next unless ( $v > 0);
> > >>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>          }
> > >>>>>        }
> > >>>>>      }
> > >>>>>    }
> > >>>>>  }
> > >>>>> }
> > >>>>>
> > >>>>> Thank you for your help!
> > >>>>>
> > >>>>>
> > >>>>> Guojun
> > >>>>> Department of Plant Biology
> > >>>>> University of Georgia
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>> To: gyang at plantbio.uga.edu
> > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> Try two things:
> > >>>>>>
> > >>>>>>> 1)  Use a much simpler script, like the one in 'perldoc
> > >>>>>>>
> > >>>>>> Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's
> > >>>>>> something
> > >>>>>>
> > >>>>> wrong
> > >>>>>
> > >>>>>> with the logic in your subroutine:
> > >>>>>>
> > >>>>>>> my $v = 1;
> > >>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  >
> >>>>>>> 'fasta' );
> > >>>>>>> while (my $input = $str->next_seq()){
> > >>>>>>>
> > >>>>>>  #Blast a sequence against a database:
> > >>>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>>  #and swap the two lines below for an example of that.
> > >>>>>>  my $r = $factory->submit_blast($input);
> > >>>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>>    foreach my $rid ( @rids ) {
> > >>>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>>      if( !ref($rc) ) {
> > >>>>>>        if( $rc < 0 ) {
> > >>>>>>          $factory->remove_rid($rid);
> > >>>>>>        }
> > >>>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>>        sleep 5;
> > >>>>>>      } else {
> > >>>>>>        my $result = $rc->next_result();
> > >>>>>>        #save the output
> > >>>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>>        $factory->save_output($filename);
> > >>>>>>        $factory->remove_rid($rid);
> > >>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>>          next unless ( $v > 0);
> > >>>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>          }
> > >>>>>>        }
> > >>>>>>      }
> > >>>>>>    }
> > >>>>>>  }
> > >>>>>> }
> > >>>>>>
> > >>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > >>>>>>>
> > >>>> really
> > >>>>
> > >>>>>> shouldn't make that much of a difference, but I noticed that
> > >>>>>> the CVS
> > >>>>>> RemoteBlast (1.28) was changed in Dec 2005, after
> > >>>>>> bioperl-1.5.1 was
> > >>>>>> released; the Bugzilla version is based off CVS.
> > >>>>>>
> > >>>>>>> Christopher Fields
> > >>>>>>>
> > >>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>> Dept. of Biochemistry
> > >>>>>> University of Illinois Urbana-Champaign
> > >>>>>>
> > >>>>>>>> -----Original Message-----
> > >>>>>>>>
> > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>> Sent: Monday, February 13, 2006 3:00 PM
> > >>>>>>> To: bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>
> > >>>>>>>>> Thanks, Chris,
> > >>>>>>>>>
> > >>>>>>> I installed version 1.5.1 and replaced the blast.pm file with
> > >>>>>>> the
> > >>>>>>>
> > >>>> one
> > >>>>
> > >>>>> from
> > >>>>>
> > >>>>>>> your bug report. The running version is 1.5 when I use the
> > >>>>>>> command
> > >>>>>>>
> > >>>> you
> > >>>>
> > >>>>>>> sent me. But when I tried the script, it doesn't change much. My
> > >>>>>>> remoteblast code (portion) is here:
> > >>>>>>>
> > >>>>>>>>> sub search {
> > >>>>>>>>>
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}
> > >>>>>>> ="$ORGN";
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}
> > >>>>>>> =5000;
> > >>>>>>> local
> > >>>>>>>
> > >>>>>>>
> > >>>> $Bio::Tools::Run::RemoteBlast::HEADER
> > >>>> {'COMPOSITION_BASED_STATISTICS'}=
> > >>>>
> > >>>>>>> 'no';
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > >>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > >>>>>>> 			      -id=>"query",
> > >>>>>>> 			      -desc=>"new seq");
> > >>>>>>> my $len=$query->length();
> > >>>>>>> @db=('nr','htgs','wgs');
> > >>>>>>> foreach my $db (@db) {
> > >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'  >
> >>>>>>> =>'blastn',
> > >>>>>>> 						'-data' =>"$db",
> > >>>>>>>
> > >>>>>>>
> > >>> '-expect'=>"$E_value");
> > >>>
> > >>>>>>>>>>> my $blast_report = $factory->submit_blast($query);
> > >>>>>>>>>>>
> > >>>>>>>>> my @rids = $factory->each_rid();
> > >>>>>>>>>
> > >>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>    print STDERR "$rid\n";
> > >>>>>>> }
> > >>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > >>>>>>> print STDERR "waiting...";
> > >>>>>>> sleep 60;
> > >>>>>>>
> > >>>>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>>>
> > >>>>>>>    my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>    while (!ref($rc) ) {
> > >>>>>>> 	if( $rc < 0 ) {
> > >>>>>>> # retrieve_blast returns -1 on error
> > >>>>>>> 	    $factory->remove_rid($rid);
> > >>>>>>> 	    print "Error!\n";
> > >>>>>>> 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > >>>>>>> 	    die "Can't retrieve $rid";
> > >>>>>>> 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > >>>>>>>
> > >>>> finished'
> > >>>>
> > >>>>>>> 	    sleep 60;
> > >>>>>>> 	    $rc = $factory->retrieve_blast($rid);
> > >>>>>>> 	}
> > >>>>>>>    }
> > >>>>>>>    if (ref($rc)) {
> > >>>>>>> 	print STDERR "Done.\n";
> > >>>>>>> 	 while( my $result = $rc->next_result) {
> > >>>>>>> 	    while( my $hit = $result->next_hit()) {
> > >>>>>>> 	    	$hit_name=$hit->name;
> > >>>>>>> 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > >>>>>>> 		$name=$1;
> > >>>>>>> 		@left_plus_start=();
> > >>>>>>> 		@left_plus_end=();
> > >>>>>>> 		@left_minus_start=();
> > >>>>>>> 		@left_minus_end=();
> > >>>>>>> 		@right_plus_start=();
> > >>>>>>> 		@right_plus_end=();
> > >>>>>>> 		@right_minus_start=();
> > >>>>>>> 		@right_minus_end=();
> > >>>>>>>
> > >>>>>>>>> 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > >>>>>>>>>
> > >>>>>>> 		while( my $hsp = $hit->next_hsp()) {
> > >>>>>>> ......
> > >>>>>>>
> > >>>>>>>>> It was working quite well before around October laster
> > >>>>>>>>> year, but
> > >>>>>>>>>
> > >>>>> it has
> > >>>>>
> > >>>>>>> stopped since then, When a submission is sent via a webpage,
> > >>>>>>> the cgi
> > >>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there,
> > >>>>>>>
> > >>>>> finally
> > >>>>>
> > >>>>>>> the expected email is received but without real results
> > >>>>>>> although it
> > >>>>>>>
> > >>>>> does
> > >>>>>
> > >>>>>>> contain something from other parts of the script. Apparently the
> > >>>>>>>
> > >>>>> search
> > >>>>>
> > >>>>>>> sub did not return anything (I know there is something should be
> > >>>>>>> returned.). Is it also possible the format of the NCBI output
> > >>>>>>> for
> > >>>>>>>
> > >>>> each
> > >>>>
> > >>>>>>> result has changed?
> > >>>>>>> Thank you,
> > >>>>>>> Guojun
> > >>>>>>>
> > >>>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>>>
> > >>>>>>> University of Georgia
> > >>>>>>>
> > >>>>>>>>>>>>> ----- Original Message -----
> > >>>>>>>>>>>>>
> > >>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>
> > >>>>>>>>>>>> How do you know two versions are installed (i.e. how are
> > >>>>>>>>>>>>
> > >>>> you
> > >>>>
> > >>>>> checking
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>> version)?  Do you see have two complete bioperl
> > >>>>>>>> distributions (in
> > >>>>>>>>
> > >>>>> two
> > >>>>>
> > >>>>>>>> separate directories) or are you looking in modules?  Here's
> > >>>>>>>> the
> > >>>>>>>>
> > >>>> way
> > >>>>
> > >>>>> to
> > >>>>>
> > >>>>>>>> check the version (from the FAQ):
> > >>>>>>>>
> > >>>>>>>>> perl -MBio::Root::Version -e 'print
> > >>>>>>>>>
> > >>>>> $Bio::Root::Version::VERSION,"\n"'
> > >>>>>
> > >>>>>>>>> If you have two full bioperl distributions on your computer,
> > >>>>>>>>>
> > >>>>> normally
> > >>>>>
> > >>>>>>> only
> > >>>>>>>
> > >>>>>>>> one will be in use unless you have explicitly set the
> > >>>>>>>> environment
> > >>>>>>>>
> > >>>>>>> variable
> > >>>>>>>
> > >>>>>>>> PERL5LIB.  The PERL5LIB  directories will be searched first
> > >>>>>>>> before
> > >>>>>>>>
> > >>>>> your
> > >>>>>
> > >>>>>>>> normal perl directory list (@INC) is searched.  You MAY get
> > >>>>>>>> some
> > >>>>>>>>
> > >>>>> mixing
> > >>>>>
> > >>>>>>>> then, but only if perl can't find a particular module in the
> > >>>>>>>> path
> > >>>>>>>>
> > >>>>>>> designated
> > >>>>>>>
> > >>>>>>>> in PERL5LIB; then it will progress through the directories
> > >>>>>>>> listed
> > >>>>>>>>
> > >>>> in
> > >>>>
> > >>>>>>> @INC.
> > >>>>>>>
> > >>>>>>>> This may happen if a module is unique to a particular
> > >>>>>>>> release, but
> > >>>>>>>>
> > >>>>>>> shouldn't
> > >>>>>>>
> > >>>>>>>> happen for the majority of modules, including RemoteBlast.  You
> > >>>>>>>>
> > >>>> can
> > >>>>
> > >>>>>>> check
> > >>>>>>>
> > >>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC
> > >>>>>>>> will
> > >>>>>>>>
> > >>>>> differ
> > >>>>>
> > >>>>>>>> depending on your OS, perl build, etc.
> > >>>>>>>>
> > >>>>>>>>> Regardless, if you follow the directions for installing
> > >>>>>>>>> bioperl
> > >>>>>>>>>
> > >>>>> for
> > >>>>>
> > >>>>>>> your
> > >>>>>>>
> > >>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make
> > >>>>>>>> install',
> > >>>>>>>>
> > >>>>> unless
> > >>>>>
> > >>>>>>> you
> > >>>>>>>
> > >>>>>>>> explicitly change the installation directory when using 'perl
> > >>>>>>>>
> > >>>>>>> Makefile.PL'),
> > >>>>>>>
> > >>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will
> > >>>>>>>>
> > >>>>> install
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>> Bioperl distribution you downloaded over the old version in
> > >>>>>>>> @INC.
> > >>>>>>>>
> > >>>>> See
> > >>>>>
> > >>>>>>> this
> > >>>>>>>
> > >>>>>>>> page:
> > >>>>>>>>
> > >>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > >>>>>>>>> for more details.
> > >>>>>>>>> Christopher Fields
> > >>>>>>>>>
> > >>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>> Dept. of Biochemistry
> > >>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>>
> > >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM
> > >>>>>>>>> To: bioperl-l at lists.open-bio.org
> > >>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>>>
> > >>>>>>>>>>> Hi, Chris,
> > >>>>>>>>>>>
> > >>>>>>>>> I do have different versions of bioperl on my Linux machine
> > >>>>>>>>>
> > >>>> (1.4.
> > >>>>
> > >>>>> and
> > >>>>>
> > >>>>>>>>> 1.5.0), this may be the problem. Should I just install
> > >>>>>>>>> bioperl-
> > >>>>>>>>>
> > >>>>> 1.5.1
> > >>>>>
> > >>>>>>> or I
> > >>>>>>>
> > >>>>>>>>> need to uninstall and remove the previous versions. I could
> > >>>>>>>>> not
> > >>>>>>>>>
> > >>>>> find
> > >>>>>
> > >>>>>>> any
> > >>>>>>>
> > >>>>>>>>> hint on uninstalling bioperl on linux. Could you please
> > >>>>>>>>> give me
> > >>>>>>>>>
> > >>>>> some
> > >>>>>
> > >>>>>>>>> suggestion?
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Guojun
> > >>>>>>>>>
> > >>>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>>>
> > >>>>>>>>> University of Georgia
> > >>>>>>>>>      _____
> > >>>>>>>>>
> > >>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>
> > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely
> > >>>>>>>>>>>>>>>
> > >>>>>>> updated from CVS
> > >>>>>>>
> > >>>>>>>>> which isn't the latest fix.
> > >>>>>>>>>
> > >>>>>>>>>>> Make sure that you check the following:
> > >>>>>>>>>>> 1) Always post to the mailing list:
> > >>>>>>>>>>>
> > >>>>>>>>> http://www.bioperl.org/wiki/
> > >>>>>>>>> HOWTO:Beginners#Getting_Assistance .
> > >>>>>>>>>
> > >>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > >>>>>>>>>>>
> > >>>>> (CVS)
> > >>>>>
> > >>>>>>>>> installed first.  Perform a clean installation; do not upgrade
> > >>>>>>>>>
> > >>>>> only
> > >>>>>
> > >>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > >>>>>>>>>
> > >>>> can't
> > >>>>
> > >>>>>>>>> guarantee that mixing modules from old and new distributions
> > >>>>>>>>>
> > >>>> (1.4
> > >>>>
> > >>>>> and
> > >>>>>
> > >>>>>>>>> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-
> > >>>>>>>>> live
> > >>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be
> > >>>>>>>>>
> > >>>>> saved
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>> parsed; it will not parse the newest BLAST text output from
> > >>>>>>>>> NCBI
> > >>>>>>>>>
> > >>>>>>> (v2.2.13)
> > >>>>>>>
> > >>>>>>>>> but it should still save it. I believe as long as
> > >>>>>>>>> next_results()
> > >>>>>>>>>
> > >>>>> isn't
> > >>>>>
> > >>>>>>>>> called, it will work.
> > >>>>>>>>>
> > >>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST
> > >>>>>>>>>>>
> > >>>> 2.2.13
> > >>>>
> > >>>>>>> text output
> > >>>>>>>
> > >>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by
> > >>>>>>>>>
> > >>>> Roger
> > >>>>
> > >>>>> Hall
> > >>>>>
> > >>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be
> > >>>>>>>>>
> > >>>>> (Jason
> > >>>>>
> > >>>>>>> or
> > >>>>>>>
> > >>>>>>>>> whomever is in charge of Bio::SearchIO).  They can be found in
> > >>>>>>>>>
> > >>>>>>> Bugzilla:
> > >>>>>>>
> > >>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>>>
> > >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>
> > >>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > >>>>>>>>>>>
> > >>>>> option
> > >>>>>
> > >>>>>>> of
> > >>>>>>>
> > >>>>>>>>> saving XML output, so isn't necessary if you don't plan on
> > >>>>>>>>> using
> > >>>>>>>>>
> > >>>>> this
> > >>>>>
> > >>>>>>>>> option.  And, remember, they haven't been committed yet to
> > >>>>>>>>> CVS,
> > >>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>>>> means that the final version will change to refle the new
> > >>>>>>>>>
> > >>>> version.
> > >>>>
> > >>>>>>>>>>>>> Christopher Fields
> > >>>>>>>>>>>>>
> > >>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>>>>>    _____
> > >>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>>>>>>>>>>
> > >>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM
> > >>>>>>>>> To: Chris Fields
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>>>> Hi, Chris
> > >>>>>>>>>>>>>
> > >>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work
> > >>>>>>>>>>>
> > >>>>> for
> > >>>>>
> > >>>>>>> my cgi
> > >>>>>>>
> > >>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I
> > >>>>>>>>> didn't
> > >>>>>>>>>
> > >>>>> even
> > >>>>>
> > >>>>>>> get
> > >>>>>>>
> > >>>>>>>>> any RID. Is there any suggestion?
> > >>>>>>>>>
> > >>>>>>>>>>>>>>> Guojun
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>> Guojun Yang
> > >>>>>>>>>>>>>
> > >>>>>>>>> Department of Plant Biology
> > >>>>>>>>> University of Georgia
> > >>>>>>>>> Tel: 706-542-1857
> > >>>>>>>>> Fax: 706-542-1805
> > >>>>>>>>> http://www.arches.uga.edu/~guojun
> > >>>>>>>>>    _____
> > >>>>>>>>>
> > >>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>>>
> > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > >>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>> I would say give the new code a try, but realize that it
> > >>>>>>>>>>>
> > >>>>> hasn't
> > >>>>>
> > >>>>>>> been
> > >>>>>>>
> > >>>>>>>>> checked
> > >>>>>>>>> in (like I said below). I will try going over the modified
> > >>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is
> > >>>>>>>>>
> > >>>>> anything I
> > >>>>>
> > >>>>>>>>> might
> > >>>>>>>>> have missed. The changed order in the header of BLAST text
> > >>>>>>>>>
> > >>>> output
> > >>>>
> > >>>>> has
> > >>>>>
> > >>>>>>> me a
> > >>>>>>>
> > >>>>>>>>> bit worried that it might not catch everything, but it at
> > >>>>>>>>> least
> > >>>>>>>>>
> > >>>>>>> doesn't
> > >>>>>>>
> > >>>>>>>>> hang
> > >>>>>>>>> in the while() loop I described in the bug report below (bug
> > >>>>>>>>>
> > >>>>> #1934)
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>> seems to process everything fine.
> > >>>>>>>>>
> > >>>>>>>>>>> If you want more stability in the code, you might consider
> > >>>>>>>>>>>
> > >>>>>>> changing over
> > >>>>>>>
> > >>>>>>>>> to
> > >>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are
> > >>>>>>>>>
> > >>>>> some
> > >>>>>
> > >>>>>>>>> changes
> > >>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > >>>>>>>>>
> > >>>>> saving
> > >>>>>
> > >>>>>>> XML
> > >>>>>>>
> > >>>>>>>>> output, but I believe it parses everything regardless. If you
> > >>>>>>>>>
> > >>>> look
> > >>>>
> > >>>>>>> back
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>> last month or so there has been a bit of discussion here about
> > >>>>>>>>>
> > >>>> it.
> > >>>>
> > >>>>>>> Jason
> > >>>>>>>
> > >>>>>>>>> describes a bit on how to set up RemoteBlast for XML:
> > >>>>>>>>>
> > >>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > >>>>>>>>>>>
> > >>>>>>> remoteblast/
> > >>>>>>>
> > >>>>>>>>>>> Christopher Fields
> > >>>>>>>>>>>
> > >>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>>>
> > >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM
> > >>>>>>>>>> To: bioperl-l at bioperl.org
> > >>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>>>
> > >>>>> version
> > >>>>>
> > >>>>>>> 1.28
> > >>>>>>>
> > >>>>>>>>>> Hi, Everybody,
> > >>>>>>>>>> I see this post and am wondering if this is the reason for
> > >>>>>>>>>> the
> > >>>>>>>>>> malfunctionning of my webserver. We set up a webserver named
> > >>>>>>>>>>
> > >>>>> MAK,
> > >>>>>
> > >>>>>>> for
> > >>>>>>>
> > >>>>>>>>> MITE
> > >>>>>>>>>
> > >>>>>>>>>> sequence analysis. It was working very well until around
> > >>>>>>>>>>
> > >>>>> November
> > >>>>>
> > >>>>>>> 2005,
> > >>>>>>>
> > >>>>>>>>>> when it stopped returning any result (the site is fine and
> > >>>>>>>>>>
> > >>>> seems
> > >>>>
> > >>>>> to
> > >>>>>
> > >>>>>>> be
> > >>>>>>>
> > >>>>>>>>>> doing sth after submission). In the CGI script, I used
> > >>>>>>>>>>
> > >>>>> remoteblast
> > >>>>>
> > >>>>>>> (that
> > >>>>>>>
> > >>>>>>>>>> work was done in 2003) to do searches. I currently do not
> > >>>>>>>>>> have
> > >>>>>>>>>>
> > >>>>>>> access to
> > >>>>>>>
> > >>>>>>>>>> the server because I moved. Quite several people sent emails
> > >>>>>>>>>>
> > >>>> to
> > >>>>
> > >>>>> us
> > >>>>>
> > >>>>>>> about
> > >>>>>>>
> > >>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the
> > >>>>>>>>>>
> > >>>>> problem?
> > >>>>>
> > >>>>>>>>> Should
> > >>>>>>>>>
> > >>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new
> > >>>>>>>>>>
> > >>>>> version?
> > >>>>>
> > >>>>>>>>>> Thanks a lot,
> > >>>>>>>>>> Guojun
> > >>>>>>>>>>
> > >>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>> University of Georgia
> > >>>>>>>>>> Tel: 706-542-1857
> > >>>>>>>>>> Fax: 706-542-1805
> > >>>>>>>>>> http://www.arches.uga.edu/~guojun
> > >>>>>>>>>> _____
> > >>>>>>>>>>
> > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > >>>>>>>>>>
> > >>>>> Jian'
> > >>>>>
> > >>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > >>>>>>>>>>
> > >>>> [mailto:bioperl-
> > >>>>
> > >>>>>>>>>> l at bioperl.org]
> > >>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > >>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>
> > >>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl-
> > >>>>>>>>>> live
> > >>>>>>>>>>
> > >>>>> CVS.
> > >>>>>
> > >>>>>>> It
> > >>>>>>>
> > >>>>>>>>>> will
> > >>>>>>>>>> work for saving text output. However, it will not parse
> > >>>>>>>>>>
> > >>>> anything
> > >>>>
> > >>>>>>> using
> > >>>>>>>
> > >>>>>>>>>> next_result (it will likely hang) and will not save XML
> > >>>>>>>>>>
> > >>>> format.
> > >>>>
> > >>>>> See
> > >>>>>
> > >>>>>>>>> these
> > >>>>>>>>>
> > >>>>>>>>>> bugs:
> > >>>>>>>>>>
> > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>>
> > >>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast
> > >>>>>>>>>>
> > >>>> and
> > >>>>
> > >>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been
> > >>>>>>>>>> checked in
> > >>>>>>>>>>
> > >>>>> yet
> > >>>>>
> > >>>>>>> so
> > >>>>>>>
> > >>>>>>>>> are
> > >>>>>>>>>
> > >>>>>>>>>> still not included in bioperl-live; they may be further
> > >>>>>>>>>>
> > >>>> modified
> > >>>>
> > >>>>>>> before
> > >>>>>>>
> > >>>>>>>>>> committing to CVS. If you're not worried about XML, you could
> > >>>>>>>>>>
> > >>>>> just
> > >>>>>
> > >>>>>>> try
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>> first fix, which is a change to SearchIO::blast.
> > >>>>>>>>>>
> > >>>>>>>>>> Nagesh, I remember you posting to the list a month ago
> > >>>>>>>>>> using a
> > >>>>>>>>>>
> > >>>>>>> script
> > >>>>>>>
> > >>>>>>>>>> which
> > >>>>>>>>>> had problems; the script you used saves the output but
> > >>>>>>>>>> doesn't
> > >>>>>>>>>>
> > >>>>>>> actually
> > >>>>>>>
> > >>>>>>>>>> parse it (i.e. you don't use next_result() to go through the
> > >>>>>>>>>>
> > >>>>> data).
> > >>>>>
> > >>>>>>> Is
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > >>>>>>>>>>
> > >>>> you
> > >>>>
> > >>>>>>> tried
> > >>>>>>>
> > >>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "-
> > >>>>>>>>>>
> > >>>>> readmethod
> > >>>>>
> > >>>>>>> =>
> > >>>>>>>
> > >>>>>>>>>> blast"
> > >>>>>>>>>> using your version of RemoteBlast and method next_result()?
> > >>>>>>>>>>
> > >>>> Like
> > >>>>
> > >>>>>>> below
> > >>>>>>>
> > >>>>>>>>>> (from
> > >>>>>>>>>> perldoc):
> > >>>>>>>>>>
> > >>>>>>>>>> while ( my @rids = $factory->each_rid ) {
> > >>>>>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>>>> my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>>>> if( !ref($rc) ) {
> > >>>>>>>>>> if( $rc < 0 ) {
> > >>>>>>>>>> $factory->remove_rid($rid);
> > >>>>>>>>>> }
> > >>>>>>>>>> print STDERR "." if ( $v > 0 );
> > >>>>>>>>>> sleep 5;
> > >>>>>>>>>> } else { # parsing
> > >>>>>>>>>> starts here
> > >>>>>>>>>> my $result = $rc->next_result(); # it should hang
> > >>>>>>>>>> here
> > >>>>>>>>>> #save the output
> > >>>>>>>>>> my $filename = $result->query_name()."\.out";
> > >>>>>>>>>> $factory->save_output($filename);
> > >>>>>>>>>> $factory->remove_rid($rid);
> > >>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>>>>> while ( my $hit = $result->next_hit ) {
> > >>>>>>>>>> next unless ( $v > 0);
> > >>>>>>>>>> print "\thit name is ", $hit->name, "\n";
> > >>>>>>>>>> while( my $hsp = $hit->next_hsp ) {
> > >>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> My script hanged if I used next_result() in any way prior to
> > >>>>>>>>>>
> > >>>> the
> > >>>>
> > >>>>>>> fixes.
> > >>>>>>>
> > >>>>>>>>> I
> > >>>>>>>>>
> > >>>>>>>>>> want to see how many others are having the same issues with
> > >>>>>>>>>>
> > >>>>> parsing
> > >>>>>
> > >>>>>>>>> using
> > >>>>>>>>>
> > >>>>>>>>>> the CVS version of bioperl-live.
> > >>>>>>>>>>
> > >>>>>>>>>> Christopher Fields
> > >>>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > >>>>>>>>>>>
> > >>>> l-
> > >>>>
> > >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > >>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM
> > >>>>>>>>>>> To: Huang Jian; bioperl-l
> > >>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi Huang,
> > >>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm
> > >>>>>>>>>>>
> > >>>>> works
> > >>>>>
> > >>>>>>> on
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>>> logic of checking the temporary file size to determine
> > >>>>>>>>>>>
> > >>>> whether
> > >>>>
> > >>>>> the
> > >>>>>
> > >>>>>>>>> Blast
> > >>>>>>>>>
> > >>>>>>>>>>> results are ready. This condition is not getting satisfied
> > >>>>>>>>>>>
> > >>>> may
> > >>>>
> > >>>>> be
> > >>>>>
> > >>>>>>> due
> > >>>>>>>
> > >>>>>>>>> to
> > >>>>>>>>>
> > >>>>>>>>>>> some changes brought about by NCBI. I had this problem
> > >>>>>>>>>>>
> > >>>>> recently
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>>>> figured out that the solution was to use the latest version
> > >>>>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>> has
> > >>>>>>>
> > >>>>>>>>>>> this problem fixed (does not use file size logic any more)
> > >>>>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>> is
> > >>>>>>>
> > >>>>>>>>> not
> > >>>>>>>>>
> > >>>>>>>>>>> yet included in the BioPerl package.
> > >>>>>>>>>>> Cheers
> > >>>>>>>>>>> Nagesh
> > >>>>>>>>>>>
> > >>>>>>>>>>> Huang Jian wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Dear Nagesh,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > >>>>>>>>>>>>
> > >>>>> you
> > >>>>>
> > >>>>>>> send
> > >>>>>>>
> > >>>>>>>>>>>> me. Now it works perfectly!!!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thank you!!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Huang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka"
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l"
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM
> > >>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > >>>>>>>>>>>>
> > >>>> net,
> > >>>>
> > >>>>> so
> > >>>>>
> > >>>>>>> still
> > >>>>>>>
> > >>>>>>>>>>>> via email
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi Huang,
> > >>>>>>>>>>>>> I see that you are submitting a sequence for a remote
> > >>>>>>>>>>>>>
> > >>>> blast
> > >>>>
> > >>>>>>> search.
> > >>>>>>>
> > >>>>>>>>>> Can
> > >>>>>>>>>>
> > >>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28
> > >>>>>>>>>>>>>
> > >>>>>>> (2005/12/09).
> > >>>>>>>
> > >>>>>>>>> If
> > >>>>>>>>>
> > >>>>>>>>>>>>> not I have attached it with this email, try to replace it
> > >>>>>>>>>>>>>
> > >>>>> with
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>>>> old
> > >>>>>>>>>>
> > >>>>>>>>>>>>> one which has a bug.
> > >>>>>>>>>>>>> Let me know if it works.
> > >>>>>>>>>>>>> Nagesh
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>> _______________________________________________
> > >>>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>>
> > >>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>> _______________________________________________
> > >>>>>>>>>
> > >>>>>>> Bioperl-l mailing list
> > >>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>
> > >>>>>>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >>
> > >
> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> > >
> > > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >



From Marc.Logghe at DEVGEN.com  Thu Feb 16 10:47:13 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Thu, 16 Feb 2006 16:47:13 +0100
Subject: [Bioperl-l] Primer maps?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>

Hi Mike,
Another route you might take is mapping your primers into
Bio::SeqFeature::Generic objects and add them to the seq object. Then
you dump the object into a rich sequence format like genbank and pass
that to EMBOSS's showseq application
Or you might do it completely with showseq. Here the only thing you need
is an annotation file containing the positions of the primers, followed
by any text (e.g. primer name).
Then you do:
showseq   -translate - -format 4
-annotation 
Have a look at http://emboss.sourceforge.net/apps/showseq.html for more
options
 
HTH,
Marc
 

Marc Logghe, PhD
Expert Scientist Bioinformatics
deVGen NV
Technologiepark 30
B - 9052 Ghent-Zwijnaarde
Tel. +32 9 324 24 83
Fax. +32 9 324 24 25
Web: www.devgen.com

 --- Disclaimer start ---
This e-mail and any attachments thereto may contain information which is
confidential and/or which is proprietary to the sender. Accordingly,
this e-mail and any attachments thereto, as well as any and all
information contained therein, are intended for the sole use of the
recipient or recipients designated above. Any use of this e-mail, of any
attachments thereto, of any and all information contained therein,
and/or of any part(s) thereof (including, without limitation, total or
partial reproduction, communication and/or distribution in any form) by
persons other than the designated recipient(s) is prohibited. If you
have received this e-mail in error, please notify the sender either by
telephone or by e-mail and delete the material from any computer.
Thank you for your cooperation.
--- Disclaimer end ---
  

 


________________________________

	From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne
	Sent: Wednesday, February 15, 2006 10:20 PM
	To: bioperl-l at lists.open-bio.org
	Subject: [Bioperl-l] Primer maps?
	
	
	Hello all --
	
	I'm having a devil of a time figuring out how to make
restriction maps using BioPerl.  What I'm going for is output similar to
GCG's map program, but instead of using a set of defined restriction
enzymes, I'd like to use a set of primers, to create a primer map rather
than a restriction map.  I do not need a table of restriction enzymes
that cut or don't cut (or primers that match or don't match, in this
case), but an honest-to-goodness map, something like:
	
	                                       FKP-5->
	                                             |
	
CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
	1921
---------+---------+---------+---------+---------+---------+ 1980
	
GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
	 
	a                         M  E  I  V  S  T  F  D  E  L  Q  D  Y
-
	
	I also need translations of orfs, but I can use GenBank files as
input to the program and thus the CDS translations are already there, so
I'm guessing that shouldn't be too hard....  How does one create such a
map using the BioPerl modules?
	
	There are intriguing indications out there that such a thing is
possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I
can't find a single example of code that creates such a basic,
bread-and-butter thing as a restriction map with orf translations.  The
documentation to these modules is fairly useless to me, consisting
mostly of internal methods and function prototypes.  Perhaps my skills
as a Perl programmer are to blame, but a clear example of how a map like
this is constructed would be a big help.
	
	Right now, I'm generating primer maps with system calls to
EMBOSS's remap, pointing it at a file of primer sequences rather than a
file of restriction enzyme sequences, but the results are less than
desired.  I'm considering trying to adapt tacg 4.1.0 or sequence
extractor 1.1 web-based code to my needs, but this seems like a lot of
work for an operation I suspect is possible in BioPerl.
	
	Any help greatly appreciated...
	
	Mike
	

	
---------------------------------------------------------------------
	 //=\   Michael J. Coyne                       phone: (617)
525-7820
	 \=//   Channing Laboratory                    FAX:   (617)
264-5193
	  //=\  EBRC, Room 617
	  \=//  221 Longwood Avenue
email:mcoyne at channing.harvard.edu
	   //=\ Boston, MA 02115                 mjcoyne at comcast.net
	   \=// 
	
---------------------------------------------------------------------
	




From sdavis2 at mail.nih.gov  Thu Feb 16 09:43:45 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 16 Feb 2006 09:43:45 -0500
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost>
Message-ID: 

Do you mean that you want to use Bio::Graphics to make a picture, or just
map your primers onto a sequence?

Sean



On 2/15/06 4:20 PM, "Michael Coyne"  wrote:

> Hello all --
> 
> I'm having a devil of a time figuring out how to make restriction maps using
> BioPerl.  What I'm going for is output similar to GCG's map program, but
> instead of using a set of defined restriction enzymes, I'd like to use a set
> of primers, to create a primer map rather than a restriction map.  I do not
> need a table of restriction enzymes that cut or don't cut (or primers that
> match or don't match, in this case), but an honest-to-goodness map, something
> like:
> 
>                                       FKP-5->
>                                             |
>     CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
> 1921 ---------+---------+---------+---------+---------+---------+ 1980
>     GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>  
> a                        M  E  I  V  S  T  F  D  E L  Q  D  Y   -
> 
> I also need translations of orfs, but I can use GenBank files as input to the
> program and thus the CDS translations are already there, so I'm guessing that
> shouldn't be too hard....  How does one create such a map using the BioPerl
> modules?
> 
> There are intriguing indications out there that such a thing is possible (e.g.
> the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find a single
> example of code that creates such a basic, bread-and-butter thing as a
> restriction map with orf translations.  The documentation to these modules is
> fairly useless to me, consisting mostly of internal methods and function
> prototypes.  Perhaps my skills as a Perl programmer are to blame, but a clear
> example of how a map like this is constructed would be a big help.
> 
> Right now, I'm generating primer maps with system calls to EMBOSS's remap,
> pointing it at a file of primer sequences rather than a file of restriction
> enzyme sequences, but the results are less than desired.  I'm considering
> trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my
> needs, but this seems like a lot of work for an operation I suspect is
> possible in BioPerl.
> 
> Any help greatly appreciated...
> 
> Mike
> 
> ---------------------------------------------------------------------
>  //=\   Michael J. Coyne                      phone: (617) 525-7820
>  \=//   Channing Laboratory                   FAX:   (617) 264-5193
>   //=\  EBRC, Room 617
>   \=//  221 Longwood Avenue       email:mcoyne at channing.harvard.edu
>    //=\ Boston, MA 02115                mjcoyne at comcast.net
>    \=// 
> ---------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Thu Feb 16 11:27:13 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 11:27:13 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

Harry,

I've long suspected, but never demonstrated, that the easiest way to do
something like this is through ENSEMBL, and Jason hinted at this as well. In
fact your question is something of a FAQ, and my previous responses always
included a plea to some anonymous ENSEMBL API expert, always unheeded. At
any rate, here is an example script I made:

#!/usr/bin/perl



use strict;

use lib "/Users/bosborne/ensembl/modules";

use DBI;

use Getopt::Long;

use Bio::EnsEMBL::DBSQL::DBAdaptor;


my $name;



GetOptions( "n=s" => \$name );



my $db = Bio::EnsEMBL::DBSQL::DBAdaptor->new(
-user   => "anonymous",

-dbname => "homo_sapiens_core_37_35j",

-host   => "ensembldb.ensembl.org",

-pass   => "",                 

-driver => 'mysql'

);



my $gene_adaptor = $db->get_GeneAdaptor;

my $slice_adaptor = $db->get_SliceAdaptor;



my @genes = @{$gene_adaptor->fetch_all_by_external_name($name)};



for my $gene (@genes) {

  for my $trans (@{$gene->get_all_Transcripts}) {

      my $seq = $slice_adaptor->fetch_by_region("chromosome",

             $trans->seq_region_name,

             $trans->start,

             $trans->end);


      print "\n",$seq->seq,"\n";

  }

}

There are some issues, the largest of which is that though this script
prints out big sequences it's completely untested! Another is that it makes
assumptions about transcripts, you should verify for yourself that ENSEMBL's
definition of transcript fits yours. Finally that
fetch_all_by_external_name() method does not seem to accept a second
argument, i.e. namespace. I found this surprising. Anyway, if more than one
gene is retrieved using some name or id you're in a quandary.

For more on this API see:

http://www.ensembl.org/info/software/core/core_tutorial.html

There are tons of modules and methods in this API, I've barely scratched the
surface here.


Brian O.




On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From heikki at sanbi.ac.za  Thu Feb 16 12:32:51 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 16 Feb 2006 19:32:51 +0200
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>
Message-ID: <200602161932.51552.heikki@sanbi.ac.za>

Mike,

Marc's suggestion is the best I've heard.

We really do not have any kind of pretty print functionality within BioPerl.
I guess there has not been a pressing need.  Bio::Graphics has filled in the 
need for sequence display.

I think Bio::Seq::PrettyPrint could be a great way to design prettyprinting in 
very modular way so that it can print out anything mapped to a sequence 
location. The EMBOSS showseq would be a great  help in there. A student 
project?

Would anyone be interested? 

   -Heikki




On Thursday 16 February 2006 17:47, Marc Logghe wrote:
> Hi Mike,
> Another route you might take is mapping your primers into
> Bio::SeqFeature::Generic objects and add them to the seq object. Then
> you dump the object into a rich sequence format like genbank and pass
> that to EMBOSS's showseq application
> Or you might do it completely with showseq. Here the only thing you need
> is an annotation file containing the positions of the primers, followed
> by any text (e.g. primer name).
> Then you do:
> showseq   -translate - -format 4
> -annotation 
> Have a look at http://emboss.sourceforge.net/apps/showseq.html for more
> options
>
> HTH,
> Marc
>
>
> Marc Logghe, PhD
> Expert Scientist Bioinformatics
> deVGen NV
> Technologiepark 30
> B - 9052 Ghent-Zwijnaarde
> Tel. +32 9 324 24 83
> Fax. +32 9 324 24 25
> Web: www.devgen.com
>
>  --- Disclaimer start ---
> This e-mail and any attachments thereto may contain information which is
> confidential and/or which is proprietary to the sender. Accordingly,
> this e-mail and any attachments thereto, as well as any and all
> information contained therein, are intended for the sole use of the
> recipient or recipients designated above. Any use of this e-mail, of any
> attachments thereto, of any and all information contained therein,
> and/or of any part(s) thereof (including, without limitation, total or
> partial reproduction, communication and/or distribution in any form) by
> persons other than the designated recipient(s) is prohibited. If you
> have received this e-mail in error, please notify the sender either by
> telephone or by e-mail and delete the material from any computer.
> Thank you for your cooperation.
> --- Disclaimer end ---
>
>
>
>
>
> ________________________________
>
> 	From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne
> 	Sent: Wednesday, February 15, 2006 10:20 PM
> 	To: bioperl-l at lists.open-bio.org
> 	Subject: [Bioperl-l] Primer maps?
>
>
> 	Hello all --
>
> 	I'm having a devil of a time figuring out how to make
> restriction maps using BioPerl.  What I'm going for is output similar to
> GCG's map program, but instead of using a set of defined restriction
> enzymes, I'd like to use a set of primers, to create a primer map rather
> than a restriction map.  I do not need a table of restriction enzymes
> that cut or don't cut (or primers that match or don't match, in this
> case), but an honest-to-goodness map, something like:
>
> 	                                       FKP-5->
>
>
> CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
> 	1921
> ---------+---------+---------+---------+---------+---------+ 1980
>
> GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>
> 	a                         M  E  I  V  S  T  F  D  E  L  Q  D  Y
> -
>
> 	I also need translations of orfs, but I can use GenBank files as
> input to the program and thus the CDS translations are already there, so
> I'm guessing that shouldn't be too hard....  How does one create such a
> map using the BioPerl modules?
>
> 	There are intriguing indications out there that such a thing is
> possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I
> can't find a single example of code that creates such a basic,
> bread-and-butter thing as a restriction map with orf translations.  The
> documentation to these modules is fairly useless to me, consisting
> mostly of internal methods and function prototypes.  Perhaps my skills
> as a Perl programmer are to blame, but a clear example of how a map like
> this is constructed would be a big help.
>
> 	Right now, I'm generating primer maps with system calls to
> EMBOSS's remap, pointing it at a file of primer sequences rather than a
> file of restriction enzyme sequences, but the results are less than
> desired.  I'm considering trying to adapt tacg 4.1.0 or sequence
> extractor 1.1 web-based code to my needs, but this seems like a lot of
> work for an operation I suspect is possible in BioPerl.
>
> 	Any help greatly appreciated...
>
> 	Mike
>
>
>
> ---------------------------------------------------------------------
> 	 //=\   Michael J. Coyne                       phone: (617)
> 525-7820
> 	 \=//   Channing Laboratory                    FAX:   (617)
> 264-5193
> 	  //=\  EBRC, Room 617
> 	  \=//  221 Longwood Avenue
> email:mcoyne at channing.harvard.edu
> 	   //=\ Boston, MA 02115                 mjcoyne at comcast.net
> 	   \=//
>
> ---------------------------------------------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From osborne1 at optonline.net  Thu Feb 16 12:59:37 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 12:59:37 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602160823.03534.hjm@tacgi.com>
Message-ID: 

Chris and Harry,

I'm writing a Wiki page on this, it's linked to the FAQ as Wiki is
complaining that the FAQ is getting too big. I'll fill in the ENSEMBL API
and Bio::DB::Fasta approaches, if you would comment on the BioPerl/eutils
approach at some point that would be superb:

http://bioperl.open-bio.org/wiki/Getting_Genomic_Sequences

Brian O.


On 2/16/06 11:23 AM, "Harry Mangalam"  wrote:

> Yes, I'm going to  try this 1st.  Also the pointer to the NCBI eutils page was
> helpful.  They describe the same thing and I think that API will give me what
> I need.  I'll post back to report.
> 
> Sorry for the delay in answering - this is a side project and as such is going
> slow.
> 
> Many thanks to you guys, especially Brian for the example code - much more
> than I had a right to expect.  Virtual Beers all round and real ones should
> we ever meet up.
> 
> Harry
> 
> 
> On Thursday 16 February 2006 04:52, Chris Fields wrote:
>> I think a method was recently implemented in Bio::DB::GenBank to
>> retrieve a segment of DNA given start and end coordinates in GenBank
>> format; that should contain the features you need.  I requested it
>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>> Would that help?
>> 
>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>> Harry,
>>> 
>>> It's not clear to me that NCBI's eutils offers this capability
>>> directly. You
>>> can probably download Entrez Gene entries and parse them for
>>> coordinates but
>>> I know of no way to remotely retrieve genomic sequences like this
>>> from NCBI
>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>> that some
>>> of us favor and to prove to myself that this is simple to do I wrote a
>>> script that I just added to examples/tools, it's called
>>> extract_genes.pl and
>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>> species to some dir, download Entrez Gene's gene2accession file,
>>> and run. It
>>> creates and stores a hash for lookups, it won't read gene2accession
>>> each
>>> time it runs.
>>> 
>>> Brian O.
>>> 
>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>> Hi Brian,
>>>> 
>>>> Thanks very much for the pointers and the speed of your reply and
>>>> apologies
>>>> for the speed of mine.
>>>> 
>>>> This looks good, but what I was looking for was a bioP approach
>>>> for hooking to
>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>> them.  In this
>>>> case, speed of retrieval is not critical and I'd rather not
>>>> download the
>>>> entirety of the sequences to a local disk to hack at them.
>>>> 
>>>> I've determined a screen-scraping approach to get them and could
>>>> script that,
>>>> but I thought that bioP had a method for using NCBI's external
>>>> API's, tho it
>>>> may be that my memory is faulty or the approach is no longer
>>>> supported due to
>>>> overload.
>>>> 
>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>> docs on them
>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>> which I
>>>> haven't started to excavate).
>>>> 
>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>> listening?
>>>> 
>>>> Harry
>>>> 
>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>> Harry,
>>>>> 
>>>>> Hope you're doing well. The approach could be based on
>>>>> Bio::DB::Fasta. So,
>>>>> from its documentation:
>>>>> 
>>>>>   use Bio::DB::Fasta;
>>>>> 
>>>>>   # create database from directory of fasta files
>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>> 
>>>>>   # simple access (for those without Bioperl)
>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>   my @ids     = $db->ids;
>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>> 
>>>>>   # Bioperl-style access
>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>> 
>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>   my $seq     = $obj->seq;
>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>> 
>>>>> Do you already have the offsets?
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>> still
>>>>>> can't find the answer to this.  Forgive me if I've missed something
>>>>>> obvious.
>>>>>> 
>>>>>> This should not be a novel request, but I've not found it
>>>>>> answered.  If
>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>> pointer to a
>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>> 
>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>> offset
>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>>>> common followup chore for some extra analysis from a gene
>>>>>> expression
>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>> the
>>>>>> sequence type to specify...?
>>>>>> 
>>>>>> 
>>>>>> TIA!
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign




From hjm at tacgi.com  Thu Feb 16 12:02:07 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 09:02:07 -0800
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost>
References: <6.2.0.14.0.20060215155422.01d44a98@localhost>
Message-ID: <200602160902.07383.hjm@tacgi.com>

A bit off the bioperl topic - if you must have bioperl, ignore this (or just 
system() wrap the command) -  but you can do exactly this mapping and in-line 
translation with a thing I wrote called tacg - you make a GCG-formatted file 
of primers ie for each pattern you need a line like:

   
;         Top                         Bottom
;Name    Offset Recognition Pattern   Offset    ! comments
primer1    0   tcgggywmkkgg               0    ! ...
primer2    0   gcttggctgaggag             0    !
 .
 .
 .
Obviously the offsets can be set to 0 for non REs.
There's no limit to the number of primer patterns (tho I think there's a 
compiled-in limit of 30 chars in the pattern - easily changed in header), no 
limit to amount of seq searched, handles degeneracies, searches at ~4Mbases/s 
on a 2G opteron (120 patterns).
 
Also does searching with errors (slowly) and regex's (at pcre speeds), and 
matrices.  Other neat stuff, too.

The output is sort of as you describe - replace the RE names with your primer 
labels and you'll have it.

6 frame xl with 3 letter abbrievs.

                  BsrGI    BsrGI AflII                      DraI
                   \        \     \                          \
    121   gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt    180
   3453   cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa   3512
              ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
1         ValCysIleCysThrLeuCysThrLeuLysThrTyrThrPheHisCysValTerIleIle
2          CysValPheValHisPheValHisLeuArgProThrHisPheIleValPheLysLeuLeu
3           ValTyrLeuTyrThrLeuTyrThrTerAspLeuHisIleSerLeuCysLeuAsnTyrTyr

4           HisIleGlnValSerGlnValSerLeuTerValValAsnTerGlnThrTerIleIleVal
5          ThrTyrLysTyrValLysTyrValTerArgSerCysMetGluAsnHisLysPheTerTer
6         HisThrAsnThrCysLysThrCysLysGlyLeuValCysLysMetThrAsnLeuAsnAsn

or 3 frames with 1 letter abbrievs

                   BsrGI    BsrGI AflII                      DraI
                   \        \     \                          \
    121   gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt    180
   3453   cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa   3512
              ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
1         V  C  I  C  T  L  C  T  L  K  T  Y  T  F  H  C  V  *  I  I
2          C  V  F  V  H  F  V  H  L  R  P  T  H  F  I  V  F  K  L  L
3           V  Y  L  Y  T  L  Y  T  *  D  L  H  I  S  L  C  L  N  Y  Y

read more at tacg.sf.net or reply to me for the latest docs and version - have 
to admit the sf site is a bit moldy.

hjm


On Wednesday 15 February 2006 13:20, Michael Coyne wrote:
>  Hello all --
>
>  I'm having a devil of a time figuring out how to make restriction maps
> using BioPerl.? What I'm going for is output similar to GCG's map program,
> but instead of using a set of defined restriction enzymes, I'd like to use
> a set of primers, to create a primer map rather than a restriction map.? I
> do not need a table of restriction enzymes that cut or don't cut (or
> primers that match or don't match, in this case), but an honest-to-goodness
> map, something like:
>
>   ?????????????????????????????????????? FKP-5->
>  ???????????????????????????????????????????? |
>  ???? CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
>  1921 ---------+---------+---------+---------+---------+---------+ 1980
>  ???? GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>  ?
>  a???????????????????????? M? E? I? V? S? T? F? D? E? L? Q? D? Y?? -
>
>  I also need translations of orfs, but I can use GenBank files as input to
> the program and thus the CDS translations are already there, so I'm
> guessing that shouldn't be too hard....? How does one create such a map
> using the BioPerl modules?
>
>  There are intriguing indications out there that such a thing is possible
> (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find
> a single example of code that creates such a basic, bread-and-butter thing
> as a restriction map with orf translations.? The documentation to these
> modules is fairly useless to me, consisting mostly of internal methods and
> function prototypes.? Perhaps my skills as a Perl programmer are to blame,
> but a clear example of how a map like this is constructed would be a big
> help.
>
>  Right now, I'm generating primer maps with system calls to EMBOSS's remap,
> pointing it at a file of primer sequences rather than a file of restriction
> enzyme sequences, but the results are less than desired.? I'm considering
> trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my
> needs, but this seems like a lot of work for an operation I suspect is
> possible in BioPerl.
>
>  Any help greatly appreciated...
>
>  Mike
>
>  ---------------------------------------------------------------------
>  ?//=\?? Michael J. Coyne?????????????????????? phone: (617) 525-7820
>  ?\=//?? Channing Laboratory??????????????????? FAX:?? (617) 264-5193
>  ? //=\? EBRC, Room 617
>  ? \=//? 221 Longwood Avenue??????? email:mcoyne at channing.harvard.edu
>  ?? //=\ Boston, MA 02115???????????????? mjcoyne at comcast.net
>  ?? \=//
>  ---------------------------------------------------------------------

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>



From hjm at tacgi.com  Thu Feb 16 11:23:02 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 08:23:02 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
	
Message-ID: <200602160823.03534.hjm@tacgi.com>

Yes, I'm going to  try this 1st.  Also the pointer to the NCBI eutils page was 
helpful.  They describe the same thing and I think that API will give me what 
I need.  I'll post back to report.  

Sorry for the delay in answering - this is a side project and as such is going 
slow.

Many thanks to you guys, especially Brian for the example code - much more 
than I had a right to expect.  Virtual Beers all round and real ones should 
we ever meet up.

Harry


On Thursday 16 February 2006 04:52, Chris Fields wrote:
> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need.  I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
>
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > Harry,
> >
> > It's not clear to me that NCBI's eutils offers this capability
> > directly. You
> > can probably download Entrez Gene entries and parse them for
> > coordinates but
> > I know of no way to remotely retrieve genomic sequences like this
> > from NCBI
> > (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > that some
> > of us favor and to prove to myself that this is simple to do I wrote a
> > script that I just added to examples/tools, it's called
> > extract_genes.pl and
> > it's based on Bio::DB::Fasta. Download the sequence files for a given
> > species to some dir, download Entrez Gene's gene2accession file,
> > and run. It
> > creates and stores a hash for lookups, it won't read gene2accession
> > each
> > time it runs.
> >
> > Brian O.
> >
> > On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> >> Hi Brian,
> >>
> >> Thanks very much for the pointers and the speed of your reply and
> >> apologies
> >> for the speed of mine.
> >>
> >> This looks good, but what I was looking for was a bioP approach
> >> for hooking to
> >> an API at NCBI or EBI so I could get this info and seqs from
> >> them.  In this
> >> case, speed of retrieval is not critical and I'd rather not
> >> download the
> >> entirety of the sequences to a local disk to hack at them.
> >>
> >> I've determined a screen-scraping approach to get them and could
> >> script that,
> >> but I thought that bioP had a method for using NCBI's external
> >> API's, tho it
> >> may be that my memory is faulty or the approach is no longer
> >> supported due to
> >> overload.
> >>
> >> Does NCBI make such APIs available anymore?  I searched a bit for
> >> docs on them
> >> but couldn't find anything (unless it's buried in the NCBI tookit,
> >> which I
> >> haven't started to excavate).
> >>
> >> Failing that, would SEALS provide such a service? Any PerlPinipeds
> >> listening?
> >>
> >> Harry
> >>
> >> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> >>> Harry,
> >>>
> >>> Hope you're doing well. The approach could be based on
> >>> Bio::DB::Fasta. So,
> >>> from its documentation:
> >>>
> >>>   use Bio::DB::Fasta;
> >>>
> >>>   # create database from directory of fasta files
> >>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>
> >>>   # simple access (for those without Bioperl)
> >>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> >>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> >>>   my @ids     = $db->ids;
> >>>   my $length   = $db->length('CHROMOSOME_I');
> >>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> >>>   my $header   = $db->header('CHROMOSOME_I');
> >>>
> >>>   # Bioperl-style access
> >>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>
> >>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> >>>   my $seq     = $obj->seq;
> >>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> >>>
> >>> Do you already have the offsets?
> >>>
> >>> Brian O.
> >>>
> >>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> >>>> Hi All,
> >>>>
> >>>> After perusing the tutorial and other docs for a an evening, I
> >>>> still
> >>>> can't find the answer to this.  Forgive me if I've missed something
> >>>> obvious.
> >>>>
> >>>> This should not be a novel request, but I've not found it
> >>>> answered.  If
> >>>> bioperl isn't the best way to do this, I'd be grateful to a
> >>>> pointer to a
> >>>> better way, especially if it includes an illuminating bit of code.
> >>>>
> >>>> The problem is to retrieve genomic sequences plus & minus some
> >>>> offset
> >>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> >>>> common followup chore for some extra analysis from a gene
> >>>> expression
> >>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> >>>> the
> >>>> sequence type to specify...?
> >>>>
> >>>>
> >>>> TIA!
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From cjfields at uiuc.edu  Thu Feb 16 16:37:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 15:37:25 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
Message-ID: <000301c63341$2e015d50$15327e82@pyrimidine>

As an update for those interested, I check on this today, feeding SearchIO
XML and text output for all NCBI's BLAST flavors.  Basically, all XML parses
fine.  All text output except blastn and tblastx works fine.  The last two
have the extra lines starting with 'Features in this part of subject
sequence:'.  I'll be checking into SearchIO::blast but don't know when I can
get around to posting a fix.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> Sent: Thursday, February 16, 2006 3:46 AM
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org; Chris Fields
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> version 1.28
> 
> Hi,
> 
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast-output.
> (see e.g. "Features flanking this part..." or "Features in this part
> ..."), example added.
> The blast.pm module starts looking for the hsp-alignement-information,
> but it dies when it hits this Feature-information.
> 
> Pieter
> 
> 
......







From osborne1 at optonline.net  Thu Feb 16 17:19:16 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 17:19:16 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: 
Message-ID: 

Chris,

Yes. The question now is where to easily get the coordinates.

Brian O.


On 2/16/06 7:52 AM, "Chris Fields"  wrote:

> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need.  I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
> 
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> 
>> Harry,
>> 
>> It's not clear to me that NCBI's eutils offers this capability
>> directly. You
>> can probably download Entrez Gene entries and parse them for
>> coordinates but
>> I know of no way to remotely retrieve genomic sequences like this
>> from NCBI
>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>> that some
>> of us favor and to prove to myself that this is simple to do I wrote a
>> script that I just added to examples/tools, it's called
>> extract_genes.pl and
>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>> species to some dir, download Entrez Gene's gene2accession file,
>> and run. It
>> creates and stores a hash for lookups, it won't read gene2accession
>> each
>> time it runs.
>> 
>> Brian O.
>> 
>> 
>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>> 
>>> Hi Brian,
>>> 
>>> Thanks very much for the pointers and the speed of your reply and
>>> apologies
>>> for the speed of mine.
>>> 
>>> This looks good, but what I was looking for was a bioP approach
>>> for hooking to
>>> an API at NCBI or EBI so I could get this info and seqs from
>>> them.  In this
>>> case, speed of retrieval is not critical and I'd rather not
>>> download the
>>> entirety of the sequences to a local disk to hack at them.
>>> 
>>> I've determined a screen-scraping approach to get them and could
>>> script that,
>>> but I thought that bioP had a method for using NCBI's external
>>> API's, tho it
>>> may be that my memory is faulty or the approach is no longer
>>> supported due to
>>> overload.
>>> 
>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>> docs on them
>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>> which I
>>> haven't started to excavate).
>>> 
>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>> listening?
>>> 
>>> Harry
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>> Harry,
>>>> 
>>>> Hope you're doing well. The approach could be based on
>>>> Bio::DB::Fasta. So,
>>>> from its documentation:
>>>> 
>>>>   use Bio::DB::Fasta;
>>>> 
>>>>   # create database from directory of fasta files
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   # simple access (for those without Bioperl)
>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>   my @ids     = $db->ids;
>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>> 
>>>>   # Bioperl-style access
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>   my $seq     = $obj->seq;
>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>> 
>>>> Do you already have the offsets?
>>>> 
>>>> Brian O.
>>>> 
>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>> Hi All,
>>>>> 
>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>> still
>>>>> can't find the answer to this.  Forgive me if I've missed something
>>>>> obvious.
>>>>> 
>>>>> This should not be a novel request, but I've not found it
>>>>> answered.  If
>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>> pointer to a
>>>>> better way, especially if it includes an illuminating bit of code.
>>>>> 
>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>> offset
>>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>>> common followup chore for some extra analysis from a gene
>>>>> expression
>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>> the
>>>>> sequence type to specify...?
>>>>> 
>>>>> 
>>>>> TIA!
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Thu Feb 16 17:29:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 16:29:15 -0600
Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO
	text parsing?
Message-ID: <000001c63348$6b8136d0$15327e82@pyrimidine>

I'm floating this to see what people think...

I'm beginning to wonder, especially when I'm wading through the
regex/parsing nightmare in SearchIO::blast, if we should either require a
minimal BLAST version number for parsing to work in SearchIO::blast.  I
could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a
warning if the blast version number is below a minimal version, so at least
people will know what the problem is (not us!).

The regexes are really piling up, and the latest changes in blastn and
tblastx will require adding a few more.  I also think that this would help
remind everybody running the latest Bioperl that there are also newer
versions of BLAST.  My current thought is to get it working for the latest
text output from NCBI, check it against the last version of BLAST (v.
2.2.12, which, luckily, blastcl3 generates), and not worry too much about
older ones.

Any thoughts on this?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From cjfields at uiuc.edu  Thu Feb 16 17:45:52 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 16:45:52 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
Message-ID: <000101c6334a$bd80a900$15327e82@pyrimidine>

If I know the start, end, and strand info for a list of features (personal
preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
up), couldn't I try pulling out the surrounding region?  My thought is this,
though I haven't coded it yet:

1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
(array of hashes) based off what I get from RNAMotif objects.
2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
in there somewhere to not tick off the NCBI curators.

Reason I'm interested in this is b/c I want to know where the RNA motif is
in context to surrounding features. If it is very close to a coding region,
then the motif likely indicates translational regulation.  Further away may
indicate transcriptional termination or another mechanism.

The files returned should have the features included as long as they are in
the full length GenBank record.  I tried it out using the web form but not
through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
page.  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, February 16, 2006 4:19 PM
> To: Chris Fields
> Cc: Harry Mangalam; bioperl-l
> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names or
> GeneIDs
> 
> Chris,
> 
> Yes. The question now is where to easily get the coordinates.
> 
> Brian O.
> 
> 
> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> 
> > I think a method was recently implemented in Bio::DB::GenBank to
> > retrieve a segment of DNA given start and end coordinates in GenBank
> > format; that should contain the features you need.  I requested it
> > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > Would that help?
> >
> > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> >
> >> Harry,
> >>
> >> It's not clear to me that NCBI's eutils offers this capability
> >> directly. You
> >> can probably download Entrez Gene entries and parse them for
> >> coordinates but
> >> I know of no way to remotely retrieve genomic sequences like this
> >> from NCBI
> >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> >> that some
> >> of us favor and to prove to myself that this is simple to do I wrote a
> >> script that I just added to examples/tools, it's called
> >> extract_genes.pl and
> >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> >> species to some dir, download Entrez Gene's gene2accession file,
> >> and run. It
> >> creates and stores a hash for lookups, it won't read gene2accession
> >> each
> >> time it runs.
> >>
> >> Brian O.
> >>
> >>
> >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> >>
> >>> Hi Brian,
> >>>
> >>> Thanks very much for the pointers and the speed of your reply and
> >>> apologies
> >>> for the speed of mine.
> >>>
> >>> This looks good, but what I was looking for was a bioP approach
> >>> for hooking to
> >>> an API at NCBI or EBI so I could get this info and seqs from
> >>> them.  In this
> >>> case, speed of retrieval is not critical and I'd rather not
> >>> download the
> >>> entirety of the sequences to a local disk to hack at them.
> >>>
> >>> I've determined a screen-scraping approach to get them and could
> >>> script that,
> >>> but I thought that bioP had a method for using NCBI's external
> >>> API's, tho it
> >>> may be that my memory is faulty or the approach is no longer
> >>> supported due to
> >>> overload.
> >>>
> >>> Does NCBI make such APIs available anymore?  I searched a bit for
> >>> docs on them
> >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> >>> which I
> >>> haven't started to excavate).
> >>>
> >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> >>> listening?
> >>>
> >>> Harry
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> >>>> Harry,
> >>>>
> >>>> Hope you're doing well. The approach could be based on
> >>>> Bio::DB::Fasta. So,
> >>>> from its documentation:
> >>>>
> >>>>   use Bio::DB::Fasta;
> >>>>
> >>>>   # create database from directory of fasta files
> >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>>
> >>>>   # simple access (for those without Bioperl)
> >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> >>>>   my @ids     = $db->ids;
> >>>>   my $length   = $db->length('CHROMOSOME_I');
> >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> >>>>   my $header   = $db->header('CHROMOSOME_I');
> >>>>
> >>>>   # Bioperl-style access
> >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>>
> >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> >>>>   my $seq     = $obj->seq;
> >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> >>>>
> >>>> Do you already have the offsets?
> >>>>
> >>>> Brian O.
> >>>>
> >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> After perusing the tutorial and other docs for a an evening, I
> >>>>> still
> >>>>> can't find the answer to this.  Forgive me if I've missed something
> >>>>> obvious.
> >>>>>
> >>>>> This should not be a novel request, but I've not found it
> >>>>> answered.  If
> >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> >>>>> pointer to a
> >>>>> better way, especially if it includes an illuminating bit of code.
> >>>>>
> >>>>> The problem is to retrieve genomic sequences plus & minus some
> >>>>> offset
> >>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> >>>>> common followup chore for some extra analysis from a gene
> >>>>> expression
> >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> >>>>> the
> >>>>> sequence type to specify...?
> >>>>>
> >>>>>
> >>>>> TIA!
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l




From hjm at tacgi.com  Thu Feb 16 18:10:59 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 15:10:59 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine>
References: <000101c6334a$bd80a900$15327e82@pyrimidine>
Message-ID: <200602161510.59679.hjm@tacgi.com>

This is essentially what I want to do and my [only in pseudocode] approach is 
basically what you describe, except that currently I only have HUGO 
descriptors, not Genbank UIDs.  If you know of an index that lists both, that 
would be the entire shot.

I'm also interested in tracking transcriptional control elements and 
cross-correlating & why I wrote the 'rules' chunk of the recently 
(self-promoted) tacg.

Best
Harry


On Thursday 16 February 2006 14:45, Chris Fields wrote:
> If I know the start, end, and strand info for a list of features (personal
> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
> up), couldn't I try pulling out the surrounding region?  My thought is
> this, though I haven't coded it yet:
>
> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
> (array of hashes) based off what I get from RNAMotif objects.
> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
> in there somewhere to not tick off the NCBI curators.
>
> Reason I'm interested in this is b/c I want to know where the RNA motif is
> in context to surrounding features. If it is very close to a coding region,
> then the motif likely indicates translational regulation.  Further away may
> indicate transcriptional termination or another mechanism.
>
> The files returned should have the features included as long as they are in
> the full length GenBank record.  I tried it out using the web form but not
> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
> page.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > Sent: Thursday, February 16, 2006 4:19 PM
> > To: Chris Fields
> > Cc: Harry Mangalam; bioperl-l
> > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> > or GeneIDs
> >
> > Chris,
> >
> > Yes. The question now is where to easily get the coordinates.
> >
> > Brian O.
> >
> > On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> > > I think a method was recently implemented in Bio::DB::GenBank to
> > > retrieve a segment of DNA given start and end coordinates in GenBank
> > > format; that should contain the features you need.  I requested it
> > > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > > Would that help?
> > >
> > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > >> Harry,
> > >>
> > >> It's not clear to me that NCBI's eutils offers this capability
> > >> directly. You
> > >> can probably download Entrez Gene entries and parse them for
> > >> coordinates but
> > >> I know of no way to remotely retrieve genomic sequences like this
> > >> from NCBI
> > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > >> that some
> > >> of us favor and to prove to myself that this is simple to do I wrote a
> > >> script that I just added to examples/tools, it's called
> > >> extract_genes.pl and
> > >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> > >> species to some dir, download Entrez Gene's gene2accession file,
> > >> and run. It
> > >> creates and stores a hash for lookups, it won't read gene2accession
> > >> each
> > >> time it runs.
> > >>
> > >> Brian O.
> > >>
> > >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> > >>> Hi Brian,
> > >>>
> > >>> Thanks very much for the pointers and the speed of your reply and
> > >>> apologies
> > >>> for the speed of mine.
> > >>>
> > >>> This looks good, but what I was looking for was a bioP approach
> > >>> for hooking to
> > >>> an API at NCBI or EBI so I could get this info and seqs from
> > >>> them.  In this
> > >>> case, speed of retrieval is not critical and I'd rather not
> > >>> download the
> > >>> entirety of the sequences to a local disk to hack at them.
> > >>>
> > >>> I've determined a screen-scraping approach to get them and could
> > >>> script that,
> > >>> but I thought that bioP had a method for using NCBI's external
> > >>> API's, tho it
> > >>> may be that my memory is faulty or the approach is no longer
> > >>> supported due to
> > >>> overload.
> > >>>
> > >>> Does NCBI make such APIs available anymore?  I searched a bit for
> > >>> docs on them
> > >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> > >>> which I
> > >>> haven't started to excavate).
> > >>>
> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> > >>> listening?
> > >>>
> > >>> Harry
> > >>>
> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> > >>>> Harry,
> > >>>>
> > >>>> Hope you're doing well. The approach could be based on
> > >>>> Bio::DB::Fasta. So,
> > >>>> from its documentation:
> > >>>>
> > >>>>   use Bio::DB::Fasta;
> > >>>>
> > >>>>   # create database from directory of fasta files
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   # simple access (for those without Bioperl)
> > >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> > >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> > >>>>   my @ids     = $db->ids;
> > >>>>   my $length   = $db->length('CHROMOSOME_I');
> > >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> > >>>>   my $header   = $db->header('CHROMOSOME_I');
> > >>>>
> > >>>>   # Bioperl-style access
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> > >>>>   my $seq     = $obj->seq;
> > >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> > >>>>
> > >>>> Do you already have the offsets?
> > >>>>
> > >>>> Brian O.
> > >>>>
> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > >>>>> Hi All,
> > >>>>>
> > >>>>> After perusing the tutorial and other docs for a an evening, I
> > >>>>> still
> > >>>>> can't find the answer to this.  Forgive me if I've missed something
> > >>>>> obvious.
> > >>>>>
> > >>>>> This should not be a novel request, but I've not found it
> > >>>>> answered.  If
> > >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> > >>>>> pointer to a
> > >>>>> better way, especially if it includes an illuminating bit of code.
> > >>>>>
> > >>>>> The problem is to retrieve genomic sequences plus & minus some
> > >>>>> offset
> > >>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> > >>>>> common followup chore for some extra analysis from a gene
> > >>>>> expression
> > >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> > >>>>> the
> > >>>>> sequence type to specify...?
> > >>>>>
> > >>>>>
> > >>>>> TIA!
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From anst at kvl.dk  Fri Feb 17 04:18:18 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Fri, 17 Feb 2006 10:18:18 +0100
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F45FE60200009B00000ED6@gwia.kvl.dk>
References: <43F45FE60200009B00000ED6@gwia.kvl.dk>
Message-ID: <43F5A2EA0200009B00000F45@gwia.kvl.dk>



>>>Anders Stegmann  02/16/06 11:20 am >>>
Hi!

I am blasting a protein seq (query) against an identical seq with a
deletion of Aa nr 61 (subject).
Then I print out the type of nomatch Aa and its position.
The nomatch for the query seq is Aa G at position 61, which is correct.
The nomatch for the subject seq is V at position 60, which is definitely
not correct!?

Is this a bug?

testblast2.pl is the program to run

Q0045 is the query seq.

Q0045del61 is the subject seq (it has to be formated: formatdb -i
Q0045del61 -p T -o F).

Regards Anders.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045del61
Type: application/octet-stream
Size: 872 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testblast2.pl
Type: application/octet-stream
Size: 6109 bytes
Desc: not available
URL: 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From saldroubi at yahoo.com  Fri Feb 17 12:49:40 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Fri, 17 Feb 2006 09:49:40 -0800 (PST)
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <43EAAEEF.3000304@infotech.monash.edu.au>
Message-ID: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>


Torsten and all,
 
 I don't think this will work for me for it only generates statistics for a single sequence.  What I need is a count matrix for each position for a number of DNA sequences.  In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide.
 
 For example if I pass an array of sequences say: ATC,CCC,TTT
 then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this:
 
 
                 1    2   3
      A        1    0    0
      C        1    1    2
      T        1    2    1     
      G        0    0    0
 
 Any idea of this is already built somewhere in bioperl?
 
 Thank you.
 
 
 Torsten Seemann  wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings?
>   Please excuse my lack of knowledge as I am a new comer to bioinformatics.

Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
example similar to what you want to do:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html

--Torsten Seemann




Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From muratem at eng.uah.edu  Fri Feb 17 12:45:30 2006
From: muratem at eng.uah.edu (Mike Muratet)
Date: Fri, 17 Feb 2006 11:45:30 -0600 (CST)
Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO
 text parsing?
In-Reply-To: <000001c63348$6b8136d0$15327e82@pyrimidine>
References: <000001c63348$6b8136d0$15327e82@pyrimidine>
Message-ID: 



On Thu, 16 Feb 2006, Chris Fields wrote:

> I'm floating this to see what people think...
>
> I'm beginning to wonder, especially when I'm wading through the
> regex/parsing nightmare in SearchIO::blast, if we should either require a
> minimal BLAST version number for parsing to work in SearchIO::blast.  I
> could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a
> warning if the blast version number is below a minimal version, so at least
> people will know what the problem is (not us!).
>
> The regexes are really piling up, and the latest changes in blastn and
> tblastx will require adding a few more.  I also think that this would help
> remind everybody running the latest Bioperl that there are also newer
> versions of BLAST.  My current thought is to get it working for the latest
> text output from NCBI, check it against the last version of BLAST (v.
> 2.2.12, which, luckily, blastcl3 generates), and not worry too much about
> older ones.
>
> Any thoughts on this?
>

Chris

I could live with it. I think most of the world runs on NCBI or WUBLAST 
and it's easy to download/update either of those.

Thanks for the effort. I use SearchIO a lot.

Mike


> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From MEC at stowers-institute.org  Fri Feb 17 13:15:53 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 17 Feb 2006 12:15:53 -0600
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: 

http://forkhead.cgb.ki.se/TFBS/ provides ability to generate position
frequency matrix from list of (presumaby aligned) sequences as follows:

#!/usr/bin/env perl	
use  TFBS::PatternGen::SimplePFM;
my @sequences = <>;
chomp @sequences;
print
TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences)->pattern->rawpr
int;
exit 0;

The output when run on your example input shows that the order the
nucleotides is not the same as you expect (it is alphbetical):

1 0 0
1 1 2
0 0 0
1 2 1

Good luck,

TFBS installation requires signifigant dependencies, including bioperl
and PDL.

Malcolm Cook 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sam 
>Al-Droubi
>Sent: Friday, February 17, 2006 11:50 AM
>To: Torsten Seemann
>Cc: BioPerl list
>Subject: Re: [Bioperl-l] Count or weight matrix in bioperl?
>
>
>Torsten and all,
> 
> I don't think this will work for me for it only generates 
>statistics for a single sequence.  What I need is a count 
>matrix for each position for a number of DNA sequences.  In 
>other words, if I pass there 3 sequences to this function then 
>it returns the count for each postion for each nucleotide.
> 
> For example if I pass an array of sequences say: ATC,CCC,TTT
> then I should get a matrix back that will have count for 
>postion 1,2,3 for each A,C,T, or G like this:
> 
> 
>                 1    2   3
>      A        1    0    0
>      C        1    1    2
>      T        1    2    1     
>      G        0    0    0
> 
> Any idea of this is already built somewhere in bioperl?
> 
> Thank you.
> 
> 
> Torsten Seemann  
>wrote:> Say I have an array of nucleotide sequences of of 
>length N. I want to calculate the count matrix (weight 
>matrix). That is for each position 1..N, I want to know how 
>many As, Cs ,Ts and Gs there are. Is the code to do this 
>already written in bioperl to build this matrix if I pass it 
>those strings?
>>   Please excuse my lack of knowledge as I am a new comer to 
>bioinformatics.
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation 
>even has an 
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/Seq
>Stats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely, 
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From jason.stajich at duke.edu  Fri Feb 17 14:01:45 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 17 Feb 2006 14:01:45 -0500
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk>
References: <43F45FE60200009B00000ED6@gwia.kvl.dk>
	<43F5A2EA0200009B00000F45@gwia.kvl.dk>
Message-ID: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu>

In case people on the list think that by my speaking up about  
question means they should ignore it...

Hopefully someone else can help debug this - I really don't have time  
I'm afraid.

-jason


On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote:

>
>
>>>> Anders Stegmann  02/16/06 11:20 am >>>
> Hi!
>
> I am blasting a protein seq (query) against an identical seq with a
> deletion of Aa nr 61 (subject).
> Then I print out the type of nomatch Aa and its position.
> The nomatch for the query seq is Aa G at position 61, which is  
> correct.
> The nomatch for the subject seq is V at position 60, which is  
> definitely
> not correct!?
>
> Is this a bug?
>
> testblast2.pl is the program to run
>
> Q0045 is the query seq.
>
> Q0045del61 is the subject seq (it has to be formated: formatdb -i
> Q0045del61 -p T -o F).
>
> Regards Anders.
>
>
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Fri Feb 17 14:17:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 17 Feb 2006 13:17:32 -0600
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu>
Message-ID: <000001c633f6$cd391740$15327e82@pyrimidine>

No, haven't ignored it.  Just been busy going through SearchIO::blast again
(I've perltidy'd it) since BLASTN and TBLASTX output (v2.2.13) don't work;
looks like all others should.  Trying to fix one problem at a time.  I'll
look at this next.  Don't worry about it.  ;>

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Friday, February 17, 2006 1:02 PM
> To: Anders Stegmann
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] another searchIO bug? with blast report
> 
> In case people on the list think that by my speaking up about
> question means they should ignore it...
> 
> Hopefully someone else can help debug this - I really don't have time
> I'm afraid.
> 
> -jason
> 
> 
> On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote:
> 
> >
> >
> >>>> Anders Stegmann  02/16/06 11:20 am >>>
> > Hi!
> >
> > I am blasting a protein seq (query) against an identical seq with a
> > deletion of Aa nr 61 (subject).
> > Then I print out the type of nomatch Aa and its position.
> > The nomatch for the query seq is Aa G at position 61, which is
> > correct.
> > The nomatch for the subject seq is V at position 60, which is
> > definitely
> > not correct!?
> >
> > Is this a bug?
> >
> > testblast2.pl is the program to run
> >
> > Q0045 is the query seq.
> >
> > Q0045del61 is the subject seq (it has to be formated: formatdb -i
> > Q0045del61 -p T -o F).
> >
> > Regards Anders.
> >
> >
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From skirov at utk.edu  Fri Feb 17 13:09:00 2006
From: skirov at utk.edu (Stefan Kirov)
Date: Fri, 17 Feb 2006 13:09:00 -0500
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>
References: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>
Message-ID: <43F6113C.6070501@utk.edu>

If you have bioperl-live:
write a file:
 >seqgroup1
ATC
CCC
TTT

my $mio=new Bio::Matrix::PSM::IO(-format=>'masta',-file=>$filename);
while (my $matrix=$mio->next_matrix) {#Returns 
Bio::Matrix::PSM::SiteMatrix object
#do something with the matrix...
print $matrix->consensus,"\n";
}

This is not going to give you the raw counts, but it will give you the 
fequency for each pos/letter. see the docs for Bio::Matrix::PSM::SiteMatrix
Hope this helps
Stefan

Sam Al-Droubi wrote:

>Torsten and all,
> 
> I don't think this will work for me for it only generates statistics for a single sequence.  What I need is a count matrix for each position for a number of DNA sequences.  In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide.
> 
> For example if I pass an array of sequences say: ATC,CCC,TTT
> then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this:
> 
> 
>                 1    2   3
>      A        1    0    0
>      C        1    1    2
>      T        1    2    1     
>      G        0    0    0
> 
> Any idea of this is already built somewhere in bioperl?
> 
> Thank you.
> 
> 
> Torsten Seemann  wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings?
>  
>
>>  Please excuse my lack of knowledge as I am a new comer to bioinformatics.
>>    
>>
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely, 
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>  
>



From cjfields at uiuc.edu  Fri Feb 17 18:02:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 17 Feb 2006 17:02:02 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
	orGeneIDs
In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine>
Message-ID: <000601c63416$2a14aa00$15327e82@pyrimidine>

Brian,

I added some sample code to the page.  See what you think.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 16, 2006 4:46 PM
> To: 'Brian Osborne'
> Cc: 'Harry Mangalam'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> orGeneIDs
> 
> If I know the start, end, and strand info for a list of features (personal
> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
> up), couldn't I try pulling out the surrounding region?  My thought is
> this,
> though I haven't coded it yet:
> 
> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
> (array of hashes) based off what I get from RNAMotif objects.
> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
> in there somewhere to not tick off the NCBI curators.
> 
> Reason I'm interested in this is b/c I want to know where the RNA motif is
> in context to surrounding features. If it is very close to a coding
> region,
> then the motif likely indicates translational regulation.  Further away
> may
> indicate transcriptional termination or another mechanism.
> 
> The files returned should have the features included as long as they are
> in
> the full length GenBank record.  I tried it out using the web form but not
> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
> page.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > Sent: Thursday, February 16, 2006 4:19 PM
> > To: Chris Fields
> > Cc: Harry Mangalam; bioperl-l
> > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> or
> > GeneIDs
> >
> > Chris,
> >
> > Yes. The question now is where to easily get the coordinates.
> >
> > Brian O.
> >
> >
> > On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> >
> > > I think a method was recently implemented in Bio::DB::GenBank to
> > > retrieve a segment of DNA given start and end coordinates in GenBank
> > > format; that should contain the features you need.  I requested it
> > > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > > Would that help?
> > >
> > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > >
> > >> Harry,
> > >>
> > >> It's not clear to me that NCBI's eutils offers this capability
> > >> directly. You
> > >> can probably download Entrez Gene entries and parse them for
> > >> coordinates but
> > >> I know of no way to remotely retrieve genomic sequences like this
> > >> from NCBI
> > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > >> that some
> > >> of us favor and to prove to myself that this is simple to do I wrote
> a
> > >> script that I just added to examples/tools, it's called
> > >> extract_genes.pl and
> > >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> > >> species to some dir, download Entrez Gene's gene2accession file,
> > >> and run. It
> > >> creates and stores a hash for lookups, it won't read gene2accession
> > >> each
> > >> time it runs.
> > >>
> > >> Brian O.
> > >>
> > >>
> > >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> > >>
> > >>> Hi Brian,
> > >>>
> > >>> Thanks very much for the pointers and the speed of your reply and
> > >>> apologies
> > >>> for the speed of mine.
> > >>>
> > >>> This looks good, but what I was looking for was a bioP approach
> > >>> for hooking to
> > >>> an API at NCBI or EBI so I could get this info and seqs from
> > >>> them.  In this
> > >>> case, speed of retrieval is not critical and I'd rather not
> > >>> download the
> > >>> entirety of the sequences to a local disk to hack at them.
> > >>>
> > >>> I've determined a screen-scraping approach to get them and could
> > >>> script that,
> > >>> but I thought that bioP had a method for using NCBI's external
> > >>> API's, tho it
> > >>> may be that my memory is faulty or the approach is no longer
> > >>> supported due to
> > >>> overload.
> > >>>
> > >>> Does NCBI make such APIs available anymore?  I searched a bit for
> > >>> docs on them
> > >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> > >>> which I
> > >>> haven't started to excavate).
> > >>>
> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> > >>> listening?
> > >>>
> > >>> Harry
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> > >>>> Harry,
> > >>>>
> > >>>> Hope you're doing well. The approach could be based on
> > >>>> Bio::DB::Fasta. So,
> > >>>> from its documentation:
> > >>>>
> > >>>>   use Bio::DB::Fasta;
> > >>>>
> > >>>>   # create database from directory of fasta files
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   # simple access (for those without Bioperl)
> > >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> > >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> > >>>>   my @ids     = $db->ids;
> > >>>>   my $length   = $db->length('CHROMOSOME_I');
> > >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> > >>>>   my $header   = $db->header('CHROMOSOME_I');
> > >>>>
> > >>>>   # Bioperl-style access
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> > >>>>   my $seq     = $obj->seq;
> > >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> > >>>>
> > >>>> Do you already have the offsets?
> > >>>>
> > >>>> Brian O.
> > >>>>
> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > >>>>> Hi All,
> > >>>>>
> > >>>>> After perusing the tutorial and other docs for a an evening, I
> > >>>>> still
> > >>>>> can't find the answer to this.  Forgive me if I've missed
> something
> > >>>>> obvious.
> > >>>>>
> > >>>>> This should not be a novel request, but I've not found it
> > >>>>> answered.  If
> > >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> > >>>>> pointer to a
> > >>>>> better way, especially if it includes an illuminating bit of code.
> > >>>>>
> > >>>>> The problem is to retrieve genomic sequences plus & minus some
> > >>>>> offset
> > >>>>> from a locus determined by HUGO keyword or GeneID.  This would be
> a
> > >>>>> common followup chore for some extra analysis from a gene
> > >>>>> expression
> > >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> > >>>>> the
> > >>>>> sequence type to specify...?
> > >>>>>
> > >>>>>
> > >>>>> TIA!
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From osborne1 at optonline.net  Fri Feb 17 23:01:14 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 17 Feb 2006 23:01:14 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
 orGeneIDs
In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine>
Message-ID: 

Chris,

That's nice. Now what I'm puzzling over is how to get the genomic
coordinates given an id, like a Gene id. The raw query is something like:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&rettyp
e=xml

This is _something_ like the queries used within Bio::DB::Query::GenBank,
but not exactly. Now taking a look at how the text returned is transformed
into objects...

Brian O.


On 2/17/06 6:02 PM, "Chris Fields"  wrote:

> Brian,
> 
> I added some sample code to the page.  See what you think.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 16, 2006 4:46 PM
>> To: 'Brian Osborne'
>> Cc: 'Harry Mangalam'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> orGeneIDs
>> 
>> If I know the start, end, and strand info for a list of features (personal
>> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
>> up), couldn't I try pulling out the surrounding region?  My thought is
>> this,
>> though I haven't coded it yet:
>> 
>> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
>> (array of hashes) based off what I get from RNAMotif objects.
>> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
>> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
>> in there somewhere to not tick off the NCBI curators.
>> 
>> Reason I'm interested in this is b/c I want to know where the RNA motif is
>> in context to surrounding features. If it is very close to a coding
>> region,
>> then the motif likely indicates translational regulation.  Further away
>> may
>> indicate transcriptional termination or another mechanism.
>> 
>> The files returned should have the features included as long as they are
>> in
>> the full length GenBank record.  I tried it out using the web form but not
>> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
>> page.
>> 
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>>> -----Original Message-----
>>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>>> Sent: Thursday, February 16, 2006 4:19 PM
>>> To: Chris Fields
>>> Cc: Harry Mangalam; bioperl-l
>>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> or
>>> GeneIDs
>>> 
>>> Chris,
>>> 
>>> Yes. The question now is where to easily get the coordinates.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
>>> 
>>>> I think a method was recently implemented in Bio::DB::GenBank to
>>>> retrieve a segment of DNA given start and end coordinates in GenBank
>>>> format; that should contain the features you need.  I requested it
>>>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>>>> Would that help?
>>>> 
>>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>>> 
>>>>> Harry,
>>>>> 
>>>>> It's not clear to me that NCBI's eutils offers this capability
>>>>> directly. You
>>>>> can probably download Entrez Gene entries and parse them for
>>>>> coordinates but
>>>>> I know of no way to remotely retrieve genomic sequences like this
>>>>> from NCBI
>>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>>>> that some
>>>>> of us favor and to prove to myself that this is simple to do I wrote
>> a
>>>>> script that I just added to examples/tools, it's called
>>>>> extract_genes.pl and
>>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>>>> species to some dir, download Entrez Gene's gene2accession file,
>>>>> and run. It
>>>>> creates and stores a hash for lookups, it won't read gene2accession
>>>>> each
>>>>> time it runs.
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> 
>>>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>>> 
>>>>>> Hi Brian,
>>>>>> 
>>>>>> Thanks very much for the pointers and the speed of your reply and
>>>>>> apologies
>>>>>> for the speed of mine.
>>>>>> 
>>>>>> This looks good, but what I was looking for was a bioP approach
>>>>>> for hooking to
>>>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>>>> them.  In this
>>>>>> case, speed of retrieval is not critical and I'd rather not
>>>>>> download the
>>>>>> entirety of the sequences to a local disk to hack at them.
>>>>>> 
>>>>>> I've determined a screen-scraping approach to get them and could
>>>>>> script that,
>>>>>> but I thought that bioP had a method for using NCBI's external
>>>>>> API's, tho it
>>>>>> may be that my memory is faulty or the approach is no longer
>>>>>> supported due to
>>>>>> overload.
>>>>>> 
>>>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>>>> docs on them
>>>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>>>> which I
>>>>>> haven't started to excavate).
>>>>>> 
>>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>>>> listening?
>>>>>> 
>>>>>> Harry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>>>> Harry,
>>>>>>> 
>>>>>>> Hope you're doing well. The approach could be based on
>>>>>>> Bio::DB::Fasta. So,
>>>>>>> from its documentation:
>>>>>>> 
>>>>>>>   use Bio::DB::Fasta;
>>>>>>> 
>>>>>>>   # create database from directory of fasta files
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   # simple access (for those without Bioperl)
>>>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>>>   my @ids     = $db->ids;
>>>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>>>> 
>>>>>>>   # Bioperl-style access
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>>>   my $seq     = $obj->seq;
>>>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>>>> 
>>>>>>> Do you already have the offsets?
>>>>>>> 
>>>>>>> Brian O.
>>>>>>> 
>>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>>>> still
>>>>>>>> can't find the answer to this.  Forgive me if I've missed
>> something
>>>>>>>> obvious.
>>>>>>>> 
>>>>>>>> This should not be a novel request, but I've not found it
>>>>>>>> answered.  If
>>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>>>> pointer to a
>>>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>>>> 
>>>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>>>> offset
>>>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be
>> a
>>>>>>>> common followup chore for some extra analysis from a gene
>>>>>>>> expression
>>>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>>>> the
>>>>>>>> sequence type to specify...?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> TIA!
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From osborne1 at optonline.net  Fri Feb 17 23:56:08 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 17 Feb 2006 23:56:08 -0500
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: 

Michael,

Yes, BioPerl has done this for you. Essentially what it does it take all the
ids in the CONTIG section and query for each individually, then use the
sequences and the location data to create the single large sequence. This
sequence is appended to the annotation and feature section of the initial
Genbank entry. If you want to study this yourself take a look at
Bio::DB::NCBIHelper::postprocess_data.

OK, to answer your first question with my assumption: what NCBI is doing is
simply providing a shorthand rather than an entire large sequence, therefore
no feature coordinates change, whether it's shorthand, CONTIG, or longhand,
ORIGIN. Second, my explanation tells you that all the sequences are the very
latest versions of each sequence, that's how eutils works by default.
However, I don't think I've answered your question because I'm not sure I
understand what you mean by "when I ask bioperl if these sequences have been
updated, I will be told no". All Bioperl does is read the file provided by
GenBank and use its stated version, nothing fancy.

Brian O.


On 2/16/06 5:31 AM, "michael watson (IAH-C)" 
wrote:

> Hi
> 
> I have two questions really.  I fetched bacterial genome sequences from
> the NCBI using Bio::DB::GenBank.
> 
> Some of these sequence entries are CONTIG sequences, ie they just point
> to other sequences that need to be joined together to form the entire
> genome.
> 
> Looking at my downloads, it looks as if bioperl has done all the
> necessary joining for me - or maybe it was the NCBI that did the
> joining?
> 
> OK, so firstly, did bioperl do the joining, and if so, are all the
> co-ordinates of the features updated to reflect their new location on
> the new, joined sequence?
> 
> And secondly, sequence versions... I'm thinking that possibly the
> sequence version of the CONTIG may be 1 (as it hasn't changed) yet the
> versions of the sequences it refers to might have changed, so when I ask
> bioperl if these sequences have been updated, I will be told no because
> the CONTIG sequence version is 1, but I should be told yes because the
> underlying sequences have...?
> 
> Make sense?
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From pedro.fabre at gmail.com  Fri Feb 17 13:36:37 2006
From: pedro.fabre at gmail.com (pedro fabre)
Date: Fri, 17 Feb 2006 18:36:37 +0000
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: 

>Torsten and all,
>
>  I don't think this will work for me for it only generates 
>statistics for a single sequence.  What I need is a count matrix for 
>each position for a number of DNA sequences.  In other words, if I 
>pass there 3 sequences to this function then it returns the count 
>for each postion for each nucleotide.
>
>  For example if I pass an array of sequences say: ATC,CCC,TTT
>  then I should get a matrix back that will have count for postion 
>1,2,3 for each A,C,T, or G like this:
>
>
>                  1    2   3
>       A        1    0    0
>       C        1    1    2
>       T        1    2    1    
>       G        0    0    0
>
>  Any idea of this is already built somewhere in bioperl?
>
>  Thank you.
>
>


Sam,

What about this?

I worked in something like that some time ago for SNP calculation

and it looks to me you are on the same way.

If you have a sequence like

   A       C       G       T       C       C       A       -       T
   C       G       G       T       A       G       T       G       C
   C       C       C       C       C       G       T       G       C
   C       G       C       T       C       G       T       G       C

Convert the sequence to numbers (0 for the first value, 1 for the 
first modification (reading by columns), 2 for the second 
modification and so on)
Deletions can be considered as another base if you like

After that:


   0       0       0       0       0       0       0       0       0
   1       1       0       0       1       1       1       1       1
   1       0       1       1       0       1       1       1       1
   1       1       1       0       0       1       1       1       1

Once we have the haplotype converted to numbers we have to generate the
snp type information for the haplotype.


SNP code = SUM ( value * multiplicity ^ position );>

     where:
       SUM is the sum of the values for the SNP
       value is the SNP number code (0 [generally for the mayor allele],
                                     1 [for the minor allele].
       position is the position on the block.

For this example the code is:

   0       0       0       0       0       0       0       0       0
   1       1       0       0       1       1       1       1       1
   1       0       1       1       0       1       1       1       1
   1       1       1       0       0       1       1       1       1
  ------------------------------------------------------------------
   14      10      12      4       2       14      14      14      14

   14 = 0*2^0 + 1*2^1 + 1*2^2 + 1*2^3
   12 = 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3
   ....

Once we have the families classify. We will B just the SNP's B.

   14      10      12      4       2

If you want to look into the code follow this link.


http://users.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/PopGen/HtSNP.pm?rev=1.4&content-type=text/vnd.viewcvs-markup

HTH
Pedro



>  Torsten Seemann  wrote:> 
>Say I have an array of nucleotide sequences of of length N. I want 
>to calculate the count matrix (weight matrix). That is for each 
>position 1..N, I want to know how many As, Cs ,Ts and Gs there are. 
>Is the code to do this already written in bioperl to build this 
>matrix if I pass it those strings?
>>    Please excuse my lack of knowledge as I am a new comer to bioinformatics.
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation even has an
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely,
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Sat Feb 18 18:35:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 18 Feb 2006 17:35:22 -0600
Subject: [Bioperl-l] Bio::SearchIO fix posted in Bugzilla
Message-ID: <97C946BE-8410-4B7F-9FA3-97A01641E20E@uiuc.edu>

Added a fix for the blastn and tblastx problems with Bio::SearchIO  
text parsing of BLAST 2.2.13 output:

http://bugzilla.open-bio.org/show_bug.cgi?id=1934

The extra lines "Features in this part of subject sequence" and the  
following descriptive lines are passed over using a loop.  See the  
bug report for specifics.

Cheers,

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From osborne1 at optonline.net  Sun Feb 19 00:47:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 19 Feb 2006 00:47:44 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
 orGeneIDs
In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine>
Message-ID: 

Chris and Harry,

OK, I've put the missing link in place. This is Bio::DB::EntrezGene, so you
can get NCBI Genes as objects, perfectly analogous to Bio::DB::GenBank and
the related modules:

use Bio::DB::EntrezGene;
$db = new Bio::DB::EntrezGene;
$seq = $db->get_Seq_by_id(2);

So starting with just a Gene id, then using Bio::DB::GenBank as Chris
showed, you can get the sequence. What's a little odd is how Entrez Gene has
stored positional information and Sequence identifier, you may have thought
that they'd create a special set of fields for this but no, it's only
available as part of a URL as far as I can tell:

Bio::Annotation::DBLink=HASH()
'_root_verbose' => 0

'database' => 'Evidence Viewer'

'primary_id' => 4693

'url' => 
'http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&contig=NT_079573.2&ge
ne=NDP&lid=4693&from=6657835&to=6682559'


Question: are NT_* sequences going to be a problem for Bio::DB::GenBank? I
see this in NCBIHelper:

# NT contigs can not be retrieved

$self->throw("NT_ contigs are whole chromosome files which are not part of
regular".
"database distributions. Go to ftp://ftp.ncbi.nih.gov/genomes/.")
      if $ids =~ /NT_/;


Perhaps we can modify this so there's no throw() when a seq_start and
seq_stop are specified.

Brian O.

On 2/17/06 6:02 PM, "Chris Fields"  wrote:

> Brian,
> 
> I added some sample code to the page.  See what you think.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 16, 2006 4:46 PM
>> To: 'Brian Osborne'
>> Cc: 'Harry Mangalam'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> orGeneIDs
>> 
>> If I know the start, end, and strand info for a list of features (personal
>> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
>> up), couldn't I try pulling out the surrounding region?  My thought is
>> this,
>> though I haven't coded it yet:
>> 
>> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
>> (array of hashes) based off what I get from RNAMotif objects.
>> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
>> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
>> in there somewhere to not tick off the NCBI curators.
>> 
>> Reason I'm interested in this is b/c I want to know where the RNA motif is
>> in context to surrounding features. If it is very close to a coding
>> region,
>> then the motif likely indicates translational regulation.  Further away
>> may
>> indicate transcriptional termination or another mechanism.
>> 
>> The files returned should have the features included as long as they are
>> in
>> the full length GenBank record.  I tried it out using the web form but not
>> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
>> page.
>> 
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>>> -----Original Message-----
>>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>>> Sent: Thursday, February 16, 2006 4:19 PM
>>> To: Chris Fields
>>> Cc: Harry Mangalam; bioperl-l
>>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> or
>>> GeneIDs
>>> 
>>> Chris,
>>> 
>>> Yes. The question now is where to easily get the coordinates.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
>>> 
>>>> I think a method was recently implemented in Bio::DB::GenBank to
>>>> retrieve a segment of DNA given start and end coordinates in GenBank
>>>> format; that should contain the features you need.  I requested it
>>>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>>>> Would that help?
>>>> 
>>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>>> 
>>>>> Harry,
>>>>> 
>>>>> It's not clear to me that NCBI's eutils offers this capability
>>>>> directly. You
>>>>> can probably download Entrez Gene entries and parse them for
>>>>> coordinates but
>>>>> I know of no way to remotely retrieve genomic sequences like this
>>>>> from NCBI
>>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>>>> that some
>>>>> of us favor and to prove to myself that this is simple to do I wrote
>> a
>>>>> script that I just added to examples/tools, it's called
>>>>> extract_genes.pl and
>>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>>>> species to some dir, download Entrez Gene's gene2accession file,
>>>>> and run. It
>>>>> creates and stores a hash for lookups, it won't read gene2accession
>>>>> each
>>>>> time it runs.
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> 
>>>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>>> 
>>>>>> Hi Brian,
>>>>>> 
>>>>>> Thanks very much for the pointers and the speed of your reply and
>>>>>> apologies
>>>>>> for the speed of mine.
>>>>>> 
>>>>>> This looks good, but what I was looking for was a bioP approach
>>>>>> for hooking to
>>>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>>>> them.  In this
>>>>>> case, speed of retrieval is not critical and I'd rather not
>>>>>> download the
>>>>>> entirety of the sequences to a local disk to hack at them.
>>>>>> 
>>>>>> I've determined a screen-scraping approach to get them and could
>>>>>> script that,
>>>>>> but I thought that bioP had a method for using NCBI's external
>>>>>> API's, tho it
>>>>>> may be that my memory is faulty or the approach is no longer
>>>>>> supported due to
>>>>>> overload.
>>>>>> 
>>>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>>>> docs on them
>>>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>>>> which I
>>>>>> haven't started to excavate).
>>>>>> 
>>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>>>> listening?
>>>>>> 
>>>>>> Harry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>>>> Harry,
>>>>>>> 
>>>>>>> Hope you're doing well. The approach could be based on
>>>>>>> Bio::DB::Fasta. So,
>>>>>>> from its documentation:
>>>>>>> 
>>>>>>>   use Bio::DB::Fasta;
>>>>>>> 
>>>>>>>   # create database from directory of fasta files
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   # simple access (for those without Bioperl)
>>>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>>>   my @ids     = $db->ids;
>>>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>>>> 
>>>>>>>   # Bioperl-style access
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>>>   my $seq     = $obj->seq;
>>>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>>>> 
>>>>>>> Do you already have the offsets?
>>>>>>> 
>>>>>>> Brian O.
>>>>>>> 
>>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>>>> still
>>>>>>>> can't find the answer to this.  Forgive me if I've missed
>> something
>>>>>>>> obvious.
>>>>>>>> 
>>>>>>>> This should not be a novel request, but I've not found it
>>>>>>>> answered.  If
>>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>>>> pointer to a
>>>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>>>> 
>>>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>>>> offset
>>>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be
>> a
>>>>>>>> common followup chore for some extra analysis from a gene
>>>>>>>> expression
>>>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>>>> the
>>>>>>>> sequence type to specify...?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> TIA!
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From maximilianh at gmail.com  Sun Feb 19 08:52:37 2006
From: maximilianh at gmail.com (Maximilian Haeussler)
Date: Sun, 19 Feb 2006 14:52:37 +0100
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <76f031ae0602190552v5f2542dbv@mail.gmail.com>

Hi bio-mailinglists,

does anyone here know of a tool or a library to display two (or more)
sequences at the same time with coloured features? Possibly with lines,
connecting some features from one sequence to the other (synteny-plot) ?
Or to display two multiple alignments, one on top of each other, with
colored features added?

It's not that it would be difficult to write, but programming visualisation
usually takes a lot of time.
Bio::Graphics seems mainly concerned with one main sequence and features on
it. Well, I could copy together two of these gif-images, but then there
would be no connecting lines. Same applies for the graphics in Biojava or
the gff2ps tool or all the multiple alignment viewers that I know (Bioedit,
ClustalX). There is something called Toucan in Java, which displays at least
several lines of gff-style-features, but no visible sequences and more
importantly, no connecting lines. A recent software, Djinn lite, is using a
similar kind of visualization to compare different spliced genes from
various species, but it's mainly aimed at splicing and written in Visual
Basic.
I guess a good compromise might be the 3D viewer Sockeye, but I haven't seen
any synteny-lines in sockeye yet.

I guess I must have missed something here. I cannot be the first one that
would like to compare, say, two gff files, or two multiple alignments?

Thanks a lot for any idea,
Max



From lutfullah at upesh.edu  Sun Feb 19 12:01:05 2006
From: lutfullah at upesh.edu (Dr. Lutfullah)
Date: Sun, 19 Feb 2006 22:01:05 +0500
Subject: [Bioperl-l] bioperl in jail
Message-ID: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>

Hello,

I am trying to create a situation where users can ssh login to a chrooted
jailed account with limited functionality.
I created the chroot jail on my Fedora Core 4 installation using a script
available at:
http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/
The script has a line:
======================
APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
/bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
/usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
/usr/libexec/openssh/sftp-server"
=======================
to which I added everything I could get with /bin/perl to make it:

APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
/bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
/usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
/usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5
/usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp
/usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl"

perl becomes available inside the jail but I cannot use the line "use
Bio::Perl" inside the jail.

The script produces an error on including /usr/lib or /usr/lib/perl5:

Copying necessary library-files to jail (may take some time)
cp: omitting directory `/usr/lib'
ldd: /usr/lib: No such file or directory
Copying files from /etc/pam.d/ to jail
Copying PAM-Modules to jail

In the jailed account the little test program:

use Bio::Perl;
print 2+4;

generated this error:

Can't locate Bio/Perl.pm in @INC (@INC contains:
/usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.4/i386-linux-thread
............................................

Any help would be much appreciated. Thanks in advance.

LK



From boris.steipe at utoronto.ca  Sun Feb 19 17:34:52 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Sun, 19 Feb 2006 17:34:52 -0500
Subject: [Bioperl-l] bioperl in jail
In-Reply-To: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>
References: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>
Message-ID: 

The path that perl uses internally to search its modules (@INC) is  
not the same thing as the path your shell uses. You have to modify  
@INC either within running scripts, or by setting the PERL5LIB  
environment variable upon login.

e.g. see http://modperlbook.org/html/ch03_09.html

HTH,
B.



On 19 Feb 2006, at 12:01, Dr. Lutfullah wrote:

> Hello,
>
> I am trying to create a situation where users can ssh login to a  
> chrooted
> jailed account with limited functionality.
> I created the chroot jail on my Fedora Core 4 installation using a  
> script
> available at:
> http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/
> The script has a line:
> ======================
> APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
> /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
> /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
> /usr/libexec/openssh/sftp-server"
> =======================
> to which I added everything I could get with /bin/perl to make it:
>
> APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
> /bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
> /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
> /usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5
> /usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp
> /usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl"
>
> perl becomes available inside the jail but I cannot use the line "use
> Bio::Perl" inside the jail.
>
> The script produces an error on including /usr/lib or /usr/lib/perl5:
>
> Copying necessary library-files to jail (may take some time)
> cp: omitting directory `/usr/lib'
> ldd: /usr/lib: No such file or directory
> Copying files from /etc/pam.d/ to jail
> Copying PAM-Modules to jail
>
> In the jailed account the little test program:
>
> use Bio::Perl;
> print 2+4;
>
> generated this error:
>
> Can't locate Bio/Perl.pm in @INC (@INC contains:
> /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread
> ............................................
>
> Any help would be much appreciated. Thanks in advance.
>
> LK
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From khoueiry at ibdm.univ-mrs.fr  Mon Feb 20 04:27:07 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Mon, 20 Feb 2006 10:27:07 +0100
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
Message-ID: <1140427628.10569.10.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 

From shameer at ncbs.res.in  Mon Feb 20 01:21:01 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 20 Feb 2006 11:51:01 +0530 (IST)
Subject: [Bioperl-l] Matrix Average Code / Module ?
In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
Message-ID: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176>

Hi all,
Is there any program/module to calculate the average of a blosum/pam any
matrix ?

I have a matrix and I need to see the average

for example

11 22 43 54 50
27 87 74 32 10
66 58 98 78 20
22 23 44 16 34

I have gone through Bio::Matrix::MatrixI and Bio::Matrix::GenericMatrix
and other perl modules like Math::Matrix
http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm
and Math::Cephes::Matrix - but none of them have a provison to do matrix 
average calculation.

Any help ???
thanks in advance,
Happy biocomputing !!!


-- 
Shameer Khadar
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From cjfields at uiuc.edu  Mon Feb 20 12:01:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 20 Feb 2006 11:01:26 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
Message-ID: <000e01c6363f$494bc5e0$15327e82@pyrimidine>

I have added a preliminary bugfix for the problems seen with nucleotide
blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
perltidy to space out the blocks (really for my own purposes; it's a pretty
complex module).  The fix bypasses the extra lines output for blastn and
tblastx and now seems to parse the text output for those reports correctly.
I tested it using all NCBI BLAST flavors for the last two version of BLAST
(2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
tested on MacOSX at the moment, so I need people out there to test it out on
anything they can to make sure it works before committing.  I'll be trying
it on Windows today.  Report back to me and I'll post anything on bugzilla.

Here it is:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> Sent: Thursday, February 16, 2006 3:46 AM
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org; Chris Fields
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> version 1.28
> 
> Hi,
> 
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast-output.
> (see e.g. "Features flanking this part..." or "Features in this part
> ..."), example added.
> The blast.pm module starts looking for the hsp-alignement-information,
> but it dies when it hits this Feature-information.
> 
> Pieter
> 
> 
> >gi|77552765|gb|DP000011.1|
>  list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> chromosome 12, complete
> 
> sequence
> Length=27492551
> 
>  Features flanking this part of subject sequence:
> 
> 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class
>  &from=19251479&to=19253693&view=gbwithparts>
> 
> 2655 bp at 3' side: hypothetical protein
>  &from=19260091&to=19260600&view=gbwithparts>
> 
>  Score = 36.2 bits (18),  Expect = 0.22
>  Identities = 18/18 (100%), Gaps = 0/18 (0%)
>  Strand=Plus/Minus
> 
> Query  4         GTACTACTCTACTCTACT  21
>                  ||||||||||||||||||
> 
> Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> 
> 
>  Features flanking this part of subject sequence:
> 
> 2991 bp at 5' side: hypothetical protein
>  &from=27003164&to=27003907&view=gbwithparts>
>    1131 bp at 3' side: hypothetical protein
> 
>  &from=27008046&to=27010752&view=gbwithparts>
> 
>  Score = 36.2 bits (18),  Expect = 0.22
>  Identities = 18/18 (100%), Gaps = 0/18 (0%)
>  Strand=Plus/Minus
> 
> Query  2         ATGTACTACTCTACTCTA  19
>                  ||||||||||||||||||
> Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> 
> 
> 
>  Features in this part of subject sequence:
>    DHHC zinc finger domain, putative
> 
>  &from=17614825&to=17618687&view=gbwithparts>
> 
>  Score = 34.2 bits (17),  Expect = 0.87
>  Identities = 17/17 (100%), Gaps = 0/17 (0%)
>  Strand=Plus/Plus
> 
> Query  5         TACTACTCTACTCTACT  21
>                  |||||||||||||||||
> Sbjct  17616437  TACTACTCTACTCTACT  17616453
> 
> 
> 
>  Features flanking this part of subject sequence:
>    102 bp at 5' side: bZIP transcription factor, putative
> 
>  &from=2774964&to=2775778&view=gbwithparts>
>    3740 bp at 3' side: yeast dcp1, putative
>  &from=2779635&to=2782508&view=gbwithparts>
> 
>  Score = 32.2 bits (16),  Expect =
> 3.4
>  Identities = 16/16 (100%), Gaps = 0/16 (0%)
>  Strand=Plus/Plus
> 
> Query  7        CTACTCTACTCTACTC  22
>                 ||||||||||||||||
> Sbjct  2775880  CTACTCTACTCTACTC  2775895
> 
> 
>  Features flanking this part of subject sequence:
> 
>    21 bp at 5' side: peptide transporter T17F3.11, putative
>  &from=27321354&to=27323117&view=gbwithparts>
> 
> 10230 bp at 3' side: transposon protein, putative, unclassified
>  &from=27333383&to=27334285&view=gbwithparts>
> 
>  Score = 32.2 bits (16),  Expect = 3.4
>  Identities = 16/16 (100%), Gaps = 0/16 (0%)
>  Strand=Plus/Minus
> 
> Query  7         CTACTCTACTCTACTC  22
> 
>                  ||||||||||||||||
> Sbjct  27323153  CTACTCTACTCTACTC  27323138
> 
> 
> 
> 
> Guojun Yang wrote:
> 
> >Hi, Chris,
> >Finally the remoteblast test script works for the amino.fa query. but
> when I try a nucleic acid sequence (see below), Error occurs:
> >"
> >waiting........
> >------------- EXCEPTION  -------------
> >MSG: no data for midline  Features flanking this part of subject
> sequence:
> >STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> hIO/blast.pm:1172
> >STACK toplevel remoteblast_test:40
> >"
> >The query sequence is:
> >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> >
> >The script (basically same as the remoteblast test, I only changed
> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> >#!/usr/bin/perl
> >
> >use Bio::SeqIO;
> >use Bio::Seq;
> >use Bio::Tools::Run::RemoteBlast;
> >use Bio::SearchIO;
> >use strict;
> >my $prog='blastn';
> >my $db='nr';
> >my $e_val=1e-10;
> >my @params=( -prog=>$prog,
> >	-data=>$db,
> >	-expect=>$e_val,
> >	-readmethod=>'SearchIO');
> >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >my $v = 1;
> >
> >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> >
> >while (my $input = $str->next_seq()){
> >  #Blast a sequence against a database:
> >  #Alternatively, you could  pass in a file with many
> >  #sequences rather than loop through sequence one at a time
> >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >  #and swap the two lines below for an example of that.
> >  my $r = $factory->submit_blast($input);
> >  #my $r = $factory->submit_blast('amino.fa');
> >  print STDERR "waiting..." if( $v > 0 );
> >  while ( my @rids = $factory->each_rid ) {
> >    foreach my $rid ( @rids ) {
> >      my $rc = $factory->retrieve_blast($rid);
> >      if( !ref($rc) ) {
> >        if( $rc < 0 ) {
> >          $factory->remove_rid($rid);
> >        }
> >        print STDERR "." if ( $v > 0 );
> >        sleep 5;
> >      } else {
> >        my $result = $rc->next_result();
> >        #save the output
> >        my $filename = $result->query_name()."\.out";
> >        $factory->save_output($filename);
> >        $factory->remove_rid($rid);
> >        print "\nQuery Name: ", $result->query_name(), "\n";
> >        while ( my $hit = $result->next_hit ) {
> >          next unless ( $v > 0);
> >          print "\thit name is ", $hit->name, "\n";
> >          while( my $hsp = $hit->next_hsp ) {
> >            print "\t\tscore is ", $hsp->score, "\n";
> >          }
> >        }
> >      }
> >    }
> >  }
> >}
> >
> >
> >Do you think there might still be something in the NCBI output format?
> >
> >Thank you,
> >Guojun
> >
> >
> >
> >
> >Guojun Yang
> >Department of Plant Biology
> >University of Georgia
> >Tel: 706-542-1857
> >Fax: 706-542-1805
> >http://www.arches.uga.edu/~guojun
> >
> >
> >
> >----- Original Message -----
> >From: Chris Fields [mailto:cjfields at uiuc.edu]
> >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >
> >
> >
> >
> >>Sorry, forgot to add that I didn't see the regex issue that you
> mentioned.
> >>It could be a perl-related issue.  Try the fixes I mentioned and see
> what
> >>happens.
> >>
> >>
> >>>Christopher Fields
> >>>
> >>>
> >>Postdoctoral Researcher - Switzer Lab
> >>Dept. of Biochemistry
> >>University of Illinois Urbana-Champaign
> >>
> >>
> >>>>>-----Original Message-----
> >>>>>
> >>>>>
> >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>Sent: Tuesday, February 14, 2006 12:36 PM
> >>>To: 'gyang at plantbio.uga.edu'
> >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >>>
> >>>
> >>>>>It's a good habit to always add single quotes around words.  The perl
> >>>>>
> >>>>>
> >>>interpreter may think a single bare word is a subroutine or perlfunc
> >>>called with no args so will try to find a subroutine named blastp().
> My
> >>>debugger actually gives the error that the bare word blastp may
> conflict
> >>>with a future reserved word.  Like you said, 'use strict' will point
> that
> >>>out.
> >>>
> >>>
> >>>>>As for the regex, it should match all the blast programs at NCBI
> (blastp,
> >>>>>
> >>>>>
> >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> >>>else passes through.
> >>>
> >>>
> >>>>>So, if you are using the script below, there are several errors.  The
> bare
> >>>>>
> >>>>>
> >>>words for $prog and $db need quotes, and the flags for you @params
> array
> >>>don't have a dash before them.  I get this after adding quotes but
> before
> >>>adding the dashes to @params:
> >>>
> >>>
> >>>>>C:\Perl\Scripts>test_blast.pl
> >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>>
> >>>>>
> >>>MSG:
> >>>STACK: Error::throw
> >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> >>>live/Bio/Root/Root.pm:328
> >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> >>>-----------------------------------------------------------
> >>>
> >>>
> >>>>>The last line indicates a problem with this line:
> >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>>Changing the @params to this:
> >>>>>my @params=( -prog=>$prog,
> >>>>>
> >>>>>
> >>>	-data=>$db,
> >>>	-expect=>$e_val,
> >>>	-readmethod=>'SearchIO');
> >>>
> >>>
> >>>>>fixes it, and I get output as expected.
> >>>>>Christopher Fields
> >>>>>
> >>>>>
> >>>Postdoctoral Researcher - Switzer Lab
> >>>Dept. of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>>>>>>-----Original Message-----
> >>>>>>>>
> >>>>>>>>
> >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >>>>
> >>>>Hi, Chris,
> >>>>When I tried with the perldoc script, It did not work either. First it
> >>>>says $prog can not be bare word if I "use strict". I added quotes on
> the
> >>>>words, then it says the value for $prog does not match expression
> >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> >>>>
> >>>>
> >>>script
> >>>
> >>>
> >>>>is shown below. Why is the expression "t?blast[pnx]"?
> >>>>
> >>>>#!/usr/bin/perl
> >>>>
> >>>>use Bio::SeqIO;
> >>>>use Bio::Seq;
> >>>>use Bio::Tools::Run::RemoteBlast;
> >>>>use Bio::SearchIO;
> >>>>
> >>>>
> >>>>my $prog=blastp;
> >>>>my $db=swissprot;
> >>>>my $e_val=1e-10;
> >>>>my @params=( prog=>$prog,
> >>>>	data=>$db,
> >>>>	expect=>$e_val,
> >>>>	readmethod=>'SearchIO');
> >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>
> >>>>my $v = 1;
> >>>>
> >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >>>>
> >>>>while (my $input = $str->next_seq()){
> >>>>  #Blast a sequence against a database:
> >>>>  #Alternatively, you could  pass in a file with many
> >>>>  #sequences rather than loop through sequence one at a time
> >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >>>>  #and swap the two lines below for an example of that.
> >>>>  my $r = $factory->submit_blast($input);
> >>>>  #my $r = $factory->submit_blast('amino.fa');
> >>>>  print STDERR "waiting..." if( $v > 0 );
> >>>>  while ( my @rids = $factory->each_rid ) {
> >>>>    foreach my $rid ( @rids ) {
> >>>>      my $rc = $factory->retrieve_blast($rid);
> >>>>      if( !ref($rc) ) {
> >>>>        if( $rc < 0 ) {
> >>>>          $factory->remove_rid($rid);
> >>>>        }
> >>>>        print STDERR "." if ( $v > 0 );
> >>>>        sleep 5;
> >>>>      } else {
> >>>>        my $result = $rc->next_result();
> >>>>        #save the output
> >>>>        my $filename = $result->query_name()."\.out";
> >>>>        $factory->save_output($filename);
> >>>>        $factory->remove_rid($rid);
> >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>        while ( my $hit = $result->next_hit ) {
> >>>>          next unless ( $v > 0);
> >>>>          print "\thit name is ", $hit->name, "\n";
> >>>>          while( my $hsp = $hit->next_hsp ) {
> >>>>            print "\t\tscore is ", $hsp->score, "\n";
> >>>>          }
> >>>>        }
> >>>>      }
> >>>>    }
> >>>>  }
> >>>>}
> >>>>
> >>>>Thank you for your help!
> >>>>
> >>>>
> >>>>Guojun
> >>>>Department of Plant Biology
> >>>>University of Georgia
> >>>>
> >>>>----- Original Message -----
> >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>To: gyang at plantbio.uga.edu
> >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Try two things:
> >>>>>
> >>>>>
> >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> >>>>>>
> >>>>>>
> >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> >>>>>
> >>>>>
> >>>>wrong
> >>>>
> >>>>
> >>>>>with the logic in your subroutine:
> >>>>>
> >>>>>
> >>>>>>my $v = 1;
> >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >>>>>>while (my $input = $str->next_seq()){
> >>>>>>
> >>>>>>
> >>>>>  #Blast a sequence against a database:
> >>>>>  #Alternatively, you could  pass in a file with many
> >>>>>  #sequences rather than loop through sequence one at a time
> >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >>>>>  #and swap the two lines below for an example of that.
> >>>>>  my $r = $factory->submit_blast($input);
> >>>>>  #my $r = $factory->submit_blast('amino.fa');
> >>>>>  print STDERR "waiting..." if( $v > 0 );
> >>>>>  while ( my @rids = $factory->each_rid ) {
> >>>>>    foreach my $rid ( @rids ) {
> >>>>>      my $rc = $factory->retrieve_blast($rid);
> >>>>>      if( !ref($rc) ) {
> >>>>>        if( $rc < 0 ) {
> >>>>>          $factory->remove_rid($rid);
> >>>>>        }
> >>>>>        print STDERR "." if ( $v > 0 );
> >>>>>        sleep 5;
> >>>>>      } else {
> >>>>>        my $result = $rc->next_result();
> >>>>>        #save the output
> >>>>>        my $filename = $result->query_name()."\.out";
> >>>>>        $factory->save_output($filename);
> >>>>>        $factory->remove_rid($rid);
> >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>>        while ( my $hit = $result->next_hit ) {
> >>>>>          next unless ( $v > 0);
> >>>>>          print "\thit name is ", $hit->name, "\n";
> >>>>>          while( my $hsp = $hit->next_hsp ) {
> >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> >>>>>          }
> >>>>>        }
> >>>>>      }
> >>>>>    }
> >>>>>  }
> >>>>>}
> >>>>>
> >>>>>
> >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> >>>>>>
> >>>>>>
> >>>really
> >>>
> >>>
> >>>>>shouldn't make that much of a difference, but I noticed that the CVS
> >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> >>>>>released; the Bugzilla version is based off CVS.
> >>>>>
> >>>>>
> >>>>>>Christopher Fields
> >>>>>>
> >>>>>>
> >>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>Dept. of Biochemistry
> >>>>>University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>
> >>>>>>>
> >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> >>>>>>To: bioperl-l at lists.open-bio.org
> >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>
> >>>>>>
> >>>>>>>>Thanks, Chris,
> >>>>>>>>
> >>>>>>>>
> >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> >>>>>>
> >>>>>>
> >>>one
> >>>
> >>>
> >>>>from
> >>>>
> >>>>
> >>>>>>your bug report. The running version is 1.5 when I use the command
> >>>>>>
> >>>>>>
> >>>you
> >>>
> >>>
> >>>>>>sent me. But when I tried the script, it doesn't change much. My
> >>>>>>remoteblast code (portion) is here:
> >>>>>>
> >>>>>>
> >>>>>>>>sub search {
> >>>>>>>>
> >>>>>>>>
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> >>>>>>local
> >>>>>>
> >>>>>>
> >>>>>>
> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> >>>
> >>>
> >>>>>>'no';
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> >>>>>>			      -id=>"query",
> >>>>>>			      -desc=>"new seq");
> >>>>>>my $len=$query->length();
> >>>>>>@db=('nr','htgs','wgs');
> >>>>>>foreach my $db (@db) {
> >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> >>>>>>						'-data' =>"$db",
> >>>>>>
> >>>>>>
> >>>>>>
> >>'-expect'=>"$E_value");
> >>
> >>
> >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>my @rids = $factory->each_rid();
> >>>>>>>>
> >>>>>>>>
> >>>>>>foreach my $rid ( @rids ) {
> >>>>>>    print STDERR "$rid\n";
> >>>>>>}
> >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> >>>>>>print STDERR "waiting...";
> >>>>>>sleep 60;
> >>>>>>
> >>>>>>
> >>>>>>>>foreach my $rid ( @rids ) {
> >>>>>>>>
> >>>>>>>>
> >>>>>>    my $rc = $factory->retrieve_blast($rid);
> >>>>>>    while (!ref($rc) ) {
> >>>>>>	if( $rc < 0 ) {
> >>>>>># retrieve_blast returns -1 on error
> >>>>>>	    $factory->remove_rid($rid);
> >>>>>>	    print "Error!\n";
> >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> >>>>>>	    die "Can't retrieve $rid";
> >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> >>>>>>
> >>>>>>
> >>>finished'
> >>>
> >>>
> >>>>>>	    sleep 60;
> >>>>>>	    $rc = $factory->retrieve_blast($rid);
> >>>>>>	}
> >>>>>>    }
> >>>>>>    if (ref($rc)) {
> >>>>>>	print STDERR "Done.\n";
> >>>>>>	 while( my $result = $rc->next_result) {
> >>>>>>	    while( my $hit = $result->next_hit()) {
> >>>>>>	    	$hit_name=$hit->name;
> >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> >>>>>>		$name=$1;
> >>>>>>		@left_plus_start=();
> >>>>>>		@left_plus_end=();
> >>>>>>		@left_minus_start=();
> >>>>>>		@left_minus_end=();
> >>>>>>		@right_plus_start=();
> >>>>>>		@right_plus_end=();
> >>>>>>		@right_minus_start=();
> >>>>>>		@right_minus_end=();
> >>>>>>
> >>>>>>
> >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> >>>>>>>>
> >>>>>>>>
> >>>>>>		while( my $hsp = $hit->next_hsp()) {
> >>>>>>......
> >>>>>>
> >>>>>>
> >>>>>>>>It was working quite well before around October laster year, but
> >>>>>>>>
> >>>>>>>>
> >>>>it has
> >>>>
> >>>>
> >>>>>>stopped since then, When a submission is sent via a webpage, the cgi
> >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> >>>>>>
> >>>>>>
> >>>>finally
> >>>>
> >>>>
> >>>>>>the expected email is received but without real results although it
> >>>>>>
> >>>>>>
> >>>>does
> >>>>
> >>>>
> >>>>>>contain something from other parts of the script. Apparently the
> >>>>>>
> >>>>>>
> >>>>search
> >>>>
> >>>>
> >>>>>>sub did not return anything (I know there is something should be
> >>>>>>returned.). Is it also possible the format of the NCBI output for
> >>>>>>
> >>>>>>
> >>>each
> >>>
> >>>
> >>>>>>result has changed?
> >>>>>>Thank you,
> >>>>>>Guojun
> >>>>>>
> >>>>>>
> >>>>>>>>>>Department of Plant Biology
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>University of Georgia
> >>>>>>
> >>>>>>
> >>>>>>>>>>>>----- Original Message -----
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>
> >>>>>>
> >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>you
> >>>
> >>>
> >>>>checking
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> >>>>>>>
> >>>>>>>
> >>>>two
> >>>>
> >>>>
> >>>>>>>separate directories) or are you looking in modules?  Here's the
> >>>>>>>
> >>>>>>>
> >>>way
> >>>
> >>>
> >>>>to
> >>>>
> >>>>
> >>>>>>>check the version (from the FAQ):
> >>>>>>>
> >>>>>>>
> >>>>>>>>perl -MBio::Root::Version -e 'print
> >>>>>>>>
> >>>>>>>>
> >>>>$Bio::Root::Version::VERSION,"\n"'
> >>>>
> >>>>
> >>>>>>>>If you have two full bioperl distributions on your computer,
> >>>>>>>>
> >>>>>>>>
> >>>>normally
> >>>>
> >>>>
> >>>>>>only
> >>>>>>
> >>>>>>
> >>>>>>>one will be in use unless you have explicitly set the environment
> >>>>>>>
> >>>>>>>
> >>>>>>variable
> >>>>>>
> >>>>>>
> >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first before
> >>>>>>>
> >>>>>>>
> >>>>your
> >>>>
> >>>>
> >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> >>>>>>>
> >>>>>>>
> >>>>mixing
> >>>>
> >>>>
> >>>>>>>then, but only if perl can't find a particular module in the path
> >>>>>>>
> >>>>>>>
> >>>>>>designated
> >>>>>>
> >>>>>>
> >>>>>>>in PERL5LIB; then it will progress through the directories listed
> >>>>>>>
> >>>>>>>
> >>>in
> >>>
> >>>
> >>>>>>@INC.
> >>>>>>
> >>>>>>
> >>>>>>>This may happen if a module is unique to a particular release, but
> >>>>>>>
> >>>>>>>
> >>>>>>shouldn't
> >>>>>>
> >>>>>>
> >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> >>>>>>>
> >>>>>>>
> >>>can
> >>>
> >>>
> >>>>>>check
> >>>>>>
> >>>>>>
> >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> >>>>>>>
> >>>>>>>
> >>>>differ
> >>>>
> >>>>
> >>>>>>>depending on your OS, perl build, etc.
> >>>>>>>
> >>>>>>>
> >>>>>>>>Regardless, if you follow the directions for installing bioperl
> >>>>>>>>
> >>>>>>>>
> >>>>for
> >>>>
> >>>>
> >>>>>>your
> >>>>>>
> >>>>>>
> >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> >>>>>>>
> >>>>>>>
> >>>>unless
> >>>>
> >>>>
> >>>>>>you
> >>>>>>
> >>>>>>
> >>>>>>>explicitly change the installation directory when using 'perl
> >>>>>>>
> >>>>>>>
> >>>>>>Makefile.PL'),
> >>>>>>
> >>>>>>
> >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> >>>>>>>
> >>>>>>>
> >>>>install
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> >>>>>>>
> >>>>>>>
> >>>>See
> >>>>
> >>>>
> >>>>>>this
> >>>>>>
> >>>>>>
> >>>>>>>page:
> >>>>>>>
> >>>>>>>
> >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> >>>>>>>>for more details.
> >>>>>>>>Christopher Fields
> >>>>>>>>
> >>>>>>>>
> >>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>Dept. of Biochemistry
> >>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> >>>>>>>>To: bioperl-l at lists.open-bio.org
> >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Hi, Chris,
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>I do have different versions of bioperl on my Linux machine
> >>>>>>>>
> >>>>>>>>
> >>>(1.4.
> >>>
> >>>
> >>>>and
> >>>>
> >>>>
> >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> >>>>>>>>
> >>>>>>>>
> >>>>1.5.1
> >>>>
> >>>>
> >>>>>>or I
> >>>>>>
> >>>>>>
> >>>>>>>>need to uninstall and remove the previous versions. I could not
> >>>>>>>>
> >>>>>>>>
> >>>>find
> >>>>
> >>>>
> >>>>>>any
> >>>>>>
> >>>>>>
> >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> >>>>>>>>
> >>>>>>>>
> >>>>some
> >>>>
> >>>>
> >>>>>>>>suggestion?
> >>>>>>>>Thanks,
> >>>>>>>>Guojun
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Department of Plant Biology
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>University of Georgia
> >>>>>>>>      _____
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>updated from CVS
> >>>>>>
> >>>>>>
> >>>>>>>>which isn't the latest fix.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Make sure that you check the following:
> >>>>>>>>>>1) Always post to the mailing list:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>(CVS)
> >>>>
> >>>>
> >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> >>>>>>>>
> >>>>>>>>
> >>>>only
> >>>>
> >>>>
> >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> >>>>>>>>
> >>>>>>>>
> >>>can't
> >>>
> >>>
> >>>>>>>>guarantee that mixing modules from old and new distributions
> >>>>>>>>
> >>>>>>>>
> >>>(1.4
> >>>
> >>>
> >>>>and
> >>>>
> >>>>
> >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> >>>>>>>>
> >>>>>>>>
> >>>>saved
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> >>>>>>>>
> >>>>>>>>
> >>>>>>(v2.2.13)
> >>>>>>
> >>>>>>
> >>>>>>>>but it should still save it. I believe as long as next_results()
> >>>>>>>>
> >>>>>>>>
> >>>>isn't
> >>>>
> >>>>
> >>>>>>>>called, it will work.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> >>>>>>>>>>
> >>>>>>>>>>
> >>>2.2.13
> >>>
> >>>
> >>>>>>text output
> >>>>>>
> >>>>>>
> >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> >>>>>>>>
> >>>>>>>>
> >>>Roger
> >>>
> >>>
> >>>>Hall
> >>>>
> >>>>
> >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> >>>>>>>>
> >>>>>>>>
> >>>>(Jason
> >>>>
> >>>>
> >>>>>>or
> >>>>>>
> >>>>>>
> >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> >>>>>>>>
> >>>>>>>>
> >>>>>>Bugzilla:
> >>>>>>
> >>>>>>
> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>option
> >>>>
> >>>>
> >>>>>>of
> >>>>>>
> >>>>>>
> >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> >>>>>>>>
> >>>>>>>>
> >>>>this
> >>>>
> >>>>
> >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> >>>>>>>>
> >>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>>>means that the final version will change to refle the new
> >>>>>>>>
> >>>>>>>>
> >>>version.
> >>>
> >>>
> >>>>>>>>>>>>Christopher Fields
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>Dept. of Biochemistry
> >>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>    _____
> >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> >>>>>>>>To: Chris Fields
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>Hi, Chris
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>for
> >>>>
> >>>>
> >>>>>>my cgi
> >>>>>>
> >>>>>>
> >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> >>>>>>>>
> >>>>>>>>
> >>>>even
> >>>>
> >>>>
> >>>>>>get
> >>>>>>
> >>>>>>
> >>>>>>>>any RID. Is there any suggestion?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>Guojun
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>Guojun Yang
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Department of Plant Biology
> >>>>>>>>University of Georgia
> >>>>>>>>Tel: 706-542-1857
> >>>>>>>>Fax: 706-542-1805
> >>>>>>>>http://www.arches.uga.edu/~guojun
> >>>>>>>>    _____
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>I would say give the new code a try, but realize that it
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>hasn't
> >>>>
> >>>>
> >>>>>>been
> >>>>>>
> >>>>>>
> >>>>>>>>checked
> >>>>>>>>in (like I said below). I will try going over the modified
> >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> >>>>>>>>
> >>>>>>>>
> >>>>anything I
> >>>>
> >>>>
> >>>>>>>>might
> >>>>>>>>have missed. The changed order in the header of BLAST text
> >>>>>>>>
> >>>>>>>>
> >>>output
> >>>
> >>>
> >>>>has
> >>>>
> >>>>
> >>>>>>me a
> >>>>>>
> >>>>>>
> >>>>>>>>bit worried that it might not catch everything, but it at least
> >>>>>>>>
> >>>>>>>>
> >>>>>>doesn't
> >>>>>>
> >>>>>>
> >>>>>>>>hang
> >>>>>>>>in the while() loop I described in the bug report below (bug
> >>>>>>>>
> >>>>>>>>
> >>>>#1934)
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>seems to process everything fine.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>If you want more stability in the code, you might consider
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>changing over
> >>>>>>
> >>>>>>
> >>>>>>>>to
> >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> >>>>>>>>
> >>>>>>>>
> >>>>some
> >>>>
> >>>>
> >>>>>>>>changes
> >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> >>>>>>>>
> >>>>>>>>
> >>>>saving
> >>>>
> >>>>
> >>>>>>XML
> >>>>>>
> >>>>>>
> >>>>>>>>output, but I believe it parses everything regardless. If you
> >>>>>>>>
> >>>>>>>>
> >>>look
> >>>
> >>>
> >>>>>>back
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>last month or so there has been a bit of discussion here about
> >>>>>>>>
> >>>>>>>>
> >>>it.
> >>>
> >>>
> >>>>>>Jason
> >>>>>>
> >>>>>>
> >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>remoteblast/
> >>>>>>
> >>>>>>
> >>>>>>>>>>Christopher Fields
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>Dept. of Biochemistry
> >>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> >>>>>>>>>To: bioperl-l at bioperl.org
> >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>>
> >>>>>>>>>
> >>>>version
> >>>>
> >>>>
> >>>>>>1.28
> >>>>>>
> >>>>>>
> >>>>>>>>>Hi, Everybody,
> >>>>>>>>>I see this post and am wondering if this is the reason for the
> >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> >>>>>>>>>
> >>>>>>>>>
> >>>>MAK,
> >>>>
> >>>>
> >>>>>>for
> >>>>>>
> >>>>>>
> >>>>>>>>MITE
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>sequence analysis. It was working very well until around
> >>>>>>>>>
> >>>>>>>>>
> >>>>November
> >>>>
> >>>>
> >>>>>>2005,
> >>>>>>
> >>>>>>
> >>>>>>>>>when it stopped returning any result (the site is fine and
> >>>>>>>>>
> >>>>>>>>>
> >>>seems
> >>>
> >>>
> >>>>to
> >>>>
> >>>>
> >>>>>>be
> >>>>>>
> >>>>>>
> >>>>>>>>>doing sth after submission). In the CGI script, I used
> >>>>>>>>>
> >>>>>>>>>
> >>>>remoteblast
> >>>>
> >>>>
> >>>>>>(that
> >>>>>>
> >>>>>>
> >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>access to
> >>>>>>
> >>>>>>
> >>>>>>>>>the server because I moved. Quite several people sent emails
> >>>>>>>>>
> >>>>>>>>>
> >>>to
> >>>
> >>>
> >>>>us
> >>>>
> >>>>
> >>>>>>about
> >>>>>>
> >>>>>>
> >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> >>>>>>>>>
> >>>>>>>>>
> >>>>problem?
> >>>>
> >>>>
> >>>>>>>>Should
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> >>>>>>>>>
> >>>>>>>>>
> >>>>version?
> >>>>
> >>>>
> >>>>>>>>>Thanks a lot,
> >>>>>>>>>Guojun
> >>>>>>>>>
> >>>>>>>>>Department of Plant Biology
> >>>>>>>>>University of Georgia
> >>>>>>>>>Tel: 706-542-1857
> >>>>>>>>>Fax: 706-542-1805
> >>>>>>>>>http://www.arches.uga.edu/~guojun
> >>>>>>>>>_____
> >>>>>>>>>
> >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> >>>>>>>>>
> >>>>>>>>>
> >>>>Jian'
> >>>>
> >>>>
> >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> >>>>>>>>>
> >>>>>>>>>
> >>>[mailto:bioperl-
> >>>
> >>>
> >>>>>>>>>l at bioperl.org]
> >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >>>>>>>>>
> >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> >>>>>>>>>
> >>>>>>>>>
> >>>>CVS.
> >>>>
> >>>>
> >>>>>>It
> >>>>>>
> >>>>>>
> >>>>>>>>>will
> >>>>>>>>>work for saving text output. However, it will not parse
> >>>>>>>>>
> >>>>>>>>>
> >>>anything
> >>>
> >>>
> >>>>>>using
> >>>>>>
> >>>>>>
> >>>>>>>>>next_result (it will likely hang) and will not save XML
> >>>>>>>>>
> >>>>>>>>>
> >>>format.
> >>>
> >>>
> >>>>See
> >>>>
> >>>>
> >>>>>>>>these
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>bugs:
> >>>>>>>>>
> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >>>>>>>>>
> >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> >>>>>>>>>
> >>>>>>>>>
> >>>and
> >>>
> >>>
> >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> >>>>>>>>>
> >>>>>>>>>
> >>>>yet
> >>>>
> >>>>
> >>>>>>so
> >>>>>>
> >>>>>>
> >>>>>>>>are
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>still not included in bioperl-live; they may be further
> >>>>>>>>>
> >>>>>>>>>
> >>>modified
> >>>
> >>>
> >>>>>>before
> >>>>>>
> >>>>>>
> >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> >>>>>>>>>
> >>>>>>>>>
> >>>>just
> >>>>
> >>>>
> >>>>>>try
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>first fix, which is a change to SearchIO::blast.
> >>>>>>>>>
> >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>script
> >>>>>>
> >>>>>>
> >>>>>>>>>which
> >>>>>>>>>had problems; the script you used saves the output but doesn't
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>actually
> >>>>>>
> >>>>>>
> >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> >>>>>>>>>
> >>>>>>>>>
> >>>>data).
> >>>>
> >>>>
> >>>>>>Is
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> >>>>>>>>>
> >>>>>>>>>
> >>>you
> >>>
> >>>
> >>>>>>tried
> >>>>>>
> >>>>>>
> >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> >>>>>>>>>
> >>>>>>>>>
> >>>>readmethod
> >>>>
> >>>>
> >>>>>>=>
> >>>>>>
> >>>>>>
> >>>>>>>>>blast"
> >>>>>>>>>using your version of RemoteBlast and method next_result()?
> >>>>>>>>>
> >>>>>>>>>
> >>>Like
> >>>
> >>>
> >>>>>>below
> >>>>>>
> >>>>>>
> >>>>>>>>>(from
> >>>>>>>>>perldoc):
> >>>>>>>>>
> >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> >>>>>>>>>foreach my $rid ( @rids ) {
> >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> >>>>>>>>>if( !ref($rc) ) {
> >>>>>>>>>if( $rc < 0 ) {
> >>>>>>>>>$factory->remove_rid($rid);
> >>>>>>>>>}
> >>>>>>>>>print STDERR "." if ( $v > 0 );
> >>>>>>>>>sleep 5;
> >>>>>>>>>} else { # parsing
> >>>>>>>>>starts here
> >>>>>>>>>my $result = $rc->next_result(); # it should hang
> >>>>>>>>>here
> >>>>>>>>>#save the output
> >>>>>>>>>my $filename = $result->query_name()."\.out";
> >>>>>>>>>$factory->save_output($filename);
> >>>>>>>>>$factory->remove_rid($rid);
> >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>>>>>>while ( my $hit = $result->next_hit ) {
> >>>>>>>>>next unless ( $v > 0);
> >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>My script hanged if I used next_result() in any way prior to
> >>>>>>>>>
> >>>>>>>>>
> >>>the
> >>>
> >>>
> >>>>>>fixes.
> >>>>>>
> >>>>>>
> >>>>>>>>I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>want to see how many others are having the same issues with
> >>>>>>>>>
> >>>>>>>>>
> >>>>parsing
> >>>>
> >>>>
> >>>>>>>>using
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>the CVS version of bioperl-live.
> >>>>>>>>>
> >>>>>>>>>Christopher Fields
> >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>>Dept. of Biochemistry
> >>>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> >>>>>>>>>>
> >>>>>>>>>>
> >>>l-
> >>>
> >>>
> >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> >>>>>>>>>>To: Huang Jian; bioperl-l
> >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >>>>>>>>>>
> >>>>>>>>>>Hi Huang,
> >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>works
> >>>>
> >>>>
> >>>>>>on
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>logic of checking the temporary file size to determine
> >>>>>>>>>>
> >>>>>>>>>>
> >>>whether
> >>>
> >>>
> >>>>the
> >>>>
> >>>>
> >>>>>>>>Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>results are ready. This condition is not getting satisfied
> >>>>>>>>>>
> >>>>>>>>>>
> >>>may
> >>>
> >>>
> >>>>be
> >>>>
> >>>>
> >>>>>>due
> >>>>>>
> >>>>>>
> >>>>>>>>to
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>some changes brought about by NCBI. I had this problem
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>recently
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>>>figured out that the solution was to use the latest version
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>has
> >>>>>>
> >>>>>>
> >>>>>>>>>>this problem fixed (does not use file size logic any more)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>is
> >>>>>>
> >>>>>>
> >>>>>>>>not
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>yet included in the BioPerl package.
> >>>>>>>>>>Cheers
> >>>>>>>>>>Nagesh
> >>>>>>>>>>
> >>>>>>>>>>Huang Jian wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>Dear Nagesh,
> >>>>>>>>>>>
> >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>you
> >>>>
> >>>>
> >>>>>>send
> >>>>>>
> >>>>>>
> >>>>>>>>>>>me. Now it works perfectly!!!
> >>>>>>>>>>>
> >>>>>>>>>>>Thank you!!
> >>>>>>>>>>>
> >>>>>>>>>>>Huang
> >>>>>>>>>>>
> >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> >>>>>>>>>>>
> >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> >>>>>>>>>>>
> >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>net,
> >>>
> >>>
> >>>>so
> >>>>
> >>>>
> >>>>>>still
> >>>>>>
> >>>>>>
> >>>>>>>>>>>via email
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>Hi Huang,
> >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>blast
> >>>
> >>>
> >>>>>>search.
> >>>>>>
> >>>>>>
> >>>>>>>>>Can
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>(2005/12/09).
> >>>>>>
> >>>>>>
> >>>>>>>>If
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>not I have attached it with this email, try to replace it
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>with
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>>>old
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>>one which has a bug.
> >>>>>>>>>>>>Let me know if it works.
> >>>>>>>>>>>>Nagesh
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>_______________________________________________
> >>>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>_______________________________________________
> >>>>>>
> >>>>>>
> >>>>>>>>Bioperl-l mailing list
> >>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>_______________________________________________
> >>>>>>>>
> >>>>>>>>
> >>>>>>Bioperl-l mailing list
> >>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From valiente at lsi.upc.edu  Mon Feb 20 13:51:35 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Mon, 20 Feb 2006 19:51:35 +0100
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <43FA0FB7.6060904@lsi.upc.edu>

The local flat file implementation of Bio::DB::Taxonomy seems to be fine:

use Bio::DB::Taxonomy;
my $nodesfile = "nodes.dmp";
my $namesfile = "names.dmp";
my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
                               -nodesfile => $nodesfile,
                               -namesfile => $namefile);
my $taxonid = $db->get_taxonid('Homo sapiens');

Here, $taxonid is 9606. However,

my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);

raises:

-------------------- WARNING ---------------------
MSG: can't create a species object for Homo sapiens (human) because it isn't a species but is a '' instead
---------------------------------------------------

Thanks,

Gabriel



From boris.steipe at utoronto.ca  Mon Feb 20 13:40:19 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 20 Feb 2006 13:40:19 -0500
Subject: [Bioperl-l] Matrix Average Code / Module ?
In-Reply-To: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
	<59825.192.168.1.176.1140416461.squirrel@192.168.1.176>
Message-ID: <92CF0104-0524-4BA3-B039-3CEECF68E20B@utoronto.ca>

Assuming you mean the arithmetic average of all elements in a matrix,  
you could do the following (using your numbers):


#!/usr/bin/perl -w
use strict;

my @matrix;

push(@matrix, [(11,22,43,54,50)]); # [(...)] :a list passed as an  
anonymous array
push(@matrix, [(27,87,74,32,10)]);
push(@matrix, [(66,58,98,78,20)]);
push(@matrix, [(22,23,44,16,34)]);

my $sum = 0;
my $number = 0;

foreach my $row (@matrix) {
     foreach my $element (@{$row}){
         $sum += $element;
         $number++;
     }
}

print "Average of $number elements = ", $sum/$number,"\n";
exit;


HTH,

B.




On 20 Feb 2006, at 01:21, Shameer Khadar wrote:

> Hi all,
> Is there any program/module to calculate the average of a blosum/ 
> pam any
> matrix ?
>
> I have a matrix and I need to see the average
>
> for example
>
> 11 22 43 54 50
> 27 87 74 32 10
> 66 58 98 78 20
> 22 23 44 16 34
>
> I have gone through Bio::Matrix::MatrixI and  
> Bio::Matrix::GenericMatrix
> and other perl modules like Math::Matrix
> http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm
> and Math::Cephes::Matrix - but none of them have a provison to do  
> matrix
> average calculation.
>
> Any help ???
> thanks in advance,
> Happy biocomputing !!!
>
>
> -- 
> Shameer Khadar
> National Centre for Biological Sciences (TIFR)
> UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
> T - 91-080-23636420-32 EXT 4241
> F - 91-080-23636662/23636675
> W - http://www.ncbs.res.in
> --------------------------------------------------
> "Refrain from illusions, insist on work and not words,
>  patiently seek divine and scientific truth."
> MM
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Mon Feb 20 17:01:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 20 Feb 2006 16:01:15 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on
	RemoteBlast.pmversion 1.28
In-Reply-To: <000e01c6363f$494bc5e0$15327e82@pyrimidine>
Message-ID: <000001c63669$2bf06a80$15327e82@pyrimidine>

Guojun Yang pointed out that his BLAST output was still not parsed
correctly, so I posted another change:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

The direct link for the module is:

http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view

Note that all caveats (can't sue if computer blows up, this is a very
preliminary bugfix, etc.) apply.

Apparently, NCBI has changed blastn and tblastx output to show features in
the region for each HSP, starting with the either one of the following
lines:

 Features in this part of subject sequence:
 Features flanking this part of subject sequence:

If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13,
most blastn or tblastx report parsing seems to choke on these lines, unless
you are pretty lucky.  This extra little feature was introduced a while back
for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default
and hadn't starting affecting web output until this last fall.  The first
fix I posted caught only the first version but not the second

The fix included a loop with debugging output to bypass this for now.  If
you use SearchIO directly for parsing (not through RemoteBlast) you can see
the bypassed lines by setting the '-verbose' flag to 1.

Thanks to Guojun Yang for pointing this out.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Monday, February 20, 2006 11:01 AM
> To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pmversion 1.28
> 
> I have added a preliminary bugfix for the problems seen with nucleotide
> blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
> perltidy to space out the blocks (really for my own purposes; it's a
> pretty
> complex module).  The fix bypasses the extra lines output for blastn and
> tblastx and now seems to parse the text output for those reports
> correctly.
> I tested it using all NCBI BLAST flavors for the last two version of BLAST
> (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
> report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
> tested on MacOSX at the moment, so I need people out there to test it out
> on
> anything they can to make sure it works before committing.  I'll be trying
> it on Windows today.  Report back to me and I'll post anything on
> bugzilla.
> 
> Here it is:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> > Sent: Thursday, February 16, 2006 3:46 AM
> > To: gyang at plantbio.uga.edu
> > Cc: bioperl-l at lists.open-bio.org; Chris Fields
> > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pm
> > version 1.28
> >
> > Hi,
> >
> > I have the same problem with the blast.pm-file.
> > The people of NCBI added some extra info when giving the Blast-output.
> > (see e.g. "Features flanking this part..." or "Features in this part
> > ..."), example added.
> > The blast.pm module starts looking for the hsp-alignement-information,
> > but it dies when it hits this Feature-information.
> >
> > Pieter
> >
> >
> > >gi|77552765|gb|DP000011.1|
> >
>  > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> > chromosome 12, complete
> >
> > sequence
> > Length=27492551
> >
> >  Features flanking this part of subject sequence:
> >
> > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-
> class
> >
>  > &from=19251479&to=19253693&view=gbwithparts>
> >
> > 2655 bp at 3' side: hypothetical protein
> >
>  > &from=19260091&to=19260600&view=gbwithparts>
> >
> >  Score = 36.2 bits (18),  Expect = 0.22
> >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> >  Strand=Plus/Minus
> >
> > Query  4         GTACTACTCTACTCTACT  21
> >                  ||||||||||||||||||
> >
> > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> >
> >
> >  Features flanking this part of subject sequence:
> >
> > 2991 bp at 5' side: hypothetical protein
> >
>  > &from=27003164&to=27003907&view=gbwithparts>
> >    1131 bp at 3' side: hypothetical protein
> >
> >
>  > &from=27008046&to=27010752&view=gbwithparts>
> >
> >  Score = 36.2 bits (18),  Expect = 0.22
> >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> >  Strand=Plus/Minus
> >
> > Query  2         ATGTACTACTCTACTCTA  19
> >                  ||||||||||||||||||
> > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> >
> >
> >
> >  Features in this part of subject sequence:
> >    DHHC zinc finger domain, putative
> >
> >
>  > &from=17614825&to=17618687&view=gbwithparts>
> >
> >  Score = 34.2 bits (17),  Expect = 0.87
> >  Identities = 17/17 (100%), Gaps = 0/17 (0%)
> >  Strand=Plus/Plus
> >
> > Query  5         TACTACTCTACTCTACT  21
> >                  |||||||||||||||||
> > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> >
> >
> >
> >  Features flanking this part of subject sequence:
> >    102 bp at 5' side: bZIP transcription factor, putative
> >
> >
>  > &from=2774964&to=2775778&view=gbwithparts>
> >    3740 bp at 3' side: yeast dcp1, putative
> >
>  > &from=2779635&to=2782508&view=gbwithparts>
> >
> >  Score = 32.2 bits (16),  Expect =
> > 3.4
> >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> >  Strand=Plus/Plus
> >
> > Query  7        CTACTCTACTCTACTC  22
> >                 ||||||||||||||||
> > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> >
> >
> >  Features flanking this part of subject sequence:
> >
> >    21 bp at 5' side: peptide transporter T17F3.11, putative
> >
>  > &from=27321354&to=27323117&view=gbwithparts>
> >
> > 10230 bp at 3' side: transposon protein, putative, unclassified
> >
>  > &from=27333383&to=27334285&view=gbwithparts>
> >
> >  Score = 32.2 bits (16),  Expect = 3.4
> >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> >  Strand=Plus/Minus
> >
> > Query  7         CTACTCTACTCTACTC  22
> >
> >                  ||||||||||||||||
> > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> >
> >
> >
> >
> > Guojun Yang wrote:
> >
> > >Hi, Chris,
> > >Finally the remoteblast test script works for the amino.fa query. but
> > when I try a nucleic acid sequence (see below), Error occurs:
> > >"
> > >waiting........
> > >------------- EXCEPTION  -------------
> > >MSG: no data for midline  Features flanking this part of subject
> > sequence:
> > >STACK Bio::SearchIO::blast::next_result
> > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> > hIO/blast.pm:1172
> > >STACK toplevel remoteblast_test:40
> > >"
> > >The query sequence is:
> > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > >
> > >The script (basically same as the remoteblast test, I only changed
> > database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > >#!/usr/bin/perl
> > >
> > >use Bio::SeqIO;
> > >use Bio::Seq;
> > >use Bio::Tools::Run::RemoteBlast;
> > >use Bio::SearchIO;
> > >use strict;
> > >my $prog='blastn';
> > >my $db='nr';
> > >my $e_val=1e-10;
> > >my @params=( -prog=>$prog,
> > >	-data=>$db,
> > >	-expect=>$e_val,
> > >	-readmethod=>'SearchIO');
> > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >
> > >my $v = 1;
> > >
> > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > >
> > >while (my $input = $str->next_seq()){
> > >  #Blast a sequence against a database:
> > >  #Alternatively, you could  pass in a file with many
> > >  #sequences rather than loop through sequence one at a time
> > >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >  #and swap the two lines below for an example of that.
> > >  my $r = $factory->submit_blast($input);
> > >  #my $r = $factory->submit_blast('amino.fa');
> > >  print STDERR "waiting..." if( $v > 0 );
> > >  while ( my @rids = $factory->each_rid ) {
> > >    foreach my $rid ( @rids ) {
> > >      my $rc = $factory->retrieve_blast($rid);
> > >      if( !ref($rc) ) {
> > >        if( $rc < 0 ) {
> > >          $factory->remove_rid($rid);
> > >        }
> > >        print STDERR "." if ( $v > 0 );
> > >        sleep 5;
> > >      } else {
> > >        my $result = $rc->next_result();
> > >        #save the output
> > >        my $filename = $result->query_name()."\.out";
> > >        $factory->save_output($filename);
> > >        $factory->remove_rid($rid);
> > >        print "\nQuery Name: ", $result->query_name(), "\n";
> > >        while ( my $hit = $result->next_hit ) {
> > >          next unless ( $v > 0);
> > >          print "\thit name is ", $hit->name, "\n";
> > >          while( my $hsp = $hit->next_hsp ) {
> > >            print "\t\tscore is ", $hsp->score, "\n";
> > >          }
> > >        }
> > >      }
> > >    }
> > >  }
> > >}
> > >
> > >
> > >Do you think there might still be something in the NCBI output format?
> > >
> > >Thank you,
> > >Guojun
> > >
> > >
> > >
> > >
> > >Guojun Yang
> > >Department of Plant Biology
> > >University of Georgia
> > >Tel: 706-542-1857
> > >Fax: 706-542-1805
> > >http://www.arches.uga.edu/~guojun
> > >
> > >
> > >
> > >----- Original Message -----
> > >From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >
> > >
> > >
> > >
> > >>Sorry, forgot to add that I didn't see the regex issue that you
> > mentioned.
> > >>It could be a perl-related issue.  Try the fixes I mentioned and see
> > what
> > >>happens.
> > >>
> > >>
> > >>>Christopher Fields
> > >>>
> > >>>
> > >>Postdoctoral Researcher - Switzer Lab
> > >>Dept. of Biochemistry
> > >>University of Illinois Urbana-Champaign
> > >>
> > >>
> > >>>>>-----Original Message-----
> > >>>>>
> > >>>>>
> > >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>Sent: Tuesday, February 14, 2006 12:36 PM
> > >>>To: 'gyang at plantbio.uga.edu'
> > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>
> > >>>
> > >>>>>It's a good habit to always add single quotes around words.  The
> perl
> > >>>>>
> > >>>>>
> > >>>interpreter may think a single bare word is a subroutine or perlfunc
> > >>>called with no args so will try to find a subroutine named blastp().
> > My
> > >>>debugger actually gives the error that the bare word blastp may
> > conflict
> > >>>with a future reserved word.  Like you said, 'use strict' will point
> > that
> > >>>out.
> > >>>
> > >>>
> > >>>>>As for the regex, it should match all the blast programs at NCBI
> > (blastp,
> > >>>>>
> > >>>>>
> > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure
> nothing
> > >>>else passes through.
> > >>>
> > >>>
> > >>>>>So, if you are using the script below, there are several errors.
> The
> > bare
> > >>>>>
> > >>>>>
> > >>>words for $prog and $db need quotes, and the flags for you @params
> > array
> > >>>don't have a dash before them.  I get this after adding quotes but
> > before
> > >>>adding the dashes to @params:
> > >>>
> > >>>
> > >>>>>C:\Perl\Scripts>test_blast.pl
> > >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> > >>>>>
> > >>>>>
> > >>>MSG:
> > >>>STACK: Error::throw
> > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > >>>live/Bio/Root/Root.pm:328
> > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> > >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> > >>>-----------------------------------------------------------
> > >>>
> > >>>
> > >>>>>The last line indicates a problem with this line:
> > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>Changing the @params to this:
> > >>>>>my @params=( -prog=>$prog,
> > >>>>>
> > >>>>>
> > >>>	-data=>$db,
> > >>>	-expect=>$e_val,
> > >>>	-readmethod=>'SearchIO');
> > >>>
> > >>>
> > >>>>>fixes it, and I get output as expected.
> > >>>>>Christopher Fields
> > >>>>>
> > >>>>>
> > >>>Postdoctoral Researcher - Switzer Lab
> > >>>Dept. of Biochemistry
> > >>>University of Illinois Urbana-Champaign
> > >>>
> > >>>
> > >>>>>>>>-----Original Message-----
> > >>>>>>>>
> > >>>>>>>>
> > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>
> > >>>>Hi, Chris,
> > >>>>When I tried with the perldoc script, It did not work either. First
> it
> > >>>>says $prog can not be bare word if I "use strict". I added quotes on
> > the
> > >>>>words, then it says the value for $prog does not match expression
> > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > >>>>
> > >>>>
> > >>>script
> > >>>
> > >>>
> > >>>>is shown below. Why is the expression "t?blast[pnx]"?
> > >>>>
> > >>>>#!/usr/bin/perl
> > >>>>
> > >>>>use Bio::SeqIO;
> > >>>>use Bio::Seq;
> > >>>>use Bio::Tools::Run::RemoteBlast;
> > >>>>use Bio::SearchIO;
> > >>>>
> > >>>>
> > >>>>my $prog=blastp;
> > >>>>my $db=swissprot;
> > >>>>my $e_val=1e-10;
> > >>>>my @params=( prog=>$prog,
> > >>>>	data=>$db,
> > >>>>	expect=>$e_val,
> > >>>>	readmethod=>'SearchIO');
> > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>
> > >>>>my $v = 1;
> > >>>>
> > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > >>>>
> > >>>>while (my $input = $str->next_seq()){
> > >>>>  #Blast a sequence against a database:
> > >>>>  #Alternatively, you could  pass in a file with many
> > >>>>  #sequences rather than loop through sequence one at a time
> > >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>  #and swap the two lines below for an example of that.
> > >>>>  my $r = $factory->submit_blast($input);
> > >>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>    foreach my $rid ( @rids ) {
> > >>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>      if( !ref($rc) ) {
> > >>>>        if( $rc < 0 ) {
> > >>>>          $factory->remove_rid($rid);
> > >>>>        }
> > >>>>        print STDERR "." if ( $v > 0 );
> > >>>>        sleep 5;
> > >>>>      } else {
> > >>>>        my $result = $rc->next_result();
> > >>>>        #save the output
> > >>>>        my $filename = $result->query_name()."\.out";
> > >>>>        $factory->save_output($filename);
> > >>>>        $factory->remove_rid($rid);
> > >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>        while ( my $hit = $result->next_hit ) {
> > >>>>          next unless ( $v > 0);
> > >>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>          }
> > >>>>        }
> > >>>>      }
> > >>>>    }
> > >>>>  }
> > >>>>}
> > >>>>
> > >>>>Thank you for your help!
> > >>>>
> > >>>>
> > >>>>Guojun
> > >>>>Department of Plant Biology
> > >>>>University of Georgia
> > >>>>
> > >>>>----- Original Message -----
> > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>To: gyang at plantbio.uga.edu
> > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>>Try two things:
> > >>>>>
> > >>>>>
> > >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> > >>>>>>
> > >>>>>>
> > >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > >>>>>
> > >>>>>
> > >>>>wrong
> > >>>>
> > >>>>
> > >>>>>with the logic in your subroutine:
> > >>>>>
> > >>>>>
> > >>>>>>my $v = 1;
> > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> );
> > >>>>>>while (my $input = $str->next_seq()){
> > >>>>>>
> > >>>>>>
> > >>>>>  #Blast a sequence against a database:
> > >>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>  #and swap the two lines below for an example of that.
> > >>>>>  my $r = $factory->submit_blast($input);
> > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>    foreach my $rid ( @rids ) {
> > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>      if( !ref($rc) ) {
> > >>>>>        if( $rc < 0 ) {
> > >>>>>          $factory->remove_rid($rid);
> > >>>>>        }
> > >>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>        sleep 5;
> > >>>>>      } else {
> > >>>>>        my $result = $rc->next_result();
> > >>>>>        #save the output
> > >>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>        $factory->save_output($filename);
> > >>>>>        $factory->remove_rid($rid);
> > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>          next unless ( $v > 0);
> > >>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>          }
> > >>>>>        }
> > >>>>>      }
> > >>>>>    }
> > >>>>>  }
> > >>>>>}
> > >>>>>
> > >>>>>
> > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > >>>>>>
> > >>>>>>
> > >>>really
> > >>>
> > >>>
> > >>>>>shouldn't make that much of a difference, but I noticed that the
> CVS
> > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > >>>>>released; the Bugzilla version is based off CVS.
> > >>>>>
> > >>>>>
> > >>>>>>Christopher Fields
> > >>>>>>
> > >>>>>>
> > >>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>Dept. of Biochemistry
> > >>>>>University of Illinois Urbana-Champaign
> > >>>>>
> > >>>>>
> > >>>>>>>-----Original Message-----
> > >>>>>>>
> > >>>>>>>
> > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> > >>>>>>To: bioperl-l at lists.open-bio.org
> > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>Thanks, Chris,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> > >>>>>>
> > >>>>>>
> > >>>one
> > >>>
> > >>>
> > >>>>from
> > >>>>
> > >>>>
> > >>>>>>your bug report. The running version is 1.5 when I use the command
> > >>>>>>
> > >>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>>>sent me. But when I tried the script, it doesn't change much. My
> > >>>>>>remoteblast code (portion) is here:
> > >>>>>>
> > >>>>>>
> > >>>>>>>>sub search {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>local
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > >>>>>>local
> > >>>>>>
> > >>>>>>
> > >>>>>>
> >
> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > >>>
> > >>>
> > >>>>>>'no';
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > >>>>>>			      -id=>"query",
> > >>>>>>			      -desc=>"new seq");
> > >>>>>>my $len=$query->length();
> > >>>>>>@db=('nr','htgs','wgs');
> > >>>>>>foreach my $db (@db) {
> > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> =>'blastn',
> > >>>>>>						'-data' =>"$db",
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>'-expect'=>"$E_value");
> > >>
> > >>
> > >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>my @rids = $factory->each_rid();
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>foreach my $rid ( @rids ) {
> > >>>>>>    print STDERR "$rid\n";
> > >>>>>>}
> > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > >>>>>>print STDERR "waiting...";
> > >>>>>>sleep 60;
> > >>>>>>
> > >>>>>>
> > >>>>>>>>foreach my $rid ( @rids ) {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>    my $rc = $factory->retrieve_blast($rid);
> > >>>>>>    while (!ref($rc) ) {
> > >>>>>>	if( $rc < 0 ) {
> > >>>>>># retrieve_blast returns -1 on error
> > >>>>>>	    $factory->remove_rid($rid);
> > >>>>>>	    print "Error!\n";
> > >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> > >>>>>>	    die "Can't retrieve $rid";
> > >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > >>>>>>
> > >>>>>>
> > >>>finished'
> > >>>
> > >>>
> > >>>>>>	    sleep 60;
> > >>>>>>	    $rc = $factory->retrieve_blast($rid);
> > >>>>>>	}
> > >>>>>>    }
> > >>>>>>    if (ref($rc)) {
> > >>>>>>	print STDERR "Done.\n";
> > >>>>>>	 while( my $result = $rc->next_result) {
> > >>>>>>	    while( my $hit = $result->next_hit()) {
> > >>>>>>	    	$hit_name=$hit->name;
> > >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > >>>>>>		$name=$1;
> > >>>>>>		@left_plus_start=();
> > >>>>>>		@left_plus_end=();
> > >>>>>>		@left_minus_start=();
> > >>>>>>		@left_minus_end=();
> > >>>>>>		@right_plus_start=();
> > >>>>>>		@right_plus_end=();
> > >>>>>>		@right_minus_start=();
> > >>>>>>		@right_minus_end=();
> > >>>>>>
> > >>>>>>
> > >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>		while( my $hsp = $hit->next_hsp()) {
> > >>>>>>......
> > >>>>>>
> > >>>>>>
> > >>>>>>>>It was working quite well before around October laster year, but
> > >>>>>>>>
> > >>>>>>>>
> > >>>>it has
> > >>>>
> > >>>>
> > >>>>>>stopped since then, When a submission is sent via a webpage, the
> cgi
> > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> > >>>>>>
> > >>>>>>
> > >>>>finally
> > >>>>
> > >>>>
> > >>>>>>the expected email is received but without real results although
> it
> > >>>>>>
> > >>>>>>
> > >>>>does
> > >>>>
> > >>>>
> > >>>>>>contain something from other parts of the script. Apparently the
> > >>>>>>
> > >>>>>>
> > >>>>search
> > >>>>
> > >>>>
> > >>>>>>sub did not return anything (I know there is something should be
> > >>>>>>returned.). Is it also possible the format of the NCBI output for
> > >>>>>>
> > >>>>>>
> > >>>each
> > >>>
> > >>>
> > >>>>>>result has changed?
> > >>>>>>Thank you,
> > >>>>>>Guojun
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>Department of Plant Biology
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>University of Georgia
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>>----- Original Message -----
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>checking
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> > >>>>>>>
> > >>>>>>>
> > >>>>two
> > >>>>
> > >>>>
> > >>>>>>>separate directories) or are you looking in modules?  Here's the
> > >>>>>>>
> > >>>>>>>
> > >>>way
> > >>>
> > >>>
> > >>>>to
> > >>>>
> > >>>>
> > >>>>>>>check the version (from the FAQ):
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>perl -MBio::Root::Version -e 'print
> > >>>>>>>>
> > >>>>>>>>
> > >>>>$Bio::Root::Version::VERSION,"\n"'
> > >>>>
> > >>>>
> > >>>>>>>>If you have two full bioperl distributions on your computer,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>normally
> > >>>>
> > >>>>
> > >>>>>>only
> > >>>>>>
> > >>>>>>
> > >>>>>>>one will be in use unless you have explicitly set the environment
> > >>>>>>>
> > >>>>>>>
> > >>>>>>variable
> > >>>>>>
> > >>>>>>
> > >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first
> before
> > >>>>>>>
> > >>>>>>>
> > >>>>your
> > >>>>
> > >>>>
> > >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> > >>>>>>>
> > >>>>>>>
> > >>>>mixing
> > >>>>
> > >>>>
> > >>>>>>>then, but only if perl can't find a particular module in the path
> > >>>>>>>
> > >>>>>>>
> > >>>>>>designated
> > >>>>>>
> > >>>>>>
> > >>>>>>>in PERL5LIB; then it will progress through the directories listed
> > >>>>>>>
> > >>>>>>>
> > >>>in
> > >>>
> > >>>
> > >>>>>>@INC.
> > >>>>>>
> > >>>>>>
> > >>>>>>>This may happen if a module is unique to a particular release,
> but
> > >>>>>>>
> > >>>>>>>
> > >>>>>>shouldn't
> > >>>>>>
> > >>>>>>
> > >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> > >>>>>>>
> > >>>>>>>
> > >>>can
> > >>>
> > >>>
> > >>>>>>check
> > >>>>>>
> > >>>>>>
> > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > >>>>>>>
> > >>>>>>>
> > >>>>differ
> > >>>>
> > >>>>
> > >>>>>>>depending on your OS, perl build, etc.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>Regardless, if you follow the directions for installing bioperl
> > >>>>>>>>
> > >>>>>>>>
> > >>>>for
> > >>>>
> > >>>>
> > >>>>>>your
> > >>>>>>
> > >>>>>>
> > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > >>>>>>>
> > >>>>>>>
> > >>>>unless
> > >>>>
> > >>>>
> > >>>>>>you
> > >>>>>>
> > >>>>>>
> > >>>>>>>explicitly change the installation directory when using 'perl
> > >>>>>>>
> > >>>>>>>
> > >>>>>>Makefile.PL'),
> > >>>>>>
> > >>>>>>
> > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> > >>>>>>>
> > >>>>>>>
> > >>>>install
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> > >>>>>>>
> > >>>>>>>
> > >>>>See
> > >>>>
> > >>>>
> > >>>>>>this
> > >>>>>>
> > >>>>>>
> > >>>>>>>page:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > >>>>>>>>for more details.
> > >>>>>>>>Christopher Fields
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>Dept. of Biochemistry
> > >>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> > >>>>>>>>To: bioperl-l at lists.open-bio.org
> > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Hi, Chris,
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>I do have different versions of bioperl on my Linux machine
> > >>>>>>>>
> > >>>>>>>>
> > >>>(1.4.
> > >>>
> > >>>
> > >>>>and
> > >>>>
> > >>>>
> > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> > >>>>>>>>
> > >>>>>>>>
> > >>>>1.5.1
> > >>>>
> > >>>>
> > >>>>>>or I
> > >>>>>>
> > >>>>>>
> > >>>>>>>>need to uninstall and remove the previous versions. I could not
> > >>>>>>>>
> > >>>>>>>>
> > >>>>find
> > >>>>
> > >>>>
> > >>>>>>any
> > >>>>>>
> > >>>>>>
> > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> > >>>>>>>>
> > >>>>>>>>
> > >>>>some
> > >>>>
> > >>>>
> > >>>>>>>>suggestion?
> > >>>>>>>>Thanks,
> > >>>>>>>>Guojun
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Department of Plant Biology
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>University of Georgia
> > >>>>>>>>      _____
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>updated from CVS
> > >>>>>>
> > >>>>>>
> > >>>>>>>>which isn't the latest fix.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Make sure that you check the following:
> > >>>>>>>>>>1) Always post to the mailing list:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>(CVS)
> > >>>>
> > >>>>
> > >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> > >>>>>>>>
> > >>>>>>>>
> > >>>>only
> > >>>>
> > >>>>
> > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > >>>>>>>>
> > >>>>>>>>
> > >>>can't
> > >>>
> > >>>
> > >>>>>>>>guarantee that mixing modules from old and new distributions
> > >>>>>>>>
> > >>>>>>>>
> > >>>(1.4
> > >>>
> > >>>
> > >>>>and
> > >>>>
> > >>>>
> > >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> > >>>>>>>>
> > >>>>>>>>
> > >>>>saved
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>(v2.2.13)
> > >>>>>>
> > >>>>>>
> > >>>>>>>>but it should still save it. I believe as long as next_results()
> > >>>>>>>>
> > >>>>>>>>
> > >>>>isn't
> > >>>>
> > >>>>
> > >>>>>>>>called, it will work.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>2.2.13
> > >>>
> > >>>
> > >>>>>>text output
> > >>>>>>
> > >>>>>>
> > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> > >>>>>>>>
> > >>>>>>>>
> > >>>Roger
> > >>>
> > >>>
> > >>>>Hall
> > >>>>
> > >>>>
> > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> > >>>>>>>>
> > >>>>>>>>
> > >>>>(Jason
> > >>>>
> > >>>>
> > >>>>>>or
> > >>>>>>
> > >>>>>>
> > >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>Bugzilla:
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>option
> > >>>>
> > >>>>
> > >>>>>>of
> > >>>>>>
> > >>>>>>
> > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> > >>>>>>>>
> > >>>>>>>>
> > >>>>this
> > >>>>
> > >>>>
> > >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>>>means that the final version will change to refle the new
> > >>>>>>>>
> > >>>>>>>>
> > >>>version.
> > >>>
> > >>>
> > >>>>>>>>>>>>Christopher Fields
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>Dept. of Biochemistry
> > >>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>    _____
> > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> > >>>>>>>>To: Chris Fields
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>Hi, Chris
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>for
> > >>>>
> > >>>>
> > >>>>>>my cgi
> > >>>>>>
> > >>>>>>
> > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > >>>>>>>>
> > >>>>>>>>
> > >>>>even
> > >>>>
> > >>>>
> > >>>>>>get
> > >>>>>>
> > >>>>>>
> > >>>>>>>>any RID. Is there any suggestion?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>>>Guojun
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>Guojun Yang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Department of Plant Biology
> > >>>>>>>>University of Georgia
> > >>>>>>>>Tel: 706-542-1857
> > >>>>>>>>Fax: 706-542-1805
> > >>>>>>>>http://www.arches.uga.edu/~guojun
> > >>>>>>>>    _____
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>I would say give the new code a try, but realize that it
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>hasn't
> > >>>>
> > >>>>
> > >>>>>>been
> > >>>>>>
> > >>>>>>
> > >>>>>>>>checked
> > >>>>>>>>in (like I said below). I will try going over the modified
> > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> > >>>>>>>>
> > >>>>>>>>
> > >>>>anything I
> > >>>>
> > >>>>
> > >>>>>>>>might
> > >>>>>>>>have missed. The changed order in the header of BLAST text
> > >>>>>>>>
> > >>>>>>>>
> > >>>output
> > >>>
> > >>>
> > >>>>has
> > >>>>
> > >>>>
> > >>>>>>me a
> > >>>>>>
> > >>>>>>
> > >>>>>>>>bit worried that it might not catch everything, but it at least
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>doesn't
> > >>>>>>
> > >>>>>>
> > >>>>>>>>hang
> > >>>>>>>>in the while() loop I described in the bug report below (bug
> > >>>>>>>>
> > >>>>>>>>
> > >>>>#1934)
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>seems to process everything fine.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>If you want more stability in the code, you might consider
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>changing over
> > >>>>>>
> > >>>>>>
> > >>>>>>>>to
> > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> > >>>>>>>>
> > >>>>>>>>
> > >>>>some
> > >>>>
> > >>>>
> > >>>>>>>>changes
> > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > >>>>>>>>
> > >>>>>>>>
> > >>>>saving
> > >>>>
> > >>>>
> > >>>>>>XML
> > >>>>>>
> > >>>>>>
> > >>>>>>>>output, but I believe it parses everything regardless. If you
> > >>>>>>>>
> > >>>>>>>>
> > >>>look
> > >>>
> > >>>
> > >>>>>>back
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>last month or so there has been a bit of discussion here about
> > >>>>>>>>
> > >>>>>>>>
> > >>>it.
> > >>>
> > >>>
> > >>>>>>Jason
> > >>>>>>
> > >>>>>>
> > >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>remoteblast/
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>Christopher Fields
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>Dept. of Biochemistry
> > >>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> > >>>>>>>>>To: bioperl-l at bioperl.org
> > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>version
> > >>>>
> > >>>>
> > >>>>>>1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>Hi, Everybody,
> > >>>>>>>>>I see this post and am wondering if this is the reason for the
> > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>MAK,
> > >>>>
> > >>>>
> > >>>>>>for
> > >>>>>>
> > >>>>>>
> > >>>>>>>>MITE
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>sequence analysis. It was working very well until around
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>November
> > >>>>
> > >>>>
> > >>>>>>2005,
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>when it stopped returning any result (the site is fine and
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>seems
> > >>>
> > >>>
> > >>>>to
> > >>>>
> > >>>>
> > >>>>>>be
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>doing sth after submission). In the CGI script, I used
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>remoteblast
> > >>>>
> > >>>>
> > >>>>>>(that
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>access to
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>the server because I moved. Quite several people sent emails
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>to
> > >>>
> > >>>
> > >>>>us
> > >>>>
> > >>>>
> > >>>>>>about
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>problem?
> > >>>>
> > >>>>
> > >>>>>>>>Should
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>version?
> > >>>>
> > >>>>
> > >>>>>>>>>Thanks a lot,
> > >>>>>>>>>Guojun
> > >>>>>>>>>
> > >>>>>>>>>Department of Plant Biology
> > >>>>>>>>>University of Georgia
> > >>>>>>>>>Tel: 706-542-1857
> > >>>>>>>>>Fax: 706-542-1805
> > >>>>>>>>>http://www.arches.uga.edu/~guojun
> > >>>>>>>>>_____
> > >>>>>>>>>
> > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>Jian'
> > >>>>
> > >>>>
> > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>[mailto:bioperl-
> > >>>
> > >>>
> > >>>>>>>>>l at bioperl.org]
> > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>
> > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>CVS.
> > >>>>
> > >>>>
> > >>>>>>It
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>will
> > >>>>>>>>>work for saving text output. However, it will not parse
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>anything
> > >>>
> > >>>
> > >>>>>>using
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>next_result (it will likely hang) and will not save XML
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>format.
> > >>>
> > >>>
> > >>>>See
> > >>>>
> > >>>>
> > >>>>>>>>these
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>bugs:
> > >>>>>>>>>
> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>
> > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>and
> > >>>
> > >>>
> > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>yet
> > >>>>
> > >>>>
> > >>>>>>so
> > >>>>>>
> > >>>>>>
> > >>>>>>>>are
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>still not included in bioperl-live; they may be further
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>modified
> > >>>
> > >>>
> > >>>>>>before
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>just
> > >>>>
> > >>>>
> > >>>>>>try
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>first fix, which is a change to SearchIO::blast.
> > >>>>>>>>>
> > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>script
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>which
> > >>>>>>>>>had problems; the script you used saves the output but doesn't
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>actually
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>data).
> > >>>>
> > >>>>
> > >>>>>>Is
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>>>tried
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>readmethod
> > >>>>
> > >>>>
> > >>>>>>=>
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>blast"
> > >>>>>>>>>using your version of RemoteBlast and method next_result()?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>Like
> > >>>
> > >>>
> > >>>>>>below
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>(from
> > >>>>>>>>>perldoc):
> > >>>>>>>>>
> > >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> > >>>>>>>>>foreach my $rid ( @rids ) {
> > >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>>>if( !ref($rc) ) {
> > >>>>>>>>>if( $rc < 0 ) {
> > >>>>>>>>>$factory->remove_rid($rid);
> > >>>>>>>>>}
> > >>>>>>>>>print STDERR "." if ( $v > 0 );
> > >>>>>>>>>sleep 5;
> > >>>>>>>>>} else { # parsing
> > >>>>>>>>>starts here
> > >>>>>>>>>my $result = $rc->next_result(); # it should hang
> > >>>>>>>>>here
> > >>>>>>>>>#save the output
> > >>>>>>>>>my $filename = $result->query_name()."\.out";
> > >>>>>>>>>$factory->save_output($filename);
> > >>>>>>>>>$factory->remove_rid($rid);
> > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>>>>while ( my $hit = $result->next_hit ) {
> > >>>>>>>>>next unless ( $v > 0);
> > >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> > >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>My script hanged if I used next_result() in any way prior to
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>the
> > >>>
> > >>>
> > >>>>>>fixes.
> > >>>>>>
> > >>>>>>
> > >>>>>>>>I
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>want to see how many others are having the same issues with
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>parsing
> > >>>>
> > >>>>
> > >>>>>>>>using
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>the CVS version of bioperl-live.
> > >>>>>>>>>
> > >>>>>>>>>Christopher Fields
> > >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>>Dept. of Biochemistry
> > >>>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>l-
> > >>>
> > >>>
> > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> > >>>>>>>>>>To: Huang Jian; bioperl-l
> > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>
> > >>>>>>>>>>Hi Huang,
> > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>works
> > >>>>
> > >>>>
> > >>>>>>on
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>logic of checking the temporary file size to determine
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>whether
> > >>>
> > >>>
> > >>>>the
> > >>>>
> > >>>>
> > >>>>>>>>Blast
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>results are ready. This condition is not getting satisfied
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>may
> > >>>
> > >>>
> > >>>>be
> > >>>>
> > >>>>
> > >>>>>>due
> > >>>>>>
> > >>>>>>
> > >>>>>>>>to
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>some changes brought about by NCBI. I had this problem
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>recently
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>figured out that the solution was to use the latest version
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>has
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>this problem fixed (does not use file size logic any more)
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>is
> > >>>>>>
> > >>>>>>
> > >>>>>>>>not
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>yet included in the BioPerl package.
> > >>>>>>>>>>Cheers
> > >>>>>>>>>>Nagesh
> > >>>>>>>>>>
> > >>>>>>>>>>Huang Jian wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>Dear Nagesh,
> > >>>>>>>>>>>
> > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>you
> > >>>>
> > >>>>
> > >>>>>>send
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>me. Now it works perfectly!!!
> > >>>>>>>>>>>
> > >>>>>>>>>>>Thank you!!
> > >>>>>>>>>>>
> > >>>>>>>>>>>Huang
> > >>>>>>>>>>>
> > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> > >>>>>>>>>>>
> > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> > >>>>>>>>>>>
> > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>net,
> > >>>
> > >>>
> > >>>>so
> > >>>>
> > >>>>
> > >>>>>>still
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>via email
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>Hi Huang,
> > >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>blast
> > >>>
> > >>>
> > >>>>>>search.
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>Can
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>(2005/12/09).
> > >>>>>>
> > >>>>>>
> > >>>>>>>>If
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>not I have attached it with this email, try to replace it
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>with
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>old
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>>>one which has a bug.
> > >>>>>>>>>>>>Let me know if it works.
> > >>>>>>>>>>>>Nagesh
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>_______________________________________________
> > >>>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>_______________________________________________
> > >>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>_______________________________________________
> > >>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>_______________________________________________
> > >>>>>>
> > >>>>>>
> > >>>>>>>>Bioperl-l mailing list
> > >>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>
> > >>>>>>>>_______________________________________________
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>Bioperl-l mailing list
> > >>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >
> > >_______________________________________________
> > >Bioperl-l mailing list
> > >Bioperl-l at lists.open-bio.org
> > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > >
> >
> > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Mon Feb 20 17:22:28 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 20 Feb 2006 17:22:28 -0500
Subject: [Bioperl-l] Tested-OK
Message-ID: <20060220172228.f7d22947@dogwood.plantbio.uga.edu>

Chris, I tested the latest fix for blast.pm on my linux with blastn. It worked very well although my CGI script still not returning what I need, but it's not related to this parsing of blast results I think. Thanks for your great efforts.

Guojun 

----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: 'Chris Fields' [mailto:cjfields at uiuc.edu], 'Pieter Monsieurs' [mailto:Pieter.Monsieurs at esat.kuleuven.be], gyang at plantbio.uga.edu
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28


> Guojun Yang pointed out that his BLAST output was still not parsed
> correctly, so I posted another change:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > The direct link for the module is:
> > http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view
> > Note that all caveats (can't sue if computer blows up, this is a very
> preliminary bugfix, etc.) apply.
> > Apparently, NCBI has changed blastn and tblastx output to show features in
> the region for each HSP, starting with the either one of the following
> lines:
> >  Features in this part of subject sequence:
>  Features flanking this part of subject sequence:
> > If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13,
> most blastn or tblastx report parsing seems to choke on these lines, unless
> you are pretty lucky.  This extra little feature was introduced a while back
> for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default
> and hadn't starting affecting web output until this last fall.  The first
> fix I posted caught only the first version but not the second
> > The fix included a loop with debugging output to bypass this for now.  If
> you use SearchIO directly for parsing (not through RemoteBlast) you can see
> the bypassed lines by setting the '-verbose' flag to 1.
> > Thanks to Guojun Yang for pointing this out.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Chris Fields
> > Sent: Monday, February 20, 2006 11:01 AM
> > To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> > RemoteBlast.pmversion 1.28
> > > > I have added a preliminary bugfix for the problems seen with nucleotide
> > blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
> > perltidy to space out the blocks (really for my own purposes; it's a
> > pretty
> > complex module).  The fix bypasses the extra lines output for blastn and
> > tblastx and now seems to parse the text output for those reports
> > correctly.
> > I tested it using all NCBI BLAST flavors for the last two version of BLAST
> > (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
> > report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
> > tested on MacOSX at the moment, so I need people out there to test it out
> > on
> > anything they can to make sure it works before committing.  I'll be trying
> > it on Windows today.  Report back to me and I'll post anything on
> > bugzilla.
> > > > Here it is:
> > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> > > Sent: Thursday, February 16, 2006 3:46 AM
> > > To: gyang at plantbio.uga.edu
> > > Cc: bioperl-l at lists.open-bio.org; Chris Fields
> > > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> > RemoteBlast.pm
> > > version 1.28
> > >
> > > Hi,
> > >
> > > I have the same problem with the blast.pm-file.
> > > The people of NCBI added some extra info when giving the Blast-output.
> > > (see e.g. "Features flanking this part..." or "Features in this part
> > > ..."), example added.
> > > The blast.pm module starts looking for the hsp-alignement-information,
> > > but it dies when it hits this Feature-information.
> > >
> > > Pieter
> > >
> > >
> > > >gi|77552765|gb|DP000011.1|
> > >
> >  > > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> > > chromosome 12, complete
> > >
> > > sequence
> > > Length=27492551
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-
> > class
> > >
> >  > > &from=19251479&to=19253693&view=gbwithparts>
> > >
> > > 2655 bp at 3' side: hypothetical protein
> > >
> >  > > &from=19260091&to=19260600&view=gbwithparts>
> > >
> > >  Score = 36.2 bits (18),  Expect = 0.22
> > >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  4         GTACTACTCTACTCTACT  21
> > >                  ||||||||||||||||||
> > >
> > > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > > 2991 bp at 5' side: hypothetical protein
> > >
> >  > > &from=27003164&to=27003907&view=gbwithparts>
> > >    1131 bp at 3' side: hypothetical protein
> > >
> > >
> >  > > &from=27008046&to=27010752&view=gbwithparts>
> > >
> > >  Score = 36.2 bits (18),  Expect = 0.22
> > >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  2         ATGTACTACTCTACTCTA  19
> > >                  ||||||||||||||||||
> > > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> > >
> > >
> > >
> > >  Features in this part of subject sequence:
> > >    DHHC zinc finger domain, putative
> > >
> > >
> >  > > &from=17614825&to=17618687&view=gbwithparts>
> > >
> > >  Score = 34.2 bits (17),  Expect = 0.87
> > >  Identities = 17/17 (100%), Gaps = 0/17 (0%)
> > >  Strand=Plus/Plus
> > >
> > > Query  5         TACTACTCTACTCTACT  21
> > >                  |||||||||||||||||
> > > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> > >
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >    102 bp at 5' side: bZIP transcription factor, putative
> > >
> > >
> >  > > &from=2774964&to=2775778&view=gbwithparts>
> > >    3740 bp at 3' side: yeast dcp1, putative
> > >
> >  > > &from=2779635&to=2782508&view=gbwithparts>
> > >
> > >  Score = 32.2 bits (16),  Expect =
> > > 3.4
> > >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > >  Strand=Plus/Plus
> > >
> > > Query  7        CTACTCTACTCTACTC  22
> > >                 ||||||||||||||||
> > > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > >    21 bp at 5' side: peptide transporter T17F3.11, putative
> > >
> >  > > &from=27321354&to=27323117&view=gbwithparts>
> > >
> > > 10230 bp at 3' side: transposon protein, putative, unclassified
> > >
> >  > > &from=27333383&to=27334285&view=gbwithparts>
> > >
> > >  Score = 32.2 bits (16),  Expect = 3.4
> > >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  7         CTACTCTACTCTACTC  22
> > >
> > >                  ||||||||||||||||
> > > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> > >
> > >
> > >
> > >
> > > Guojun Yang wrote:
> > >
> > > >Hi, Chris,
> > > >Finally the remoteblast test script works for the amino.fa query. but
> > > when I try a nucleic acid sequence (see below), Error occurs:
> > > >"
> > > >waiting........
> > > >------------- EXCEPTION  -------------
> > > >MSG: no data for midline  Features flanking this part of subject
> > > sequence:
> > > >STACK Bio::SearchIO::blast::next_result
> > > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> > > hIO/blast.pm:1172
> > > >STACK toplevel remoteblast_test:40
> > > >"
> > > >The query sequence is:
> > > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > > >
> > > >The script (basically same as the remoteblast test, I only changed
> > > database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > > >#!/usr/bin/perl
> > > >
> > > >use Bio::SeqIO;
> > > >use Bio::Seq;
> > > >use Bio::Tools::Run::RemoteBlast;
> > > >use Bio::SearchIO;
> > > >use strict;
> > > >my $prog='blastn';
> > > >my $db='nr';
> > > >my $e_val=1e-10;
> > > >my @params=( -prog=>$prog,
> > > >	-data=>$db,
> > > >	-expect=>$e_val,
> > > >	-readmethod=>'SearchIO');
> > > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >
> > > >my $v = 1;
> > > >
> > > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > > >
> > > >while (my $input = $str->next_seq()){
> > > >  #Blast a sequence against a database:
> > > >  #Alternatively, you could  pass in a file with many
> > > >  #sequences rather than loop through sequence one at a time
> > > >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >  #and swap the two lines below for an example of that.
> > > >  my $r = $factory->submit_blast($input);
> > > >  #my $r = $factory->submit_blast('amino.fa');
> > > >  print STDERR "waiting..." if( $v > 0 );
> > > >  while ( my @rids = $factory->each_rid ) {
> > > >    foreach my $rid ( @rids ) {
> > > >      my $rc = $factory->retrieve_blast($rid);
> > > >      if( !ref($rc) ) {
> > > >        if( $rc < 0 ) {
> > > >          $factory->remove_rid($rid);
> > > >        }
> > > >        print STDERR "." if ( $v > 0 );
> > > >        sleep 5;
> > > >      } else {
> > > >        my $result = $rc->next_result();
> > > >        #save the output
> > > >        my $filename = $result->query_name()."\.out";
> > > >        $factory->save_output($filename);
> > > >        $factory->remove_rid($rid);
> > > >        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >        while ( my $hit = $result->next_hit ) {
> > > >          next unless ( $v > 0);
> > > >          print "\thit name is ", $hit->name, "\n";
> > > >          while( my $hsp = $hit->next_hsp ) {
> > > >            print "\t\tscore is ", $hsp->score, "\n";
> > > >          }
> > > >        }
> > > >      }
> > > >    }
> > > >  }
> > > >}
> > > >
> > > >
> > > >Do you think there might still be something in the NCBI output format?
> > > >
> > > >Thank you,
> > > >Guojun
> > > >
> > > >
> > > >
> > > >
> > > >Guojun Yang
> > > >Department of Plant Biology
> > > >University of Georgia
> > > >Tel: 706-542-1857
> > > >Fax: 706-542-1805
> > > >http://www.arches.uga.edu/~guojun
> > > >
> > > >
> > > >
> > > >----- Original Message -----
> > > >From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >
> > > >
> > > >
> > > >
> > > >>Sorry, forgot to add that I didn't see the regex issue that you
> > > mentioned.
> > > >>It could be a perl-related issue.  Try the fixes I mentioned and see
> > > what
> > > >>happens.
> > > >>
> > > >>
> > > >>>Christopher Fields
> > > >>>
> > > >>>
> > > >>Postdoctoral Researcher - Switzer Lab
> > > >>Dept. of Biochemistry
> > > >>University of Illinois Urbana-Champaign
> > > >>
> > > >>
> > > >>>>>-----Original Message-----
> > > >>>>>
> > > >>>>>
> > > >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>Sent: Tuesday, February 14, 2006 12:36 PM
> > > >>>To: 'gyang at plantbio.uga.edu'
> > > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >>>
> > > >>>
> > > >>>>>It's a good habit to always add single quotes around words.  The
> > perl
> > > >>>>>
> > > >>>>>
> > > >>>interpreter may think a single bare word is a subroutine or perlfunc
> > > >>>called with no args so will try to find a subroutine named blastp().
> > > My
> > > >>>debugger actually gives the error that the bare word blastp may
> > > conflict
> > > >>>with a future reserved word.  Like you said, 'use strict' will point
> > > that
> > > >>>out.
> > > >>>
> > > >>>
> > > >>>>>As for the regex, it should match all the blast programs at NCBI
> > > (blastp,
> > > >>>>>
> > > >>>>>
> > > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure
> > nothing
> > > >>>else passes through.
> > > >>>
> > > >>>
> > > >>>>>So, if you are using the script below, there are several errors.
> > The
> > > bare
> > > >>>>>
> > > >>>>>
> > > >>>words for $prog and $db need quotes, and the flags for you @params
> > > array
> > > >>>don't have a dash before them.  I get this after adding quotes but
> > > before
> > > >>>adding the dashes to @params:
> > > >>>
> > > >>>
> > > >>>>>C:\Perl\Scripts>test_blast.pl
> > > >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> > > >>>>>
> > > >>>>>
> > > >>>MSG:
> > > >>>STACK: Error::throw
> > > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > > >>>live/Bio/Root/Root.pm:328
> > > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > > >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> > > >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> > > >>>-----------------------------------------------------------
> > > >>>
> > > >>>
> > > >>>>>The last line indicates a problem with this line:
> > > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >>>>>Changing the @params to this:
> > > >>>>>my @params=( -prog=>$prog,
> > > >>>>>
> > > >>>>>
> > > >>>	-data=>$db,
> > > >>>	-expect=>$e_val,
> > > >>>	-readmethod=>'SearchIO');
> > > >>>
> > > >>>
> > > >>>>>fixes it, and I get output as expected.
> > > >>>>>Christopher Fields
> > > >>>>>
> > > >>>>>
> > > >>>Postdoctoral Researcher - Switzer Lab
> > > >>>Dept. of Biochemistry
> > > >>>University of Illinois Urbana-Champaign
> > > >>>
> > > >>>
> > > >>>>>>>>-----Original Message-----
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> > > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >>>>
> > > >>>>Hi, Chris,
> > > >>>>When I tried with the perldoc script, It did not work either. First
> > it
> > > >>>>says $prog can not be bare word if I "use strict". I added quotes on
> > > the
> > > >>>>words, then it says the value for $prog does not match expression
> > > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > > >>>>
> > > >>>>
> > > >>>script
> > > >>>
> > > >>>
> > > >>>>is shown below. Why is the expression "t?blast[pnx]"?
> > > >>>>
> > > >>>>#!/usr/bin/perl
> > > >>>>
> > > >>>>use Bio::SeqIO;
> > > >>>>use Bio::Seq;
> > > >>>>use Bio::Tools::Run::RemoteBlast;
> > > >>>>use Bio::SearchIO;
> > > >>>>
> > > >>>>
> > > >>>>my $prog=blastp;
> > > >>>>my $db=swissprot;
> > > >>>>my $e_val=1e-10;
> > > >>>>my @params=( prog=>$prog,
> > > >>>>	data=>$db,
> > > >>>>	expect=>$e_val,
> > > >>>>	readmethod=>'SearchIO');
> > > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >>>>
> > > >>>>my $v = 1;
> > > >>>>
> > > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > >>>>
> > > >>>>while (my $input = $str->next_seq()){
> > > >>>>  #Blast a sequence against a database:
> > > >>>>  #Alternatively, you could  pass in a file with many
> > > >>>>  #sequences rather than loop through sequence one at a time
> > > >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >>>>  #and swap the two lines below for an example of that.
> > > >>>>  my $r = $factory->submit_blast($input);
> > > >>>>  #my $r = $factory->submit_blast('amino.fa');
> > > >>>>  print STDERR "waiting..." if( $v > 0 );
> > > >>>>  while ( my @rids = $factory->each_rid ) {
> > > >>>>    foreach my $rid ( @rids ) {
> > > >>>>      my $rc = $factory->retrieve_blast($rid);
> > > >>>>      if( !ref($rc) ) {
> > > >>>>        if( $rc < 0 ) {
> > > >>>>          $factory->remove_rid($rid);
> > > >>>>        }
> > > >>>>        print STDERR "." if ( $v > 0 );
> > > >>>>        sleep 5;
> > > >>>>      } else {
> > > >>>>        my $result = $rc->next_result();
> > > >>>>        #save the output
> > > >>>>        my $filename = $result->query_name()."\.out";
> > > >>>>        $factory->save_output($filename);
> > > >>>>        $factory->remove_rid($rid);
> > > >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>        while ( my $hit = $result->next_hit ) {
> > > >>>>          next unless ( $v > 0);
> > > >>>>          print "\thit name is ", $hit->name, "\n";
> > > >>>>          while( my $hsp = $hit->next_hsp ) {
> > > >>>>            print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>          }
> > > >>>>        }
> > > >>>>      }
> > > >>>>    }
> > > >>>>  }
> > > >>>>}
> > > >>>>
> > > >>>>Thank you for your help!
> > > >>>>
> > > >>>>
> > > >>>>Guojun
> > > >>>>Department of Plant Biology
> > > >>>>University of Georgia
> > > >>>>
> > > >>>>----- Original Message -----
> > > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>To: gyang at plantbio.uga.edu
> > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>>Try two things:
> > > >>>>>
> > > >>>>>
> > > >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> > > >>>>>>
> > > >>>>>>
> > > >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > > >>>>>
> > > >>>>>
> > > >>>>wrong
> > > >>>>
> > > >>>>
> > > >>>>>with the logic in your subroutine:
> > > >>>>>
> > > >>>>>
> > > >>>>>>my $v = 1;
> > > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> > );
> > > >>>>>>while (my $input = $str->next_seq()){
> > > >>>>>>
> > > >>>>>>
> > > >>>>>  #Blast a sequence against a database:
> > > >>>>>  #Alternatively, you could  pass in a file with many
> > > >>>>>  #sequences rather than loop through sequence one at a time
> > > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >>>>>  #and swap the two lines below for an example of that.
> > > >>>>>  my $r = $factory->submit_blast($input);
> > > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > > >>>>>  while ( my @rids = $factory->each_rid ) {
> > > >>>>>    foreach my $rid ( @rids ) {
> > > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > > >>>>>      if( !ref($rc) ) {
> > > >>>>>        if( $rc < 0 ) {
> > > >>>>>          $factory->remove_rid($rid);
> > > >>>>>        }
> > > >>>>>        print STDERR "." if ( $v > 0 );
> > > >>>>>        sleep 5;
> > > >>>>>      } else {
> > > >>>>>        my $result = $rc->next_result();
> > > >>>>>        #save the output
> > > >>>>>        my $filename = $result->query_name()."\.out";
> > > >>>>>        $factory->save_output($filename);
> > > >>>>>        $factory->remove_rid($rid);
> > > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>>        while ( my $hit = $result->next_hit ) {
> > > >>>>>          next unless ( $v > 0);
> > > >>>>>          print "\thit name is ", $hit->name, "\n";
> > > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>>          }
> > > >>>>>        }
> > > >>>>>      }
> > > >>>>>    }
> > > >>>>>  }
> > > >>>>>}
> > > >>>>>
> > > >>>>>
> > > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > > >>>>>>
> > > >>>>>>
> > > >>>really
> > > >>>
> > > >>>
> > > >>>>>shouldn't make that much of a difference, but I noticed that the
> > CVS
> > > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > >>>>>released; the Bugzilla version is based off CVS.
> > > >>>>>
> > > >>>>>
> > > >>>>>>Christopher Fields
> > > >>>>>>
> > > >>>>>>
> > > >>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>Dept. of Biochemistry
> > > >>>>>University of Illinois Urbana-Champaign
> > > >>>>>
> > > >>>>>
> > > >>>>>>>-----Original Message-----
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> > > >>>>>>To: bioperl-l at lists.open-bio.org
> > > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>Thanks, Chris,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> > > >>>>>>
> > > >>>>>>
> > > >>>one
> > > >>>
> > > >>>
> > > >>>>from
> > > >>>>
> > > >>>>
> > > >>>>>>your bug report. The running version is 1.5 when I use the command
> > > >>>>>>
> > > >>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>>>sent me. But when I tried the script, it doesn't change much. My
> > > >>>>>>remoteblast code (portion) is here:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>sub search {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>local
> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > >>>>>>local
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > >
> > >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > >>>
> > > >>>
> > > >>>>>>'no';
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > >>>>>>			      -id=>"query",
> > > >>>>>>			      -desc=>"new seq");
> > > >>>>>>my $len=$query->length();
> > > >>>>>>@db=('nr','htgs','wgs');
> > > >>>>>>foreach my $db (@db) {
> > > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> > =>'blastn',
> > > >>>>>>						'-data' =>"$db",
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>'-expect'=>"$E_value");
> > > >>
> > > >>
> > > >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>my @rids = $factory->each_rid();
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>    print STDERR "$rid\n";
> > > >>>>>>}
> > > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > >>>>>>print STDERR "waiting...";
> > > >>>>>>sleep 60;
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>    my $rc = $factory->retrieve_blast($rid);
> > > >>>>>>    while (!ref($rc) ) {
> > > >>>>>>	if( $rc < 0 ) {
> > > >>>>>># retrieve_blast returns -1 on error
> > > >>>>>>	    $factory->remove_rid($rid);
> > > >>>>>>	    print "Error!\n";
> > > >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > >>>>>>	    die "Can't retrieve $rid";
> > > >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > > >>>>>>
> > > >>>>>>
> > > >>>finished'
> > > >>>
> > > >>>
> > > >>>>>>	    sleep 60;
> > > >>>>>>	    $rc = $factory->retrieve_blast($rid);
> > > >>>>>>	}
> > > >>>>>>    }
> > > >>>>>>    if (ref($rc)) {
> > > >>>>>>	print STDERR "Done.\n";
> > > >>>>>>	 while( my $result = $rc->next_result) {
> > > >>>>>>	    while( my $hit = $result->next_hit()) {
> > > >>>>>>	    	$hit_name=$hit->name;
> > > >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > >>>>>>		$name=$1;
> > > >>>>>>		@left_plus_start=();
> > > >>>>>>		@left_plus_end=();
> > > >>>>>>		@left_minus_start=();
> > > >>>>>>		@left_minus_end=();
> > > >>>>>>		@right_plus_start=();
> > > >>>>>>		@right_plus_end=();
> > > >>>>>>		@right_minus_start=();
> > > >>>>>>		@right_minus_end=();
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>		while( my $hsp = $hit->next_hsp()) {
> > > >>>>>>......
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>It was working quite well before around October laster year, but
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>it has
> > > >>>>
> > > >>>>
> > > >>>>>>stopped since then, When a submission is sent via a webpage, the
> > cgi
> > > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > >>>>>>
> > > >>>>>>
> > > >>>>finally
> > > >>>>
> > > >>>>
> > > >>>>>>the expected email is received but without real results although
> > it
> > > >>>>>>
> > > >>>>>>
> > > >>>>does
> > > >>>>
> > > >>>>
> > > >>>>>>contain something from other parts of the script. Apparently the
> > > >>>>>>
> > > >>>>>>
> > > >>>>search
> > > >>>>
> > > >>>>
> > > >>>>>>sub did not return anything (I know there is something should be
> > > >>>>>>returned.). Is it also possible the format of the NCBI output for
> > > >>>>>>
> > > >>>>>>
> > > >>>each
> > > >>>
> > > >>>
> > > >>>>>>result has changed?
> > > >>>>>>Thank you,
> > > >>>>>>Guojun
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>University of Georgia
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>>----- Original Message -----
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>checking
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>two
> > > >>>>
> > > >>>>
> > > >>>>>>>separate directories) or are you looking in modules?  Here's the
> > > >>>>>>>
> > > >>>>>>>
> > > >>>way
> > > >>>
> > > >>>
> > > >>>>to
> > > >>>>
> > > >>>>
> > > >>>>>>>check the version (from the FAQ):
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>perl -MBio::Root::Version -e 'print
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>$Bio::Root::Version::VERSION,"\n"'
> > > >>>>
> > > >>>>
> > > >>>>>>>>If you have two full bioperl distributions on your computer,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>normally
> > > >>>>
> > > >>>>
> > > >>>>>>only
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>one will be in use unless you have explicitly set the environment
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>variable
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first
> > before
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>your
> > > >>>>
> > > >>>>
> > > >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>mixing
> > > >>>>
> > > >>>>
> > > >>>>>>>then, but only if perl can't find a particular module in the path
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>designated
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>in PERL5LIB; then it will progress through the directories listed
> > > >>>>>>>
> > > >>>>>>>
> > > >>>in
> > > >>>
> > > >>>
> > > >>>>>>@INC.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>This may happen if a module is unique to a particular release,
> > but
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>shouldn't
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> > > >>>>>>>
> > > >>>>>>>
> > > >>>can
> > > >>>
> > > >>>
> > > >>>>>>check
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>differ
> > > >>>>
> > > >>>>
> > > >>>>>>>depending on your OS, perl build, etc.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>Regardless, if you follow the directions for installing bioperl
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>for
> > > >>>>
> > > >>>>
> > > >>>>>>your
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>unless
> > > >>>>
> > > >>>>
> > > >>>>>>you
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>explicitly change the installation directory when using 'perl
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>Makefile.PL'),
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>install
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>See
> > > >>>>
> > > >>>>
> > > >>>>>>this
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>page:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > >>>>>>>>for more details.
> > > >>>>>>>>Christopher Fields
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>Dept. of Biochemistry
> > > >>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> > > >>>>>>>>To: bioperl-l at lists.open-bio.org
> > > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Hi, Chris,
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>I do have different versions of bioperl on my Linux machine
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>(1.4.
> > > >>>
> > > >>>
> > > >>>>and
> > > >>>>
> > > >>>>
> > > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>1.5.1
> > > >>>>
> > > >>>>
> > > >>>>>>or I
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>need to uninstall and remove the previous versions. I could not
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>find
> > > >>>>
> > > >>>>
> > > >>>>>>any
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>some
> > > >>>>
> > > >>>>
> > > >>>>>>>>suggestion?
> > > >>>>>>>>Thanks,
> > > >>>>>>>>Guojun
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>University of Georgia
> > > >>>>>>>>      _____
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>updated from CVS
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>which isn't the latest fix.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Make sure that you check the following:
> > > >>>>>>>>>>1) Always post to the mailing list:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>(CVS)
> > > >>>>
> > > >>>>
> > > >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>only
> > > >>>>
> > > >>>>
> > > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>can't
> > > >>>
> > > >>>
> > > >>>>>>>>guarantee that mixing modules from old and new distributions
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>(1.4
> > > >>>
> > > >>>
> > > >>>>and
> > > >>>>
> > > >>>>
> > > >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>saved
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>(v2.2.13)
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>but it should still save it. I believe as long as next_results()
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>isn't
> > > >>>>
> > > >>>>
> > > >>>>>>>>called, it will work.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>2.2.13
> > > >>>
> > > >>>
> > > >>>>>>text output
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>Roger
> > > >>>
> > > >>>
> > > >>>>Hall
> > > >>>>
> > > >>>>
> > > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>(Jason
> > > >>>>
> > > >>>>
> > > >>>>>>or
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>Bugzilla:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>option
> > > >>>>
> > > >>>>
> > > >>>>>>of
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>this
> > > >>>>
> > > >>>>
> > > >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>>>means that the final version will change to refle the new
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>version.
> > > >>>
> > > >>>
> > > >>>>>>>>>>>>Christopher Fields
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>    _____
> > > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> > > >>>>>>>>To: Chris Fields
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>Hi, Chris
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>for
> > > >>>>
> > > >>>>
> > > >>>>>>my cgi
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>even
> > > >>>>
> > > >>>>
> > > >>>>>>get
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>any RID. Is there any suggestion?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>>>Guojun
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>Guojun Yang
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Department of Plant Biology
> > > >>>>>>>>University of Georgia
> > > >>>>>>>>Tel: 706-542-1857
> > > >>>>>>>>Fax: 706-542-1805
> > > >>>>>>>>http://www.arches.uga.edu/~guojun
> > > >>>>>>>>    _____
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>I would say give the new code a try, but realize that it
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>hasn't
> > > >>>>
> > > >>>>
> > > >>>>>>been
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>checked
> > > >>>>>>>>in (like I said below). I will try going over the modified
> > > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>anything I
> > > >>>>
> > > >>>>
> > > >>>>>>>>might
> > > >>>>>>>>have missed. The changed order in the header of BLAST text
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>output
> > > >>>
> > > >>>
> > > >>>>has
> > > >>>>
> > > >>>>
> > > >>>>>>me a
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>bit worried that it might not catch everything, but it at least
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>doesn't
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>hang
> > > >>>>>>>>in the while() loop I described in the bug report below (bug
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>#1934)
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>seems to process everything fine.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>If you want more stability in the code, you might consider
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>changing over
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>to
> > > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>some
> > > >>>>
> > > >>>>
> > > >>>>>>>>changes
> > > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>saving
> > > >>>>
> > > >>>>
> > > >>>>>>XML
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>output, but I believe it parses everything regardless. If you
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>look
> > > >>>
> > > >>>
> > > >>>>>>back
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>last month or so there has been a bit of discussion here about
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>it.
> > > >>>
> > > >>>
> > > >>>>>>Jason
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>remoteblast/
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>Christopher Fields
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> > > >>>>>>>>>To: bioperl-l at bioperl.org
> > > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>version
> > > >>>>
> > > >>>>
> > > >>>>>>1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>Hi, Everybody,
> > > >>>>>>>>>I see this post and am wondering if this is the reason for the
> > > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>MAK,
> > > >>>>
> > > >>>>
> > > >>>>>>for
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>MITE
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>sequence analysis. It was working very well until around
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>November
> > > >>>>
> > > >>>>
> > > >>>>>>2005,
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>when it stopped returning any result (the site is fine and
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>seems
> > > >>>
> > > >>>
> > > >>>>to
> > > >>>>
> > > >>>>
> > > >>>>>>be
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>doing sth after submission). In the CGI script, I used
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>remoteblast
> > > >>>>
> > > >>>>
> > > >>>>>>(that
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>access to
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>the server because I moved. Quite several people sent emails
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>to
> > > >>>
> > > >>>
> > > >>>>us
> > > >>>>
> > > >>>>
> > > >>>>>>about
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>problem?
> > > >>>>
> > > >>>>
> > > >>>>>>>>Should
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>version?
> > > >>>>
> > > >>>>
> > > >>>>>>>>>Thanks a lot,
> > > >>>>>>>>>Guojun
> > > >>>>>>>>>
> > > >>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>University of Georgia
> > > >>>>>>>>>Tel: 706-542-1857
> > > >>>>>>>>>Fax: 706-542-1805
> > > >>>>>>>>>http://www.arches.uga.edu/~guojun
> > > >>>>>>>>>_____
> > > >>>>>>>>>
> > > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>Jian'
> > > >>>>
> > > >>>>
> > > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>[mailto:bioperl-
> > > >>>
> > > >>>
> > > >>>>>>>>>l at bioperl.org]
> > > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >>>>>>>>>
> > > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>CVS.
> > > >>>>
> > > >>>>
> > > >>>>>>It
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>will
> > > >>>>>>>>>work for saving text output. However, it will not parse
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>anything
> > > >>>
> > > >>>
> > > >>>>>>using
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>next_result (it will likely hang) and will not save XML
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>format.
> > > >>>
> > > >>>
> > > >>>>See
> > > >>>>
> > > >>>>
> > > >>>>>>>>these
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>bugs:
> > > >>>>>>>>>
> > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > >>>>>>>>>
> > > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>and
> > > >>>
> > > >>>
> > > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>yet
> > > >>>>
> > > >>>>
> > > >>>>>>so
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>are
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>still not included in bioperl-live; they may be further
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>modified
> > > >>>
> > > >>>
> > > >>>>>>before
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>just
> > > >>>>
> > > >>>>
> > > >>>>>>try
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>first fix, which is a change to SearchIO::blast.
> > > >>>>>>>>>
> > > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>script
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>which
> > > >>>>>>>>>had problems; the script you used saves the output but doesn't
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>actually
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>data).
> > > >>>>
> > > >>>>
> > > >>>>>>Is
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>>>tried
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>readmethod
> > > >>>>
> > > >>>>
> > > >>>>>>=>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>blast"
> > > >>>>>>>>>using your version of RemoteBlast and method next_result()?
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>Like
> > > >>>
> > > >>>
> > > >>>>>>below
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>(from
> > > >>>>>>>>>perldoc):
> > > >>>>>>>>>
> > > >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> > > >>>>>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> > > >>>>>>>>>if( !ref($rc) ) {
> > > >>>>>>>>>if( $rc < 0 ) {
> > > >>>>>>>>>$factory->remove_rid($rid);
> > > >>>>>>>>>}
> > > >>>>>>>>>print STDERR "." if ( $v > 0 );
> > > >>>>>>>>>sleep 5;
> > > >>>>>>>>>} else { # parsing
> > > >>>>>>>>>starts here
> > > >>>>>>>>>my $result = $rc->next_result(); # it should hang
> > > >>>>>>>>>here
> > > >>>>>>>>>#save the output
> > > >>>>>>>>>my $filename = $result->query_name()."\.out";
> > > >>>>>>>>>$factory->save_output($filename);
> > > >>>>>>>>>$factory->remove_rid($rid);
> > > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>>>>>>while ( my $hit = $result->next_hit ) {
> > > >>>>>>>>>next unless ( $v > 0);
> > > >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> > > >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> > > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>My script hanged if I used next_result() in any way prior to
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>the
> > > >>>
> > > >>>
> > > >>>>>>fixes.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>I
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>want to see how many others are having the same issues with
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>parsing
> > > >>>>
> > > >>>>
> > > >>>>>>>>using
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>the CVS version of bioperl-live.
> > > >>>>>>>>>
> > > >>>>>>>>>Christopher Fields
> > > >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>l-
> > > >>>
> > > >>>
> > > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> > > >>>>>>>>>>To: Huang Jian; bioperl-l
> > > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >>>>>>>>>>
> > > >>>>>>>>>>Hi Huang,
> > > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>works
> > > >>>>
> > > >>>>
> > > >>>>>>on
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>logic of checking the temporary file size to determine
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>whether
> > > >>>
> > > >>>
> > > >>>>the
> > > >>>>
> > > >>>>
> > > >>>>>>>>Blast
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>results are ready. This condition is not getting satisfied
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>may
> > > >>>
> > > >>>
> > > >>>>be
> > > >>>>
> > > >>>>
> > > >>>>>>due
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>to
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>some changes brought about by NCBI. I had this problem
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>recently
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>figured out that the solution was to use the latest version
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>has
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>this problem fixed (does not use file size logic any more)
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>is
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>not
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>yet included in the BioPerl package.
> > > >>>>>>>>>>Cheers
> > > >>>>>>>>>>Nagesh
> > > >>>>>>>>>>
> > > >>>>>>>>>>Huang Jian wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>>Dear Nagesh,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>you
> > > >>>>
> > > >>>>
> > > >>>>>>send
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>me. Now it works perfectly!!!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Thank you!!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Huang
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> > > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>net,
> > > >>>
> > > >>>
> > > >>>>so
> > > >>>>
> > > >>>>
> > > >>>>>>still
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>via email
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>>Hi Huang,
> > > >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>blast
> > > >>>
> > > >>>
> > > >>>>>>search.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>Can
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>(2005/12/09).
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>If
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>not I have attached it with this email, try to replace it
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>with
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>old
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>>>one which has a bug.
> > > >>>>>>>>>>>>Let me know if it works.
> > > >>>>>>>>>>>>Nagesh
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>_______________________________________________
> > > >>>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>_______________________________________________
> > > >>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>_______________________________________________
> > > >>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>_______________________________________________
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>
> > > >>>>>>>>_______________________________________________
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>Bioperl-l mailing list
> > > >>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >
> > > >_______________________________________________
> > > >Bioperl-l mailing list
> > > >Bioperl-l at lists.open-bio.org
> > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > >
> > >
> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 



From cjm at fruitfly.org  Mon Feb 20 20:48:57 2006
From: cjm at fruitfly.org (chris mungall)
Date: Mon, 20 Feb 2006 17:48:57 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
	<3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
Message-ID: <930b0083193357df7d43cc7a3111c938@fruitfly.org>


I like the idea of using an ontology to describe the ontology.

Note that the proposed structure:
OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI

will lead to cycles in the object graph when the metadata ontology 
describes itself.

actually, I think the ontology module already has object reference 
cycles. TermI->OntologyI->TermI

When I brought this up originally people didn't seem to care much - so 
long as you're only parsing GO then it's not a big issue, people have 
enough memory they won't notice a big chunk of memory that refuses to 
be garbage collected way after it's used. Of course, if you want to use 
bioperl to cycle though all of OBO + SnoMed + UMLS then it's a 
different story.

I think it's best of Sohel concentrates on getting obo.pm working, then 
we can start thinking as a group about the best way to capture ontology 
metadata. This includes metadata on the whole ontology, and metadata on 
the terms (eg synonyms).

To what extent are the current modules already in use? I think the 
object cycle is a serious flaw, will it be possible to fix this without 
a major overhaul?


On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:

> Sohel, please do keep the discussion on the list, in your own interest
> as there's a multitude of people who can respond to you.
>
> SimpleValue would probably be what I'd use too. As Heikki hinted you
> might even create an ontology for annotating ontologies, which would
> allow you to use Annotation::OntologyTerm for annotation, but then
> there's no qualifier value ...
>
> Bioperl 1.5.1 has been released last year, please check the website.
>
> 	-hilmar
>
> On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
>
>> Hi Hilmar,
>>   I really like your suggestion of implementing the Bio::AnnotatableI
>> interface in the Bio::Ontology::Ontology class. I am going to 
>> implement
>> this and play around a little with it. I am planning to use
>> Bio::Annotation::SimpleValue for annotating the header as it provides 
>> a
>> good way of specifying the Tag/value pair. What are your thoughts on
>> using this?
>>
>>   Also, I was wondering if you have any idea about the scheduled date
>> for the Bioperl 1.51 release. I would like to contribute some stuff in
>> the next release.
>>
>> Thanks,
>> Sohel.
>>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Friday, February 10, 2006 3:40 PM
>> To: Sohel Merchant
>> Cc: Bioperl
>> Subject: Re: Bio::Ontology::Ontology
>>
>> Sohel,
>>
>> please allow me to copy the list in my response. There's many good and
>> insightful people on the list who may have something to add or
>> different ideas.
>>
>> I've come across that problem myself, for instance with InterPro. What
>> I've done so far simply is to stick it unstructured into the 
>> definition
>> slot, which is not helpful if your purpose goes further than just
>> displaying it in an unstructured fashion.
>>
>> I'm not sure you would want to create another class for this (like
>> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
>> implementation, probably not the interface) annotatable (i.e.,
>> implement Bio::Annotatable), which supposedly would be simple to do
>> (AnnotationCollection is already implemented, you'd just return an
>> instance of it).
>>
>> Even though tag/value pairs sound like quick&fast way to go I'm 
>> leaning
>> against it; in essence we're moving away from that elsewhere
>> (SeqFeatureI) and hence I don't think we should restart it here.
>>
>> I'm not giving a definitive answer here, just my (initial) thoughts.
>> Hope that helps nonetheless. Can you fancy yourself trying the
>> Annotatable approach and let us know how it goes?
>>
>> 	-hilmar
>>
>>
>> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
>>
>>> Hi Hilmar,
>>> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
>>> Northwestern University. I am working on a parser for an ontology
>>> file. I really like the ontology object model which you have
>>> contributed to Bioperl. I think its just Awesome!! One of things 
>>> which
>>
>>> I thought would be great to capture is the ontology headers. Right 
>>> now
>>
>>> one can specify only the name, authority information. I was wondering
>>> if there is any way, I could also capture other ontology file headers
>>> like version of the file, date when that ontology file was made. I 
>>> was
>>
>>> thinking of making a header class or alternatively it could go as 
>>> Hash
>>
>>> of values in the Bio::Ontology::Ontology class itself. I wanted to
>>> know whets your thoughts about on this.
>>> ?
>>> Thanks,
>>> Sohel Merchant
>>> dictyBase
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Mon Feb 20 23:42:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 20 Feb 2006 23:42:18 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <43FA0FB7.6060904@lsi.upc.edu>
Message-ID: 

Gabriel,

You had a couple of little errors in your script but once fixed it worked
fine:

#!/usr/bin/perl -w


use strict;

use lib "/Users/bosborne/bioperl-live";

use Bio::DB::Taxonomy;



my $nodesfile = "nodes.dmp";

my $namefile = "names.dmp";

my $db = new Bio::DB::Taxonomy(-source => 'flatfile',

-nodesfile => $nodesfile,

-namesfile => $namefile);


my $taxonid = $db->get_taxonid('Homo sapiens');


# Here, $taxonid is 9606. However,


my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);


print $species->common_name;


This is using bioperl-live on Mac OSX, Perl 5.8. Are you on Windows? If so
then do "-directory => C:/temp", see what happens.

Brian O.

On 2/20/06 1:51 PM, "Gabriel Valiente"  wrote:

> use Bio::DB::Taxonomy;
> my $nodesfile = "nodes.dmp";
> my $namesfile = "names.dmp";
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>                                -nodesfile => $nodesfile,
>                                -namesfile => $namefile);
> my $taxonid = $db->get_taxonid('Homo sapiens');
> 
> Here, $taxonid is 9606. However,
> 
> my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);




From valiente at lsi.upc.edu  Tue Feb 21 07:19:04 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 13:19:04 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <1125313334valiente@lsi.upc.es>

Thanks. There's still a problem with Bio::DB::Taxonomy:

use strict;
use Bio::DB::Taxonomy;

my $nodesfile = "nodes.dmp";
my $namesfile = "names.dmp";
my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
                              -nodesfile => $nodesfile,
                              -namesfile => $namesfile);

my $taxonid = $db->get_taxonid('Homo sapiens');
my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 

So far so good. Now, access to the parent node via

my $parent = $node->get_Parent_Node;

is alright, but access to the children nodes via

my @childrenids = $db->get_Children_Taxids($taxonid);

raises:

------------- EXCEPTION  -------------
MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not
implemented by package Bio::DB::Taxonomy::entrez.
This is not your fault - author of Bio::DB::Taxonomy::entrez should be
blamed!

STACK Bio::Root::RootI::throw_not_implemented
/home/valiente/bioperl-live/Bio/Root/RootI.pm:523
STACK Bio::DB::Taxonomy::get_Children_Taxids
/home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162
STACK toplevel fetch.pl:17

Perhaps there could be a $node->get_Children_Nodes() method in
Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez.
You, know, efficient access to the children of a node is a quite
important method for almost any interesting use of the NCBI Taxonomy.

Gabriel




From dhoworth at mrc-lmb.cam.ac.uk  Tue Feb 21 05:47:41 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Tue, 21 Feb 2006 10:47:41 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
Message-ID: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>

I'm drawing a simple graphic and seeing something I didn't expect. I'm 
not sure whether I've misunderstood the docs or found a bug. If I run a 
program containing:

     my $name   = 'O68601';
     my $length = 44;
     my $panel  = Bio::Graphics::Panel->new(
                 -length    => $length,
                 -width     => 800,
                 -pad_left  => 10,
                 -pad_right => 10,
                 -key_style => 'between',
                 );

     my $feature = new Bio::SeqFeature::Generic(
                 -start  => 1,
                 -end    => $length,
                 -display_name => $name . " ($length)",
                 );

     $panel->add_track($feature,
                 -glyph   => 'arrow',
                 -tick    =>  1,
                 -fgcolor => 'black',
                 -double  => 1,
                 -label   => 1,
                 );

Then I see a tick strip labelled at its left end with '1' and at its 
right end with '45'. I expected to see '44'. Should I be looking for a 
bug in Bio::Graphics or fixing my program?

Thanks, Dave


From gbazykin at Princeton.EDU  Tue Feb 21 09:37:32 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Tue, 21 Feb 2006 09:37:32 -0500
Subject: [Bioperl-l] planning sequence mutating modules
Message-ID: <922343764.20060221093732@princeton.edu>

Heikki:

Let me explain what I need more clearly, and perhaps you guys can tell
me how this can be done best in Bioperl.

I?d like to marry the trees and the sequences, so that I could get a
sequence corresponding to each of the nodes (including internal nodes)
on the tree. The sequences of the nodes can be either generated by
some evolution process, or loaded; PAUP, for example, can reconstruct
the sequences of the internal nodes. I am dealing with coding
sequence, and for my purposes, I need to look at individual codons
rather than nucleotides. Then I answer questions such as this:

- for this codon (position), when (before which nodes of the tree) did
all (synonymous or non-synonymous) mutations occur?

- for this node and for this codon, when (before which node) did the
preceding (synonymous or non-synonymous) mutation occur? Preceding
means that it occurred in the line of direct ancestors, i.e. between
some two sequences on the path from this node to the root.

- infer position-specific ?substitution matrix? from the tree, i.e. in
this position, what fraction of nucleotides A that were present at the
beginning of each brunch, turned into nucleotide ?C? by the end of the
branch, possibly weighting with branch lengths.

Further, I need to do simulate sequence evolution along the tree,
e.g., like this:

- mutate specified codon along the tree, perhaps with given
substitution matrix (and, possibly, with given
non-synonymous/synonymous substitutions rate). In the process, the
codons for all nodes will be generated.

I need to do all this for large trees (with hundreds of leaves) and
long sequences. So far, I have been using a huge hash to store all my
sequences for each of the nodes:

my $node = (some tree::node object)
my $posit = 0; 
$codons{$posit}->{$node} =  ?AAA?;

etc. But there should be a better way to do it? How can I integrate
all this into Bioperl? (I am new to object-oriented programming).

I?ll be thankful for any feedback.

Yegor



------------------------------
Tuesday, February 14, 2006, 11:09:27 AM, you wrote:

> Yegor,

> Like you said, there are examples how it is done.. It should be possible to
> evolve sequences based on a rooted tree. You just walk the tree and evolve
> each sequence from its parent.  If there is  an agreement how the branch
> lengths get translated to  mutations, even that could be done. Do you have
> any suggestions?

>         -Heikki



> On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
>> Hi,
>>
>> Just a thought: I really think that in perspective, it would be nice
>> to be able to evolve the sequence along a tree of given shape. I think
>> PAML's "evolver" has this functionality. I've already been doing this
>> in my scripts, but I am not sure how to couple the tree and the
>> sequence data properly.
>>
>> Yegor (George) Bazykin
>>
>>
>> ------------------------------
>>
>> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
>> > I've committed an interim solution to the sequence evolution problem:
>> >
>> >     $newseq = Bio::SeqUtils-> evolve
>> >         ($seq, $similarity, $transition_transversion_rate);
>> >
>> > I will go on to transform this code to fully OO, extensible solution.
>> >
>> >    -Heikki
>> >
>> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> >> Ryan Golhar's mail got me thinking that we should have a simple
>> >> framework for mutating sequences to a desired level. The model can then
>> >> be extended to necessary complexity when needed by subclassing.
>> >>
>> >> To start with, I have been planning:
>> >>
>> >>
>> >> Bio::SeqEvolution::EvolutionI - interface file
>> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>> >>         (defaults to Bio::PrimarySeq)
>> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>> >>        - returns an array of $count seqs
>> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>> >>       converteed to probabilites of change internally
>> >>
>> >>   various methods to define the extent of divergence:
>> >>   only one to start with:
>> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>> >>    (= 100% - identity)
>> >>
>> >> Bio::SeqEvolution::Factory - core class to call,
>> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>> >>
>> >>
>> >> Bio::SeqEvolution::DNASimple - default for nucleotides
>> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>> >>         e.g. 5 => 5:1, defaults to 1:1
>> >>         simple alternative to a scoring matrix
>> >>
>> >>
>> >> I am soliciting usual comments and suggestions about naming and minimal
>> >> functionality.
>> >>
>> >>
>> >>    -Heikki
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Tue Feb 21 09:46:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 09:46:56 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1125313334valiente@lsi.upc.es>
Message-ID: 

Gabriel,

I don't think so, this works:

#!/usr/bin/perl -w



use strict;

use lib "/Users/bosborne/bioperl-live";


use Bio::DB::Taxonomy;


my $nodesfile = "nodes.dmp";

my $namefile = "names.dmp";

my $db = new Bio::DB::Taxonomy(-source => 'flatfile',

-nodesfile => $nodesfile,

-namesfile => $namefile);


my $taxonid = $db->get_taxonid('Homo sapiens');


my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 


# Here, $taxonid is 9606. However,


my $parent = $node->get_Parent_Node;


# is alright, but access to the children nodes via


my @childrenids = $db->get_Children_Taxids($taxonid);


print "@childrenids";


What Bioperl version are you using?

Brian O.


On 2/21/06 7:19 AM, "Gabriel Valiente"  wrote:

> my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 




From gbazykin at Princeton.EDU  Mon Feb 20 18:21:03 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Mon, 20 Feb 2006 18:21:03 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
	<214316262.20060214093454@princeton.edu>
	<200602141809.28057.heikki@sanbi.ac.za>
Message-ID: <158747055.20060220182103@princeton.edu>

Heikki:

Let me explain what I need more clearly, and perhaps you guys can tell
me how this can be done best in Bioperl.

I?d like to marry the trees and the sequences, so that I could get a
sequence corresponding to each of the nodes (including internal nodes)
on the tree. The sequences of the nodes can be either generated by
some evolution process, or loaded; PAUP, for example, can reconstruct
the sequences of the internal nodes. I am dealing with coding
sequence, and for my purposes, I need to look at individual codons
rather than nucleotides. Then I answer questions such as this:

- for this codon (position), when (before which nodes of the tree) did
all (synonymous or non-synonymous) mutations occur?

- for this node and for this codon, when (before which node) did the
preceding (synonymous or non-synonymous) mutation occur? Preceding
means that it occurred in the line of direct ancestors, i.e. between
some two sequences on the path from this node to the root.

- infer position-specific ?substitution matrix? from the tree, i.e. in
this position, what fraction of nucleotides A that were present at the
beginning of each brunch, turned into nucleotide ?C? by the end of the
branch, possibly weighting with branch lengths.

Further, I need to do simulate sequence evolution along the tree,
e.g., like this:

- mutate specified codon along the tree, perhaps with given
substitution matrix (and, possibly, with given
non-synonymous/synonymous substitutions rate). In the process, the
codons for all nodes will be generated.

I need to do all this for large trees (with hundreds of leaves) and
long sequences. So far, I have been using a huge hash to store all my
sequences for each of the nodes:

my $node = (some tree::node object)
my $posit = 0; 
$codons{$posit}->{$node} =  ?AAA?;

etc. But there should be a better way to do it? How can I integrate
all this into Bioperl? (I am new to object-oriented programming).

I?ll be thankful for any feedback.

Yegor



------------------------------
Tuesday, February 14, 2006, 11:09:27 AM, you wrote:

> Yegor,

> Like you said, there are examples how it is done.. It should be possible to
> evolve sequences based on a rooted tree. You just walk the tree and evolve
> each sequence from its parent.  If there is  an agreement how the branch
> lengths get translated to  mutations, even that could be done. Do you have
> any suggestions?

>         -Heikki



> On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
>> Hi,
>>
>> Just a thought: I really think that in perspective, it would be nice
>> to be able to evolve the sequence along a tree of given shape. I think
>> PAML's "evolver" has this functionality. I've already been doing this
>> in my scripts, but I am not sure how to couple the tree and the
>> sequence data properly.
>>
>> Yegor (George) Bazykin
>>
>>
>> ------------------------------
>>
>> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
>> > I've committed an interim solution to the sequence evolution problem:
>> >
>> >     $newseq = Bio::SeqUtils-> evolve
>> >         ($seq, $similarity, $transition_transversion_rate);
>> >
>> > I will go on to transform this code to fully OO, extensible solution.
>> >
>> >    -Heikki
>> >
>> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> >> Ryan Golhar's mail got me thinking that we should have a simple
>> >> framework for mutating sequences to a desired level. The model can then
>> >> be extended to necessary complexity when needed by subclassing.
>> >>
>> >> To start with, I have been planning:
>> >>
>> >>
>> >> Bio::SeqEvolution::EvolutionI - interface file
>> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>> >>         (defaults to Bio::PrimarySeq)
>> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>> >>        - returns an array of $count seqs
>> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>> >>       converteed to probabilites of change internally
>> >>
>> >>   various methods to define the extent of divergence:
>> >>   only one to start with:
>> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>> >>    (= 100% - identity)
>> >>
>> >> Bio::SeqEvolution::Factory - core class to call,
>> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>> >>
>> >>
>> >> Bio::SeqEvolution::DNASimple - default for nucleotides
>> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>> >>         e.g. 5 => 5:1, defaults to 1:1
>> >>         simple alternative to a scoring matrix
>> >>
>> >>
>> >> I am soliciting usual comments and suggestions about naming and minimal
>> >> functionality.
>> >>
>> >>
>> >>    -Heikki
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From jason.stajich at duke.edu  Tue Feb 21 09:51:39 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 21 Feb 2006 09:51:39 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1125313334valiente@lsi.upc.es>
References: <1125313334valiente@lsi.upc.es>
Message-ID: <16B69355-A7EC-4FA6-B0F3-A473C705B921@duke.edu>

of course it should, and it does support this.  Children query  
definitely exists for the flatfile implementation I don't understand  
why are you getting entrez errors when you are requesting the  
flatfile handle?
I can't investigate but it definitely worked for me to get  children  
nodes.  Did you actually try running the script that already should  
work - scripts/taxa/local_taxonomdb_query ?

You definitely can't request children nodes via the entrez  
implementation because NCBI doesn't (or didn't when this was written  
I don't know about now) provide children id access so it is pretty  
useful for that - although the eutils support may have expanded I'm  
not sure. If someone has the itch, please scratch it and work on this.

I think you need to pass in $parent instead of $taxonid to  
get_Children_Taxids -- although I guess I wrote the method to accept  
either.

-jason

On Feb 21, 2006, at 7:19 AM, Gabriel Valiente wrote:

> Thanks. There's still a problem with Bio::DB::Taxonomy:
>
> use strict;
> use Bio::DB::Taxonomy;
>
> my $nodesfile = "nodes.dmp";
> my $namesfile = "names.dmp";
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>                               -nodesfile => $nodesfile,
>                               -namesfile => $namesfile);
>
> my $taxonid = $db->get_taxonid('Homo sapiens');
> my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid);
>
> So far so good. Now, access to the parent node via
>
> my $parent = $node->get_Parent_Node;
>
> is alright, but access to the children nodes via
>
> my @childrenids = $db->get_Children_Taxids($taxonid);
>
> raises:
>
> ------------- EXCEPTION  -------------
> MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not
> implemented by package Bio::DB::Taxonomy::entrez.
> This is not your fault - author of Bio::DB::Taxonomy::entrez should be
> blamed!
>
> STACK Bio::Root::RootI::throw_not_implemented
> /home/valiente/bioperl-live/Bio/Root/RootI.pm:523
> STACK Bio::DB::Taxonomy::get_Children_Taxids
> /home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162
> STACK toplevel fetch.pl:17
>
> Perhaps there could be a $node->get_Children_Nodes() method in
> Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez.
> You, know, efficient access to the children of a node is a quite
> important method for almost any interesting use of the NCBI Taxonomy.
>
> Gabriel
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From hlapp at gmx.net  Mon Feb 20 21:52:34 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 20 Feb 2006 18:52:34 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <930b0083193357df7d43cc7a3111c938@fruitfly.org>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
	<3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
	<930b0083193357df7d43cc7a3111c938@fruitfly.org>
Message-ID: 

On 2/20/06, chris mungall  wrote:
>
> I like the idea of using an ontology to describe the ontology.
>
> Note that the proposed structure:
> OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
>
> will lead to cycles in the object graph when the metadata ontology
> describes itself.

Yes I know, that's why I didn't want to be too vocal about it ...

>
> actually, I think the ontology module already has object reference
> cycles. TermI->OntologyI->TermI
>
> When I brought this up originally people didn't seem to care much - so
> long as you're only parsing GO then it's not a big issue, people have
> enough memory they won't notice a big chunk of memory that refuses to
> be garbage collected way after it's used.

There is a method that destroys the cycle: $ontology->close()
(this is also an interface method)

Essentially, the cycle is not in OntologyI itself but in OntologyI
HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
terms which (may) hold a reference to an OntologyI which holds a
reference to the OntologyEngineI.

I say 'may' in parentheses because an implementation may use tricks
like late instantiation, stringified references (handles), and weak
references. It's possible to avoid the cycle altogether using such
tricks but it remains questionable how much this then affects
performance, and how ugly and incomprehensible the code would become.
Since there is the close() method I haven't bothered yet trying a
fully de-cycled implementation.

> Of course, if you want to use
> bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> different story.

Well if you want to keep all three in memory for some kind of
cross-reasoning then yes you are in trouble. But if you do one
ontology after another, you'd just have make sure to call close() on
an ontology once you're done with it.

>
> I think it's best of Sohel concentrates on getting obo.pm working, then
> we can start thinking as a group about the best way to capture ontology
> metadata. This includes metadata on the whole ontology, and metadata on
> the terms (eg synonyms).
>
> To what extent are the current modules already in use?

I don't know about others but I use them often.

> I think the object cycle is a serious flaw, will it be possible to fix this without
> a major overhaul?

If I recall correctly the way go-perl circumvents this is by having
the ontology of a term as a flat attribute. This also means that when
having a term alone, you cannot ask for its connected terms. It's been
a while, so Chris set me straight where this is not true.

It should be possible to come up with an implementation of OntologyI
that for all intents and purposes behaves like a flat scalar giving
the name until you call one of its graph traversal methods. At that
point it would instantiate the engine from persistent storage (file,
or a database connection), or retrieve one from a 'store'. The latter
is I believe what Allen started with the OntologyStore, but again I
would need to check the details.

    -hilmar

>
>
> On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
>
> > Sohel, please do keep the discussion on the list, in your own interest
> > as there's a multitude of people who can respond to you.
> >
> > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > might even create an ontology for annotating ontologies, which would
> > allow you to use Annotation::OntologyTerm for annotation, but then
> > there's no qualifier value ...
> >
> > Bioperl 1.5.1 has been released last year, please check the website.
> >
> >       -hilmar
> >
> > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> >
> >> Hi Hilmar,
> >>   I really like your suggestion of implementing the Bio::AnnotatableI
> >> interface in the Bio::Ontology::Ontology class. I am going to
> >> implement
> >> this and play around a little with it. I am planning to use
> >> Bio::Annotation::SimpleValue for annotating the header as it provides
> >> a
> >> good way of specifying the Tag/value pair. What are your thoughts on
> >> using this?
> >>
> >>   Also, I was wondering if you have any idea about the scheduled date
> >> for the Bioperl 1.51 release. I would like to contribute some stuff in
> >> the next release.
> >>
> >> Thanks,
> >> Sohel.
> >>
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Friday, February 10, 2006 3:40 PM
> >> To: Sohel Merchant
> >> Cc: Bioperl
> >> Subject: Re: Bio::Ontology::Ontology
> >>
> >> Sohel,
> >>
> >> please allow me to copy the list in my response. There's many good and
> >> insightful people on the list who may have something to add or
> >> different ideas.
> >>
> >> I've come across that problem myself, for instance with InterPro. What
> >> I've done so far simply is to stick it unstructured into the
> >> definition
> >> slot, which is not helpful if your purpose goes further than just
> >> displaying it in an unstructured fashion.
> >>
> >> I'm not sure you would want to create another class for this (like
> >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> >> implementation, probably not the interface) annotatable (i.e.,
> >> implement Bio::Annotatable), which supposedly would be simple to do
> >> (AnnotationCollection is already implemented, you'd just return an
> >> instance of it).
> >>
> >> Even though tag/value pairs sound like quick&fast way to go I'm
> >> leaning
> >> against it; in essence we're moving away from that elsewhere
> >> (SeqFeatureI) and hence I don't think we should restart it here.
> >>
> >> I'm not giving a definitive answer here, just my (initial) thoughts.
> >> Hope that helps nonetheless. Can you fancy yourself trying the
> >> Annotatable approach and let us know how it goes?
> >>
> >>      -hilmar
> >>
> >>
> >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> >>
> >>> Hi Hilmar,
> >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> >>> Northwestern University. I am working on a parser for an ontology
> >>> file. I really like the ontology object model which you have
> >>> contributed to Bioperl. I think its just Awesome!! One of things
> >>> which
> >>
> >>> I thought would be great to capture is the ontology headers. Right
> >>> now
> >>
> >>> one can specify only the name, authority information. I was wondering
> >>> if there is any way, I could also capture other ontology file headers
> >>> like version of the file, date when that ontology file was made. I
> >>> was
> >>
> >>> thinking of making a header class or alternatively it could go as
> >>> Hash
> >>
> >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> >>> know whets your thoughts about on this.
> >>>
> >>> Thanks,
> >>> Sohel Merchant
> >>> dictyBase
> >>>
> >> --
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >>
> >>
> >>
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From valiente at lsi.upc.edu  Tue Feb 21 11:10:05 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 17:10:05 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <1783551242valiente@lsi.upc.es>

It works now, with the #!/usr/bin/perl -w switch. Sorry about that.

I'd like to contribute a couple of additional methods to
Bio::DB::Taxonomy. The first one returns a reference to an array with
the full lineage of a given node.

sub lineage {
  my $node = shift;
  my @PATH;
  while ($node->node_name ne "root") {
    $node = $node->get_Parent_Node;
    unshift @PATH, $node;
  }
  return \@PATH;
}

The second one uses the lineage method to return the most recent common
ancestor of two given nodes.

sub LCA {
  my $node1 = shift;
  my $node2 = shift;
  my @PATH1 = @{lineage($node1)};
  my @PATH2 = @{lineage($node2)};
  my $root1 = shift @PATH1;
  my $root2 = shift @PATH2;
  while ($root1->node_name eq $root2->node_name) {
    $root1 = shift @PATH1;
    $root2 = shift @PATH2;
  }
  return $root1;
}

Jason, shall I include them myself in Bio::DB::Taxonomy or can you take
care of this? I think, the right place for these methods might be
Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy.

Thanks,

Gabriel




From lstein at cshl.edu  Tue Feb 21 10:55:30 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 21 Feb 2006 10:55:30 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
Message-ID: <200602211055.31221.lstein@cshl.edu>

Hi,

When you are looking at the resolution of individual bases, a base pair at 
position one occupies the half-open interval from 1->2, meaning that it comes 
up to, but doesn't quite touch, the 2. For the purposes of display, 
Bio::Graphics draws the end of the half-open interval.

Lincoln

On Tuesday 21 February 2006 05:47, Dave Howorth wrote:
> I'm drawing a simple graphic and seeing something I didn't expect. I'm
> not sure whether I've misunderstood the docs or found a bug. If I run a
> program containing:
>
>      my $name   = 'O68601';
>      my $length = 44;
>      my $panel  = Bio::Graphics::Panel->new(
>                  -length    => $length,
>                  -width     => 800,
>                  -pad_left  => 10,
>                  -pad_right => 10,
>                  -key_style => 'between',
>                  );
>
>      my $feature = new Bio::SeqFeature::Generic(
>                  -start  => 1,
>                  -end    => $length,
>                  -display_name => $name . " ($length)",
>                  );
>
>      $panel->add_track($feature,
>                  -glyph   => 'arrow',
>                  -tick    =>  1,
>                  -fgcolor => 'black',
>                  -double  => 1,
>                  -label   => 1,
>                  );
>
> Then I see a tick strip labelled at its left end with '1' and at its
> right end with '45'. I expected to see '44'. Should I be looking for a
> bug in Bio::Graphics or fixing my program?
>
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason.stajich at duke.edu  Tue Feb 21 11:28:22 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 21 Feb 2006 11:28:22 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1783551242valiente@lsi.upc.es>
References: <1783551242valiente@lsi.upc.es>
Message-ID: <1C38DDCF-9312-42D3-923F-C0DD4CE7E9AA@duke.edu>

you'll have to do it - I don't have time, I thought there was  
something like this already, but I guess not, so please put it in.  I  
must do this when we initialize the classification array when  
building a node,


On Feb 21, 2006, at 11:10 AM, Gabriel Valiente wrote:

> It works now, with the #!/usr/bin/perl -w switch. Sorry about that.
>
> I'd like to contribute a couple of additional methods to
> Bio::DB::Taxonomy. The first one returns a reference to an array with
> the full lineage of a given node.
>
> sub lineage {
>   my $node = shift;
>   my @PATH;
>   while ($node->node_name ne "root") {
>     $node = $node->get_Parent_Node;
>     unshift @PATH, $node;
>   }
>   return \@PATH;
> }
>
> The second one uses the lineage method to return the most recent  
> common
> ancestor of two given nodes.
>
> sub LCA {
>   my $node1 = shift;
>   my $node2 = shift;
>   my @PATH1 = @{lineage($node1)};
>   my @PATH2 = @{lineage($node2)};
>   my $root1 = shift @PATH1;
>   my $root2 = shift @PATH2;
>   while ($root1->node_name eq $root2->node_name) {
>     $root1 = shift @PATH1;
>     $root2 = shift @PATH2;
>   }
>   return $root1;
> }
>
> Jason, shall I include them myself in Bio::DB::Taxonomy or can you  
> take
> care of this? I think, the right place for these methods might be
> Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy.
>
> Thanks,
>
> Gabriel
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From dhoworth at mrc-lmb.cam.ac.uk  Tue Feb 21 11:50:37 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Tue, 21 Feb 2006 16:50:37 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602211055.31221.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>
Message-ID: <43FB44DD.4090504@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> When you are looking at the resolution of individual bases, a base pair at 
> position one occupies the half-open interval from 1->2, meaning that it comes 
> up to, but doesn't quite touch, the 2. For the purposes of display, 
> Bio::Graphics draws the end of the half-open interval.

I think I understand the description of what it's doing but I don't 
understand why. What is the purpose of labelling the [44,45) interval 
45, when that interval is representing the 44th discrete mer?

I'm working with proteins and domains, so I'm always at the level of 
individual residues and people frequently care about the exact residue 
boundaries, especially when the regions are short. So I need to make 
pictures that match the data.

The displayed track seems more consistent with an interpretation that 
the residues are represented by the discrete integer points along the 
line but I don't know if I'm buying myself trouble later if I try to 
adopt that interpretation.

Alternatively, is there some way to get a track with 44 intervals, 
labelled 1 to 44?

Or will I need to patch my copy of bioperl to achieve that?

Thanks, Dave


From cjfields at uiuc.edu  Tue Feb 21 12:30:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 11:30:58 -0600
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk>
Message-ID: <000301c6370c$93b07c70$15327e82@pyrimidine>

Anders,

I think you should look through the mail list archives for an answer,
specifically:

http://portal.open-bio.org/pipermail/bioperl-l/2004-November/017285.html

Look up the other methods in Bio::Search::HSP::BlastHSP as well. They may be
more helpful.  I can't help but think there is something wrong with the
logic in your subroutines since they don't call other methods built in to
HSP objects.  It may be an off-by-one error.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Anders Stegmann
> Sent: Friday, February 17, 2006 3:18 AM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] another searchIO bug? with blast report
> 
> 
> 
> >>>Anders Stegmann  02/16/06 11:20 am >>>
> Hi!
> 
> I am blasting a protein seq (query) against an identical seq with a
> deletion of Aa nr 61 (subject).
> Then I print out the type of nomatch Aa and its position.
> The nomatch for the query seq is Aa G at position 61, which is correct.
> The nomatch for the subject seq is V at position 60, which is definitely
> not correct!?
> 
> Is this a bug?
> 
> testblast2.pl is the program to run
> 
> Q0045 is the query seq.
> 
> Q0045del61 is the subject seq (it has to be formated: formatdb -i
> Q0045del61 -p T -o F).
> 
> Regards Anders.
> 




From staffa at niehs.nih.gov  Tue Feb 21 12:24:39 2006
From: staffa at niehs.nih.gov (staffa)
Date: Tue, 21 Feb 2006 12:24:39 -0500
Subject: [Bioperl-l] Pattern Density
Message-ID: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>

Good Friends,
I have an important client who wants a histogram display of the density 
of "ccgg" along any chromosome of the mouse genome in 1000 bp windows.

I'm thinking that maybe there is a bio-perl module that could help with 
this.
That'd probably beat having to write something from scratch.
Any help that you give would be greatly appreciated.
I am more concerned about the reading and analysis of the sequence than 
actual plotting of the histogram, but anything you can offer will be 
appreciated.

Thank you.

Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1167 bytes
Desc: not available
URL: 

From lstein at cshl.edu  Tue Feb 21 13:25:59 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 21 Feb 2006 13:25:59 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FB44DD.4090504@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>
	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
Message-ID: <200602211326.00021.lstein@cshl.edu>

Hi Dave,

Well, when you are using 1-based coordinates, an line that contains 44 
intervals will have 45 ticks. If you move to 0-based coordinates, then the 
first tick will be labeled 0 and the last tick will be labeled 44. An 
alternative is to make each base dimensionless, but that becomes a problem 
when dealing with single base features, such as SNPs. These issues are why I 
have long advocated for interbase coordinates in which you number the 
positions between bases rather than the bases themselves.

Draw me the picture of what you expect to see. I think of it this way:

	1    2  3  4   5   6
         A>G>C>T>A>

Lincoln

On Tuesday 21 February 2006 11:50, Dave Howorth wrote:
> Lincoln Stein wrote:
> > When you are looking at the resolution of individual bases, a base pair
> > at position one occupies the half-open interval from 1->2, meaning that
> > it comes up to, but doesn't quite touch, the 2. For the purposes of
> > display, Bio::Graphics draws the end of the half-open interval.
>
> I think I understand the description of what it's doing but I don't
> understand why. What is the purpose of labelling the [44,45) interval
> 45, when that interval is representing the 44th discrete mer?
>
> I'm working with proteins and domains, so I'm always at the level of
> individual residues and people frequently care about the exact residue
> boundaries, especially when the regions are short. So I need to make
> pictures that match the data.
>
> The displayed track seems more consistent with an interpretation that
> the residues are represented by the discrete integer points along the
> line but I don't know if I'm buying myself trouble later if I try to
> adopt that interpretation.
>
> Alternatively, is there some way to get a track with 44 intervals,
> labelled 1 to 44?
>
> Or will I need to patch my copy of bioperl to achieve that?
>
> Thanks, Dave

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From osborne1 at optonline.net  Tue Feb 21 13:25:35 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 13:25:35 -0500
Subject: [Bioperl-l] Pattern Density
In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
Message-ID: 

Nick,

Right, BioPerl really can?t help you with the histogram itself but there are
probably multiple solutions to the problem of iterating over the sequence.
Here?s one idea, untested, it assumes your sequence is in fasta format:

use strict;
use Bio::DB::Fasta;
use Bio::Tools::SeqWords;

my $db  = Bio::DB::Fasta->new('/path/to/fasta/files');
my $obj = $db->get_Seq_by_id('CHROMOSOME_I');
my $start = 0;
my $windowsize = 1000;
my $str = ?ccgg?;
my $len = $obj->length;
my $overlap = 250;

while (1) {
    my $end = $start + $windowsize;
    last if ( $end > $len);
    my $subseq  = $obj->subseq($start,$end);
    my $count = get_count($str,$subseq);
    $start += $overlap;
}

sub get_count {
    my ($str,$subseq) = @_;
    my $seqobj = Bio::Seq->new(-seq => $subseq);
    my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj);
    my $ref = $seq_word->count_overlap_words(length($str));
    $ref->{$str};
}

Note this skips the very last window, debugging needed.

Brian O.


On 2/21/06 12:24 PM, "staffa"  wrote:

> I am more concerned about the reading and analysis of the sequence than actual
> plotting of the histogram, but anything you can offer will be appreciated.





From gyang at plantbio.uga.edu  Tue Feb 21 13:45:50 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Tue, 21 Feb 2006 13:45:50 -0500
Subject: [Bioperl-l] full chromosome accesscion number mess
In-Reply-To: <000001c63669$2bf06a80$15327e82@pyrimidine>
Message-ID: <20060221184550.6557851b@dogwood.plantbio.uga.edu>

Hi, everybody,  
In the process of reparing my CGI script after NCBI blast output format change, I noticed that the accession number for rice pseudochromosome is very confusing and cause trouble for sequence retrieving. My script use remoteblast to search for similar sequences,and then retrieve the hit sequence with a bit flanking region from GenBank. The rice pseudochromosomes have accession numbers similar to that of the individual clones like AP00XXX. I do not want the sequence retrieving to involve these accessions because it takes forever. Can anybody give some suggestion on how to deal with it?  
Thanks,  
 

Guojun Yang
Department of Plant Biology
University of Georgia


From valiente at lsi.upc.edu  Tue Feb 21 13:46:10 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 19:46:10 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <3193394449valiente@lsi.upc.es>

> you'll have to do it - I don't have time, I thought there was  
> something like this already, but I guess not, so please put it in.

Done. I've added methods get_Lineage_Nodes and get_LCA_Node to
Bio::Taxonomy::Node.

> Uhm, does that return the LCA or one of the first divergent ancestors?
> And what does it do if lineage($node1) is the same as lineage($node2)?

Thanks, I've already taken this into account.

Cheers

Gabriel




From s-merchant at northwestern.edu  Tue Feb 21 13:47:54 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Tue, 21 Feb 2006 12:47:54 -0600
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: 
Message-ID: <000001c63717$5314ded0$c2987ca5@pc13>

Hi Hilmar and Chris,
  I have played around a bit using Bio::Annotation::Collection to
capture the headers of an ontology file. It behaves pretty well and
avoids the cycle issue which might arise by suing ontology to describe
the ontology. I have an initial version of a working parser for obo flat
file format. 

Chris, I was able to model any kind of relationship by using some of the
functionality in the Bio::Ontology::SimpleGoEngine which, I had
initially overlooked. 

I would like to commit this code to the Bioperl CVS, but I don't have
write access to it I believe. Can I send the stuff to either of you
guys?

Hilmar, I would like your feedback on the code base and would be happy
to make any changes required before we commit it to the CVS.

Thanks,
Sohel Merchant.
dictyBase

-----Original Message-----
From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
Lapp
Sent: Monday, February 20, 2006 8:53 PM
To: chris mungall
Cc: Bioperl; Sohel Merchant
Subject: Re: [Bioperl-l] Bio::Ontology::Ontology

On 2/20/06, chris mungall  wrote:
>
> I like the idea of using an ontology to describe the ontology.
>
> Note that the proposed structure:
> OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
>
> will lead to cycles in the object graph when the metadata ontology
> describes itself.

Yes I know, that's why I didn't want to be too vocal about it ...

>
> actually, I think the ontology module already has object reference
> cycles. TermI->OntologyI->TermI
>
> When I brought this up originally people didn't seem to care much - so
> long as you're only parsing GO then it's not a big issue, people have
> enough memory they won't notice a big chunk of memory that refuses to
> be garbage collected way after it's used.

There is a method that destroys the cycle: $ontology->close()
(this is also an interface method)

Essentially, the cycle is not in OntologyI itself but in OntologyI
HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
terms which (may) hold a reference to an OntologyI which holds a
reference to the OntologyEngineI.

I say 'may' in parentheses because an implementation may use tricks
like late instantiation, stringified references (handles), and weak
references. It's possible to avoid the cycle altogether using such
tricks but it remains questionable how much this then affects
performance, and how ugly and incomprehensible the code would become.
Since there is the close() method I haven't bothered yet trying a
fully de-cycled implementation.

> Of course, if you want to use
> bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> different story.

Well if you want to keep all three in memory for some kind of
cross-reasoning then yes you are in trouble. But if you do one
ontology after another, you'd just have make sure to call close() on
an ontology once you're done with it.

>
> I think it's best of Sohel concentrates on getting obo.pm working,
then
> we can start thinking as a group about the best way to capture
ontology
> metadata. This includes metadata on the whole ontology, and metadata
on
> the terms (eg synonyms).
>
> To what extent are the current modules already in use?

I don't know about others but I use them often.

> I think the object cycle is a serious flaw, will it be possible to fix
this without
> a major overhaul?

If I recall correctly the way go-perl circumvents this is by having
the ontology of a term as a flat attribute. This also means that when
having a term alone, you cannot ask for its connected terms. It's been
a while, so Chris set me straight where this is not true.

It should be possible to come up with an implementation of OntologyI
that for all intents and purposes behaves like a flat scalar giving
the name until you call one of its graph traversal methods. At that
point it would instantiate the engine from persistent storage (file,
or a database connection), or retrieve one from a 'store'. The latter
is I believe what Allen started with the OntologyStore, but again I
would need to check the details.

    -hilmar

>
>
> On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
>
> > Sohel, please do keep the discussion on the list, in your own
interest
> > as there's a multitude of people who can respond to you.
> >
> > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > might even create an ontology for annotating ontologies, which would
> > allow you to use Annotation::OntologyTerm for annotation, but then
> > there's no qualifier value ...
> >
> > Bioperl 1.5.1 has been released last year, please check the website.
> >
> >       -hilmar
> >
> > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> >
> >> Hi Hilmar,
> >>   I really like your suggestion of implementing the
Bio::AnnotatableI
> >> interface in the Bio::Ontology::Ontology class. I am going to
> >> implement
> >> this and play around a little with it. I am planning to use
> >> Bio::Annotation::SimpleValue for annotating the header as it
provides
> >> a
> >> good way of specifying the Tag/value pair. What are your thoughts
on
> >> using this?
> >>
> >>   Also, I was wondering if you have any idea about the scheduled
date
> >> for the Bioperl 1.51 release. I would like to contribute some stuff
in
> >> the next release.
> >>
> >> Thanks,
> >> Sohel.
> >>
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Friday, February 10, 2006 3:40 PM
> >> To: Sohel Merchant
> >> Cc: Bioperl
> >> Subject: Re: Bio::Ontology::Ontology
> >>
> >> Sohel,
> >>
> >> please allow me to copy the list in my response. There's many good
and
> >> insightful people on the list who may have something to add or
> >> different ideas.
> >>
> >> I've come across that problem myself, for instance with InterPro.
What
> >> I've done so far simply is to stick it unstructured into the
> >> definition
> >> slot, which is not helpful if your purpose goes further than just
> >> displaying it in an unstructured fashion.
> >>
> >> I'm not sure you would want to create another class for this (like
> >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e.,
the
> >> implementation, probably not the interface) annotatable (i.e.,
> >> implement Bio::Annotatable), which supposedly would be simple to do
> >> (AnnotationCollection is already implemented, you'd just return an
> >> instance of it).
> >>
> >> Even though tag/value pairs sound like quick&fast way to go I'm
> >> leaning
> >> against it; in essence we're moving away from that elsewhere
> >> (SeqFeatureI) and hence I don't think we should restart it here.
> >>
> >> I'm not giving a definitive answer here, just my (initial)
thoughts.
> >> Hope that helps nonetheless. Can you fancy yourself trying the
> >> Annotatable approach and let us know how it goes?
> >>
> >>      -hilmar
> >>
> >>
> >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> >>
> >>> Hi Hilmar,
> >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> >>> Northwestern University. I am working on a parser for an ontology
> >>> file. I really like the ontology object model which you have
> >>> contributed to Bioperl. I think its just Awesome!! One of things
> >>> which
> >>
> >>> I thought would be great to capture is the ontology headers. Right
> >>> now
> >>
> >>> one can specify only the name, authority information. I was
wondering
> >>> if there is any way, I could also capture other ontology file
headers
> >>> like version of the file, date when that ontology file was made. I
> >>> was
> >>
> >>> thinking of making a header class or alternatively it could go as
> >>> Hash
> >>
> >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> >>> know whets your thoughts about on this.
> >>>
> >>> Thanks,
> >>> Sohel Merchant
> >>> dictyBase
> >>>
> >> --
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >>
> >>
> >>
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------




From cjfields at uiuc.edu  Tue Feb 21 14:25:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 13:25:02 -0600
Subject: [Bioperl-l] full chromosome accesscion number mess
In-Reply-To: <20060221184550.6557851b@dogwood.plantbio.uga.edu>
Message-ID: <000001c6371c$83bf92a0$15327e82@pyrimidine>

What is the accession you're having problems with?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Tuesday, February 21, 2006 12:46 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] full chromosome accesscion number mess
> 
> Hi, everybody,
> In the process of reparing my CGI script after NCBI blast output format
> change, I noticed that the accession number for rice pseudochromosome is
> very confusing and cause trouble for sequence retrieving. My script use
> remoteblast to search for similar sequences,and then retrieve the hit
> sequence with a bit flanking region from GenBank. The rice
> pseudochromosomes have accession numbers similar to that of the individual
> clones like AP00XXX. I do not want the sequence retrieving to involve
> these accessions because it takes forever. Can anybody give some
> suggestion on how to deal with it?
> Thanks,
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hlapp at gmx.net  Tue Feb 21 14:31:31 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 11:31:31 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c63717$5314ded0$c2987ca5@pc13>
References: 
	<000001c63717$5314ded0$c2987ca5@pc13>
Message-ID: 

Send it to me. I'll review and check it in if appropriate. You should
also write a test (and include it in what you send to me; see t/*.t
for examples for how to write a test). (and obviously the test should
succeed)

Chris, I suppose this is the time to object - I would conceptually
like the ontology-based annotation too but now we are up against a
(hopefully) working implementation which can only be beaten by another
working implementation, and frankly I don't have time to attempt one
now.

   -hilmar

On 2/21/06, Sohel Merchant  wrote:
> Hi Hilmar and Chris,
>   I have played around a bit using Bio::Annotation::Collection to
> capture the headers of an ontology file. It behaves pretty well and
> avoids the cycle issue which might arise by suing ontology to describe
> the ontology. I have an initial version of a working parser for obo flat
> file format.
>
> Chris, I was able to model any kind of relationship by using some of the
> functionality in the Bio::Ontology::SimpleGoEngine which, I had
> initially overlooked.
>
> I would like to commit this code to the Bioperl CVS, but I don't have
> write access to it I believe. Can I send the stuff to either of you
> guys?
>
> Hilmar, I would like your feedback on the code base and would be happy
> to make any changes required before we commit it to the CVS.
>
> Thanks,
> Sohel Merchant.
> dictyBase
>
> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Monday, February 20, 2006 8:53 PM
> To: chris mungall
> Cc: Bioperl; Sohel Merchant
> Subject: Re: [Bioperl-l] Bio::Ontology::Ontology
>
> On 2/20/06, chris mungall  wrote:
> >
> > I like the idea of using an ontology to describe the ontology.
> >
> > Note that the proposed structure:
> > OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
> >
> > will lead to cycles in the object graph when the metadata ontology
> > describes itself.
>
> Yes I know, that's why I didn't want to be too vocal about it ...
>
> >
> > actually, I think the ontology module already has object reference
> > cycles. TermI->OntologyI->TermI
> >
> > When I brought this up originally people didn't seem to care much - so
> > long as you're only parsing GO then it's not a big issue, people have
> > enough memory they won't notice a big chunk of memory that refuses to
> > be garbage collected way after it's used.
>
> There is a method that destroys the cycle: $ontology->close()
> (this is also an interface method)
>
> Essentially, the cycle is not in OntologyI itself but in OntologyI
> HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
> terms which (may) hold a reference to an OntologyI which holds a
> reference to the OntologyEngineI.
>
> I say 'may' in parentheses because an implementation may use tricks
> like late instantiation, stringified references (handles), and weak
> references. It's possible to avoid the cycle altogether using such
> tricks but it remains questionable how much this then affects
> performance, and how ugly and incomprehensible the code would become.
> Since there is the close() method I haven't bothered yet trying a
> fully de-cycled implementation.
>
> > Of course, if you want to use
> > bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> > different story.
>
> Well if you want to keep all three in memory for some kind of
> cross-reasoning then yes you are in trouble. But if you do one
> ontology after another, you'd just have make sure to call close() on
> an ontology once you're done with it.
>
> >
> > I think it's best of Sohel concentrates on getting obo.pm working,
> then
> > we can start thinking as a group about the best way to capture
> ontology
> > metadata. This includes metadata on the whole ontology, and metadata
> on
> > the terms (eg synonyms).
> >
> > To what extent are the current modules already in use?
>
> I don't know about others but I use them often.
>
> > I think the object cycle is a serious flaw, will it be possible to fix
> this without
> > a major overhaul?
>
> If I recall correctly the way go-perl circumvents this is by having
> the ontology of a term as a flat attribute. This also means that when
> having a term alone, you cannot ask for its connected terms. It's been
> a while, so Chris set me straight where this is not true.
>
> It should be possible to come up with an implementation of OntologyI
> that for all intents and purposes behaves like a flat scalar giving
> the name until you call one of its graph traversal methods. At that
> point it would instantiate the engine from persistent storage (file,
> or a database connection), or retrieve one from a 'store'. The latter
> is I believe what Allen started with the OntologyStore, but again I
> would need to check the details.
>
>     -hilmar
>
> >
> >
> > On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
> >
> > > Sohel, please do keep the discussion on the list, in your own
> interest
> > > as there's a multitude of people who can respond to you.
> > >
> > > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > > might even create an ontology for annotating ontologies, which would
> > > allow you to use Annotation::OntologyTerm for annotation, but then
> > > there's no qualifier value ...
> > >
> > > Bioperl 1.5.1 has been released last year, please check the website.
> > >
> > >       -hilmar
> > >
> > > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> > >
> > >> Hi Hilmar,
> > >>   I really like your suggestion of implementing the
> Bio::AnnotatableI
> > >> interface in the Bio::Ontology::Ontology class. I am going to
> > >> implement
> > >> this and play around a little with it. I am planning to use
> > >> Bio::Annotation::SimpleValue for annotating the header as it
> provides
> > >> a
> > >> good way of specifying the Tag/value pair. What are your thoughts
> on
> > >> using this?
> > >>
> > >>   Also, I was wondering if you have any idea about the scheduled
> date
> > >> for the Bioperl 1.51 release. I would like to contribute some stuff
> in
> > >> the next release.
> > >>
> > >> Thanks,
> > >> Sohel.
> > >>
> > >> -----Original Message-----
> > >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > >> Sent: Friday, February 10, 2006 3:40 PM
> > >> To: Sohel Merchant
> > >> Cc: Bioperl
> > >> Subject: Re: Bio::Ontology::Ontology
> > >>
> > >> Sohel,
> > >>
> > >> please allow me to copy the list in my response. There's many good
> and
> > >> insightful people on the list who may have something to add or
> > >> different ideas.
> > >>
> > >> I've come across that problem myself, for instance with InterPro.
> What
> > >> I've done so far simply is to stick it unstructured into the
> > >> definition
> > >> slot, which is not helpful if your purpose goes further than just
> > >> displaying it in an unstructured fashion.
> > >>
> > >> I'm not sure you would want to create another class for this (like
> > >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e.,
> the
> > >> implementation, probably not the interface) annotatable (i.e.,
> > >> implement Bio::Annotatable), which supposedly would be simple to do
> > >> (AnnotationCollection is already implemented, you'd just return an
> > >> instance of it).
> > >>
> > >> Even though tag/value pairs sound like quick&fast way to go I'm
> > >> leaning
> > >> against it; in essence we're moving away from that elsewhere
> > >> (SeqFeatureI) and hence I don't think we should restart it here.
> > >>
> > >> I'm not giving a definitive answer here, just my (initial)
> thoughts.
> > >> Hope that helps nonetheless. Can you fancy yourself trying the
> > >> Annotatable approach and let us know how it goes?
> > >>
> > >>      -hilmar
> > >>
> > >>
> > >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> > >>
> > >>> Hi Hilmar,
> > >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> > >>> Northwestern University. I am working on a parser for an ontology
> > >>> file. I really like the ontology object model which you have
> > >>> contributed to Bioperl. I think its just Awesome!! One of things
> > >>> which
> > >>
> > >>> I thought would be great to capture is the ontology headers. Right
> > >>> now
> > >>
> > >>> one can specify only the name, authority information. I was
> wondering
> > >>> if there is any way, I could also capture other ontology file
> headers
> > >>> like version of the file, date when that ontology file was made. I
> > >>> was
> > >>
> > >>> thinking of making a header class or alternatively it could go as
> > >>> Hash
> > >>
> > >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> > >>> know whets your thoughts about on this.
> > >>>
> > >>> Thanks,
> > >>> Sohel Merchant
> > >>> dictyBase
> > >>>
> > >> --
> > >> -------------------------------------------------------------
> > >> Hilmar Lapp                            email: lapp at gnf.org
> > >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > >> -------------------------------------------------------------
> > >>
> > >>
> > >>
> > >>
> > > --
> > > -------------------------------------------------------------
> > > Hilmar Lapp                            email: lapp at gnf.org
> > > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > > -------------------------------------------------------------
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From MEC at stowers-institute.org  Tue Feb 21 15:38:55 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 21 Feb 2006 14:38:55 -0600
Subject: [Bioperl-l] Pattern Density
Message-ID: 

 
You might consider displaying ccgg content as a track in mouse genome
browser at
http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34
 
For example, the following track causes it to display 3 proportionally
sized red boxes in the first 3K of mouse Chr1 

[MotifContent]
glyph = xyplot
graph_type = boxes
fgcolor = black
bgcolor = red
height=100
min_score=0
max_score=100
label=1
key="Motif Content"

reference=Chr1
MotifContent CCGG   1..1000    score=20
MotifContent CCGG   1001..2000    score=50
MotifContent CCGG   2001..3000    score=30


There are many ways for computing the score.  I myself would begin with:

#!/usr/bin/env perl
use strict;

use Bio::SeqIO; # for reading sequence to scan
use TFBS::Word::Consensus; # for the pattern matching.  cf.
http://forkhead.cgb.ki.se/TFBS/ 
use PDL::Basic; # if you have it installed, for the histogram binning
statistics 

 
 



________________________________

	From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa
	Sent: Tuesday, February 21, 2006 11:25 AM
	To: bioperl-l at lists.open-bio.org
	Subject: [Bioperl-l] Pattern Density
	
	
	Good Friends, 
	I have an important client who wants a histogram display of the
density of "ccgg" along any chromosome of the mouse genome in 1000 bp
windows. 

	I'm thinking that maybe there is a bio-perl module that could
help with this. 
	That'd probably beat having to write something from scratch. 
	Any help that you give would be greatly appreciated. 
	I am more concerned about the reading and analysis of the
sequence than actual plotting of the histogram, but anything you can
offer will be appreciated. 

	Thank you. 

	Nick Staffa 
	Telephone: 919-316-4569 (NIEHS: 6-4569) 
	Scientific Computing Support Group 
	NIEHS Information Technology Support Services Contract 
	(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) 
	National Institute of Environmental Health Sciences 
	National Institutes of Health 
	Research Triangle Park, North Carolina 




From cjfields at uiuc.edu  Tue Feb 21 16:15:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 15:15:18 -0600
Subject: [Bioperl-l] bioperl maillist searches not updated
Message-ID: <000801c6372b$eae00870$15327e82@pyrimidine>

Seems that using Google to search through the mailing list will only get
mail up to the beginning of August 2005.  I went back to look up Hilmar's
email on bioperl-db recently and can't find it.  So I tried anything in
2006:

http://www.google.com/search?hl=en&lr=&safe=off&as_qdr=all&q=site%3Abioperl.
org+inurl%3Apipermail+inurl%3Abioperl-l+2006&btnG=Search

And got nothin'!

The Open-Bio form has some mail from 2006, but only up to 1-24-2006.
Luckily, the mailing list archives seem to be fine:



Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From osborne1 at optonline.net  Tue Feb 21 16:13:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 16:13:44 -0500
Subject: [Bioperl-l] Pattern Density
In-Reply-To: 
Message-ID: 

Nick,

I was mistaken previously when I hinted that you couldn't create histograms
using Bioperl:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Graphics/Glyph/xyplot.
html

This could do exactly what you want.

Brian O.


On 2/21/06 3:38 PM, "Cook, Malcolm"  wrote:

>  
> You might consider displaying ccgg content as a track in mouse genome
> browser at
> http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34
>  
> For example, the following track causes it to display 3 proportionally
> sized red boxes in the first 3K of mouse Chr1
> 
> [MotifContent]
> glyph = xyplot
> graph_type = boxes
> fgcolor = black
> bgcolor = red
> height=100
> min_score=0
> max_score=100
> label=1
> key="Motif Content"
> 
> reference=Chr1
> MotifContent CCGG   1..1000    score=20
> MotifContent CCGG   1001..2000    score=50
> MotifContent CCGG   2001..3000    score=30
> 
> 
> There are many ways for computing the score.  I myself would begin with:
> 
> #!/usr/bin/env perl
> use strict;
> 
> use Bio::SeqIO; # for reading sequence to scan
> use TFBS::Word::Consensus; # for the pattern matching.  cf.
> http://forkhead.cgb.ki.se/TFBS/
> use PDL::Basic; # if you have it installed, for the histogram binning
> statistics 
> 
>  
>  
> 
> 
> 
> ________________________________
> 
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa
> Sent: Tuesday, February 21, 2006 11:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Pattern Density
> 
> 
> Good Friends, 
> I have an important client who wants a histogram display of the
> density of "ccgg" along any chromosome of the mouse genome in 1000 bp
> windows. 
> 
> I'm thinking that maybe there is a bio-perl module that could
> help with this. 
> That'd probably beat having to write something from scratch.
> Any help that you give would be greatly appreciated.
> I am more concerned about the reading and analysis of the
> sequence than actual plotting of the histogram, but anything you can
> offer will be appreciated.
> 
> Thank you. 
> 
> Nick Staffa 
> Telephone: 919-316-4569 (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Tue Feb 21 16:58:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 15:58:07 -0600
Subject: [Bioperl-l] bioperl-db issues
Message-ID: <000d01c63731$e61be1f0$15327e82@pyrimidine>

Sorry about the huge delay in this response, got caught up with other
things.

> > Bad News:  There's a new problem now. I updated from CVS yesterday; I
> > walked
> > through the steps and ran 'nmake test', with everything passing fine.
> > However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> > every 5 minutes or so.  I noticed (when using '-debug') that it is
> > hanging
> > up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
> > database,
> > load the biosql schema, and load sequences w/o loading taxonomy, the
> > problem
> > goes away.
> >
> > Here's the debugging output (I cut it off at the point it hangs up):
> > [...]
> 
> > preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> > taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> > taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
> > ncbi_taxon_id =
> > ?
> > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> > SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
> 
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name
> over (taxon_id,name,name_class). Also, there should be separate indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

So far everything looks like you mentioned (see below for the ANALYZE
stuff).  The only thing that I wasn't sure about was that taxon_name indexes
were all primary keys.  That's really it.

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
> 
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x version
> of MySQL under certain circumstances you have to analyze the statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
> 
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
> 
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
> 
> This should show you whether the index would be picked up for the query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).
> 
> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?

I find it odd that it worked well back in December and doesn't work now.  I
updated bioperl and bioperl-db from CVS since then, so have there been any
changes that may have caused this?  I noticed a few changes here and there.

Here's what I have tried thus far:

1) I reinstalled MySQL.  I thought it might be that I had my database on a
partitioned drive, so I reinstalled on the main drive.

2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the
schema, and got the same error when loading (hanging on SpeciesAdaptor.
Tried ANALYZE:
------------------------------------
mysql> ANALYZE TABLE taxon;
+----------------+---------+----------+----------+
| Table          | Op      | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| bioseqdb.taxon | analyze | status   | OK       |
+----------------+---------+----------+----------+
1 row in set (0.42 sec)

mysql> ANALYZE TABLE taxon_name;
+---------------------+---------+----------+----------+
| Table               | Op      | Msg_type | Msg_text |
+---------------------+---------+----------+----------+
| bioseqdb.taxon_name | analyze | status   | OK       |
+---------------------+---------+----------+----------+
1 row in set (0.36 sec)

mysql>
------------------------------------
so that's fine.  

3) Using EXPLAIN table:
------------------------------------
mysql> EXPLAIN taxon;
+-------------------+---------------------+------+-----+---------+----------
------+
| Field             | Type                | Null | Key | Default | Extra
|
+-------------------+---------------------+------+-----+---------+----------
------+
| taxon_id          | int(10) unsigned    | NO   | PRI | NULL    |
auto_increment |
| ncbi_taxon_id     | int(10)             | YES  | UNI | NULL    |
|
| parent_taxon_id   | int(10) unsigned    | YES  | MUL | NULL    |
|
| node_rank         | varchar(32)         | YES  |     | NULL    |
|
| genetic_code      | tinyint(3) unsigned | YES  |     | NULL    |
|
| mito_genetic_code | tinyint(3) unsigned | YES  |     | NULL    |
|
| left_value        | int(10) unsigned    | YES  | UNI | NULL    |
|
| right_value       | int(10) unsigned    | YES  | UNI | NULL    |
|
+-------------------+---------------------+------+-----+---------+----------
------+
8 rows in set (0.02 sec)

mysql> EXPLAIN taxon_name;
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| taxon_id   | int(10) unsigned | NO   | PRI |         |       |
| name       | varchar(255)     | NO   | PRI |         |       |
| name_class | varchar(32)      | NO   | PRI |         |       |
+------------+------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

------------------------------------
Does taxon_name need three primary keys?

4) So I tried reloading the sequences:
------------------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -format
genbank -dbname bioseqdb -dbuser root -dbpass ********** -testonly -safe
-debug NP_249092.gpt

And got this:

Loading NP_249092.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
......
SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank)
SimpleValueAdaptor::add_assoc: binding column 1 to "21" (FK to
Bio::SeqFeature::Generic)
SimpleValueAdaptor::add_assoc: binding column 2 to "34" (FK to
Bio::Annotation::SimpleValue)
SimpleValueAdaptor::add_assoc: binding column 3 to "11" (value)
SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank)
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
BioNamespaceAdaptor: binding UK column 1 to "bioperl" (namespace)
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
------------------------------------
Which is where it hangs, as before, usually about 2 minutes for each
sequence.  It seems there's a timeout happening in there somewhere...  It
definitely has something to do with the lookup, but like I said it did run
much faster last Nov-Dec.

So I'm a bit lost now.  Any ideas?  

I may try re-optimizing tables to see if it helps any.

I'm also really thinking of giving postgresql a shot but I have used mysql
for a while now; I'd like to stay with it if I can.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From cjfields at uiuc.edu  Tue Feb 21 23:09:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 22:09:18 -0600
Subject: [Bioperl-l] bioperl-db issues
In-Reply-To: 
Message-ID: <000001c63765$c0472370$15327e82@pyrimidine>

I got it worked out.  The Windows installer had picked out lower memory
settings (key buffer 10M, for instance) when I reinstalled, which
drastically slowed everything down.  I reset the settings for a server
environment and it's fine now.  Well, as fine as it will likely get since
I'm running this on a 1.8 GHz P4 with 756 MB RAM, so I'm not expecting it to
actually fly.  It's loading at about two sequences/second.  I'll have to see
if I get a speed improvement when optimizing tables.  I'll add this to the
wiki for installing bioperl-db under Windows.  

Are there optimal settings for using bioperl-db, such as key buffer and sort
buffer size, buffer pool size, etc?  Or do you think I'm likely to run into
a processor speed limit?  Just trying to get a fix on how much memory I
could push towards getting a smaller sequence database loaded, nothing like
swissprot.  I saw something in the mail list about setting
max_allowed_packet and a few other settings but that was about four years
ago.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Tuesday, February 21, 2006 6:44 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: bioperl-db issues
> 
> On 2/21/06, Chris Fields  wrote:
> > [...]
> > I find it odd that it worked well back in December and doesn't work now.
> I
> > updated bioperl and bioperl-db from CVS since then, so have there been
> any
> > changes that may have caused this?  I noticed a few changes here and
> there.
> 
> The changes were fixes to retrieve the rank on persistent annotation
> objects (it was only stored before, but never retrieved). Neither the
> SpeciesAdaptor nor any of the taxonomy queries was affected by this.
> 
> >
> > Here's what I have tried thus far:
> >
> > 1) I reinstalled MySQL.  I thought it might be that I had my database on
> a
> > partitioned drive, so I reinstalled on the main drive.
> >
> > 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded
> the
> > schema, and got the same error when loading (hanging on SpeciesAdaptor.
> > Tried ANALYZE:
> > ------------------------------------
> > mysql> ANALYZE TABLE taxon;
> > +----------------+---------+----------+----------+
> > | Table          | Op      | Msg_type | Msg_text |
> > +----------------+---------+----------+----------+
> > | bioseqdb.taxon | analyze | status   | OK       |
> > +----------------+---------+----------+----------+
> > 1 row in set (0.42 sec)
> >
> > mysql> ANALYZE TABLE taxon_name;
> > +---------------------+---------+----------+----------+
> > | Table               | Op      | Msg_type | Msg_text |
> > +---------------------+---------+----------+----------+
> > | bioseqdb.taxon_name | analyze | status   | OK       |
> > +---------------------+---------+----------+----------+
> > 1 row in set (0.36 sec)
> 
> I'm not sure but you may have to analyze all tables.
> 
> >
> > mysql>
> > ------------------------------------
> > so that's fine.
> >
> > 3) Using EXPLAIN table:
> > ------------------------------------
> > mysql> EXPLAIN taxon;
> 
> Note that you wouldn't use EXPLAIN on a table but on a query instead.
> I.e., copy&paste the offending query into the mysql editor, prefix it
> with EXPLAIN and then see what the results are. It should show whether
> the indexes are being used properly.
> 
> Most likely it doesn't use one of the idnexes that it should be using
> but does a full table scan instead. The explain plan should pinpoint
> that.
> 
> BTW you can also use this to reconfirm the command line observation
> about the query being slow - it should 'hang' in the mysql shell as
> well. If it doesn't then there is something else going on. (if the
> placeholders pose a problem replace them with the actual values as
> given in the log)
> 
> > [..]
> > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> > SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
> > ------------------------------------
> > Which is where it hangs, as before, usually about 2 minutes for each
> > sequence.
> 
> Do you also see a SELECT CLASSIFICATION query succeeding the one above
> (e.g., if you wait)? I'm asking because I'm surprised that that isn't
> the one you're seeing as taking too long, because it has been reported
> earlier to cause such problems with mysql. Alex Zelensky posted what
> he found worked as a fix.
> 
>   -hilmar
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------



From hlapp at gmx.net  Tue Feb 21 19:43:42 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 16:43:42 -0800
Subject: [Bioperl-l] bioperl-db issues
In-Reply-To: <000d01c63731$e61be1f0$15327e82@pyrimidine>
References: <000d01c63731$e61be1f0$15327e82@pyrimidine>
Message-ID: 

On 2/21/06, Chris Fields  wrote:
> [...]
> I find it odd that it worked well back in December and doesn't work now.  I
> updated bioperl and bioperl-db from CVS since then, so have there been any
> changes that may have caused this?  I noticed a few changes here and there.

The changes were fixes to retrieve the rank on persistent annotation
objects (it was only stored before, but never retrieved). Neither the
SpeciesAdaptor nor any of the taxonomy queries was affected by this.

>
> Here's what I have tried thus far:
>
> 1) I reinstalled MySQL.  I thought it might be that I had my database on a
> partitioned drive, so I reinstalled on the main drive.
>
> 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the
> schema, and got the same error when loading (hanging on SpeciesAdaptor.
> Tried ANALYZE:
> ------------------------------------
> mysql> ANALYZE TABLE taxon;
> +----------------+---------+----------+----------+
> | Table          | Op      | Msg_type | Msg_text |
> +----------------+---------+----------+----------+
> | bioseqdb.taxon | analyze | status   | OK       |
> +----------------+---------+----------+----------+
> 1 row in set (0.42 sec)
>
> mysql> ANALYZE TABLE taxon_name;
> +---------------------+---------+----------+----------+
> | Table               | Op      | Msg_type | Msg_text |
> +---------------------+---------+----------+----------+
> | bioseqdb.taxon_name | analyze | status   | OK       |
> +---------------------+---------+----------+----------+
> 1 row in set (0.36 sec)

I'm not sure but you may have to analyze all tables.

>
> mysql>
> ------------------------------------
> so that's fine.
>
> 3) Using EXPLAIN table:
> ------------------------------------
> mysql> EXPLAIN taxon;

Note that you wouldn't use EXPLAIN on a table but on a query instead.
I.e., copy&paste the offending query into the mysql editor, prefix it
with EXPLAIN and then see what the results are. It should show whether
the indexes are being used properly.

Most likely it doesn't use one of the idnexes that it should be using
but does a full table scan instead. The explain plan should pinpoint
that.

BTW you can also use this to reconfirm the command line observation
about the query being slow - it should 'hang' in the mysql shell as
well. If it doesn't then there is something else going on. (if the
placeholders pose a problem replace them with the actual values as
given in the log)

> [..]
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
> ------------------------------------
> Which is where it hangs, as before, usually about 2 minutes for each
> sequence.

Do you also see a SELECT CLASSIFICATION query succeeding the one above
(e.g., if you wait)? I'm asking because I'm surprised that that isn't
the one you're seeing as taking too long, because it has been reported
earlier to cause such problems with mysql. Alex Zelensky posted what
he found worked as a fix.

  -hilmar
--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb 22 00:13:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 23:13:18 -0600
Subject: [Bioperl-l] removing sequences from a database?
Message-ID: <000001c6376e$b113c170$15327e82@pyrimidine>

I think this has been posed once but I couldn't find a straight answer on
the mailing list; is there a way to remove sequences in a BioSQL database
using bioperl-db?  This is the last I heard about it:

http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From hlapp at gmx.net  Wed Feb 22 00:20:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 21:20:05 -0800
Subject: [Bioperl-l] removing sequences from a database?
In-Reply-To: <000001c6376e$b113c170$15327e82@pyrimidine>
References: <000001c6376e$b113c170$15327e82@pyrimidine>
Message-ID: 

This is a pretty old posting :-) Sure you can remove sequences. In
fact you can remove any persistent object by calling $pobj->remove().
I.e., for a persistent sequence (which is what you get from the
adaptors): $pseq->remove()

Do not forget to call commit() on the persistence adaptor or the
persistent object itself or otherwise the operation is rolled back
when you disconnect.

BTW there are examples for objects other than the sequence object
itself (say you want to remove only the features) in the
scripts/biosql directory; some of the --mergeobjs closure examples do
this.

    -hilmar

On 2/21/06, Chris Fields  wrote:
> I think this has been posed once but I couldn't find a straight answer on
> the mailing list; is there a way to remove sequences in a BioSQL database
> using bioperl-db?  This is the last I heard about it:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 05:20:10 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 10:20:10 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602211326.00021.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
Message-ID: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> Hi Dave,
> 
> Well, when you are using 1-based coordinates, an line that contains 44 
> intervals will have 45 ticks. If you move to 0-based coordinates, then the 
> first tick will be labeled 0 and the last tick will be labeled 44. An 
> alternative is to make each base dimensionless, but that becomes a problem 
> when dealing with single base features, such as SNPs.
 >
> These issues are why I have long advocated for interbase coordinates
> in which you number the positions between bases rather than the bases
> themselves.

I see your point but I need to work with the coordinates that the users 
expect and are familiar with. (Things get much worse with PDB residue 
numbering :)

> Draw me the picture of what you expect to see. I think of it this way:
> 
> 	1    2  3  4   5   6
>          A>G>C>T>A>

I guess something went wrong with your ASCII art :(

OK, consider a 44-residue entry from SwissProt (P12239):

   TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR

The first T is numbered 1 and the last R is numbered 44.

So I expect to see a line with 44 positions indicated somehow (whether 
these are half-open intervals or points on the line), with the number 1 
at the left end and the number 44 at the right end.

An important point is that if I then place other tracks below this one 
that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, 
they should align properly (according to whatever convention is used to 
represent a residue).

For a short sequence like this it would be possible to use letters to 
represent the residue but I'd like to use the same convention for longer 
sequences as well and have everything be consistent.

I'm hoping Bio:Graphics will make this easy.

Thanks, Dave


From khoueiry at ibdm.univ-mrs.fr  Wed Feb 22 04:12:20 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Wed, 22 Feb 2006 10:12:20 +0100
Subject: [Bioperl-l] [Fwd: Re:  Pattern Density]
Message-ID: <1140599541.19981.26.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 
-------------- next part --------------
An embedded message was scrubbed...
From: khoueiry 
Subject: Re: [Bioperl-l] Pattern Density
Date: Tue, 21 Feb 2006 19:47:54 +0100
Size: 3812
URL: 

From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 10:13:10 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 15:13:10 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <1140619014.3142.81.camel@localhost.localdomain>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>	
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<1140619014.3142.81.camel@localhost.localdomain>
Message-ID: <43FC7F86.6060901@mrc-lmb.cam.ac.uk>

Scott Cain wrote:
> I don't know if this helps at all, but you could think of that 45 tick
> mark as the termination, since the space between the 44th and the 45th
> tick mark corresponds to your 44th residue.

Yes, that's the way I do think of it and that's the way I expect 
everybody else to think of it.

But the numbers need to match the residues in any case. ie. the numbers 
need to match the spaces not the tick marks, if the spaces match the 
residues.

> I suppose it is a matter of correctly training your users :-)

The important thing is to have a consistent model, then it's easy to 
explain to users.

Cheers, Dave


From lstein at cshl.edu  Wed Feb 22 11:22:02 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 11:22:02 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
Message-ID: <200602221122.02707.lstein@cshl.edu>

The base starts at the tickmark and extends to (but doesn't touch) the next 
one. If you are down at the resolution at which you see residue letters, then 
lines drawn underneath the letters will line up like this:

 1  2  3  4  5  6  7  8  9 10    ticks
 T  S  N  T  P  N  Q  E  P       residues
    =========   ===========      domains

Right?

Lincoln

On Wednesday 22 February 2006 05:20, Dave Howorth wrote:
> Lincoln Stein wrote:
> > Hi Dave,
> >
> > Well, when you are using 1-based coordinates, an line that contains 44
> > intervals will have 45 ticks. If you move to 0-based coordinates, then
> > the first tick will be labeled 0 and the last tick will be labeled 44. An
> > alternative is to make each base dimensionless, but that becomes a
> > problem when dealing with single base features, such as SNPs.
> >
> > These issues are why I have long advocated for interbase coordinates
> > in which you number the positions between bases rather than the bases
> > themselves.
>
> I see your point but I need to work with the coordinates that the users
> expect and are familiar with. (Things get much worse with PDB residue
> numbering :)
>
> > Draw me the picture of what you expect to see. I think of it this way:
> >
> > 	1    2  3  4   5   6
> >          A>G>C>T>A>
>
> I guess something went wrong with your ASCII art :(
>
> OK, consider a 44-residue entry from SwissProt (P12239):
>
>    TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR
>
> The first T is numbered 1 and the last R is numbered 44.
>
> So I expect to see a line with 44 positions indicated somehow (whether
> these are half-open intervals or points on the line), with the number 1
> at the left end and the number 44 at the right end.
>
> An important point is that if I then place other tracks below this one
> that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI,
> they should align properly (according to whatever convention is used to
> represent a residue).
>
> For a short sequence like this it would be possible to use letters to
> represent the residue but I'd like to use the same convention for longer
> sequences as well and have everything be consistent.
>
> I'm hoping Bio:Graphics will make this easy.
>
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 11:34:08 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 16:34:08 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602221122.02707.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<200602221122.02707.lstein@cshl.edu>
Message-ID: <43FC9280.1020008@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> The base starts at the tickmark and extends to (but doesn't touch) the next 
> one. If you are down at the resolution at which you see residue letters, then 
> lines drawn underneath the letters will line up like this:
> 
>  1  2  3  4  5  6  7  8  9 10    ticks
>  T  S  N  T  P  N  Q  E  P       residues
>     =========   ===========      domains
> 
> Right?

Yes. What's your point?

Dave


From cain at cshl.edu  Wed Feb 22 11:29:21 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 22 Feb 2006 11:29:21 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC7F86.6060901@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<1140619014.3142.81.camel@localhost.localdomain>
	<43FC7F86.6060901@mrc-lmb.cam.ac.uk>
Message-ID: <1140625762.3142.107.camel@localhost.localdomain>

Hi Dave,

I took the example code you posted a few days ago and added a few
motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
last residue), which results in the attached graphic.

As Lincoln pointed it, the features are drawn from the beginning (1 and
35), and through the last residue (up to but not touching 11 and 45).
So the space between 35 and 36 corresponds to residue 35.  That's the
way it works.

Scott


On Wed, 2006-02-22 at 15:13 +0000, Dave Howorth wrote:
> Scott Cain wrote:
> > I don't know if this helps at all, but you could think of that 45 tick
> > mark as the termination, since the space between the 44th and the 45th
> > tick mark corresponds to your 44th residue.
> 
> Yes, that's the way I do think of it and that's the way I expect 
> everybody else to think of it.
> 
> But the numbers need to match the residues in any case. ie. the numbers 
> need to match the spaces not the tick marks, if the spaces match the 
> residues.
> 
> > I suppose it is a matter of correctly training your users :-)
> 
> The important thing is to have a consistent model, then it's easy to 
> explain to users.
> 
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: motifs.png
Type: image/png
Size: 1879 bytes
Desc: not available
URL: 

From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 11:45:00 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 16:45:00 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <1140625762.3142.107.camel@localhost.localdomain>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>	
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>	
	<1140619014.3142.81.camel@localhost.localdomain>	
	<43FC7F86.6060901@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
Message-ID: <43FC950C.7080007@mrc-lmb.cam.ac.uk>

Scott Cain wrote:
> Hi Dave,
> 
> I took the example code you posted a few days ago and added a few
> motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> last residue), which results in the attached graphic.

Yes, that's the same sort of graphic I'm getting.

> As Lincoln pointed it, the features are drawn from the beginning (1 and
> 35), and through the last residue (up to but not touching 11 and 45).
> So the space between 35 and 36 corresponds to residue 35.

But there is no residue 45!  So there should be no number 45 anywhere on 
the picture.

I think the problem is that the tick strip is displaying numbers for the 
ticks instead of the intervals. The intervals are what corresponds to 
users' models of physical reality and my graphics need to match that.

 > That's the way it works.

I guess I'll have to experiment and patch until it does what I want 
then, if nobody knows how to do it.

Cheers, Dave


From iamvela at yahoo.com  Wed Feb 22 12:21:59 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 09:21:59 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
Message-ID: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com>

Hi All:

I am new to Perl/BioPerl world.

I am debugging a program that used to work fine
before. 
Blast works fine and returns results, but I am unale
to get any hits from the results.

Here is the relevant code:

$blastObj = new Bio::SearchIO (-file=>$resultsFile,
-format=>'blast');
  while (my $result = $blastObj->next_result()) {
     while (my $bioPerlHit = $result->next_hit()) {
         .......


The first while condition returns true, but the second
while condition returns false. So looks like there is
some result, but it is unable to identify the hits in
the result. I printed the $result (pasted below).

Any ideas/comments to resolve this? Thanks in advance.

I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on
Windows XP platform. 

Like I said before, this application was running fine
on a different windows machine with similar
environment,so looks like there is some change in the
products/versions that is causing the problem.

thanks again,
Raghu




Blast result (i can send complete result if you need
it):

BLASTP 2.2.13 [Nov-27-2005]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Sch?ffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman 
(1997), "Gapped BLAST and PSI-BLAST: a new generation
of 
protein database search programs", Nucleic Acids Res.
25:3389-3402.

RID: 1140573059-19990-140117828872.BLASTQ1


Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding
environmental samples
           3,297,000 sequences; 1,129,354,045 total
letters
Query=  
Length=360


                                                      
            Score     E
Sequences producing significant alignments:           
            (Bits)  Value

ref|XP_534770.2|  PREDICTED: similar to
Mitogen-activated prot...   739    0.0   
gb|AAX36107.1|  mitogen-activated protein kinase 1
[synthetic con   739    0.0   
pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
Complexed...   739    0.0   
pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
With A S...   739    0.0   
ref|NP_786987.1|  mitogen-activated protein kinase 1
[Bos taur...   739    0.0   
emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
>prf||1813...   738    0.0   
gb|AAQ02541.1|  mitogen-activated protein kinase 1
[synthetic con   736    0.0   
gb|AAH99905.1|  Mitogen-activated protein kinase 1
[Homo sapiens]   735    0.0   
emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
             734    0.0   
gb|AAH58258.1|  Mitogen activated protein kinase 1
[Mus muscul...   731    0.0   
pdb|4ERK|   The Complex Structure Of The Map Kinase
Erk2OLOMOU...   731    0.0   
pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
Arginin...   730    0.0   
ref|XP_860750.1|  PREDICTED: similar to
Mitogen-activated prot...   729    0.0   
gb|AAK56503.1|  extracellular signal-regulated kinase
2 [Gallu...   726    0.0   
ref|XP_860716.1|  PREDICTED: similar to
Mitogen-activated prot...   726    0.0   
pdb|2ERK|   Phosphorylated Map Kinase Erk2            
             726    0.0   
pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
Kinase Co...   725    0.0   
ref|XP_860682.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
ref|XP_860651.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
>prf||181...   717    0.0   
ref|NP_001017127.1|  mitogen-activated protein kinase
1 [Xenopus    715    0.0   
dbj|BAE28679.1|  unnamed protein product [Mus
musculus]             713    0.0   
emb|CAA42482.1|  MAP kinase [Xenopus laevis]
>gb|AAH60748.1| M...   711    0.0   
sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
1 (Myel...   711    0.0   
gb|AAH76730.1|  Xp42 protein [Xenopus laevis]         
             706    0.0   
gb|AAH65868.1|  Mitogen-activated protein kinase 1
[Danio rerio]    696    0.0   
dbj|BAD23843.1|  extracellular signal regulated
protein kinase...   694    0.0   
ref|NP_878308.2|  mitogen-activated protein kinase 1
[Danio re...   694    0.0   
emb|CAG07778.1|  unnamed protein product [Tetraodon
nigroviridis]   692    0.0   
dbj|BAB11813.1|  ERK2 [Danio rerio]                   
             689    0.0   
gb|AAY57805.1|  extracellular signal-regulated kinase
2 [Danio re   687    0.0   
gb|AAH45505.1|  Mitogen-activated protein kinase 3
[Danio reri...   654    0.0   
dbj|BAB11812.1|  ERK1 [Danio rerio]                   
             654    0.0   
ref|XP_609884.2|  PREDICTED: similar to mitogen
activated prot...   653    0.0   
dbj|BAD23842.1|  extracellular signal regulated
protein kinase...   650    0.0   
gb|AAH29712.1|  Mitogen activated protein kinase 3
[Mus muscul...   644    0.0   
ref|XP_885698.1|  PREDICTED: similar to mitogen
activated prot...   644    0.0   
gb|AAA20009.1|  microtubule-associated protein-2
kinase             643    0.0   
emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
>ref|NP_059043...   641    0.0   
gb|AAH13992.1|  Mitogen-activated protein kinase 3
[Homo sapie...   641    0.0   
gb|AAQ02422.1|  mitogen-activated protein kinase 3
[synthetic ...   641    0.0   
gb|AAA41123.1|  extracellular signal-regulated kinase
1             640    0.0   
ref|XP_854045.1|  PREDICTED: similar to mitogen
activated prot...   640    0.0   
gb|AAA63486.1|  extracellular-signal-regulated kinase
1 [Rattus n   640    0.0   
emb|CAG02655.1|  unnamed protein product [Tetraodon
nigroviridis]   640    0.0   
emb|CAA42744.1|  protein serine/threonine kinase [Homo
sapiens...   639    0.0   
gb|AAA36142.1|  kinase 1                              
             639    0.0   
emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
>prf||181...   639    0.0   
ref|XP_885840.1|  PREDICTED: similar to mitogen
activated prot...   632    5e-180
ref|XP_885818.1|  PREDICTED: similar to mitogen
activated prot...   630    3e-179
ref|XP_860621.1|  PREDICTED: similar to
Mitogen-activated prot...   627    2e-178
gb|AAF71666.1|  extracellular signal-regulated kinase
1b [Rattus    627    2e-178
ref|XP_393029.1|  PREDICTED: similar to MAP kinase
[Apis mellifer   621    1e-176
gb|AAA83210.1|  MAP kinase                            
             619    4e-176
dbj|BAE46741.1|  Extracellular regulated MAP kinase
[Bombyx mori]   618    1e-175
gb|AAH13754.1|  Mapk3 protein [Mus musculus]          
             612    9e-174
dbj|BAE06412.1|  mitogen-activated protein kinase
[Ciona intestin   607    2e-172
dbj|BAE33167.1|  unnamed protein product [Mus
musculus]             600    3e-170
gb|AAN46679.1|  MAP kinase [Strongylocentrotus
purpuratus] >re...   598    1e-169
dbj|BAC02940.1|  mitogen-activated protein kinase
[Halocynthia ro   592    6e-168
gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
>gb|EAA4631...   590    2e-167
emb|CAD97888.1|  hypothetical protein [Homo sapiens]  
             589    5e-167
emb|CAD60453.1|  extracellular signal-regulated
protein kinase...   589    5e-167
emb|CAD56894.1|  mitogen-activated protein kinase 1
[Meloidogyne    589    6e-167
ref|XP_536917.2|  PREDICTED: similar to mitogen
activated prot...   588    1e-166
gb|AAN40736.1|  mitogen-activated protein kinase
[Paralichthys ol   586    4e-166
emb|CAE73725.1|  Hypothetical protein CBG21247
[Caenorhabditis br   583    3e-165
emb|CAA87057.1|  Hypothetical protein F43C1.2a
[Caenorhabditis...   581    2e-164
gb|AAA18956.1|  Sur-1 MAP kinase                      
             581    2e-164
emb|CAB60996.1|  Hypothetical protein F43C1.2b
[Caenorhabditis...   581    2e-164
gb|AAK52329.1|  extracellular signal-related kinase 1b
[Homo sapi   580    4e-164
ref|XP_885794.1|  PREDICTED: similar to mitogen
activated prot...   553    4e-156
ref|XP_868146.1|  PREDICTED: similar to mitogen
activated prot...   548    2e-154
gb|AAK52330.1|  extracellular signal-related kinase 1c
[Homo sapi   546    4e-154
dbj|BAA22620.1|  ERK2 [Mus musculus]                  
             544    2e-153
ref|XP_510921.1|  PREDICTED: mitogen-activated protein
kinase 3 [   529    8e-149
gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]    
             496    7e-139
emb|CAJ44437.1|  MAP kinase [Echinococcus
multilocularis]           491    1e-137
ref|XP_885774.1|  PREDICTED: similar to mitogen
activated prot...   444    3e-123
gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
str. PES...   431    2e-119
gb|AAZ38881.1|  extracellular regulated kinase
[Littorina littore   431    2e-119
emb|CAD60723.1|  unnamed protein product [Podospora
anserina]       411    2e-113
gb|AAK25816.1|  MAP kinase [Neurospora crassa]
>ref|XP_959713....   411    2e-113
gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
[Aspergillus ...   409    1e-112
gb|EAA74589.1|  hypothetical protein FG06385.1
[Gibberella zea...   409    1e-112
ref|XP_504312.1|  hypothetical protein [Yarrowia
lipolytica] >...   408    2e-112
gb|AAG01162.1|  mitogen-activated protein kinase
[Fusarium oxy...   408    2e-112
gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
>gb|AAK52840.1|...   408    2e-112
dbj|BAE57584.1|  unnamed protein product [Aspergillus
oryzae]       408    2e-112
dbj|BAD42855.1|  mitogen-activated protein kinase
[Bipolaris oryz   407    3e-112
gb|AAD50496.1|  mitogen activated protein kinase
[Colletotrichum    407    3e-112
gb|AAF05913.1|  mitogen-activated protein kinase
[Cochliobolus he   407    3e-112
gb|AAM89501.1|  mitogen-activated protein kinase
[Leptosphaeria m   407    3e-112
dbj|BAB21569.1|  mitogen-activated protein kinase
[Glomerella cin   407    3e-112
gb|AAB72017.1|  mitogen-activated protein kinase
[Nectria haem...   407    3e-112
emb|CAC36428.1|  mitogen activated protein kinase
[Gibberella fuj   406    6e-112
ref|XP_364720.1|  hypothetical protein MG09565.4
[Magnaporthe gri   406    6e-112
gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]   
             406    6e-112
gb|AAO63561.1|  mitogen activated protein kinase
[Verticillium fu   406    8e-112
dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]     
             405    1e-111

ALIGNMENTS
>ref|XP_534770.2| PREDICTED: similar to
Mitogen-activated protein kinase 1 (Extracellular 
signal-regulated kinase 2) (ERK-2) (Mitogen-activated 
protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
(ERT1) 
isoform 1 [Canis familiaris]
 ref|NP_620407.1| mitogen-activated protein kinase 1
[Homo sapiens]
 ref|NP_002736.3| mitogen-activated protein kinase 1
[Homo sapiens]
 gb|AAH17832.1| Mitogen-activated protein kinase 1
[Homo sapiens]
 sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
1 (Extracellular signal-regulated 
kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)

(MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
 gb|AAA58459.1| protein kinase 2
Length=360

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>gb|AAX36107.1| mitogen-activated protein kinase 1
[synthetic construct]
Length=361

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
Complexed With A Pyrazolopyridazine 
Derivative
Length=368

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  9   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 68

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From lstein at cshl.edu  Wed Feb 22 13:23:09 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:23:09 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221323.09872.lstein@cshl.edu>

Hi Dave,

If you want to adjust the way that the arrow.pm module draws the ticks, please 
make it a user-configurable option with the default being the current method. 
It should be easy enough to do this -- you just offset the position of the 
labels by 0.5 interval and inhibit drawing of the last one.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Wed Feb 22 13:40:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:40:27 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221340.28573.lstein@cshl.edu>

I have just committed a version of the arrow.pm glyph that has a 
-label_intervals flag.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Feb 22 14:45:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 13:45:54 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <000c01c637e8$980c6f90$15327e82@pyrimidine>

Upgrade bioperl from CVS using nmake. 

Installation instructions for using nmake:

http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core

You can download a tarball using anonymous CVS (link at bottom):

http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/

or use CVS directly:

http://www.bioperl.org/wiki/Using_CVS

Then make sure to grab the last SearchIO::last bugfix, which is not in CVS
yet:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

Replace the blast.pm in \site\lib\Bio\SearchIO in your Perl directory.

Does that fix it?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 11:22 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Blast returns result, but does not return hits
> 
> Hi All:
> 
> I am new to Perl/BioPerl world.
> 
> I am debugging a program that used to work fine
> before.
> Blast works fine and returns results, but I am unale
> to get any hits from the results.
> 
> Here is the relevant code:
> 
> $blastObj = new Bio::SearchIO (-file=>$resultsFile,
> -format=>'blast');
>   while (my $result = $blastObj->next_result()) {
>      while (my $bioPerlHit = $result->next_hit()) {
>          .......
> 
> 
> The first while condition returns true, but the second
> while condition returns false. So looks like there is
> some result, but it is unable to identify the hits in
> the result. I printed the $result (pasted below).
> 
> Any ideas/comments to resolve this? Thanks in advance.
> 
> I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on
> Windows XP platform.
> 
> Like I said before, this application was running fine
> on a different windows machine with similar
> environment,so looks like there is some change in the
> products/versions that is causing the problem.
> 
> thanks again,
> Raghu
> 
> 
> 
> 
> Blast result (i can send complete result if you need
> it):
> 
> 

> BLASTP 2.2.13 [Nov-27-2005]
> Reference: Altschul, Stephen F., Thomas L. Madden,
> Alejandro A. Sch?ffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> Lipman
> (1997), "Gapped BLAST and PSI-BLAST: a new generation
> of
> protein database search programs", Nucleic Acids Res.
> 25:3389-3402.
> 
> RID: 1140573059-19990-140117828872.BLASTQ1
> 
> 
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding
> environmental samples
>            3,297,000 sequences; 1,129,354,045 total
> letters
> Query=
> Length=360
> 
> 
> 
>             Score     E
> Sequences producing significant alignments:
>             (Bits)  Value
> 
> ref|XP_534770.2|  PREDICTED: similar to
> Mitogen-activated prot...   739    0.0
> gb|AAX36107.1|  mitogen-activated protein kinase 1
> [synthetic con   739    0.0
> pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
> Complexed...   739    0.0
> pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
> With A S...   739    0.0
> ref|NP_786987.1|  mitogen-activated protein kinase 1
> [Bos taur...   739    0.0
> emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
> >prf||1813...   738    0.0
> gb|AAQ02541.1|  mitogen-activated protein kinase 1
> [synthetic con   736    0.0
> gb|AAH99905.1|  Mitogen-activated protein kinase 1
> [Homo sapiens]   735    0.0
> emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
>              734    0.0
> gb|AAH58258.1|  Mitogen activated protein kinase 1
> [Mus muscul...   731    0.0
> pdb|4ERK|   The Complex Structure Of The Map Kinase
> Erk2OLOMOU...   731    0.0
> pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
> Arginin...   730    0.0
> ref|XP_860750.1|  PREDICTED: similar to
> Mitogen-activated prot...   729    0.0
> gb|AAK56503.1|  extracellular signal-regulated kinase
> 2 [Gallu...   726    0.0
> ref|XP_860716.1|  PREDICTED: similar to
> Mitogen-activated prot...   726    0.0
> pdb|2ERK|   Phosphorylated Map Kinase Erk2
>              726    0.0
> pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
> Kinase Co...   725    0.0
> ref|XP_860682.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> ref|XP_860651.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
> >prf||181...   717    0.0
> ref|NP_001017127.1|  mitogen-activated protein kinase
> 1 [Xenopus    715    0.0
> dbj|BAE28679.1|  unnamed protein product [Mus
> musculus]             713    0.0
> emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> >gb|AAH60748.1| M...   711    0.0
> sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
> 1 (Myel...   711    0.0
> gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
>              706    0.0
> gb|AAH65868.1|  Mitogen-activated protein kinase 1
> [Danio rerio]    696    0.0
> dbj|BAD23843.1|  extracellular signal regulated
> protein kinase...   694    0.0
> ref|NP_878308.2|  mitogen-activated protein kinase 1
> [Danio re...   694    0.0
> emb|CAG07778.1|  unnamed protein product [Tetraodon
> nigroviridis]   692    0.0
> dbj|BAB11813.1|  ERK2 [Danio rerio]
>              689    0.0
> gb|AAY57805.1|  extracellular signal-regulated kinase
> 2 [Danio re   687    0.0
> gb|AAH45505.1|  Mitogen-activated protein kinase 3
> [Danio reri...   654    0.0
> dbj|BAB11812.1|  ERK1 [Danio rerio]
>              654    0.0
> ref|XP_609884.2|  PREDICTED: similar to mitogen
> activated prot...   653    0.0
> dbj|BAD23842.1|  extracellular signal regulated
> protein kinase...   650    0.0
> gb|AAH29712.1|  Mitogen activated protein kinase 3
> [Mus muscul...   644    0.0
> ref|XP_885698.1|  PREDICTED: similar to mitogen
> activated prot...   644    0.0
> gb|AAA20009.1|  microtubule-associated protein-2
> kinase             643    0.0
> emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
> >ref|NP_059043...   641    0.0
> gb|AAH13992.1|  Mitogen-activated protein kinase 3
> [Homo sapie...   641    0.0
> gb|AAQ02422.1|  mitogen-activated protein kinase 3
> [synthetic ...   641    0.0
> gb|AAA41123.1|  extracellular signal-regulated kinase
> 1             640    0.0
> ref|XP_854045.1|  PREDICTED: similar to mitogen
> activated prot...   640    0.0
> gb|AAA63486.1|  extracellular-signal-regulated kinase
> 1 [Rattus n   640    0.0
> emb|CAG02655.1|  unnamed protein product [Tetraodon
> nigroviridis]   640    0.0
> emb|CAA42744.1|  protein serine/threonine kinase [Homo
> sapiens...   639    0.0
> gb|AAA36142.1|  kinase 1
>              639    0.0
> emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
> >prf||181...   639    0.0
> ref|XP_885840.1|  PREDICTED: similar to mitogen
> activated prot...   632    5e-180
> ref|XP_885818.1|  PREDICTED: similar to mitogen
> activated prot...   630    3e-179
> ref|XP_860621.1|  PREDICTED: similar to
> Mitogen-activated prot...   627    2e-178
> gb|AAF71666.1|  extracellular signal-regulated kinase
> 1b [Rattus    627    2e-178
> ref|XP_393029.1|  PREDICTED: similar to MAP kinase
> [Apis mellifer   621    1e-176
> gb|AAA83210.1|  MAP kinase
>              619    4e-176
> dbj|BAE46741.1|  Extracellular regulated MAP kinase
> [Bombyx mori]   618    1e-175
> gb|AAH13754.1|  Mapk3 protein [Mus musculus]
>              612    9e-174
> dbj|BAE06412.1|  mitogen-activated protein kinase
> [Ciona intestin   607    2e-172
> dbj|BAE33167.1|  unnamed protein product [Mus
> musculus]             600    3e-170
> gb|AAN46679.1|  MAP kinase [Strongylocentrotus
> purpuratus] >re...   598    1e-169
> dbj|BAC02940.1|  mitogen-activated protein kinase
> [Halocynthia ro   592    6e-168
> gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
> >gb|EAA4631...   590    2e-167
> emb|CAD97888.1|  hypothetical protein [Homo sapiens]
>              589    5e-167
> emb|CAD60453.1|  extracellular signal-regulated
> protein kinase...   589    5e-167
> emb|CAD56894.1|  mitogen-activated protein kinase 1
> [Meloidogyne    589    6e-167
> ref|XP_536917.2|  PREDICTED: similar to mitogen
> activated prot...   588    1e-166
> gb|AAN40736.1|  mitogen-activated protein kinase
> [Paralichthys ol   586    4e-166
> emb|CAE73725.1|  Hypothetical protein CBG21247
> [Caenorhabditis br   583    3e-165
> emb|CAA87057.1|  Hypothetical protein F43C1.2a
> [Caenorhabditis...   581    2e-164
> gb|AAA18956.1|  Sur-1 MAP kinase
>              581    2e-164
> emb|CAB60996.1|  Hypothetical protein F43C1.2b
> [Caenorhabditis...   581    2e-164
> gb|AAK52329.1|  extracellular signal-related kinase 1b
> [Homo sapi   580    4e-164
> ref|XP_885794.1|  PREDICTED: similar to mitogen
> activated prot...   553    4e-156
> ref|XP_868146.1|  PREDICTED: similar to mitogen
> activated prot...   548    2e-154
> gb|AAK52330.1|  extracellular signal-related kinase 1c
> [Homo sapi   546    4e-154
> dbj|BAA22620.1|  ERK2 [Mus musculus]
>              544    2e-153
> ref|XP_510921.1|  PREDICTED: mitogen-activated protein
> kinase 3 [   529    8e-149
> gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]
>              496    7e-139
> emb|CAJ44437.1|  MAP kinase [Echinococcus
> multilocularis]           491    1e-137
> ref|XP_885774.1|  PREDICTED: similar to mitogen
> activated prot...   444    3e-123
> gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
> str. PES...   431    2e-119
> gb|AAZ38881.1|  extracellular regulated kinase
> [Littorina littore   431    2e-119
> emb|CAD60723.1|  unnamed protein product [Podospora
> anserina]       411    2e-113
> gb|AAK25816.1|  MAP kinase [Neurospora crassa]
> >ref|XP_959713....   411    2e-113
> gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
> [Aspergillus ...   409    1e-112
> gb|EAA74589.1|  hypothetical protein FG06385.1
> [Gibberella zea...   409    1e-112
> ref|XP_504312.1|  hypothetical protein [Yarrowia
> lipolytica] >...   408    2e-112
> gb|AAG01162.1|  mitogen-activated protein kinase
> [Fusarium oxy...   408    2e-112
> gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
> >gb|AAK52840.1|...   408    2e-112
> dbj|BAE57584.1|  unnamed protein product [Aspergillus
> oryzae]       408    2e-112
> dbj|BAD42855.1|  mitogen-activated protein kinase
> [Bipolaris oryz   407    3e-112
> gb|AAD50496.1|  mitogen activated protein kinase
> [Colletotrichum    407    3e-112
> gb|AAF05913.1|  mitogen-activated protein kinase
> [Cochliobolus he   407    3e-112
> gb|AAM89501.1|  mitogen-activated protein kinase
> [Leptosphaeria m   407    3e-112
> dbj|BAB21569.1|  mitogen-activated protein kinase
> [Glomerella cin   407    3e-112
> gb|AAB72017.1|  mitogen-activated protein kinase
> [Nectria haem...   407    3e-112
> emb|CAC36428.1|  mitogen activated protein kinase
> [Gibberella fuj   406    6e-112
> ref|XP_364720.1|  hypothetical protein MG09565.4
> [Magnaporthe gri   406    6e-112
> gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]
>              406    6e-112
> gb|AAO63561.1|  mitogen activated protein kinase
> [Verticillium fu   406    8e-112
> dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]
>              405    1e-111
> 
> ALIGNMENTS
> >ref|XP_534770.2| PREDICTED: similar to
> Mitogen-activated protein kinase 1 (Extracellular
> signal-regulated kinase 2) (ERK-2) (Mitogen-activated
> protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
> (ERT1)
> isoform 1 [Canis familiaris]
>  ref|NP_620407.1| mitogen-activated protein kinase 1
> [Homo sapiens]
>  ref|NP_002736.3| mitogen-activated protein kinase 1
> [Homo sapiens]
>  gb|AAH17832.1| Mitogen-activated protein kinase 1
> [Homo sapiens]
>  sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
> 1 (Extracellular signal-regulated
> kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)
> 
> (MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
>  gb|AAA58459.1| protein kinase 2
> Length=360
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >gb|AAX36107.1| mitogen-activated protein kinase 1
> [synthetic construct]
> Length=361
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
> Complexed With A Pyrazolopyridazine
> Derivative
> Length=368
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  9
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  68
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From iamvela at yahoo.com  Wed Feb 22 16:06:54 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 13:06:54 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000c01c637e8$980c6f90$15327e82@pyrimidine>
Message-ID: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>

Thanks Chris. I am getting below mentioned errors with
nmake.

As suggested, I downloaded the nmake utility from
Microsoft website and the bioperl-live tarball.

After untaring, I replaced the blast.pm file (under
bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
size) attached to the bug report 1934.

I then did the following to install packages using
nmake:

1) perl Makefile.pl was successful without any errors.


2) 'c:\nmake' results in following errors

        pl2bat.bat blib\script\bp_unflatten_seq.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.


3) 'c:\nmake test' fails with following errors:

NMAKE : fatal error U1095: expanded command line
'C:\mod_perl\Perl\bin\perl.exe
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib\lib', 'blib\arch')" t\AACh
ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
t\AlignStats.t t\AlignUti
l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
t\Annotation.t t\AnnotationAdapto
r.t t\asciitree.t t\Assembly.t t\Biblio.t
t\Biblio_biofetch.t t\Biblio_eutils.t
t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
t\BioGraphics.t t\BlastIndex.t
 t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
t\Chain.t t\chaosxml.t t\cig
arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
t\Compatible.t t\consed.t
 t\CoordinateGraph.t t\CoordinateMapper.t
t\Correlate.t t\ctf.t t\CytoMap.t t\DB
.t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
t\ECnumber.t t\ELM.t t\embl
.t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
t\entrezgene.t t\ePCR.t t\ESEfind
er.t t\est2genome.t t\Exception.t t\Exonerate.t
t\exp.t t\fasta.t t\FeatureIO.t
t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
t\gcg.t t\GDB.t t\Gel.t t\genba
nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
t\Genomewise.t t\Genpred.t
 t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
t\GuessSeqFormat.t t\hmmer.t t\HNN
.t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
t\InterProParser.t t\IUPAC.
t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
t\largepseq.t t\LinkageMap.t t\L
iveSeq.t t\LocatableSeq.t t\Location.t
t\LocationFactory.t t\LocusLink.t t\lucy.
t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
t\MeSH.t t\metafasta.t t\Me
taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
t\MitoProt.t t\Molphy.t t\Mult
iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
t\NetPhos.t t\Node.t t\OddCo
des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
t\OMIMparser.t t\Ontology.t t\On
tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
t\phd.t t\Phenotype.t t\Phyli
pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
t\pir.t t\pln.t t\PopGen.
t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
t\primedseq.t t\Primer.t t\prime
r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
t\ProtMatrix.t t\ProtPsm.t t\Ps
eudowise.t t\psm.t t\QRNA.t t\qual.t
t\RandDistFunctions.t t\RandomTreeFactory.t
 t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
t\Relationship.t t\Relatio
nshipType.t t\RemoteBlast.t t\RepeatMasker.t
t\RestrictionAnalysis.t t\Restricti
onEnzyme.t t\RestrictionIO.t t\RNAChange.t
t\Root-Utilities.t t\RootI.t t\RootIO
.t t\RootStorable.t t\Scansite.t t\scf.t
t\SearchDist.t t\SearchIO.t t\Seq.t t\s
eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
t\SeqDiff.t t\SeqFeatCollectio
n.t t\SeqFeature.t t\seqfeaturePrimer.t
t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
 t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
t\sequencetrace.t t\SeqUtils.t
 t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
t\Sigcleave.t t\Sim4.t t\Similar
ityPair.t t\SimpleAlign.t t\simpleGOparser.t
t\singlet.t t\sirna.t t\SiteMatrix.
t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
t\splicedseq.t t\StandAloneBlast.t t\
StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
t\TagHaplotype.t t\Taxonom
y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
t\tinyseq.t t\Tools.t t\Tree
.t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
t\tutorial.t t\UCSCParsers.t
 t\Unflattener.t t\Unflattener2.t t\UniGene.t
t\Variation_IO.t t\WABA.t t\XEMBL_
DB.t t\ztr.t' too long
Stop.

C:\bioperl-live\bioperl-live>



4) 'c:\nmake install' results in following errors:

        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.
Appending installation info to
C:\mod_perl\Perl\lib/perllocal.pod
NMAKE : fatal error U1095: expanded command line '@
C:\mod_perl\Perl\bin\perl.ex
e "-MExtUtils::Command::MM" -e perllocal_install 
"Module" "Bio"  "installed int
o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic" 
VERSION "1.5"  EXE_FILES "./
scripts_temp/bp_biblio.pl
./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
./scripts_temp/bp_genbank2gff3.
pl ./scripts_temp/bp_generate_histogram.pl
./scripts_temp/bp_load_gff.pl ./scrip
ts_temp/bp_meta_gff.pl
./scripts_temp/bp_process_gadfly.pl
./scripts_temp/bp_pro
cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
./scripts_temp/bp_embl2picture
.pl ./scripts_temp/bp_glyphs1-demo.pl
./scripts_temp/bp_glyphs2-demo.pl ./script
s_temp/bp_biofetch_genbank_proxy.pl
./scripts_temp/bp_bioflat_index.pl ./scripts
_temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
./scripts_temp/bp_contig_draw.
pl ./scripts_temp/bp_feature_draw.pl
./scripts_temp/bp_frend.pl ./scripts_temp/b
p_search_overview.pl ./scripts_temp/bp_fetch.pl
./scripts_temp/bp_index.pl ./scr
ipts_temp/bp_seqret.pl
./scripts_temp/bp_composite_LD.pl
./scripts_temp/bp_heter
ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
./scripts_temp/bp_filter_
search.pl ./scripts_temp/bp_hmmer_to_table.pl
./scripts_temp/bp_search2table.pl
./scripts_temp/bp_extract_feature_seq.pl
./scripts_temp/bp_make_mrna_protein.pl
./scripts_temp/bp_seqconvert.pl
./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
_translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
./scripts_temp/bp_aacomp.pl
 ./scripts_temp/bp_chaos_plot.pl
./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
ligo_count.pl
./scripts_temp/bp_classify_hits_kingdom.pl
./scripts_temp/bp_local
_taxonomydb_query.pl
./scripts_temp/bp_query_entrez_taxa.pl
./scripts_temp/bp_ta
xid4species.pl ./scripts_temp/bp_blast2tree.pl
./scripts_temp/bp_nexus2nh.pl ./s
cripts_temp/bp_tree2pag.pl
./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
l ./scripts_temp/bp_sreformat.pl
./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
mask_by_search.pl ./scripts_temp/bp_mutate.pl
./scripts_temp/bp_pairwise_kaks.pl
 ./scripts_temp/bp_remote_blast.pl
./scripts_temp/bp_search2alnblocks.pl ./scrip
ts_temp/bp_search2BSML.pl
./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
C:\mod_perl\Perl\lib\perllocal.
pod' too long
Stop.

C:\bioperl-live\bioperl-live>

--- Chris Fields  wrote:

> Upgrade bioperl from CVS using nmake. 
> 
> Installation instructions for using nmake:
> 
>
http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> 
> You can download a tarball using anonymous CVS (link
> at bottom):
> 
>
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> 
> or use CVS directly:
> 
> http://www.bioperl.org/wiki/Using_CVS
> 
> Then make sure to grab the last SearchIO::last
> bugfix, which is not in CVS
> yet:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> 
> Replace the blast.pm in \site\lib\Bio\SearchIO in
> your Perl directory.
> 
> Does that fix it?
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 11:22 AM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Hi All:
> > 
> > I am new to Perl/BioPerl world.
> > 
> > I am debugging a program that used to work fine
> > before.
> > Blast works fine and returns results, but I am
> unale
> > to get any hits from the results.
> > 
> > Here is the relevant code:
> > 
> > $blastObj = new Bio::SearchIO
> (-file=>$resultsFile,
> > -format=>'blast');
> >   while (my $result = $blastObj->next_result()) {
> >      while (my $bioPerlHit = $result->next_hit())
> {
> >          .......
> > 
> > 
> > The first while condition returns true, but the
> second
> > while condition returns false. So looks like there
> is
> > some result, but it is unable to identify the hits
> in
> > the result. I printed the $result (pasted below).
> > 
> > Any ideas/comments to resolve this? Thanks in
> advance.
> > 
> > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> 1.3.34 on
> > Windows XP platform.
> > 
> > Like I said before, this application was running
> fine
> > on a different windows machine with similar
> > environment,so looks like there is some change in
> the
> > products/versions that is causing the problem.
> > 
> > thanks again,
> > Raghu
> > 
> > 
> > 
> > 
> > Blast result (i can send complete result if you
> need
> > it):
> > 
> > 

> > BLASTP 2.2.13 [Nov-27-2005]
> > Reference: Altschul, Stephen F., Thomas L. Madden,
> > Alejandro A. Sch?ffer,
> > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> J.
> > Lipman
> > (1997), "Gapped BLAST and PSI-BLAST: a new
> generation
> > of
> > protein database search programs", Nucleic Acids
> Res.
> > 25:3389-3402.
> > 
> > RID: 1140573059-19990-140117828872.BLASTQ1
> > 
> > 
> > Database: All non-redundant GenBank CDS
> > translations+PDB+SwissProt+PIR+PRF excluding
> > environmental samples
> >            3,297,000 sequences; 1,129,354,045
> total
> > letters
> > Query=
> > Length=360
> > 
> > 
> > 
> >             Score     E
> > Sequences producing significant alignments:
> >             (Bits)  Value
> > 
> > ref|XP_534770.2|  PREDICTED: similar to
> > Mitogen-activated prot...   739    0.0
> > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > [synthetic con   739    0.0
> > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> Erk2
> > Complexed...   739    0.0
> > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> Complex
> > With A S...   739    0.0
> > ref|NP_786987.1|  mitogen-activated protein kinase
> 1
> > [Bos taur...   739    0.0
> > emb|CAA77752.1|  41kD protein kinase [Homo
> sapiens]
> > >prf||1813...   738    0.0
> > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > [synthetic con   736    0.0
> > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > [Homo sapiens]   735    0.0
> > emb|CAI29602.1|  hypothetical protein [Pongo
> pygmaeus]
> >              734    0.0
> > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > [Mus muscul...   731    0.0
> > pdb|4ERK|   The Complex Structure Of The Map
> Kinase
> > Erk2OLOMOU...   731    0.0
> > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> With An
> > Arginin...   730    0.0
> > ref|XP_860750.1|  PREDICTED: similar to
> > Mitogen-activated prot...   729    0.0
> > gb|AAK56503.1|  extracellular signal-regulated
> kinase
> > 2 [Gallu...   726    0.0
> > ref|XP_860716.1|  PREDICTED: similar to
> > Mitogen-activated prot...   726    0.0
> > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> >              726    0.0
> > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> Map
> > Kinase Co...   725    0.0
> > ref|XP_860682.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > ref|XP_860651.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > emb|CAA77753.1|  40kDa protein kinase [Homo
> sapiens]
> > >prf||181...   717    0.0
> > ref|NP_001017127.1|  mitogen-activated protein
> kinase
> > 1 [Xenopus    715    0.0
> > dbj|BAE28679.1|  unnamed protein product [Mus
> > musculus]             713    0.0
> > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > >gb|AAH60748.1| M...   711    0.0
> > sp|P26696|MK01_XENLA  Mitogen-activated protein
> kinase
> > 1 (Myel...   711    0.0
> > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> >              706    0.0
> > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > [Danio rerio]    696    0.0
> > dbj|BAD23843.1|  extracellular signal regulated
> > protein kinase...   694    0.0
> > ref|NP_878308.2|  mitogen-activated protein kinase
> 1
> > [Danio re...   694    0.0
> > emb|CAG07778.1|  unnamed protein product
> [Tetraodon
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed Feb 22 16:55:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 15:55:34 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <001701c637fa$b5110120$15327e82@pyrimidine>

You know, I assumed you were using ActivePerl b/c of the older version of
Bioperl (and since it?s the most commonly used Perl for Windows build).  My
goof.  It looks like you're using Apache/mod_perl/perl, right?  The only
Perl/Apache/mod_perl combos for Windows I know of are listed here:

http://perl.apache.org/docs/2.0/os/win32/install.html

The only Perl for Windows we have actively supported is ActivePerl AFAIK,
but maybe we can walk through this.  Anything learned here can be added to
the installation instructions in case this comes up again.

To start, what mod_perl/Perl version are you using, and from what
distributor (IndigoStar, Apache, etc)?  Each distribution should have some
documentation for installing CPAN modules or prebuilt/pretested packages,
like ActiveState's PPM or IndigoStar's GPM.  I think Apache's Perl build is
from ActiveState's source code so should come with PPM.

Next: you obviously have installed Bioperl before (v1.2.3); did you use
'make' or 'nmake', or was it from a repository (like IndigoPerl's GPM)?
AFAIK, you would install it like you would any other perl module; there
should be no problem with 'make/nmake', though 'make/nmake test' will not
pass completely (it should pass most tests, though, otherwise something is
seriously wrong).

The other option, though not as nice, is setting the PERL5LIB variable to
include the bioperl-live directory; it works for me while I'm developing.  I
don?t know how this may affect other mod_perl-related functions, though.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Wednesday, February 22, 2006 3:07 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris. I am getting below mentioned errors with
> nmake.
> 
> As suggested, I downloaded the nmake utility from
> Microsoft website and the bioperl-live tarball.
> 
> After untaring, I replaced the blast.pm file (under
> bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
> size) attached to the bug report 1934.
> 
> I then did the following to install packages using
> nmake:
> 
> 1) perl Makefile.pl was successful without any errors.
> 
> 
> 2) 'c:\nmake' results in following errors
> 
>         pl2bat.bat blib\script\bp_unflatten_seq.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> 
> 
> 3) 'c:\nmake test' fails with following errors:
> 
> NMAKE : fatal error U1095: expanded command line
> 'C:\mod_perl\Perl\bin\perl.exe
> "-MExtUtils::Command::MM" "-e" "test_harness(0,
> 'blib\lib', 'blib\arch')" t\AACh
> ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
> t\AlignStats.t t\AlignUti
> l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> t\Annotation.t t\AnnotationAdapto
> r.t t\asciitree.t t\Assembly.t t\Biblio.t
> t\Biblio_biofetch.t t\Biblio_eutils.t
> t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> t\BioGraphics.t t\BlastIndex.t
>  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
> t\Chain.t t\chaosxml.t t\cig
> arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
> t\Compatible.t t\consed.t
>  t\CoordinateGraph.t t\CoordinateMapper.t
> t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
> t\ECnumber.t t\ELM.t t\embl
> .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> t\entrezgene.t t\ePCR.t t\ESEfind
> er.t t\est2genome.t t\Exception.t t\Exonerate.t
> t\exp.t t\fasta.t t\FeatureIO.t
> t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> t\gcg.t t\GDB.t t\Gel.t t\genba
> nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
> t\Genomewise.t t\Genpred.t
>  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> t\GuessSeqFormat.t t\hmmer.t t\HNN
> .t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
> t\InterProParser.t t\IUPAC.
> t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> t\largepseq.t t\LinkageMap.t t\L
> iveSeq.t t\LocatableSeq.t t\Location.t
> t\LocationFactory.t t\LocusLink.t t\lucy.
> t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
> t\MeSH.t t\metafasta.t t\Me
> taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> t\MitoProt.t t\Molphy.t t\Mult
> iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
> t\NetPhos.t t\Node.t t\OddCo
> des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> t\OMIMparser.t t\Ontology.t t\On
> tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> t\phd.t t\Phenotype.t t\Phyli
> pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
> t\pir.t t\pln.t t\PopGen.
> t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> t\primedseq.t t\Primer.t t\prime
> r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> t\ProtMatrix.t t\ProtPsm.t t\Ps
> eudowise.t t\psm.t t\QRNA.t t\qual.t
> t\RandDistFunctions.t t\RandomTreeFactory.t
>  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
> t\Relationship.t t\Relatio
> nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> t\RestrictionAnalysis.t t\Restricti
> onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> t\Root-Utilities.t t\RootI.t t\RootIO
> .t t\RootStorable.t t\Scansite.t t\scf.t
> t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> t\SeqDiff.t t\SeqFeatCollectio
> n.t t\SeqFeature.t t\seqfeaturePrimer.t
> t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
>  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> t\sequencetrace.t t\SeqUtils.t
>  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> t\Sigcleave.t t\Sim4.t t\Similar
> ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> t\singlet.t t\sirna.t t\SiteMatrix.
> t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
> t\splicedseq.t t\StandAloneBlast.t t\
> StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
> t\TagHaplotype.t t\Taxonom
> y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
> t\tinyseq.t t\Tools.t t\Tree
> .t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
> t\tutorial.t t\UCSCParsers.t
>  t\Unflattener.t t\Unflattener2.t t\UniGene.t
> t\Variation_IO.t t\WABA.t t\XEMBL_
> DB.t t\ztr.t' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> 
> 
> 4) 'c:\nmake install' results in following errors:
> 
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> Appending installation info to
> C:\mod_perl\Perl\lib/perllocal.pod
> NMAKE : fatal error U1095: expanded command line '@
> C:\mod_perl\Perl\bin\perl.ex
> e "-MExtUtils::Command::MM" -e perllocal_install
> "Module" "Bio"  "installed int
> o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic"
> VERSION "1.5"  EXE_FILES "./
> scripts_temp/bp_biblio.pl
> ./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
> k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
> ./scripts_temp/bp_genbank2gff3.
> pl ./scripts_temp/bp_generate_histogram.pl
> ./scripts_temp/bp_load_gff.pl ./scrip
> ts_temp/bp_meta_gff.pl
> ./scripts_temp/bp_process_gadfly.pl
> ./scripts_temp/bp_pro
> cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
> ./scripts_temp/bp_embl2picture
> .pl ./scripts_temp/bp_glyphs1-demo.pl
> ./scripts_temp/bp_glyphs2-demo.pl ./script
> s_temp/bp_biofetch_genbank_proxy.pl
> ./scripts_temp/bp_bioflat_index.pl ./scripts
> _temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
> ./scripts_temp/bp_contig_draw.
> pl ./scripts_temp/bp_feature_draw.pl
> ./scripts_temp/bp_frend.pl ./scripts_temp/b
> p_search_overview.pl ./scripts_temp/bp_fetch.pl
> ./scripts_temp/bp_index.pl ./scr
> ipts_temp/bp_seqret.pl
> ./scripts_temp/bp_composite_LD.pl
> ./scripts_temp/bp_heter
> ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
> ./scripts_temp/bp_filter_
> search.pl ./scripts_temp/bp_hmmer_to_table.pl
> ./scripts_temp/bp_search2table.pl
> ./scripts_temp/bp_extract_feature_seq.pl
> ./scripts_temp/bp_make_mrna_protein.pl
> ./scripts_temp/bp_seqconvert.pl
> ./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
> _translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
> ./scripts_temp/bp_aacomp.pl
>  ./scripts_temp/bp_chaos_plot.pl
> ./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
> ligo_count.pl
> ./scripts_temp/bp_classify_hits_kingdom.pl
> ./scripts_temp/bp_local
> _taxonomydb_query.pl
> ./scripts_temp/bp_query_entrez_taxa.pl
> ./scripts_temp/bp_ta
> xid4species.pl ./scripts_temp/bp_blast2tree.pl
> ./scripts_temp/bp_nexus2nh.pl ./s
> cripts_temp/bp_tree2pag.pl
> ./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
> l ./scripts_temp/bp_sreformat.pl
> ./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
> mask_by_search.pl ./scripts_temp/bp_mutate.pl
> ./scripts_temp/bp_pairwise_kaks.pl
>  ./scripts_temp/bp_remote_blast.pl
> ./scripts_temp/bp_search2alnblocks.pl ./scrip
> ts_temp/bp_search2BSML.pl
> ./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
> ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
> C:\mod_perl\Perl\lib\perllocal.
> pod' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> --- Chris Fields  wrote:
> 
> > Upgrade bioperl from CVS using nmake.
> >
> > Installation instructions for using nmake:
> >
> >
> http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> >
> > You can download a tarball using anonymous CVS (link
> > at bottom):
> >
> >
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >
> > or use CVS directly:
> >
> > http://www.bioperl.org/wiki/Using_CVS
> >
> > Then make sure to grab the last SearchIO::last
> > bugfix, which is not in CVS
> > yet:
> >
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >
> > Replace the blast.pm in \site\lib\Bio\SearchIO in
> > your Perl directory.
> >
> > Does that fix it?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 11:22 AM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > Hi All:
> > >
> > > I am new to Perl/BioPerl world.
> > >
> > > I am debugging a program that used to work fine
> > > before.
> > > Blast works fine and returns results, but I am
> > unale
> > > to get any hits from the results.
> > >
> > > Here is the relevant code:
> > >
> > > $blastObj = new Bio::SearchIO
> > (-file=>$resultsFile,
> > > -format=>'blast');
> > >   while (my $result = $blastObj->next_result()) {
> > >      while (my $bioPerlHit = $result->next_hit())
> > {
> > >          .......
> > >
> > >
> > > The first while condition returns true, but the
> > second
> > > while condition returns false. So looks like there
> > is
> > > some result, but it is unable to identify the hits
> > in
> > > the result. I printed the $result (pasted below).
> > >
> > > Any ideas/comments to resolve this? Thanks in
> > advance.
> > >
> > > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> > 1.3.34 on
> > > Windows XP platform.
> > >
> > > Like I said before, this application was running
> > fine
> > > on a different windows machine with similar
> > > environment,so looks like there is some change in
> > the
> > > products/versions that is causing the problem.
> > >
> > > thanks again,
> > > Raghu
> > >
> > >
> > >
> > >
> > > Blast result (i can send complete result if you
> > need
> > > it):
> > >
> > > 

> > > BLASTP 2.2.13 [Nov-27-2005]
> > > Reference: Altschul, Stephen F., Thomas L. Madden,
> > > Alejandro A. Sch?ffer,
> > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> > J.
> > > Lipman
> > > (1997), "Gapped BLAST and PSI-BLAST: a new
> > generation
> > > of
> > > protein database search programs", Nucleic Acids
> > Res.
> > > 25:3389-3402.
> > >
> > > RID: 1140573059-19990-140117828872.BLASTQ1
> > >
> > >
> > > Database: All non-redundant GenBank CDS
> > > translations+PDB+SwissProt+PIR+PRF excluding
> > > environmental samples
> > >            3,297,000 sequences; 1,129,354,045
> > total
> > > letters
> > > Query=
> > > Length=360
> > >
> > >
> > >
> > >             Score     E
> > > Sequences producing significant alignments:
> > >             (Bits)  Value
> > >
> > > ref|XP_534770.2|  PREDICTED: similar to
> > > Mitogen-activated prot...   739    0.0
> > > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > > [synthetic con   739    0.0
> > > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> > Erk2
> > > Complexed...   739    0.0
> > > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> > Complex
> > > With A S...   739    0.0
> > > ref|NP_786987.1|  mitogen-activated protein kinase
> > 1
> > > [Bos taur...   739    0.0
> > > emb|CAA77752.1|  41kD protein kinase [Homo
> > sapiens]
> > > >prf||1813...   738    0.0
> > > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > > [synthetic con   736    0.0
> > > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > > [Homo sapiens]   735    0.0
> > > emb|CAI29602.1|  hypothetical protein [Pongo
> > pygmaeus]
> > >              734    0.0
> > > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > > [Mus muscul...   731    0.0
> > > pdb|4ERK|   The Complex Structure Of The Map
> > Kinase
> > > Erk2OLOMOU...   731    0.0
> > > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> > With An
> > > Arginin...   730    0.0
> > > ref|XP_860750.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   729    0.0
> > > gb|AAK56503.1|  extracellular signal-regulated
> > kinase
> > > 2 [Gallu...   726    0.0
> > > ref|XP_860716.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   726    0.0
> > > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> > >              726    0.0
> > > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> > Map
> > > Kinase Co...   725    0.0
> > > ref|XP_860682.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > ref|XP_860651.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > emb|CAA77753.1|  40kDa protein kinase [Homo
> > sapiens]
> > > >prf||181...   717    0.0
> > > ref|NP_001017127.1|  mitogen-activated protein
> > kinase
> > > 1 [Xenopus    715    0.0
> > > dbj|BAE28679.1|  unnamed protein product [Mus
> > > musculus]             713    0.0
> > > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > > >gb|AAH60748.1| M...   711    0.0
> > > sp|P26696|MK01_XENLA  Mitogen-activated protein
> > kinase
> > > 1 (Myel...   711    0.0
> > > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> > >              706    0.0
> > > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > > [Danio rerio]    696    0.0
> > > dbj|BAD23843.1|  extracellular signal regulated
> > > protein kinase...   694    0.0
> > > ref|NP_878308.2|  mitogen-activated protein kinase
> > 1
> > > [Danio re...   694    0.0
> > > emb|CAG07778.1|  unnamed protein product
> > [Tetraodon
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com




From iamvela at yahoo.com  Wed Feb 22 17:32:08 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 14:32:08 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <001701c637fa$b5110120$15327e82@pyrimidine>
Message-ID: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>

Chris,

Please see my response below.

--- Chris Fields  wrote:

> You know, I assumed you were using ActivePerl b/c of
> the older version of
> Bioperl (and since it?s the most commonly used Perl
> for Windows build).  My
> goof.  It looks like you're using
> Apache/mod_perl/perl, right?  The only
> Perl/Apache/mod_perl combos for Windows I know of
> are listed here:


I am using ActivePerl 5.8.7 downloaded from
activeperl.com. I just happened to install it under
c:\mod_perl\Perl directory (application has hardcoded
dependencies for this directory). I am not using
apache/mod_perl/perl.

Please see below version string returned by perl
exectutable.

 
C:\bioperl-live\bioperl-live>perl -version

This is perl, v5.8.7 built for
MSWin32-x86-multi-thread
(with 14 registered patches, see perl -V for more
detail)

Copyright 1987-2005, Larry Wall

Binary build 815 [211909] provided by ActiveState
http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Nov  2 2005 08:44:52


> 
>
http://perl.apache.org/docs/2.0/os/win32/install.html
> 
> The only Perl for Windows we have actively supported
> is ActivePerl AFAIK,
> but maybe we can walk through this.  Anything
> learned here can be added to
> the installation instructions in case this comes up
> again.
> 
> To start, what mod_perl/Perl version are you using,
> and from what
> distributor (IndigoStar, Apache, etc)?  Each
> distribution should have some
> documentation for installing CPAN modules or
> prebuilt/pretested packages,
> like ActiveState's PPM or IndigoStar's GPM.  I think
> Apache's Perl build is
> from ActiveState's source code so should come with
> PPM.
> 



I used 'ppm' to install packages (DBI, Oracle-DBD,
bioperl etc) before, so this is the first time I tried
to install it using 'nmake' utility.

After downloading the latest bioperl tar ball and
replacing the blast.pm file, can I just do ppm install
bioperl instead of doing nmake?


> Next: you obviously have installed Bioperl before
> (v1.2.3); did you use
> 'make' or 'nmake', or was it from a repository (like
> IndigoPerl's GPM)?
> AFAIK, you would install it like you would any other
> perl module; there
> should be no problem with 'make/nmake', though
> 'make/nmake test' will not
> pass completely (it should pass most tests, though,
> otherwise something is
> seriously wrong).
> 
> The other option, though not as nice, is setting the
> PERL5LIB variable to
> include the bioperl-live directory; it works for me
> while I'm developing. 

I tried setting PERL5LIB, but it did not make any
difference. I am still getting the same errors.


I wanted to a clean install, i tried 'nmake clean',
but looks like there is no 'rm' utility installed on
my machine.

thanks for all your help,
Raghu

> I
> don?t know how this may affect other
> mod_perl-related functions, though.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Raghunath Verabelli
> [mailto:iamvela at yahoo.com]
> > Sent: Wednesday, February 22, 2006 3:07 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Thanks Chris. I am getting below mentioned errors
> with
> > nmake.
> > 
> > As suggested, I downloaded the nmake utility from
> > Microsoft website and the bioperl-live tarball.
> > 
> > After untaring, I replaced the blast.pm file
> (under
> > bioperl-live\Bio\SearchIO) with the blast.pm (86
> KB
> > size) attached to the bug report 1934.
> > 
> > I then did the following to install packages using
> > nmake:
> > 
> > 1) perl Makefile.pl was successful without any
> errors.
> > 
> > 
> > 2) 'c:\nmake' results in following errors
> > 
> >         pl2bat.bat blib\script\bp_unflatten_seq.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_taxid4species.pl blib\script\bp_taxid4species.pl
> >         pl2bat.bat blib\script\bp_taxid4species.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_seqret.pl blib\script\bp_seqret.pl
> >         pl2bat.bat blib\script\bp_seqret.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioscripts.pod
> > Can't open bioscripts.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodatabases.pod
> > Can't open biodatabases.pod: No such file or
> > directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodesign.pod
> > Can't open biodesign.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioperl.pod
> > Can't open bioperl.pod: No such file or directory.
> > 
> > 
> > 3) 'c:\nmake test' fails with following errors:
> > 
> > NMAKE : fatal error U1095: expanded command line
> > 'C:\mod_perl\Perl\bin\perl.exe
> > "-MExtUtils::Command::MM" "-e" "test_harness(0,
> > 'blib\lib', 'blib\arch')" t\AACh
> > ange.t t\AAReverseMutate.t t\abi.t t\ace.t
> t\AlignIO.t
> > t\AlignStats.t t\AlignUti
> > l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> > t\Annotation.t t\AnnotationAdapto
> > r.t t\asciitree.t t\Assembly.t t\Biblio.t
> > t\Biblio_biofetch.t t\Biblio_eutils.t
> > t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> > t\BioGraphics.t t\BlastIndex.t
> >  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t
> t\bsml_sax.t
> > t\Chain.t t\chaosxml.t t\cig
> > arstring.t t\ClusterIO.t t\Coalescent.t
> t\CodonTable.t
> > t\Compatible.t t\consed.t
> >  t\CoordinateGraph.t t\CoordinateMapper.t
> > t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> > .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t
> t\Domcut.t
> > t\ECnumber.t t\ELM.t t\embl
> > .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> > t\entrezgene.t t\ePCR.t t\ESEfind
> > er.t t\est2genome.t t\Exception.t t\Exonerate.t
> > t\exp.t t\fasta.t t\FeatureIO.t
> > t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> > t\gcg.t t\GDB.t t\Gel.t t\genba
> > nk.t t\GeneCoordinateMapper.t t\Geneid.t
> t\Genewise.t
> > t\Genomewise.t t\Genpred.t
> >  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> > t\GuessSeqFormat.t t\hmmer.t t\HNN
> > .t t\HtSNP.t t\Index.t t\InstanceSite.t
> t\interpro.t
> > t\InterProParser.t t\IUPAC.
> > t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> > t\largepseq.t t\LinkageMap.t t\L
> > iveSeq.t t\LocatableSeq.t t\Location.t
> > t\LocationFactory.t t\LocusLink.t t\lucy.
> > t t\Map.t t\MapIO.t t\masta.t t\Matrix.t
> t\Measure.t
> > t\MeSH.t t\metafasta.t t\Me
> > taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> > t\MitoProt.t t\Molphy.t t\Mult
> > iFile.t t\multiple_fasta.t t\Mutation.t
> t\Mutator.t
> > t\NetPhos.t t\Node.t t\OddCo
> > des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> > t\OMIMparser.t t\Ontology.t t\On
> > tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> > t\phd.t t\Phenotype.t t\Phyli
> > pDist.t t\PhysicalMap.t t\pICalculator.t
> t\Pictogram.t
> > t\pir.t t\pln.t t\PopGen.
> > t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> > t\primedseq.t t\Primer.t t\prime
> > r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> > t\ProtMatrix.t t\ProtPsm.t t\Ps
> > eudowise.t t\psm.t t\QRNA.t t\qual.t
> > t\RandDistFunctions.t t\RandomTreeFactory.t
> >  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t
> t\Registry.t
> > t\Relationship.t t\Relatio
> > nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> > t\RestrictionAnalysis.t t\Restricti
> > onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> > t\Root-Utilities.t t\RootI.t t\RootIO
> > .t t\RootStorable.t t\Scansite.t t\scf.t
> > t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> > eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> > t\SeqDiff.t t\SeqFeatCollectio
> > n.t t\SeqFeature.t t\seqfeaturePrimer.t
> > t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
> >  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> > t\sequencetrace.t t\SeqUtils.t
> >  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> > t\Sigcleave.t t\Sim4.t t\Similar
> > ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed Feb 22 19:02:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 18:02:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>
Message-ID: <002101c6380c$75910880$15327e82@pyrimidine>

> 
> I am using ActivePerl 5.8.7 downloaded from
> activeperl.com. I just happened to install it under
> c:\mod_perl\Perl directory (application has hardcoded
> dependencies for this directory). I am not using
> apache/mod_perl/perl.
> 
> Please see below version string returned by perl
> exectutable.
> 
> 
> C:\bioperl-live\bioperl-live>perl -version
> 
> This is perl, v5.8.7 built for
> MSWin32-x86-multi-thread
> (with 14 registered patches, see perl -V for more
> detail)
> 
> Copyright 1987-2005, Larry Wall
> 
> Binary build 815 [211909] provided by ActiveState
> http://www.ActiveState.com
> ActiveState is a division of Sophos.
> Built Nov  2 2005 08:44:52
 
When you type 'perl -V' what do you see (make sure it is a capital 'V', not
lower case).

> http://perl.apache.org/docs/2.0/os/win32/install.html
> >
> > The only Perl for Windows we have actively supported
> > is ActivePerl AFAIK,
> > but maybe we can walk through this.  Anything
> > learned here can be added to
> > the installation instructions in case this comes up
> > again.
> >
> I used 'ppm' to install packages (DBI, Oracle-DBD,
> bioperl etc) before, so this is the first time I tried
> to install it using 'nmake' utility.
>
> After downloading the latest bioperl tar ball and
> replacing the blast.pm file, can I just do ppm install
> bioperl instead of doing nmake?

Okay, so I know you're using PPM now.  No, you can't do that.  I'm adding a
section to this page:

http://bioperl.open-bio.org/wiki/Making_a_BioPerl_release

about building your own PPM; it will explain everything.  It isn't up yet
but should be up tonight or tomorrow.  BTW, you'll still need nmake to work
for this to work.  Again, make sure nmake is in your PATH env variable, or
at least have it in the same directory you plan running 'nmake', 'nmake
install.'  Although nmake is buggy I haven't had a problem with it yet.
 
> > Next: you obviously have installed Bioperl before
> > (v1.2.3); did you use
> > 'make' or 'nmake', or was it from a repository (like
> > IndigoPerl's GPM)?
> > AFAIK, you would install it like you would any other
> > perl module; there
> > should be no problem with 'make/nmake', though
> > 'make/nmake test' will not
> > pass completely (it should pass most tests, though,
> > otherwise something is
> > seriously wrong).
> >
> > The other option, though not as nice, is setting the
> > PERL5LIB variable to
> > include the bioperl-live directory; it works for me
> > while I'm developing.
> 
> I tried setting PERL5LIB, but it did not make any
> difference. I am still getting the same errors.
 
Do you mean the errors from nmake or errors from your scripts?  If PERL5LIB
is set properly then it should parse those directories for modules before it
checks the rest in @INC (i.e. will not need to make and install these using
nmake).  

The reason I don't recommend this is it's not the best habit to get into
installing the entire Bioperl distribution into a folder and using PERL5LIB,
but some are forced to do it this way, so it's there if you need it.  A
direct installation is recommended if possible.

The PERL5LIB I use below only contains modules I'm working on or
modifications of current modules (like SearchIO::blast, RemoteBlast, etc).
Bioperl from CVS is installed via PPM (custom-built PPM, BTW, using the
instructions I mentioned).  

The following is what my PERL5LIB is set to.  Note that it also tells you
what @INC is set to as well:

C:\Perl\src\bioperl\bioperl-live>perl -V
Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef useithreads=define 



  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="C:/Perl/src/bioperl/bioperl-live;
C:/Perl/src/bioperl/bioperl-db"
  @INC:
    C:/Perl/src/bioperl/bioperl-live
     C:/Perl/src/bioperl/bioperl-db
    C:/Perl/lib
    C:/Perl/site/lib
    .



From iamvela at yahoo.com  Wed Feb 22 21:25:02 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 18:25:02 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <002101c6380c$75910880$15327e82@pyrimidine>
Message-ID: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>


Thanks very much Chris for your time.
Please see below output that you requested (the only
difference i saw between your output and mine is @INC
value. I have only 2 directories c:\mod_perl\perl
where i installed activeperl. I see two additional
directories in your @INC path).

>  
> When you type 'perl -V' what do you see (make sure
> it is a capital 'V', not
> lower case).

C:\Documents and Settings\Administrator>perl  -V
Summary of my perl5 (revision 5 version 8 subversion
7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0,
archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef
useithreads=define usemultiplicity=de
fine
    useperlio=define d_sfio=undef uselargefiles=define
usesocks=undef
    use64bitint=undef use64bitall=undef
uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi
-DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED
-DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_
CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO
-DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8,
byteorder=1234
    d_longlong=undef, longlongsize=8,
d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double',
nvsize=8, Off_t='__int64', lseeksi
ze=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug
-opt:ref,icf  -libpath:"C:
\mod_perl\Perl\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib 
netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comd
lg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib  netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes,
libperl=perl58.lib
    gnulibc_version='undef'
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef,
ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo
-nodefaultlib -debug -opt:ref,icf  -
libpath:"C:\mod_perl\Perl\lib\CORE"  -machine:x86'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES
                        USE_SITECUSTOMIZE
PERL_IMPLICIT_CONTEXT
                        PERL_IMPLICIT_SYS
  Locally applied patches:
        ActivePerl Build 815 [211909]
        Iin_load_module moved for compatibility with
build 806
        PerlEx support in CGI::Carp
        Less verbose ExtUtils::Install and Pod::Find
        instmodsh upgraded from
ExtUtils-MakeMaker-6.25
        Patch for CAN-2005-0448 from Debian with
modifications
        Upgrade to Time-HiRes-1.76
        25774 Keys of %INC always use forward slashes
        25747 Accidental interpolation of $@ in
Pod::Html
        25362 File::Path::mkpath resets errno
        25181 Incorrect (X)HTML generated by Pod::Html
        24999 Avoid redefinition warning for MinGW
        24699 ICMP_UNREACHABLE handling in Net::Ping
        21540 Fix backward-compatibility issues in
if.pm
  Built under MSWin32
  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="c:\bioperl-live"
  @INC:
    c:\bioperl-live
    C:/mod_perl/Perl/lib
    C:/mod_perl/Perl/site/lib
    .



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From michael.watson at bbsrc.ac.uk  Thu Feb 23 05:17:39 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 23 Feb 2006 10:17:39 -0000
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503008306@iahce2ksrv1.iah.bbsrc.ac.uk>

What I mean is, you have accession1, which is a contig file referring to
n other sequence files.  Accession1 has a version number.  Is that
version number increased when one of the sequences that constitute it is
updated? 

-----Original Message-----
From: Brian Osborne [mailto:osborne1 at optonline.net] 
Sent: 18 February 2006 04:56
To: michael watson (IAH-C); bioperl-l
Subject: Re: [Bioperl-l] CONTIG sequence files from the NCBI

Michael,

Yes, BioPerl has done this for you. Essentially what it does it take all
the ids in the CONTIG section and query for each individually, then use
the sequences and the location data to create the single large sequence.
This sequence is appended to the annotation and feature section of the
initial Genbank entry. If you want to study this yourself take a look at
Bio::DB::NCBIHelper::postprocess_data.

OK, to answer your first question with my assumption: what NCBI is doing
is simply providing a shorthand rather than an entire large sequence,
therefore no feature coordinates change, whether it's shorthand, CONTIG,
or longhand, ORIGIN. Second, my explanation tells you that all the
sequences are the very latest versions of each sequence, that's how
eutils works by default.
However, I don't think I've answered your question because I'm not sure
I understand what you mean by "when I ask bioperl if these sequences
have been updated, I will be told no". All Bioperl does is read the file
provided by GenBank and use its stated version, nothing fancy.

Brian O.


On 2/16/06 5:31 AM, "michael watson (IAH-C)"

wrote:

> Hi
> 
> I have two questions really.  I fetched bacterial genome sequences 
> from the NCBI using Bio::DB::GenBank.
> 
> Some of these sequence entries are CONTIG sequences, ie they just 
> point to other sequences that need to be joined together to form the 
> entire genome.
> 
> Looking at my downloads, it looks as if bioperl has done all the 
> necessary joining for me - or maybe it was the NCBI that did the 
> joining?
> 
> OK, so firstly, did bioperl do the joining, and if so, are all the 
> co-ordinates of the features updated to reflect their new location on 
> the new, joined sequence?
> 
> And secondly, sequence versions... I'm thinking that possibly the 
> sequence version of the CONTIG may be 1 (as it hasn't changed) yet the

> versions of the sequences it refers to might have changed, so when I 
> ask bioperl if these sequences have been updated, I will be told no 
> because the CONTIG sequence version is 1, but I should be told yes 
> because the underlying sequences have...?
> 
> Make sense?
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From neetisomaiya at gmail.com  Thu Feb 23 05:26:23 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:56:23 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <764978cf0602230226vb907821x5407599bf9accf44@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 05:45:19 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 16:15:19 +0530
Subject: [Bioperl-l] using parameters other than default in standalone blast
Message-ID: <764978cf0602230245m45747fexbb42074a98515177@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlas t->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 05:14:46 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:44:46 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230214r4b2a5efcl69ac207789379416@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 05:13:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:43:10 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive



From cjfields at uiuc.edu  Thu Feb 23 09:39:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 08:39:40 -0600
Subject: [Bioperl-l] urgent help required - syntax for using
	paramatersdifferent from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <000301c63886$fa95eb20$15327e82@pyrimidine>

Have you tried this to see if it works?  The blast report itself should tell
you if everything is set correctly.  Use 'perldoc
Bio::Tools::Run::StandAlone::Blast', which explains everything.  I don't
know if the example script works but the test script StandAloneBlast.t (in
/t) should; that will give you plenty of examples for setting parameters.

And please, don't spam the bioperl-l list with repeated emails (four at last
count over 2 1/2 hours).
 
Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: Thursday, February 23, 2006 4:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] urgent help required - syntax for using
> paramatersdifferent from default in standalone blast
> 
> Hi,
> 
> I am running standalone blast and I wanna use a particular e value, gap
> open
> and extension cost and matrix. Is the following the correct syntax for the
> same :
> 
>                                 my $Seq_in = Bio::SeqIO->new (-file =>
> $file, -format => 'fasta');
>                                 my $query = $Seq_in->next_seq();
>                                 my $factory =
> Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
>                                                  'database' => '
> human.rna.fna',
>                                                  _READMETHOD => "Blast"
>                                                  );
>                                 $factory->e(0.0001);
>                                 $factory->G(-11);
>                                 $factory->E(-1);
>                                 $factory->M('BLOSUM80');
> 
>                                 my $blast_report =
> $factory->blastall($query);
>                                 my $result = $blast_report->next_result;
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Thu Feb 23 10:23:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 09:23:53 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>
Message-ID: <000a01c6388d$281ed010$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 8:25 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> 
> Thanks very much Chris for your time.
> Please see below output that you requested (the only
> difference i saw between your output and mine is @INC
> value. I have only 2 directories c:\mod_perl\perl
> where i installed activeperl. I see two additional
> directories in your @INC path).
> 
> >
> > When you type 'perl -V' what do you see (make sure
> > it is a capital 'V', not
> > lower case).
> 
> C:\Documents and Settings\Administrator>perl  -V
> Summary of my perl5 (revision 5 version 8 subversion
> 7) configuration:
>   Platform:
>     osname=MSWin32, osvers=5.0,
> archname=MSWin32-x86-multi-thread

[....]

> if.pm
>   Built under MSWin32
>   Compiled at Nov  2 2005 08:44:52
>   %ENV:
>     PERL5LIB="c:\bioperl-live"
>   @INC:
>     c:\bioperl-live
>     C:/mod_perl/Perl/lib
>     C:/mod_perl/Perl/site/lib
>     .

Personally I wouldn't place the the bioperl-live folder in the root
directory; this shouldn't make a difference, but you can try moving it to
the perl directory in a separate folder to see if that helps.  Can't see why
it would make a difference, but it is Windows... Main reason I'll switching
over to Mac OS X!

Make sure that the Bio directory is in the bioperl-live directory,
regardless (i.e. if PERL5LIB is set to
C:\mod_perl\Perl\bioperl\bioperl-live, then there should be a directory like
C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it won't work.

What do you get with this?

perl -MBio::Root::Version -e "print $Bio::Root::Version::VERSION"

If everything is working (PERL5LIB, etc) then it should be 1.5 for CVS
bioperl; otherwise it will either find the old version (1.2.3) or fail
completely.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From iamvela at yahoo.com  Thu Feb 23 11:23:56 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 08:23:56 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000a01c6388d$281ed010$15327e82@pyrimidine>
Message-ID: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>

Thanks Chris for all your help.

The patch for blast.pm worked. I was able to parse the
hits from the raw file. I uninstalled previous
versions of bioperl using ppm and then I installed
bioperl 1.4.x using nmake, and applied your fix. I am
getting hits the way I wanted.

However, I noticed that the p-value for each hit
doesn't seem to be parsed
correctly. It sets it to 0 for all hits. Not sure if
this is a known issue. Any suggestions/comments,
please let me know.

Thanks,
Raghu

--- Chris Fields  wrote:

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 8:25 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > 
> > Thanks very much Chris for your time.
> > Please see below output that you requested (the
> only
> > difference i saw between your output and mine is
> @INC
> > value. I have only 2 directories c:\mod_perl\perl
> > where i installed activeperl. I see two additional
> > directories in your @INC path).
> > 
> > >
> > > When you type 'perl -V' what do you see (make
> sure
> > > it is a capital 'V', not
> > > lower case).
> > 
> > C:\Documents and Settings\Administrator>perl  -V
> > Summary of my perl5 (revision 5 version 8
> subversion
> > 7) configuration:
> >   Platform:
> >     osname=MSWin32, osvers=5.0,
> > archname=MSWin32-x86-multi-thread
> 
> [....]
> 
> > if.pm
> >   Built under MSWin32
> >   Compiled at Nov  2 2005 08:44:52
> >   %ENV:
> >     PERL5LIB="c:\bioperl-live"
> >   @INC:
> >     c:\bioperl-live
> >     C:/mod_perl/Perl/lib
> >     C:/mod_perl/Perl/site/lib
> >     .
> 
> Personally I wouldn't place the the bioperl-live
> folder in the root
> directory; this shouldn't make a difference, but you
> can try moving it to
> the perl directory in a separate folder to see if
> that helps.  Can't see why
> it would make a difference, but it is Windows...
> Main reason I'll switching
> over to Mac OS X!
> 
> Make sure that the Bio directory is in the
> bioperl-live directory,
> regardless (i.e. if PERL5LIB is set to
> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> should be a directory like
> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> won't work.
> 
> What do you get with this?
> 
> perl -MBio::Root::Version -e "print
> $Bio::Root::Version::VERSION"
> 
> If everything is working (PERL5LIB, etc) then it
> should be 1.5 for CVS
> bioperl; otherwise it will either find the old
> version (1.2.3) or fail
> completely.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Thu Feb 23 12:41:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 11:41:07 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>
Message-ID: <000301c638a0$53eb9a30$15327e82@pyrimidine>

Yes that's a potential issue.  I'll try to replicate that here; please send
a code example so I can see how you're calling for the p-value.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Thursday, February 23, 2006 10:24 AM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris for all your help.
> 
> The patch for blast.pm worked. I was able to parse the
> hits from the raw file. I uninstalled previous
> versions of bioperl using ppm and then I installed
> bioperl 1.4.x using nmake, and applied your fix. I am
> getting hits the way I wanted.
> 
> However, I noticed that the p-value for each hit
> doesn't seem to be parsed
> correctly. It sets it to 0 for all hits. Not sure if
> this is a known issue. Any suggestions/comments,
> please let me know.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > >
> > > Thanks very much Chris for your time.
> > > Please see below output that you requested (the
> > only
> > > difference i saw between your output and mine is
> > @INC
> > > value. I have only 2 directories c:\mod_perl\perl
> > > where i installed activeperl. I see two additional
> > > directories in your @INC path).
> > >
> > > >
> > > > When you type 'perl -V' what do you see (make
> > sure
> > > > it is a capital 'V', not
> > > > lower case).
> > >
> > > C:\Documents and Settings\Administrator>perl  -V
> > > Summary of my perl5 (revision 5 version 8
> > subversion
> > > 7) configuration:
> > >   Platform:
> > >     osname=MSWin32, osvers=5.0,
> > > archname=MSWin32-x86-multi-thread
> >
> > [....]
> >
> > > if.pm
> > >   Built under MSWin32
> > >   Compiled at Nov  2 2005 08:44:52
> > >   %ENV:
> > >     PERL5LIB="c:\bioperl-live"
> > >   @INC:
> > >     c:\bioperl-live
> > >     C:/mod_perl/Perl/lib
> > >     C:/mod_perl/Perl/site/lib
> > >     .
> >
> > Personally I wouldn't place the the bioperl-live
> > folder in the root
> > directory; this shouldn't make a difference, but you
> > can try moving it to
> > the perl directory in a separate folder to see if
> > that helps.  Can't see why
> > it would make a difference, but it is Windows...
> > Main reason I'll switching
> > over to Mac OS X!
> >
> > Make sure that the Bio directory is in the
> > bioperl-live directory,
> > regardless (i.e. if PERL5LIB is set to
> > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > should be a directory like
> > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > won't work.
> >
> > What do you get with this?
> >
> > perl -MBio::Root::Version -e "print
> > $Bio::Root::Version::VERSION"
> >
> > If everything is working (PERL5LIB, etc) then it
> > should be 1.5 for CVS
> > bioperl; otherwise it will either find the old
> > version (1.2.3) or fail
> > completely.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Thu Feb 23 13:06:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:06:37 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000301c638a0$53eb9a30$15327e82@pyrimidine>
Message-ID: <000401c638a3$e37fb520$15327e82@pyrimidine>

Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-mill NCBI
blast report these days gives e-values (expectation value), NOT p-values.  I
think they changed over to using only e-values with BLAST v2.  Make sure you
didn't mix these up; look out the text output to make sure that P values are
present.  That would explain why you're getting 0, since they don't exist.

>From the BLAST tutorial:

The BLAST programs report E-value rather than P-values because it is easier
to understand the difference between, for example, E-value of 5 and 10 than
P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and E-value
are nearly identical.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 23, 2006 11:41 AM
> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Yes that's a potential issue.  I'll try to replicate that here; please
> send
> a code example so I can see how you're calling for the p-value.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> > Sent: Thursday, February 23, 2006 10:24 AM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> >
> > Thanks Chris for all your help.
> >
> > The patch for blast.pm worked. I was able to parse the
> > hits from the raw file. I uninstalled previous
> > versions of bioperl using ppm and then I installed
> > bioperl 1.4.x using nmake, and applied your fix. I am
> > getting hits the way I wanted.
> >
> > However, I noticed that the p-value for each hit
> > doesn't seem to be parsed
> > correctly. It sets it to 0 for all hits. Not sure if
> > this is a known issue. Any suggestions/comments,
> > please let me know.
> >
> > Thanks,
> > Raghu
> >
> > --- Chris Fields  wrote:
> >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > > Verabelli
> > > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] Blast returns result, but
> > > does not return hits
> > > >
> > > >
> > > > Thanks very much Chris for your time.
> > > > Please see below output that you requested (the
> > > only
> > > > difference i saw between your output and mine is
> > > @INC
> > > > value. I have only 2 directories c:\mod_perl\perl
> > > > where i installed activeperl. I see two additional
> > > > directories in your @INC path).
> > > >
> > > > >
> > > > > When you type 'perl -V' what do you see (make
> > > sure
> > > > > it is a capital 'V', not
> > > > > lower case).
> > > >
> > > > C:\Documents and Settings\Administrator>perl  -V
> > > > Summary of my perl5 (revision 5 version 8
> > > subversion
> > > > 7) configuration:
> > > >   Platform:
> > > >     osname=MSWin32, osvers=5.0,
> > > > archname=MSWin32-x86-multi-thread
> > >
> > > [....]
> > >
> > > > if.pm
> > > >   Built under MSWin32
> > > >   Compiled at Nov  2 2005 08:44:52
> > > >   %ENV:
> > > >     PERL5LIB="c:\bioperl-live"
> > > >   @INC:
> > > >     c:\bioperl-live
> > > >     C:/mod_perl/Perl/lib
> > > >     C:/mod_perl/Perl/site/lib
> > > >     .
> > >
> > > Personally I wouldn't place the the bioperl-live
> > > folder in the root
> > > directory; this shouldn't make a difference, but you
> > > can try moving it to
> > > the perl directory in a separate folder to see if
> > > that helps.  Can't see why
> > > it would make a difference, but it is Windows...
> > > Main reason I'll switching
> > > over to Mac OS X!
> > >
> > > Make sure that the Bio directory is in the
> > > bioperl-live directory,
> > > regardless (i.e. if PERL5LIB is set to
> > > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > > should be a directory like
> > > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > > won't work.
> > >
> > > What do you get with this?
> > >
> > > perl -MBio::Root::Version -e "print
> > > $Bio::Root::Version::VERSION"
> > >
> > > If everything is working (PERL5LIB, etc) then it
> > > should be 1.5 for CVS
> > > bioperl; otherwise it will either find the old
> > > version (1.2.3) or fail
> > > completely.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason.stajich at duke.edu  Thu Feb 23 13:29:57 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 23 Feb 2006 13:29:57 -0500
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000401c638a3$e37fb520$15327e82@pyrimidine>
References: <000401c638a3$e37fb520$15327e82@pyrimidine>
Message-ID: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>

p-values do show up in WU-BLAST reports so that is why we have a p- 
value function.


On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:

> Hold up a second.  Do you mean e-value, or p-value?  A run-of-the- 
> mill NCBI
> blast report these days gives e-values (expectation value), NOT p- 
> values.  I
> think they changed over to using only e-values with BLAST v2.  Make  
> sure you
> didn't mix these up; look out the text output to make sure that P  
> values are
> present.  That would explain why you're getting 0, since they don't  
> exist.
>
>> From the BLAST tutorial:
>
> The BLAST programs report E-value rather than P-values because it  
> is easier
> to understand the difference between, for example, E-value of 5 and  
> 10 than
> P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and  
> E-value
> are nearly identical.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 23, 2006 11:41 AM
>> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Blast returns result, but does not return  
>> hits
>>
>> Yes that's a potential issue.  I'll try to replicate that here;  
>> please
>> send
>> a code example so I can see how you're calling for the p-value.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
>>> Sent: Thursday, February 23, 2006 10:24 AM
>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] Blast returns result, but does not  
>>> return hits
>>>
>>> Thanks Chris for all your help.
>>>
>>> The patch for blast.pm worked. I was able to parse the
>>> hits from the raw file. I uninstalled previous
>>> versions of bioperl using ppm and then I installed
>>> bioperl 1.4.x using nmake, and applied your fix. I am
>>> getting hits the way I wanted.
>>>
>>> However, I noticed that the p-value for each hit
>>> doesn't seem to be parsed
>>> correctly. It sets it to 0 for all hits. Not sure if
>>> this is a known issue. Any suggestions/comments,
>>> please let me know.
>>>
>>> Thanks,
>>> Raghu
>>>
>>> --- Chris Fields  wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
>>>> Verabelli
>>>>> Sent: Wednesday, February 22, 2006 8:25 PM
>>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] Blast returns result, but
>>>> does not return hits
>>>>>
>>>>>
>>>>> Thanks very much Chris for your time.
>>>>> Please see below output that you requested (the
>>>> only
>>>>> difference i saw between your output and mine is
>>>> @INC
>>>>> value. I have only 2 directories c:\mod_perl\perl
>>>>> where i installed activeperl. I see two additional
>>>>> directories in your @INC path).
>>>>>
>>>>>>
>>>>>> When you type 'perl -V' what do you see (make
>>>> sure
>>>>>> it is a capital 'V', not
>>>>>> lower case).
>>>>>
>>>>> C:\Documents and Settings\Administrator>perl  -V
>>>>> Summary of my perl5 (revision 5 version 8
>>>> subversion
>>>>> 7) configuration:
>>>>>   Platform:
>>>>>     osname=MSWin32, osvers=5.0,
>>>>> archname=MSWin32-x86-multi-thread
>>>>
>>>> [....]
>>>>
>>>>> if.pm
>>>>>   Built under MSWin32
>>>>>   Compiled at Nov  2 2005 08:44:52
>>>>>   %ENV:
>>>>>     PERL5LIB="c:\bioperl-live"
>>>>>   @INC:
>>>>>     c:\bioperl-live
>>>>>     C:/mod_perl/Perl/lib
>>>>>     C:/mod_perl/Perl/site/lib
>>>>>     .
>>>>
>>>> Personally I wouldn't place the the bioperl-live
>>>> folder in the root
>>>> directory; this shouldn't make a difference, but you
>>>> can try moving it to
>>>> the perl directory in a separate folder to see if
>>>> that helps.  Can't see why
>>>> it would make a difference, but it is Windows...
>>>> Main reason I'll switching
>>>> over to Mac OS X!
>>>>
>>>> Make sure that the Bio directory is in the
>>>> bioperl-live directory,
>>>> regardless (i.e. if PERL5LIB is set to
>>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
>>>> should be a directory like
>>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
>>>> won't work.
>>>>
>>>> What do you get with this?
>>>>
>>>> perl -MBio::Root::Version -e "print
>>>> $Bio::Root::Version::VERSION"
>>>>
>>>> If everything is working (PERL5LIB, etc) then it
>>>> should be 1.5 for CVS
>>>> bioperl; otherwise it will either find the old
>>>> version (1.2.3) or fail
>>>> completely.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Thu Feb 23 13:34:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:34:19 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>
Message-ID: <000501c638a7$c2802630$15327e82@pyrimidine>

I think Raghu's running NCBI BLAST, though.  Am I right? 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Thursday, February 23, 2006 12:30 PM
> To: Chris Fields
> Cc: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> p-values do show up in WU-BLAST reports so that is why we have a p-
> value function.
> 
> 
> On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> 
> > Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-
> > mill NCBI
> > blast report these days gives e-values (expectation value), NOT p-
> > values.  I
> > think they changed over to using only e-values with BLAST v2.  Make
> > sure you
> > didn't mix these up; look out the text output to make sure that P
> > values are
> > present.  That would explain why you're getting 0, since they don't
> > exist.
> >
> >> From the BLAST tutorial:
> >
> > The BLAST programs report E-value rather than P-values because it
> > is easier
> > to understand the difference between, for example, E-value of 5 and
> > 10 than
> > P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and
> > E-value
> > are nearly identical.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, February 23, 2006 11:41 AM
> >> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Blast returns result, but does not return
> >> hits
> >>
> >> Yes that's a potential issue.  I'll try to replicate that here;
> >> please
> >> send
> >> a code example so I can see how you're calling for the p-value.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab
> >> Dept. of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> >>> Sent: Thursday, February 23, 2006 10:24 AM
> >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>> Subject: Re: [Bioperl-l] Blast returns result, but does not
> >>> return hits
> >>>
> >>> Thanks Chris for all your help.
> >>>
> >>> The patch for blast.pm worked. I was able to parse the
> >>> hits from the raw file. I uninstalled previous
> >>> versions of bioperl using ppm and then I installed
> >>> bioperl 1.4.x using nmake, and applied your fix. I am
> >>> getting hits the way I wanted.
> >>>
> >>> However, I noticed that the p-value for each hit
> >>> doesn't seem to be parsed
> >>> correctly. It sets it to 0 for all hits. Not sure if
> >>> this is a known issue. Any suggestions/comments,
> >>> please let me know.
> >>>
> >>> Thanks,
> >>> Raghu
> >>>
> >>> --- Chris Fields  wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
> >>>> Verabelli
> >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] Blast returns result, but
> >>>> does not return hits
> >>>>>
> >>>>>
> >>>>> Thanks very much Chris for your time.
> >>>>> Please see below output that you requested (the
> >>>> only
> >>>>> difference i saw between your output and mine is
> >>>> @INC
> >>>>> value. I have only 2 directories c:\mod_perl\perl
> >>>>> where i installed activeperl. I see two additional
> >>>>> directories in your @INC path).
> >>>>>
> >>>>>>
> >>>>>> When you type 'perl -V' what do you see (make
> >>>> sure
> >>>>>> it is a capital 'V', not
> >>>>>> lower case).
> >>>>>
> >>>>> C:\Documents and Settings\Administrator>perl  -V
> >>>>> Summary of my perl5 (revision 5 version 8
> >>>> subversion
> >>>>> 7) configuration:
> >>>>>   Platform:
> >>>>>     osname=MSWin32, osvers=5.0,
> >>>>> archname=MSWin32-x86-multi-thread
> >>>>
> >>>> [....]
> >>>>
> >>>>> if.pm
> >>>>>   Built under MSWin32
> >>>>>   Compiled at Nov  2 2005 08:44:52
> >>>>>   %ENV:
> >>>>>     PERL5LIB="c:\bioperl-live"
> >>>>>   @INC:
> >>>>>     c:\bioperl-live
> >>>>>     C:/mod_perl/Perl/lib
> >>>>>     C:/mod_perl/Perl/site/lib
> >>>>>     .
> >>>>
> >>>> Personally I wouldn't place the the bioperl-live
> >>>> folder in the root
> >>>> directory; this shouldn't make a difference, but you
> >>>> can try moving it to
> >>>> the perl directory in a separate folder to see if
> >>>> that helps.  Can't see why
> >>>> it would make a difference, but it is Windows...
> >>>> Main reason I'll switching
> >>>> over to Mac OS X!
> >>>>
> >>>> Make sure that the Bio directory is in the
> >>>> bioperl-live directory,
> >>>> regardless (i.e. if PERL5LIB is set to
> >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> >>>> should be a directory like
> >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> >>>> won't work.
> >>>>
> >>>> What do you get with this?
> >>>>
> >>>> perl -MBio::Root::Version -e "print
> >>>> $Bio::Root::Version::VERSION"
> >>>>
> >>>> If everything is working (PERL5LIB, etc) then it
> >>>> should be 1.5 for CVS
> >>>> bioperl; otherwise it will either find the old
> >>>> version (1.2.3) or fail
> >>>> completely.
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher - Switzer Lab
> >>>> Dept. of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >>>
> >>> __________________________________________________
> >>> Do You Yahoo!?
> >>> Tired of spam?  Yahoo! Mail has the best spam protection around
> >>> http://mail.yahoo.com
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12




From iamvela at yahoo.com  Thu Feb 23 14:33:50 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 11:33:50 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000501c638a7$c2802630$15327e82@pyrimidine>
Message-ID: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>

Chris, you are right. I am using NCBI BLAST.

Here is my http query:

my $urltext =
"http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=nr&PROGRAM=blastp";

This is my code for populating p-value:

my $pValue = $bioPerlHit->significance;


I looked at the text output, could not find any p
value column, the only 'value' column in the output is
'E value'. I will try that.

Thanks,
Raghu
 
--- Chris Fields  wrote:

> I think Raghu's running NCBI BLAST, though.  Am I
> right? 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Jason Stajich
> [mailto:jason.stajich at duke.edu]
> > Sent: Thursday, February 23, 2006 12:30 PM
> > To: Chris Fields
> > Cc: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > p-values do show up in WU-BLAST reports so that is
> why we have a p-
> > value function.
> > 
> > 
> > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > 
> > > Hold up a second.  Do you mean e-value, or
> p-value?  A run-of-the-
> > > mill NCBI
> > > blast report these days gives e-values
> (expectation value), NOT p-
> > > values.  I
> > > think they changed over to using only e-values
> with BLAST v2.  Make
> > > sure you
> > > didn't mix these up; look out the text output to
> make sure that P
> > > values are
> > > present.  That would explain why you're getting
> 0, since they don't
> > > exist.
> > >
> > >> From the BLAST tutorial:
> > >
> > > The BLAST programs report E-value rather than
> P-values because it
> > > is easier
> > > to understand the difference between, for
> example, E-value of 5 and
> > > 10 than
> > > P-values of 0.993 and 0.99995. However, when E <
> 0.01, P-values and
> > > E-value
> > > are nearly identical.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris
> Fields
> > >> Sent: Thursday, February 23, 2006 11:41 AM
> > >> To: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Blast returns result,
> but does not return
> > >> hits
> > >>
> > >> Yes that's a potential issue.  I'll try to
> replicate that here;
> > >> please
> > >> send
> > >> a code example so I can see how you're calling
> for the p-value.
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher - Switzer Lab
> > >> Dept. of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath Verabelli
> > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>> Subject: Re: [Bioperl-l] Blast returns result,
> but does not
> > >>> return hits
> > >>>
> > >>> Thanks Chris for all your help.
> > >>>
> > >>> The patch for blast.pm worked. I was able to
> parse the
> > >>> hits from the raw file. I uninstalled previous
> > >>> versions of bioperl using ppm and then I
> installed
> > >>> bioperl 1.4.x using nmake, and applied your
> fix. I am
> > >>> getting hits the way I wanted.
> > >>>
> > >>> However, I noticed that the p-value for each
> hit
> > >>> doesn't seem to be parsed
> > >>> correctly. It sets it to 0 for all hits. Not
> sure if
> > >>> this is a known issue. Any
> suggestions/comments,
> > >>> please let me know.
> > >>>
> > >>> Thanks,
> > >>> Raghu
> > >>>
> > >>> --- Chris Fields  wrote:
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > >>>> [mailto:bioperl-l-
> > >>>>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath
> > >>>> Verabelli
> > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > >>>>> To: Chris Fields;
> bioperl-l at lists.open-bio.org
> > >>>>> Subject: Re: [Bioperl-l] Blast returns
> result, but
> > >>>> does not return hits
> > >>>>>
> > >>>>>
> > >>>>> Thanks very much Chris for your time.
> > >>>>> Please see below output that you requested
> (the
> > >>>> only
> > >>>>> difference i saw between your output and
> mine is
> > >>>> @INC
> > >>>>> value. I have only 2 directories
> c:\mod_perl\perl
> > >>>>> where i installed activeperl. I see two
> additional
> > >>>>> directories in your @INC path).
> > >>>>>
> > >>>>>>
> > >>>>>> When you type 'perl -V' what do you see
> (make
> > >>>> sure
> > >>>>>> it is a capital 'V', not
> > >>>>>> lower case).
> > >>>>>
> > >>>>> C:\Documents and Settings\Administrator>perl
>  -V
> > >>>>> Summary of my perl5 (revision 5 version 8
> > >>>> subversion
> > >>>>> 7) configuration:
> > >>>>>   Platform:
> > >>>>>     osname=MSWin32, osvers=5.0,
> > >>>>> archname=MSWin32-x86-multi-thread
> > >>>>
> > >>>> [....]
> > >>>>
> > >>>>> if.pm
> > >>>>>   Built under MSWin32
> > >>>>>   Compiled at Nov  2 2005 08:44:52
> > >>>>>   %ENV:
> > >>>>>     PERL5LIB="c:\bioperl-live"
> > >>>>>   @INC:
> > >>>>>     c:\bioperl-live
> > >>>>>     C:/mod_perl/Perl/lib
> > >>>>>     C:/mod_perl/Perl/site/lib
> > >>>>>     .
> > >>>>
> > >>>> Personally I wouldn't place the the
> bioperl-live
> > >>>> folder in the root
> > >>>> directory; this shouldn't make a difference,
> but you
> > >>>> can try moving it to
> > >>>> the perl directory in a separate folder to
> see if
> > >>>> that helps.  Can't see why
> > >>>> it would make a difference, but it is
> Windows...
> > >>>> Main reason I'll switching
> > >>>> over to Mac OS X!
> > >>>>
> > >>>> Make sure that the Bio directory is in the
> > >>>> bioperl-live directory,
> > >>>> regardless (i.e. if PERL5LIB is set to
> > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> there
> > >>>> should be a directory like
> > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> it
> > >>>> won't work.
> > >>>>
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Thu Feb 23 16:11:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 15:11:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>
Message-ID: <000301c638bd$bc9eb590$15327e82@pyrimidine>

I think you want $hit->expect (for hits) or $hsp->evalue (for HSPs).
$hit->significance (for NCBI blast) gives the values from the descriptions
(the score and expect) for each hit.

If you want to see what methods are available for any given object (in this
case Bio::Search::Hit::BlastHit ot Bio::Search::HSP::BlastHSP), use the
below script from the bioperl FAQ (use PPM to install Class::Inspector
first) and pass the object module name on the command line.  Be careful as
many of these are get/sets (so don't pass any args).
----------------------------------
#!perl
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-methods($class,'full','public')}),
"\n";
----------------------------------
You should get something like this:

C:\Perl\Scripts>methods.pl Bio::Search::Hit::BlastHit
Bio::Root::Root::DESTROY
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::Search::Hit::BlastHit::expect
Bio::Search::Hit::BlastHit::found_again
Bio::Search::Hit::BlastHit::iteration
Bio::Search::Hit::BlastHit::new
Bio::Search::Hit::GenericHit::accession
Bio::Search::Hit::GenericHit::add_hsp
Bio::Search::Hit::GenericHit::algorithm
Bio::Search::Hit::GenericHit::ambiguous_aln
Bio::Search::Hit::GenericHit::bits
Bio::Search::Hit::GenericHit::description
Bio::Search::Hit::GenericHit::each_accession_number
Bio::Search::Hit::GenericHit::end
Bio::Search::Hit::GenericHit::frac_aligned_hit
Bio::Search::Hit::GenericHit::frac_aligned_query
Bio::Search::Hit::GenericHit::frac_conserved
Bio::Search::Hit::GenericHit::frac_identical
Bio::Search::Hit::GenericHit::frame
Bio::Search::Hit::GenericHit::gaps
Bio::Search::Hit::GenericHit::hsp
Bio::Search::Hit::GenericHit::hsps
Bio::Search::Hit::GenericHit::length
Bio::Search::Hit::GenericHit::length_aln
Bio::Search::Hit::GenericHit::locus
Bio::Search::Hit::GenericHit::logical_length
Bio::Search::Hit::GenericHit::matches
Bio::Search::Hit::GenericHit::n
Bio::Search::Hit::GenericHit::name
Bio::Search::Hit::GenericHit::next_hsp
Bio::Search::Hit::GenericHit::num_hsps
Bio::Search::Hit::GenericHit::num_unaligned_hit
Bio::Search::Hit::GenericHit::num_unaligned_query
Bio::Search::Hit::GenericHit::num_unaligned_sbjct
Bio::Search::Hit::GenericHit::overlap
Bio::Search::Hit::GenericHit::p
Bio::Search::Hit::GenericHit::query_length
Bio::Search::Hit::GenericHit::range
Bio::Search::Hit::GenericHit::rank
Bio::Search::Hit::GenericHit::raw_score
Bio::Search::Hit::GenericHit::rewind
Bio::Search::Hit::GenericHit::score
Bio::Search::Hit::GenericHit::seq_inds
Bio::Search::Hit::GenericHit::significance
Bio::Search::Hit::GenericHit::start
Bio::Search::Hit::GenericHit::strand
Bio::Search::Hit::GenericHit::tiled_hsps
Bio::Search::Hit::HitI::hit_description
Bio::Search::Hit::HitI::hit_length

Nice, huh?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Thursday, February 23, 2006 1:34 PM
> To: Chris Fields; 'Jason Stajich'
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Blast returns result, but does not return hits
> 
> Chris, you are right. I am using NCBI BLAST.
> 
> Here is my http query:
> 
> my $urltext =
> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=n
> r&PROGRAM=blastp";
> 
> This is my code for populating p-value:
> 
> my $pValue = $bioPerlHit->significance;
> 
> 
> I looked at the text output, could not find any p
> value column, the only 'value' column in the output is
> 'E value'. I will try that.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > I think Raghu's running NCBI BLAST, though.  Am I
> > right?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > > -----Original Message-----
> > > From: Jason Stajich
> > [mailto:jason.stajich at duke.edu]
> > > Sent: Thursday, February 23, 2006 12:30 PM
> > > To: Chris Fields
> > > Cc: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > p-values do show up in WU-BLAST reports so that is
> > why we have a p-
> > > value function.
> > >
> > >
> > > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > >
> > > > Hold up a second.  Do you mean e-value, or
> > p-value?  A run-of-the-
> > > > mill NCBI
> > > > blast report these days gives e-values
> > (expectation value), NOT p-
> > > > values.  I
> > > > think they changed over to using only e-values
> > with BLAST v2.  Make
> > > > sure you
> > > > didn't mix these up; look out the text output to
> > make sure that P
> > > > values are
> > > > present.  That would explain why you're getting
> > 0, since they don't
> > > > exist.
> > > >
> > > >> From the BLAST tutorial:
> > > >
> > > > The BLAST programs report E-value rather than
> > P-values because it
> > > > is easier
> > > > to understand the difference between, for
> > example, E-value of 5 and
> > > > 10 than
> > > > P-values of 0.993 and 0.99995. However, when E <
> > 0.01, P-values and
> > > > E-value
> > > > are nearly identical.
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Chris
> > Fields
> > > >> Sent: Thursday, February 23, 2006 11:41 AM
> > > >> To: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > >> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not return
> > > >> hits
> > > >>
> > > >> Yes that's a potential issue.  I'll try to
> > replicate that here;
> > > >> please
> > > >> send
> > > >> a code example so I can see how you're calling
> > for the p-value.
> > > >>
> > > >> Christopher Fields
> > > >> Postdoctoral Researcher - Switzer Lab
> > > >> Dept. of Biochemistry
> > > >> University of Illinois Urbana-Champaign
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath Verabelli
> > > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > > >>> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not
> > > >>> return hits
> > > >>>
> > > >>> Thanks Chris for all your help.
> > > >>>
> > > >>> The patch for blast.pm worked. I was able to
> > parse the
> > > >>> hits from the raw file. I uninstalled previous
> > > >>> versions of bioperl using ppm and then I
> > installed
> > > >>> bioperl 1.4.x using nmake, and applied your
> > fix. I am
> > > >>> getting hits the way I wanted.
> > > >>>
> > > >>> However, I noticed that the p-value for each
> > hit
> > > >>> doesn't seem to be parsed
> > > >>> correctly. It sets it to 0 for all hits. Not
> > sure if
> > > >>> this is a known issue. Any
> > suggestions/comments,
> > > >>> please let me know.
> > > >>>
> > > >>> Thanks,
> > > >>> Raghu
> > > >>>
> > > >>> --- Chris Fields  wrote:
> > > >>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > > >>>> [mailto:bioperl-l-
> > > >>>>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath
> > > >>>> Verabelli
> > > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > > >>>>> To: Chris Fields;
> > bioperl-l at lists.open-bio.org
> > > >>>>> Subject: Re: [Bioperl-l] Blast returns
> > result, but
> > > >>>> does not return hits
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks very much Chris for your time.
> > > >>>>> Please see below output that you requested
> > (the
> > > >>>> only
> > > >>>>> difference i saw between your output and
> > mine is
> > > >>>> @INC
> > > >>>>> value. I have only 2 directories
> > c:\mod_perl\perl
> > > >>>>> where i installed activeperl. I see two
> > additional
> > > >>>>> directories in your @INC path).
> > > >>>>>
> > > >>>>>>
> > > >>>>>> When you type 'perl -V' what do you see
> > (make
> > > >>>> sure
> > > >>>>>> it is a capital 'V', not
> > > >>>>>> lower case).
> > > >>>>>
> > > >>>>> C:\Documents and Settings\Administrator>perl
> >  -V
> > > >>>>> Summary of my perl5 (revision 5 version 8
> > > >>>> subversion
> > > >>>>> 7) configuration:
> > > >>>>>   Platform:
> > > >>>>>     osname=MSWin32, osvers=5.0,
> > > >>>>> archname=MSWin32-x86-multi-thread
> > > >>>>
> > > >>>> [....]
> > > >>>>
> > > >>>>> if.pm
> > > >>>>>   Built under MSWin32
> > > >>>>>   Compiled at Nov  2 2005 08:44:52
> > > >>>>>   %ENV:
> > > >>>>>     PERL5LIB="c:\bioperl-live"
> > > >>>>>   @INC:
> > > >>>>>     c:\bioperl-live
> > > >>>>>     C:/mod_perl/Perl/lib
> > > >>>>>     C:/mod_perl/Perl/site/lib
> > > >>>>>     .
> > > >>>>
> > > >>>> Personally I wouldn't place the the
> > bioperl-live
> > > >>>> folder in the root
> > > >>>> directory; this shouldn't make a difference,
> > but you
> > > >>>> can try moving it to
> > > >>>> the perl directory in a separate folder to
> > see if
> > > >>>> that helps.  Can't see why
> > > >>>> it would make a difference, but it is
> > Windows...
> > > >>>> Main reason I'll switching
> > > >>>> over to Mac OS X!
> > > >>>>
> > > >>>> Make sure that the Bio directory is in the
> > > >>>> bioperl-live directory,
> > > >>>> regardless (i.e. if PERL5LIB is set to
> > > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> > there
> > > >>>> should be a directory like
> > > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> > it
> > > >>>> won't work.
> > > >>>>
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com



From cain at cshl.edu  Wed Feb 22 09:36:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 22 Feb 2006 09:36:54 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
Message-ID: <1140619014.3142.81.camel@localhost.localdomain>

Hi Dave,

I don't know if this helps at all, but you could think of that 45 tick
mark as the termination, since the space between the 44th and the 45th
tick mark corresponds to your 44th residue.  I suppose it is a matter of
correctly training your users :-)

Scott


On Wed, 2006-02-22 at 10:20 +0000, Dave Howorth wrote:
> Lincoln Stein wrote:
> > Hi Dave,
> > 
> > Well, when you are using 1-based coordinates, an line that contains 44 
> > intervals will have 45 ticks. If you move to 0-based coordinates, then the 
> > first tick will be labeled 0 and the last tick will be labeled 44. An 
> > alternative is to make each base dimensionless, but that becomes a problem 
> > when dealing with single base features, such as SNPs.
>  >
> > These issues are why I have long advocated for interbase coordinates
> > in which you number the positions between bases rather than the bases
> > themselves.
> 
> I see your point but I need to work with the coordinates that the users 
> expect and are familiar with. (Things get much worse with PDB residue 
> numbering :)
> 
> > Draw me the picture of what you expect to see. I think of it this way:
> > 
> > 	1    2  3  4   5   6
> >          A>G>C>T>A>
> 
> I guess something went wrong with your ASCII art :(
> 
> OK, consider a 44-residue entry from SwissProt (P12239):
> 
>    TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR
> 
> The first T is numbered 1 and the last R is numbered 44.
> 
> So I expect to see a line with 44 positions indicated somehow (whether 
> these are half-open intervals or points on the line), with the number 1 
> at the left end and the number 44 at the right end.
> 
> An important point is that if I then place other tracks below this one 
> that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, 
> they should align properly (according to whatever convention is used to 
> represent a residue).
> 
> For a short sequence like this it would be possible to use letters to 
> represent the residue but I'd like to use the same convention for longer 
> sequences as well and have everything be consistent.
> 
> I'm hoping Bio:Graphics will make this easy.
> 
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hlapp at gnf.org  Thu Feb 23 21:10:13 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 23 Feb 2006 18:10:13 -0800
Subject: [Bioperl-l] [BioSQL-l] Load seqfeature from biosql database
	with perl
In-Reply-To: <1140744561.2888.19.camel@alien>
Message-ID: 

Yes, kudos to you for figuring this out yourself, and you actually figured
out the more difficult way. I apologize for my delay in responding, I was
tied up this morning and last night.

You got the first key step right, namely obtaining the right persistence
adaptor. This step determines which object you get back.

Your query will work, and in fact will be equally fast as the simple
solution (which is simple only because it is simpler to code, not because
the internally executed query is simpler). The simple solution is that every
Bio::DB::PersistenceAdaptorI implementing object (i.e., any object you get
back from $db->get_object_adaptor(..)) has a method
$adp->find_by_primary_key(). So, using that method:

    $feature = $adaptor->find_by_primary_key($seqfeature_id);

You can also control the type of object to be created (so long as it is a
Bio::SeqFeatureI) by passing in an object factory in addition.

BTW as an aside, using the finder method will also make the object cache
used for lookup first if the cache is enabled. It doesn't matter for seq
features because due to the potentially large number of objects the cache is
not enabled by default for this adaptor.

    -hilmar  

On 2/23/06 5:29 PM, "Michael Cipriano"  wrote:

> Ah, I think I figured it out.
> 
> my $seqfeature_id = '401138';
> my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");
> 
> my $query = Bio::DB::Query::BioQuery->new(
> 
> -datacollections=>["Bio::SeqFeatureI t1"],
>                                         -where => ["t1.Bio::SeqFeatureI
> = ?"]);
> 
> my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
> SEQ',
> 
> -values=>[$seqfeature_id]);
> 
> while(my $loc = $qres->next_object())
> {
>         my $obj = $loc;
> 
>         print $obj->primary_key() . "\n";
>         print 'location:' . $obj->location->to_FTstring() . "\n";
>         $obj->add_tag_value("test", "moretest");
>         foreach my $tag ($obj->get_all_tags())
>         {
>                 print " Values for tag $tag: ";
>                 print join(' ',$obj->get_tag_values($tag));
>                 print "\n";
>         }
>         print "------------------\n";
> 
> }
> 
> 
> 
> This seems to work
> On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
>> Hello BioSQLers,
>> 
>> I have a simple question (I hope), Can I easily load a seqfeature from a
>> biosql database into a perl Bio::SeqFeatureI object?  I have the
>> database value for the  seqfeature.seqfeature_id and would like to load
>> it using this alone.
>> 
>> I do not want to have to load the whole bioentry object then search for
>> the feature, I just want the feature object since the bioentry is a
>> whole genome and loading that will take more time then necessary.
>> 
>> I have searched the documentation and have even tried looking through
>> the code for the modules, but could not find an easy fast method.
>> 
>> Please reply directly to me as well as the list as I am not a list
>> member.
>> 
>> Thanks for your help,
>> 
>> 
>> Michael Cipriano
>> 
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




From praveecbt at yahoo.co.in  Fri Feb 24 00:57:22 2006
From: praveecbt at yahoo.co.in (Praveen Raj)
Date: Fri, 24 Feb 2006 05:57:22 +0000 (GMT)
Subject: [Bioperl-l] Problem in BioPerl. Help!
Message-ID: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>

Dear sir,
   
           I have one problem in using Bioperl module 'Clustalw.pm'.
Clustalw creates SimpleAlign object as output,isn't it?.
  I successfully convert the object into 'clustal' and 'phylip' format using a
  file handler.
Sir, I want to make a newick format( for phylogenetic tree ) from the object itself.
But I know that Standalone Clustalw creates a newick file(.dnd extension) as an output along with 
the .aln file.
When I created a 'clustal' format and printed into a web page, it look like this;
   
  CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence format is Pearson
Sequence 1: >gi|dengue2|           13 aa
Sequence 2: >gi|yellowfever|       13 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  15
Guide tree        file created:   [\tXGgJDIuZZ\jmIerlkHz7.dnd]
Start of Multiple Alignment
There are 1 groups
...............
   
  I don't know where the .dnd file(it's in newick format) is created.
It's not in the current directory.
Is there any method to specify the path for the .dnd file?
  I have gone through all the documentation provided with the BioPerl & clustalw.
  
How can I create a 'newick' output(.dnd file) format from a SimpleAlign object,created by Clustalw.pm?
   
  It's a great benefit for me, if you provide a solution for the same.
I can't move forward without a solution for this.
  So, Please reply...
   
                                    Thanking you,
                                                   Praveen Raj(student).
                                                   National Institute of Virology,   
                                                   Pune. India

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.


From roy at colibase.bham.ac.uk  Fri Feb 24 10:51:46 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Fri, 24 Feb 2006 15:51:46 +0000
Subject: [Bioperl-l] Problem in BioPerl. Help!
In-Reply-To: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
References: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
Message-ID: <43FF2B92.9090801@colibase.bham.ac.uk>

Praveen Raj wrote:
> Sir, I want to make a newick format( for phylogenetic tree ) from the
> object itself. But I know that Standalone Clustalw creates a newick
> file(.dnd extension) as an output along with the .aln file.

Be careful with this. The .dnd files produced by ClustalW contain a 
Newick format guide tree- produced from pairwise-aligned sequences to 
guide the multiple alignment process. This should not be confused with a 
phylogenetic analysis, and the .dnd file is usually best ignored.

ClustalW can be used to produce a true phylogenetic tree from the 
alignment using the Neighbor-joining method (see the menus and 
documentation for details). This method produces files with a .ph or 
.phb extension (.phb if the tree is bootstrapped). I'm not sure if this 
process can be done using BioPerl, but it is possible to do using 
ClustalW's command line flags, so if you need to automate the process 
you could use Perl's system command. If you want to use BioPerl you can 
use the Phylip program neighbor to generate your tree directly from a 
SimpleAlign object, using the module 
Bio::Tools::Run::Phylo::Phylip::Neighbor.

Cheers.
Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk




From perlmails at gmail.com  Sun Feb 26 06:51:37 2006
From: perlmails at gmail.com (perlmails at gmail.com)
Date: Sun, 26 Feb 2006 17:21:37 +0530
Subject: [Bioperl-l] extract ncDNA
Message-ID: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>

Dear Bioperl group,

I have been working on extracting non-coding DNA (ncDNA) sequences
from an organimsm.

I tried extracting the intergenic sequences from the sense-strand
after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
the EMBL feature table entries using the Bioperl and the additional
script (mentioned below).

Now, I realised that there is a problem to extract the ncDNA sequences
from the negative-strand, Any ideas?

To extract the ncDNAs from negative-strand, I thought of converting
the negative-strand co-ordinates to sense-strand co-ordinates and
adding these to the sense-strand cords. Then filter all the features
(select the ncDNAs after discarding the features from EMBL FT) to get
all the ncDNAs.

Is there anything I am missing for using from the bioperl kit?

##<<>
use strict;

my $EMBL_cord_file = "Organism.feature.cords";  # feature
co-ordinates: start \t end
my $RAW_file = "Organism.raw";
my $ncDNA_file = "Organism.ncDNA";

open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
open(RAW, $RAW_file) or die "Canot open RAW_file";
open(OUT, ">$ncDNA_file") or die;

my @dna=;
my $dna = join('', at dna);

while($dna){
	$dna=~s/\s//g;
	while(){
		my @cords = split /\t/;
		my	$start = $cords[0];
		my	$end = $cords[1];
		my $replaceString = "\n>$start..$end";
		substr($dna, $start-1, $end-$start+1, $replaceString);
}
	print OUT $dna,"\n";
	exit;
}
##<<>

Another thing is, since I am reading the whole file in a scalar the
script does not complete the extraction of all ncDNAs from the
sense-strand. Obviously, the features are parsed first before the
flattening of the 266,000 nt sequence into a single string.

Any help would be appreciated.

-PO



From cjfields at uiuc.edu  Sun Feb 26 09:12:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 26 Feb 2006 08:12:57 -0600
Subject: [Bioperl-l] extract ncDNA
In-Reply-To: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
References: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
Message-ID: 

You're not using bioperl.  See:

http://www.bioperl.org/wiki/HOWTO:Beginners

then go to:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Chris


On Feb 26, 2006, at 5:51 AM, perlmails at gmail.com wrote:

> Dear Bioperl group,
>
> I have been working on extracting non-coding DNA (ncDNA) sequences
> from an organimsm.
>
> I tried extracting the intergenic sequences from the sense-strand
> after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
> the EMBL feature table entries using the Bioperl and the additional
> script (mentioned below).
>
> Now, I realised that there is a problem to extract the ncDNA sequences
> from the negative-strand, Any ideas?
>
> To extract the ncDNAs from negative-strand, I thought of converting
> the negative-strand co-ordinates to sense-strand co-ordinates and
> adding these to the sense-strand cords. Then filter all the features
> (select the ncDNAs after discarding the features from EMBL FT) to get
> all the ncDNAs.
>
> Is there anything I am missing for using from the bioperl kit?
>
> ##<<>
> use strict;
>
> my $EMBL_cord_file = "Organism.feature.cords";  # feature
> co-ordinates: start \t end
> my $RAW_file = "Organism.raw";
> my $ncDNA_file = "Organism.ncDNA";
>
> open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
> open(RAW, $RAW_file) or die "Canot open RAW_file";
> open(OUT, ">$ncDNA_file") or die;
>
> my @dna=;
> my $dna = join('', at dna);
>
> while($dna){
> 	$dna=~s/\s//g;
> 	while(){
> 		my @cords = split /\t/;
> 		my	$start = $cords[0];
> 		my	$end = $cords[1];
> 		my $replaceString = "\n>$start..$end";
> 		substr($dna, $start-1, $end-$start+1, $replaceString);
> }
> 	print OUT $dna,"\n";
> 	exit;
> }
> ##<<>
>
> Another thing is, since I am reading the whole file in a scalar the
> script does not complete the extraction of all ncDNAs from the
> sense-strand. Obviously, the features are parsed first before the
> flattening of the 266,000 nt sequence into a single string.
>
> Any help would be appreciated.
>
> -PO
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From saldroubi at yahoo.com  Sun Feb 26 15:15:14 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Sun, 26 Feb 2006 12:15:14 -0800 (PST)
Subject: [Bioperl-l] Is it worth it?
Message-ID: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>

Hello everyone,
   
  Please forgive me for posting my questions on this list since they are not directly related to bioperl but since most of you are doing bioinformatics, I thought I could ask for some advise.  Also, please point me to other lists or websites if more appropriate. 
   
  Basically I am wondering if it is worth it getting a Master or PhD degree in bioinformatics with funding?  I already have an MS degree in Software Engineering and I've take a few bioinformatics courses and I like the field.  Additionally, I am almost 40 years old and have a stable job.  If I am to get PhD in 3 to 4 years, what job opportunities will be out there for me?  And is it better to work in academia or the private sector?  What the average salary like?
   
  Thank you very much and please respond to me directly instead of of the list since my questions are off topic.
   
   


Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From joel at macresearcher.com  Sun Feb 26 22:12:12 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Sun, 26 Feb 2006 20:12:12 -0700
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
References: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
Message-ID: 

It seems to me that your mind is already made up. By asking such a  
question I think it's safe to say a PhD program in Bioinformatics  
would not be your cup of tea. This is not to be negative. If you like  
bioinformatics, do bioinformatics. Join an open-source project, or  
start one of your own. If you live in a town with a University, find  
a lab that needs bioinformatics work and volunteer your time. If you  
really have a passion for bioinformatics, just do bioinformatics and  
your path will become clear, opportunities will arise, your salary  
will be what you need. Just my two shekels of course.

- Joel

On Feb 26, 2006, at 1:15 PM, Sam Al-Droubi wrote:

> Hello everyone,
>
>   Please forgive me for posting my questions on this list since  
> they are not directly related to bioperl but since most of you are  
> doing bioinformatics, I thought I could ask for some advise.  Also,  
> please point me to other lists or websites if more appropriate.
>
>   Basically I am wondering if it is worth it getting a Master or  
> PhD degree in bioinformatics with funding?  I already have an MS  
> degree in Software Engineering and I've take a few bioinformatics  
> courses and I like the field.  Additionally, I am almost 40 years  
> old and have a stable job.  If I am to get PhD in 3 to 4 years,  
> what job opportunities will be out there for me?  And is it better  
> to work in academia or the private sector?  What the average salary  
> like?
>
>   Thank you very much and please respond to me directly instead of  
> of the list since my questions are off topic.
>
>
>
>
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From sdavis2 at mail.nih.gov  Mon Feb 27 06:39:27 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 27 Feb 2006 06:39:27 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
Message-ID: 




On 2/26/06 10:12 PM, "Joel Dudley"  wrote:

> It seems to me that your mind is already made up. By asking such a
> question I think it's safe to say a PhD program in Bioinformatics
> would not be your cup of tea. This is not to be negative. If you like
> bioinformatics, do bioinformatics. Join an open-source project, or
> start one of your own. If you live in a town with a University, find
> a lab that needs bioinformatics work and volunteer your time. If you
> really have a passion for bioinformatics, just do bioinformatics and
> your path will become clear, opportunities will arise, your salary
> will be what you need. Just my two shekels of course.

I would second this sentiment.  Most of the folks that I know that are doing
bioinformatics are doing it WITHOUT a degree in it.  The trick is to have
both computational skills AND domain-specific knowledge.  Just find a
project that will require you to gain some domain-specific knowledge (which
can actually happen pretty quickly) and go for it.  As Joel said, there are
dozens of open source projects that would love a helping hand.  If you need
more face-time, do as Joel suggests and work with a local university (or
even high school) to design some web-based tools or something like that to
do things that would be either educational or novel.

Sean




From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb 27 05:40:19 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 27 Feb 2006 10:40:19 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602221340.28573.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	<1140625762.3142.107.camel@localhost.localdomain>	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
	<200602221340.28573.lstein@cshl.edu>
Message-ID: <4402D713.2050007@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> I have just committed a version of the arrow.pm glyph that has a 
> -label_intervals flag.

Thanks Lincoln,

I've edited your new version so it displays the tick labels pretty much 
as I need. My changes were to the first and last label and to move the 
position of the others a little. I hope that it behaves exactly like 
your version unless label_intervals is set. I've attached my edited version.

There's still an oddity with the number of minor ticks at the start and 
end of the line (I've seen 7, 8, and 9 minor intervals at the start of 
the line as well as 10) but I'll probably ignore that for now.

Thanks, Dave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arrow.pm
Type: application/x-perl
Size: 16357 bytes
Desc: not available
URL: 

From boris.steipe at utoronto.ca  Mon Feb 27 10:42:54 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 27 Feb 2006 10:42:54 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
References: 
Message-ID: <56C842D6-18AD-40B0-AE9A-47A29AE83F1D@utoronto.ca>

I'd put I slightly different emphasis on this: obviously most of  
those in the field can't have a degree in bioinformatics because such  
degree programs haven't been around for all that long. One shouldn't  
conclude that graduate programs are therefore somehow less relevant.  
To successfully apply for a paid job, you need credentials for your  
ability to be productive.

Credentials can come from open source projects IF you can document  
the scope and quality of your contributions.

Credentials can come from a graduate degree IF your thesis appears  
relevant, original and well executed.

Credentials can come from peer-reviewed publications.

Credentials can come from personal references of collaborators.



Regards,
B.

On 27 Feb 2006, at 06:39, Sean Davis wrote:

>
>
>
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
>
>> It seems to me that your mind is already made up. By asking such a
>> question I think it's safe to say a PhD program in Bioinformatics
>> would not be your cup of tea. This is not to be negative. If you like
>> bioinformatics, do bioinformatics. Join an open-source project, or
>> start one of your own. If you live in a town with a University, find
>> a lab that needs bioinformatics work and volunteer your time. If you
>> really have a passion for bioinformatics, just do bioinformatics and
>> your path will become clear, opportunities will arise, your salary
>> will be what you need. Just my two shekels of course.
>
> I would second this sentiment.  Most of the folks that I know that  
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is  
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific  
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said,  
> there are
> dozens of open source projects that would love a helping hand.  If  
> you need
> more face-time, do as Joel suggests and work with a local  
> university (or
> even high school) to design some web-based tools or something like  
> that to
> do things that would be either educational or novel.
>
> Sean
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From slenk at emich.edu  Mon Feb 27 16:07:38 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Mon, 27 Feb 2006 16:07:38 -0500
Subject: [Bioperl-l] Is it worth it?
Message-ID: <556d070556f727.556f727556d070@emich.edu>

Gee golly ollie, this is good advice. I face the same issues, but am much older (53). I am taking a Sloan MS in 
Bioinformatics while working full time at the car parts company. I bring what I have newly learned at school to 
work (Perl especially, in which I build and share tools even as far away as exotic India (smile)). I take what I have 
from work (discipline, experience, work ethic) and apply it to open source and shared school projects. The 
world has given me a lot; I enjoy giving back. Why not take an MS in Biology/Bioinformatics at your pace and 
see where it leads. I have no idea if I will EVER have a JOB in Bioinformatics, so I just live it day by day. Plug 
follows - see MCPrimers at CPAN for PCR primer design for molecular cloning with site-directed mutagenesis. I 
did this as an outgrowth of a Rectech class I took. 



----- Original Message -----
From: Sean Davis 
Date: Monday, February 27, 2006 6:39 am
Subject: Re: [Bioperl-l] Is it worth it?

> 
> 
> 
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
> 
> > It seems to me that your mind is already made up. By asking such a
> > question I think it's safe to say a PhD program in Bioinformatics
> > would not be your cup of tea. This is not to be negative. If you 
> like> bioinformatics, do bioinformatics. Join an open-source 
> project, or
> > start one of your own. If you live in a town with a University, find
> > a lab that needs bioinformatics work and volunteer your time. If you
> > really have a passion for bioinformatics, just do bioinformatics and
> > your path will become clear, opportunities will arise, your salary
> > will be what you need. Just my two shekels of course.
> 
> I would second this sentiment.  Most of the folks that I know that 
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is 
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific 
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said, 
> there are
> dozens of open source projects that would love a helping hand.  If 
> you need
> more face-time, do as Joel suggests and work with a local 
> university (or
> even high school) to design some web-based tools or something like 
> that to
> do things that would be either educational or novel.
> 
> Sean
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From joel at macresearcher.com  Mon Feb 27 20:56:13 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Mon, 27 Feb 2006 18:56:13 -0700
Subject: [Bioperl-l] BioPerlers Represent!
Message-ID: 

Hey list,
	The contest to fill the script repository at MacResearch.org is  
ending very soon. Thus far we've only received a paltry three  
submissions with PERL scripts. The contest take home prize is a black  
iPod nano (2GB) so if you've got anything lying around that you'd  
like to share I'd suggest zipping it up and adding it to the script  
repository. Full contest details can be viewed here:

http://www.macresearch.org/ipod_contest

Now before get ready to smack me with your anti-spam cudgel, or shake  
your fist in my general direction, please note that MacResearch.org  
is completely non-profit, existing only to aid and foster community  
for scientists using OS X. I gain nothing personally by attracting  
BioPerl scripts to the repository but I'd love to see Perl well  
represented. Thanks for understanding.

- Joel


From jforment at ibmcp.upv.es  Tue Feb 28 07:17:59 2006
From: jforment at ibmcp.upv.es (Javier Forment)
Date: Tue, 28 Feb 2006 13:17:59 +0100
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for hit
 or for hsp objects?
Message-ID: <44043F77.1010901@ibmcp.upv.es>

Hi bioperlers... I have some questions when parsing BLAST results.

As far as I know, bioperl documentation for Bio::SearchIO states that 
frac_identical and frac_conserved are methods for hsp objects (e.g., 
$hsp->frac_identical). I have found that it is also possible to use 
these methods for hit objects (e.g., $hit->frac_identical), since it 
does not give an error, but in this case they don't work properly (I 
think that they work fine with blastn, but not with blastx). So my 
questions are:

1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
2.- if so, how they get the frac_identical for a hit when it has more 
than one HSP (maybe getting the average value for all the hsps)?
3.- if so, why they don't work fine sometimes, for example, with blastx?
4.- if not, is there any method to get the fraction of identical or 
conserved residues for a hit, other than averaging the corresponding 
values for all the hsps of this hit?

Thanks a lot in advance,

Javier.

-- 
Javier Forment Millet
Unidad de Bioinformatica del Laboratorio de Genomica
Instituto de Biologia Molecular y Celular de Plantas
Universidad Politecnica de Valencia
Avenida de los Naranjos, s/n
46022 Valencia (Spain)
Tlf.(1): +34-963877885
Tlf.(2): 685142553
FAX: +34-963877859
e-mail: jforment at ibmcp.upv.es


From jason.stajich at duke.edu  Tue Feb 28 08:31:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 08:31:00 -0500
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for
	hit or for hsp objects?
In-Reply-To: <44043F77.1010901@ibmcp.upv.es>
References: <44043F77.1010901@ibmcp.upv.es>
Message-ID: 

Personally, I only use these values from HSPs - the Hit methods  
require HSPs to be tiled to summarize the bases and I'm not convinced  
the method works for all situations.

If you want it summarized to a single value for query/hit pair I  
would use FASTA or use WU-BLAST to if you must use BLAST, get the  
links path out and summarize it on a set of HSPs paths.

-jason
On Feb 28, 2006, at 7:17 AM, Javier Forment wrote:

> Hi bioperlers... I have some questions when parsing BLAST results.
>
> As far as I know, bioperl documentation for Bio::SearchIO states that
> frac_identical and frac_conserved are methods for hsp objects (e.g.,
> $hsp->frac_identical). I have found that it is also possible to use
> these methods for hit objects (e.g., $hit->frac_identical), since it
> does not give an error, but in this case they don't work properly (I
> think that they work fine with blastn, but not with blastx). So my
> questions are:
>
> 1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
> 2.- if so, how they get the frac_identical for a hit when it has more
> than one HSP (maybe getting the average value for all the hsps)?
> 3.- if so, why they don't work fine sometimes, for example, with  
> blastx?
> 4.- if not, is there any method to get the fraction of identical or
> conserved residues for a hit, other than averaging the corresponding
> values for all the hsps of this hit?
>
> Thanks a lot in advance,
>
> Javier.
>
> -- 
> Javier Forment Millet
> Unidad de Bioinformatica del Laboratorio de Genomica
> Instituto de Biologia Molecular y Celular de Plantas
> Universidad Politecnica de Valencia
> Avenida de los Naranjos, s/n
> 46022 Valencia (Spain)
> Tlf.(1): +34-963877885
> Tlf.(2): 685142553
> FAX: +34-963877859
> e-mail: jforment at ibmcp.upv.es
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From julioallen at hotmail.com  Tue Feb 28 08:22:14 2006
From: julioallen at hotmail.com (James Allen)
Date: Tue, 28 Feb 2006 13:22:14 +0000
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
Message-ID: 

Hello,
I'm retrieving data using the 'features' method of Bio::DB::GFF, and when 
the feature is on the reverse strand (ie = -1) the start and end points are 
flipped, so that 'feature->end' is the smaller number (ie what I consider 
the start point) and 'feature->start' is the larger number.
Is there anyway to prevent this behaviour, so that the start value of my 
feature is the same as the start value in my database, regardless of the 
strand?

Thanks,
Julio




From ewijaya at singnet.com.sg  Tue Feb 28 05:01:23 2006
From: ewijaya at singnet.com.sg (Edward WIJAYA)
Date: Tue, 28 Feb 2006 18:01:23 +0800
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file (Fasta)
	into Array
Message-ID: 

Hi,

Does Bio::SeqIO has a method  specially designed for
reading all the sequences from a fasta file into array.

What I have currently is this subroutine, it seems to me
__very inefficient__. I was wondering
is there a better way to achieve it.


sub get_sequence_from_fasta {
     my $file = shift;
     my @seqs= ();

     open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->new(-format => 'fasta',
                              -noclose => 1 ,
                              -fh => \*INFILE);

     while ( my $seq = $in->next_seq() ) {
        push @seqs, $seq->seq();
     }
     return @seqs;
}


BTW, I also have tried to do this. I thought
this might be a better way to do the above job.
but it doesn't work.

sub get_sequence_from_fasta_that_doesnot_work {
     my $file = shift;
      open my fh, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
     return <$in>;
}

Hope to hear from you again.

--
Regards,
Edward WIJAYA
SINGAPORE


From lstein at cshl.edu  Tue Feb 28 10:08:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 28 Feb 2006 10:08:27 -0500
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
In-Reply-To: 
References: 
Message-ID: <200602281008.28373.lstein@cshl.edu>

Call the absolute(1) method, which turns off relative addressing.

Lincoln

On Tuesday 28 February 2006 08:22, James Allen wrote:
> Hello,
> I'm retrieving data using the 'features' method of Bio::DB::GFF, and when
> the feature is on the reverse strand (ie = -1) the start and end points are
> flipped, so that 'feature->end' is the smaller number (ie what I consider
> the start point) and 'feature->start' is the larger number.
> Is there anyway to prevent this behaviour, so that the start value of my
> feature is the same as the start value in my database, regardless of the
> strand?
>
> Thanks,
> Julio
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason.stajich at duke.edu  Tue Feb 28 12:36:34 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 12:36:34 -0500
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file
	(Fasta) into Array
In-Reply-To: 
References: 
Message-ID: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>


On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:

> Hi,
>
> Does Bio::SeqIO has a method  specially designed for
> reading all the sequences from a fasta file into array.
>
no but feel free to contribute one.
> What I have currently is this subroutine, it seems to me
> __very inefficient__. I was wondering
> is there a better way to achieve it.
>
Do you have a reason to think this is the slow part of your algorithm  
or are you just going on a gut reaction?  There is certainly overhead  
in calling a method but I am pretty sure that it isn't that  
significant, depends on how many sequences you are reading in I guess.

Just write a next_seq_array method and have it put the seqs onto an  
array within the method and do a benchmark test to show that it is  
faster.

-jason
>
> sub get_sequence_from_fasta {
>      my $file = shift;
>      my @seqs= ();
>
>      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->new(-format => 'fasta',
>                               -noclose => 1 ,
>                               -fh => \*INFILE);
>
>      while ( my $seq = $in->next_seq() ) {
>         push @seqs, $seq->seq();
>      }
>      return @seqs;
> }
>
>
> BTW, I also have tried to do this. I thought
> this might be a better way to do the above job.
> but it doesn't work.
>
> sub get_sequence_from_fasta_that_doesnot_work {
>      my $file = shift;
>       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
>      return <$in>;
> }
>
> Hope to hear from you again.
>
> --
> Regards,
> Edward WIJAYA
> SINGAPORE
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Tue Feb 28 13:50:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 28 Feb 2006 12:50:50 -0600
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file(Fasta)
	into Array
In-Reply-To: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>
Message-ID: <002001c63c97$e57f20c0$15327e82@pyrimidine>

Is there any particular reason why you aren't opening the file directly with
Bio::SeqIO?  

 sub get_sequence_from_fasta {
      my $file = shift;
      my @seqs= ();
      my $in = Bio::SeqIO->new(-format => 'fasta',
                               -file => "<$file");
      while ( my $seq = $in->next_seq() ) {
         push @seqs, $seq->seq();
      }
      return @seqs;
 }

I'm not completely sure of your intent here, but I think if you want to use
a globbed filehandle this way you need to open the file before entering the
sub then pass the filehandle to the sub.  I'm not sure why you pass the file
name, open the file, attach the file handle, parse the seqs, then return an
array?  Or am I missing something here?

Also, read:

http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

which explains that loading arrays can be memory-intensive if the seqs are
big.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Tuesday, February 28, 2006 11:37 AM
> To: Edward WIJAYA
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence
> file(Fasta) into Array
> 
> 
> On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:
> 
> > Hi,
> >
> > Does Bio::SeqIO has a method  specially designed for
> > reading all the sequences from a fasta file into array.
> >
> no but feel free to contribute one.
> > What I have currently is this subroutine, it seems to me
> > __very inefficient__. I was wondering
> > is there a better way to achieve it.
> >
> Do you have a reason to think this is the slow part of your algorithm
> or are you just going on a gut reaction?  There is certainly overhead
> in calling a method but I am pretty sure that it isn't that
> significant, depends on how many sequences you are reading in I guess.
> 
> Just write a next_seq_array method and have it put the seqs onto an
> array within the method and do a benchmark test to show that it is
> faster.
> 
> -jason
> >
> > sub get_sequence_from_fasta {
> >      my $file = shift;
> >      my @seqs= ();
> >
> >      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->new(-format => 'fasta',
> >                               -noclose => 1 ,
> >                               -fh => \*INFILE);
> >
> >      while ( my $seq = $in->next_seq() ) {
> >         push @seqs, $seq->seq();
> >      }
> >      return @seqs;
> > }
> >
> >
> > BTW, I also have tried to do this. I thought
> > this might be a better way to do the above job.
> > but it doesn't work.
> >
> > sub get_sequence_from_fasta_that_doesnot_work {
> >      my $file = shift;
> >       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
> >      return <$in>;
> > }
> >
> > Hope to hear from you again.
> >
> > --
> > Regards,
> > Edward WIJAYA
> > SINGAPORE
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From pterry2 at unlnotes.unl.edu  Tue Feb 28 13:53:11 2006
From: pterry2 at unlnotes.unl.edu (Philip M Terry)
Date: Tue, 28 Feb 2006 12:53:11 -0600
Subject: [Bioperl-l] Bioperl use question
Message-ID: 


Hello,

Is this an appropriate mailing list for this question?

I am trying Test 4 from the Tisdale book, p-299, "Mastering Perl for
Bioinformatics".

Comparing screen output from p-303 of the Tisdale book for bp1.pl with
mine:

philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$ ./bp1.pl
Sequence name is AI129902
Sequence acc  is AI129902
First 5 bases is CTCCG

-------------------- WARNING ---------------------
MSG: acc (gb|3598416) does not exist
---------------------------------------------------
Submitted Blast for [ROA1_HUMAN]
philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$

Two questions:
i. why the warning message in my screen output?
ii. my Blast fails, that is,
--I don't see "dots" on the output line on screen following "Submitted
Blast for [ROA1_HUMAN]"?
--my output file, blast.out has 0 KB in it?

My computer system:
Power Mac G5, OS X 10.4.5, installed "core" bioperl, that is,
sudo perl -MCPAN -e shell;
cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz

Can you comment?

Thanks,
Philip M. Terry, Ph.D.
University of Nebraska-Lincoln



From staffa at niehs.nih.gov  Tue Feb 28 15:01:42 2006
From: staffa at niehs.nih.gov (staffa)
Date: Tue, 28 Feb 2006 15:01:42 -0500
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
Message-ID: 

Hello,
Does anyone know if Bio::Tools::SeqWords
count_words
or
count_overlap_words
will do DNA pattern searches and honor ambiguity symbols
like exist in some restriction enzyme pattern definitions,
e.g. GGnnCC


> Thank you.
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1028 bytes
Desc: not available
URL: 

From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 16:45:16 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 08:45:16 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: 
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
	
Message-ID: <4404C46C.4010005@infotech.monash.edu.au>

Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 17:01:38 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:01:38 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
Message-ID: <4404C842.2050608@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From staffa at niehs.nih.gov  Tue Feb 28 16:46:30 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 16:46:30 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>

Yes 
N matches any of the four bases.

Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 4:45 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] seq_word and pattern counts


Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From staffa at niehs.nih.gov  Tue Feb 28 17:08:40 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 17:08:40 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>

The real problem is this:
We want to count sites in a long sequence where a restriction enzyme would cut.
This restriction enzyme, in the example I gave will recognize GGnnCC,
that is two G separated by two of any bases followed by two C.

The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
I'm sure there is some module somewhere for this purpose. 





Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 5:02 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l
Subject: Re: [Bioperl-l] seq_word and pattern counts


Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 17:47:01 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:47:01 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
Message-ID: <4404D2E5.4090405@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> The real problem is this:
> We want to count sites in a long sequence where a restriction enzyme would cut.
> This restriction enzyme, in the example I gave will recognize GGnnCC,
> that is two G separated by two of any bases followed by two C.
> The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
> I'm sure there is some module somewhere for this purpose. 

(Nick - please respond to me AND the bioperl-l at bioperl.org mailing list 
ie. "Reply All", so others can benefit from the Q&A - I've re-sent your 
past responses already).

Perhaps this module?

http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html

With this code?

my $enz = "GGNNCC";
my $re = new Bio::Tools::RestrictionEnzyme(-NAME =>"NicksResEnz--$enz",
	  			  	 -MAKE =>'custom');
@fragments = $re->cut_seq($seqobj);
print "$enz cuts ", $seqobj->display_id, " ", scalar(@fragments), " 
times.\n";

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From cjfields at uiuc.edu  Tue Feb 28 21:41:08 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 28 Feb 2006 20:41:08 -0600
Subject: [Bioperl-l] WGS sequences through Bio::DB::GenBank
Message-ID: <000001c63cd9$98988520$15327e82@pyrimidine>

I know that a recent post showed that you could retrieve CONTIG sequences
from GenBank files fairly easily:

http://bioperl.org/pipermail/bioperl-l/2006-February/020891.html

I'm driving myself a bit buggy looking for this, and I may be blind to it,
but can the same be done with WGS files?  I've tried Bio::DB::GenBank and a
few other Bio::DB* modules to see if it's been implemented but haven't had
any luck yet.  I may try getting around it using Bio::DB::Query::GenBank,
but just trying to find a more direct route.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From chandan.kr.singh at gmail.com  Thu Feb  2 07:26:09 2006
From: chandan.kr.singh at gmail.com (CHANDAN SINGH)
Date: Thu, 2 Feb 2006 12:56:09 +0530
Subject: [Bioperl-l] Sorry, failure in post on the net,
	so still via email
In-Reply-To: <001001c62793$bef08f70$93656785@zhur>
References: <001001c62793$bef08f70$93656785@zhur>
Message-ID: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>

Hi
It seems that its not a proxy problem. I tried today and faced the same
problem. It has been months since my last try and therefore something might
have changed.
Try reading more on this problem.
I myself will try to do it.
Regards
Chandan

On 2/2/06, Huang Jian  wrote:
>
> I tried  some "Quick getting started scripts" in bptutorial.
>
> use Bio::Perl;
>   $seq = get_sequence('swiss',"ROA1_HUMAN");
>   # uses the default database - nr in this case
>   $blast_result = blast_sequence($seq);
>   write_blast(">roa1.blast",$blast_result);
>
> It returns "Submitted Blast for [ROA1_HUMAN] "
> It does not return me any error after I run the script.  However, it does
> not
> return me any result either.  The file "roa1.blast" is created but is
> always
> empty.
>
> I found the return is like the code below in function "blast_sequence"
>  if( $verbose ) {
>  print STDERR "Submitted Blast for [".$seq->id."] ";
>     }
>     sleep 5;
> ....
> I have tested "( env_proxy => 1 )" ...The problem remains the same...
>
> Help! By the way, could you send me an invitation letter of gmail, I want
> to have a gmail account too... :-)
>
> Best Regards!
> Jian Huang
>
>



From osborne1 at optonline.net  Thu Feb  2 22:06:25 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 02 Feb 2006 17:06:25 -0500
Subject: [Bioperl-l] Sorry, failure in post on the net,
	so still via email
In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
Message-ID: 

Chandan,

I'd be interested in what you find. This is not a new problem, this same
code snippet has been mentioned many times, but for many others, like me,
the code always works.

Brian O.


On 2/2/06 2:26 AM, "CHANDAN SINGH"  wrote:

> Hi
> It seems that its not a proxy problem. I tried today and faced the same
> problem. It has been months since my last try and therefore something might
> have changed.
> Try reading more on this problem.
> I myself will try to do it.
> Regards
> Chandan
> 
> On 2/2/06, Huang Jian  wrote:
>> 
>> I tried  some "Quick getting started scripts" in bptutorial.
>> 
>> use Bio::Perl;
>>   $seq = get_sequence('swiss',"ROA1_HUMAN");
>>   # uses the default database - nr in this case
>>   $blast_result = blast_sequence($seq);
>>   write_blast(">roa1.blast",$blast_result);
>> 
>> It returns "Submitted Blast for [ROA1_HUMAN] "
>> It does not return me any error after I run the script.  However, it does
>> not
>> return me any result either.  The file "roa1.blast" is created but is
>> always
>> empty.
>> 
>> I found the return is like the code below in function "blast_sequence"
>>  if( $verbose ) {
>>  print STDERR "Submitted Blast for [".$seq->id."] ";
>>     }
>>     sleep 5;
>> ....
>> I have tested "( env_proxy => 1 )" ...The problem remains the same...
>> 
>> Help! By the way, could you send me an invitation letter of gmail, I want
>> to have a gmail account too... :-)
>> 
>> Best Regards!
>> Jian Huang
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From nagesh.chakka at anu.edu.au  Fri Feb  3 01:23:50 2006
From: nagesh.chakka at anu.edu.au (Nagesh Chakka)
Date: Fri, 03 Feb 2006 12:23:50 +1100
Subject: [Bioperl-l] RemoteBlast.pm version 1.28
In-Reply-To: <003901c6285e$d1b36670$93656785@zhur>
References: 
	<43E28C39.2060308@anu.edu.au> <003901c6285e$d1b36670$93656785@zhur>
Message-ID: <43E2B0A6.7000307@anu.edu.au>

Hi Huang,
Thanks for the message. The older version of RemoteBlast.pm works on the 
logic of checking the temporary file size to determine whether the Blast 
results are ready. This condition is not getting satisfied may be due to 
some changes brought about by NCBI. I had this problem recently and 
figured out that the solution was to use the latest version which has 
this problem fixed (does not use file size logic any more) which is not 
yet included in the BioPerl package.
Cheers
Nagesh

Huang Jian wrote:

> Dear Nagesh,
>
> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send 
> me. Now it works perfectly!!!
>
> Thank you!!
>
> Huang
>
> ----- Original Message ----- From: "Nagesh Chakka" 
> 
> To: "Huang Jian" ; "bioperl-l" 
> 
> Sent: Friday, February 03, 2006 7:48 AM
> Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still 
> via email
>
>
>> Hi Huang,
>> I see that you are submitting a sequence for a remote blast search. Can
>> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
>> not I have attached it with this email, try to replace it with the old
>> one which has a bug.
>> Let me know if it works.
>> Nagesh
>
>
>
   


From cjfields at uiuc.edu  Fri Feb  3 15:45:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 09:45:23 -0600
Subject: [Bioperl-l] RemoteBlast.pm version 1.28
In-Reply-To: <43E2B0A6.7000307@anu.edu.au>
Message-ID: <001501c628d8$d91cd430$15327e82@pyrimidine>

Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS.  It will
work for saving text output.  However, it will not parse anything using
next_result (it will likely hang) and will not save XML format.  See these
bugs:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934
http://bugzilla.bioperl.org/show_bug.cgi?id=1935

for explanations and possible fixes (changes to RemoteBlast and
Bio::SearchIO::blast).  Note that these haven't been checked in yet so are
still not included in bioperl-live; they may be further modified before
committing to CVS.  If you're not worried about XML, you could just try the
first fix, which is a change to SearchIO::blast.

Nagesh, I remember you posting to the list a month ago using a script which
had problems; the script you used saves the output but doesn't actually
parse it (i.e. you don't use next_result() to go through the data).  Is the
version of BLAST in your text output 2.2.12 or 2.2.13?  Have you tried
parsing the output using "-readmethod => SearchIO" or "-readmethod => blast"
using your version of RemoteBlast and method next_result()? Like below (from
perldoc):  

        while ( my @rids = $factory->each_rid ) {
          foreach my $rid ( @rids ) {
            my $rc = $factory->retrieve_blast($rid);
            if( !ref($rc) ) {
              if( $rc < 0 ) {
                $factory->remove_rid($rid);
              }
              print STDERR "." if ( $v > 0 );
              sleep 5;
            } else { 				 		# parsing
starts here
              my $result = $rc->next_result(); 		# it should hang
here
              #save the output
              my $filename = $result->query_name()."\.out";
              $factory->save_output($filename);
              $factory->remove_rid($rid);
              print "\nQuery Name: ", $result->query_name(), "\n";
              while ( my $hit = $result->next_hit ) {
                next unless ( $v > 0);
                print "\thit name is ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                  print "\t\tscore is ", $hsp->score, "\n";
                }
              }
            }
          }
        }
      }


My script hanged if I used next_result() in any way prior to the fixes.  I
want to see how many others are having the same issues with parsing using
the CVS version of bioperl-live.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> Sent: Thursday, February 02, 2006 7:24 PM
> To: Huang Jian; bioperl-l
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Hi Huang,
> Thanks for the message. The older version of RemoteBlast.pm works on the
> logic of checking the temporary file size to determine whether the Blast
> results are ready. This condition is not getting satisfied may be due to
> some changes brought about by NCBI. I had this problem recently and
> figured out that the solution was to use the latest version which has
> this problem fixed (does not use file size logic any more) which is not
> yet included in the BioPerl package.
> Cheers
> Nagesh
> 
> Huang Jian wrote:
> 
> > Dear Nagesh,
> >
> > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > me. Now it works perfectly!!!
> >
> > Thank you!!
> >
> > Huang
> >
> > ----- Original Message ----- From: "Nagesh Chakka"
> > 
> > To: "Huang Jian" ; "bioperl-l"
> > 
> > Sent: Friday, February 03, 2006 7:48 AM
> > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > via email
> >
> >
> >> Hi Huang,
> >> I see that you are submitting a sequence for a remote blast search. Can
> >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> >> not I have attached it with this email, try to replace it with the old
> >> one which has a bug.
> >> Let me know if it works.
> >> Nagesh
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From osborne1 at optonline.net  Fri Feb  3 18:05:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 03 Feb 2006 13:05:44 -0500
Subject: [Bioperl-l] Documentation in the Bioperl package
Message-ID: 

bioperl-l,

The recent work on the Bioperl Wiki moved much of the Bioperl documentation
online. Since we cannot maintain 2 locations for all of this we?ll be
removing a number of files from the package, specifically:

biodatabases.pod   
biodesign.pod    
bioperl.pod   
bioscripts.pod
doc/howto/*
doc/faq/*
FAQ

Rest assured that all of these files have been gone over in detail to make
sure that no important information was lost during the migration. All of
this will be replaced by a single file, such as ?README.docs?, that explains
where all the documentation is. It?s not entirely clear what will happen to
bptutorial.pl. Moving its content to different online locations is possible
but in this case we loose its functionality as a script.

Are there any comments or questions or concerns?

Brian O.




From saldroubi at yahoo.com  Fri Feb  3 18:38:26 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Fri, 3 Feb 2006 10:38:26 -0800 (PST)
Subject: [Bioperl-l] Gibbs sampling algorithm?
Message-ID: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com>

Hi everyone,

I am wondering if anyone has implemented the Gibbs sampling algorithm in BioPerl or otherwise for finding motifs.  I saw Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs program which is not free open source, I think.   I prefer not to write my one Gibbs sampling algorithm if it is already out there.  Any comments are appreciated.

Thank you

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From cjfields at uiuc.edu  Fri Feb  3 19:34:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 13:34:27 -0600
Subject: [Bioperl-l] Gibbs sampling algorithm?
In-Reply-To: <20060203183826.41696.qmail@web34306.mail.mud.yahoo.com>
Message-ID: <001901c628f8$d89917b0$15327e82@pyrimidine>

Do you mean this Gibbs program?

ftp://ncbi.nlm.nih.gov/pub/neuwald/ 

You can also request a license from the Gibbs Motif Sampler homepage, which
is more up to date:

http://bayesweb.wadsworth.org/gibbs/gibbs.html.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Sam Al-Droubi
> Sent: Friday, February 03, 2006 12:38 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Gibbs sampling algorithm?
> 
> Hi everyone,
> 
> I am wondering if anyone has implemented the Gibbs sampling algorithm in
> BioPerl or otherwise for finding motifs.  I saw
> Bio::Tools::Run::PiseApplication::gibbs which I believe calls a Gibbs
> program which is not free open source, I think.   I prefer not to write my
> one Gibbs sampling algorithm if it is already out there.  Any comments are
> appreciated.
> 
> Thank you
> 
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Fri Feb  3 19:44:50 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Fri, 03 Feb 2006 14:44:50 -0500
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <001501c628d8$d91cd430$15327e82@pyrimidine>
Message-ID: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu>

Hi, Everybody,  
I see this post and am wondering if this is the reason for the malfunctionning of my webserver. We set up a webserver named MAK, for MITE sequence analysis. It was working very well until around November 2005, when it stopped returning any result (the site is fine and seems to be doing sth after submission).  In the CGI script, I used remoteblast (that work was done in 2003) to do searches. I currently do not have access to the server because I moved. Quite several people sent emails to us about its malfunctioning. Is there any suggestion on fixing the problem?  Should I simplily ask the remoteblast.pm be replaced with the new version?  
Thanks a lot,  
Guojun

Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun
      _____  

  From: Chris Fields [mailto:cjfields at uiuc.edu]
To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian' [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-l at bioperl.org]
Sent: Fri, 03 Feb 2006 10:45:23 -0500
Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28

Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It will
work for saving text output. However, it will not parse anything using
next_result (it will likely hang) and will not save XML format. See these
bugs:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934
http://bugzilla.bioperl.org/show_bug.cgi?id=1935

for explanations and possible fixes (changes to RemoteBlast and
Bio::SearchIO::blast). Note that these haven't been checked in yet so are
still not included in bioperl-live; they may be further modified before
committing to CVS. If you're not worried about XML, you could just try the
first fix, which is a change to SearchIO::blast.

Nagesh, I remember you posting to the list a month ago using a script which
had problems; the script you used saves the output but doesn't actually
parse it (i.e. you don't use next_result() to go through the data). Is the
version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
parsing the output using "-readmethod => SearchIO" or "-readmethod => blast"
using your version of RemoteBlast and method next_result()? Like below (from
perldoc): 

while ( my @rids = $factory->each_rid ) {
foreach my $rid ( @rids ) {
my $rc = $factory->retrieve_blast($rid);
if( !ref($rc) ) {
if( $rc < 0 ) {
$factory->remove_rid($rid);
}
print STDERR "." if ( $v > 0 );
sleep 5;
} else { # parsing
starts here
my $result = $rc->next_result(); # it should hang
here
#save the output
my $filename = $result->query_name()."\.out";
$factory->save_output($filename);
$factory->remove_rid($rid);
print "\nQuery Name: ", $result->query_name(), "\n";
while ( my $hit = $result->next_hit ) {
next unless ( $v > 0);
print "\thit name is ", $hit->name, "\n";
while( my $hsp = $hit->next_hsp ) {
print "\t\tscore is ", $hsp->score, "\n";
}
}
}
}
}
}


My script hanged if I used next_result() in any way prior to the fixes. I
want to see how many others are having the same issues with parsing using
the CVS version of bioperl-live.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> Sent: Thursday, February 02, 2006 7:24 PM
> To: Huang Jian; bioperl-l
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Hi Huang,
> Thanks for the message. The older version of RemoteBlast.pm works on the
> logic of checking the temporary file size to determine whether the Blast
> results are ready. This condition is not getting satisfied may be due to
> some changes brought about by NCBI. I had this problem recently and
> figured out that the solution was to use the latest version which has
> this problem fixed (does not use file size logic any more) which is not
> yet included in the BioPerl package.
> Cheers
> Nagesh
> 
> Huang Jian wrote:
> 
> > Dear Nagesh,
> >
> > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > me. Now it works perfectly!!!
> >
> > Thank you!!
> >
> > Huang
> >
> > ----- Original Message ----- From: "Nagesh Chakka"
> > 
> > To: "Huang Jian" ; "bioperl-l"
> > 
> > Sent: Friday, February 03, 2006 7:48 AM
> > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > via email
> >
> >
> >> Hi Huang,
> >> I see that you are submitting a sequence for a remote blast search. Can
> >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> >> not I have attached it with this email, try to replace it with the old
> >> one which has a bug.
> >> Let me know if it works.
> >> Nagesh
> >
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l
      
   
 


From gbazykin at Princeton.EDU  Fri Feb  3 20:38:04 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Fri, 3 Feb 2006 15:38:04 -0500
Subject: [Bioperl-l] proposed additions to Tree and cladogram
In-Reply-To: <148174979677.20051026172707@princeton.edu>
References: <148174979677.20051026172707@princeton.edu>
Message-ID: <8010525745.20060203153804@princeton.edu>

Hi all,

a while ago, I mailed to bioperl-l some proposed additions to
phylogeny-related modules (see below). I am doing a project on hiv
phylogeny now, and rely on these additions heavily. They expand on
what was already present in the corresponding modules. I expected them
to be also of general usage (at least the first one).

However, I never got any answer, so I assumed that these additions
were considered superfluous by most.

I am now working on an addition to Tree::Draw::Cladogram module. For
my project, I need to color individual tree edges (including internal)
into colors from red to blue (according to the nosynonymous/synonymous
ratios of these branches). This should be technically easy (I guess I
will add -Rcolor, -Gcolor and -Bcolor tags to nodes and use them in
Cladogram to color preceding edges), but I have two questions:

    - will this add-on be of general interest - should I try to do it
    "the right way", updating the pods etc.;
    
    - in general, are there any guidelines about how specific an issue
    a method should address to be included in bioperl distribution?

Thanks,
Yegor Bazykin



This is a forwarded message
From: Georgii Bazykin 
To: bioperl-l at bioperl.org
Date: Wednesday, October 26, 2005, 4:27:07 PM
Subject: suggestions for additions to Tree

===8<==============Original message text===============
Hi,

here are some tree-related methods I needed and added to my bioperl.
Hope someone else finds any of them useful as well.

Yegor Bazykin



=============================================
To NodeI:


# modified from total_branch_length in Tree:Tree module
# gets sum of branches in the subtree - descendents of given node

=head2 children_branch_length

 Title   : children_branch_length
 Usage   : my $size = $node->children_branch_length
 Function: Returns the sum of the length of all branches of the subtree which starts at given node
 Returns : integer
 Args    : none

=cut

sub children_branch_length {
   my ($self) = @_;
   
   return 0 if($self -> is_Leaf) ;

   my $sum = 0;

   for ($self -> get_all_Descendents) {
       $sum += $_->branch_length || 0;
   }

   return $sum;
}


-----------------------------------

=head2 height_nodes

 Title   : height_nodes
 Usage   : my $len = $node->height_nodes
 Function: Returns the height of the tree starting at this
           node.  Height is the maximum branchlength to get to the tip.
 Returns : The longest length to a leaf, in nodes
 Args    : none

=cut

sub height_nodes{
   my ($self) = @_;
   
   return 0 if( $self->is_Leaf );

   my $max = 0;
   foreach my $subnode ( $self->each_Descendent ) { 
       my $s = $subnode->height_nodes + 1;
       if( $s > $max ) { $max = $s; }
   }
   return $max;
}



----------------------------------

=head2 get_all_Descendent_Leaves

 Title   : get_all_Descendent_Leaves($sortby)
 Usage   : my @nodes = $node->get_all_Descendent_Leaves;
 Function: Recursively fetch all the nodes and their descendents, only selecting leaves
           *NOTE* This is different from each_Descendent
 Returns : Array or Bio::Tree::NodeI objects
 Args    : $sortby [optional] "height", "creation" or coderef to be used
           to sort the order of children nodes.

=cut

sub get_all_Descendent_Leaves{
   my ($self, $sortby) = @_;
   $sortby ||= 'height';   
   my @nodes;
   foreach my $node ( $self->each_Descendent($sortby) ) {
       if ($node->is_Leaf) {
           push @nodes, $node;
       }
       else {
           push @nodes, ($node->get_all_Descendents($sortby));
       }
   }
   return @nodes;
} 

=====================================================
To Tree:

=head2 total_internal_branch_length

 Title   : total_internal_branch_length
 Usage   : my $size = $tree->total_internal_branch_length
 Function: Returns the sum of the length of all branches, excluding branches leading to leaves
 Returns : integer
 Args    : none

=cut

sub total_internal_branch_length {
   my ($self) = @_;
   my $sum = 0;
   if( defined $self->get_root_node ) {
       for ( $self->get_root_node->get_Descendents() ) {
           unless ($_->is_Leaf) {       # YB: THIS IS ALL I ADDED
               $sum += $_->branch_length || 0;
           }
       }
   }
   return $sum;
} 


=================================================

To TreeFunctionsI:

=head2 distance_nodes

 Title   : distance_nodes
 Usage   : distance_nodes(-nodes => \@nodes )
 Function: returns the distance between two given nodes in numbers of nodes
 Returns : numerical distance
 Args    : -nodes => arrayref of nodes to test

=cut


# YB: distance_nodes is very similar to distance method in TreeFunctionsI except that 
# it estimates distances between nodes in numbers of nodes (e.g., 1 between mother and 
# daughter, 2 between two sisters, etc.)


sub distance_nodes {
    my ($self, at args) = @_;
    my ($nodes) = $self->_rearrange([qw(NODES)], at args);
    if( ! defined $nodes ) {
        $self->warn("Must supply -nodes parameter to distance_nodes() method");
        return undef;
    }
    my ($node1,$node2) = $self->_check_two_nodes($nodes);
    # algorithm:

    # Find lca: Start with first node, find and save every node from it
    # to root, saving cumulative distance. Then start with second node;
    # for it and each of its ancestor nodes, check to see if it's in
    # the first node's ancestor list - if so it is the lca. Return sum
    # of (cumul. distance from node1 to lca) and (cumul. distance from
    # node2 to lca)

    # find and save every ancestor of node1 (including itself)

    my %node1_ancestors;        # keys are internal ids, values are objects
    my %node1_cumul_dist;       # keys are internal ids, values 
    # are cumulative distance from node1 to given node
    my $place = $node1;         # start at node1
    my $cumul_dist = 0;

    while ( $place ){
        $node1_ancestors{$place->internal_id} = $place;
        $node1_cumul_dist{$place->internal_id} = $cumul_dist;
        $cumul_dist++;                                                # YB
#YB     if ($place->branch_length) {
#YB         $cumul_dist += $place->branch_length; # include current branch
#YB                                               # length in next iteration
#YB     }
        $place = $place->ancestor;
    }

    # now climb up node2, for each node checking whether 
    # it's in node1_ancestors
    $place = $node2;  # start at node2
    $cumul_dist = 0;
    while ( $place ){
        foreach my $key ( keys %node1_ancestors ){ # ugh
            if ( $place->internal_id == $key){ # we're at lca
                return $node1_cumul_dist{$key} + $cumul_dist;
            }
        }
        # include current branch length in next iteration
#YB     $cumul_dist += $place->branch_length || 0; 
        $cumul_dist++;                                                 # YB
        $place = $place->ancestor;
    }
    $self->warn("Could not find distance!"); # should never execute, 
    # if so, there's a problem
    return undef;
}
===8<===========End of original message text===========





From cjfields at uiuc.edu  Fri Feb  3 21:07:29 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 3 Feb 2006 15:07:29 -0600
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <20060203194450.792e8d4e@dogwood.plantbio.uga.edu>
Message-ID: <001a01c62905$d7ef0920$15327e82@pyrimidine>

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below).  I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed.  The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml.  There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless.  If you look back the
last month or so there has been a bit of discussion here about it.  Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission).  In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem?  Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
>       _____
> 
>   From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hlapp at gmx.net  Fri Feb  3 23:11:03 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 3 Feb 2006 15:11:03 -0800
Subject: [Bioperl-l] Documentation in the Bioperl package
In-Reply-To: 
References: 
Message-ID: 

Just to be sure, the wiki will be able to handle versions (releases)?
(documentation and APIs may change between releases and hence a more
recent doc page may not apply to an earlier release)

  -hilmar

On 2/3/06, Brian Osborne  wrote:
> bioperl-l,
>
> The recent work on the Bioperl Wiki moved much of the Bioperl documentation
> online. Since we cannot maintain 2 locations for all of this we?ll be
> removing a number of files from the package, specifically:
>
> biodatabases.pod
> biodesign.pod
> bioperl.pod
> bioscripts.pod
> doc/howto/*
> doc/faq/*
> FAQ
>
> Rest assured that all of these files have been gone over in detail to make
> sure that no important information was lost during the migration. All of
> this will be replaced by a single file, such as ?README.docs?, that explains
> where all the documentation is. It?s not entirely clear what will happen to
> bptutorial.pl. Moving its content to different online locations is possible
> but in this case we loose its functionality as a script.
>
> Are there any comments or questions or concerns?
>
> Brian O.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From hubert.prielinger at gmx.at  Fri Feb  3 22:47:37 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 03 Feb 2006 16:47:37 -0600
Subject: [Bioperl-l] standalone blast composition based statistics parameter
Message-ID: <43E3DD89.7080903@gmx.at>

Hi,
Does anybody know whether it is possible to perform a with the 
standalone blast a database search where the composition based 
statistics parameter is on
and what's the abbreviation for the parameter

thanks
Hubert


From osborne1 at optonline.net  Sat Feb  4 03:32:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 03 Feb 2006 22:32:18 -0500
Subject: [Bioperl-l] Documentation in the Bioperl package
In-Reply-To: 
Message-ID: 

Hilmar,

MediaWiki supports such things as rollback based on date but it is not CVS
where an entire set of pages are tagged by version. It is also scriptable so
it may be possible to emulate this type of tagging by script, but I'm not
entirely sure (see WWW::Mediawiki::Client, Jason pointed this out to me).

So the simple answer is probably "no". But let's be honest: synchrony
between code and documentation wasn't achieved using the previous approach,
CVS, either. 

What Jason, Torsten, and I appreciated when adding content to this new site
was that it was relatively easy, our hope is that this approach will get
more people involved. The assumption is that more involvement will lead to
better documentation - Jason made this assumption when electing to move the
site to MediaWiki and I have to say that I completely agree with this
assumption.

Jason, any thoughts on this question? An interesting one...

Brian O.



On 2/3/06 6:11 PM, "Hilmar Lapp"  wrote:

> Just to be sure, the wiki will be able to handle versions (releases)?
> (documentation and APIs may change between releases and hence a more
> recent doc page may not apply to an earlier release)
> 
>   -hilmar
> 
> On 2/3/06, Brian Osborne  wrote:
>> bioperl-l,
>> 
>> The recent work on the Bioperl Wiki moved much of the Bioperl documentation
>> online. Since we cannot maintain 2 locations for all of this we?ll be
>> removing a number of files from the package, specifically:
>> 
>> biodatabases.pod
>> biodesign.pod
>> bioperl.pod
>> bioscripts.pod
>> doc/howto/*
>> doc/faq/*
>> FAQ
>> 
>> Rest assured that all of these files have been gone over in detail to make
>> sure that no important information was lost during the migration. All of
>> this will be replaced by a single file, such as ?README.docs?, that explains
>> where all the documentation is. It?s not entirely clear what will happen to
>> bptutorial.pl. Moving its content to different online locations is possible
>> but in this case we loose its functionality as a script.
>> 
>> Are there any comments or questions or concerns?
>> 
>> Brian O.
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> 
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From shameer at ncbs.res.in  Sat Feb  4 10:15:33 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Sat, 4 Feb 2006 15:45:33 +0530 (IST)
Subject: [Bioperl-l] Calpha to Co-ordinates Program
In-Reply-To: <2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
Message-ID: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176>

Dear All,

Any one is aware of a perl script / Bio::PERL module that can be used to
construct full atomic coordinates of a protein from a given C(alpha) trace
and optimizes side chain geometry.

I tried the original program Maxsprout from Holms Group, But it is not
giving me proper results (am getting errors like segmentation fault -
backbonchain failed etc.)

Since I need to use as a part of a webs server - I would appreciate if any
one could let me know about a perl script for the same.

Thanks and cheers in advance,
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From torsten.seemann at infotech.monash.edu.au  Sun Feb  5 03:34:35 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Sun, 05 Feb 2006 14:34:35 +1100
Subject: [Bioperl-l] standalone blast composition based statistics
	parameter
In-Reply-To: <43E3DD89.7080903@gmx.at>
References: <43E3DD89.7080903@gmx.at>
Message-ID: <43E5724B.5070007@infotech.monash.edu.au>

Hubert,

> Does anybody know whether it is possible to perform a with the 
> standalone blast a database search where the composition based 
> statistics parameter is on
> and what's the abbreviation for the parameter

The StandAloneBlast only runs the "blastall" binary on your system. It 
accepts all the command line options (like "-d" etc.) that "blastall" 
does but just passes them as-is; it doesn't do anything special.

On a Unix system, type "blastall -" to list all the options that your 
BLAST binary supports.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From fernan at iib.unsam.edu.ar  Sun Feb  5 04:34:27 2006
From: fernan at iib.unsam.edu.ar (Fernan Aguero)
Date: Sun, 5 Feb 2006 01:34:27 -0300
Subject: [Bioperl-l] standalone blast composition based statistics
	parameter
In-Reply-To: <43E3DD89.7080903@gmx.at>
References: <43E3DD89.7080903@gmx.at>
Message-ID: <20060205043427.GB39264@iib.unsam.edu.ar>

+----[ Hubert Prielinger  (03.Feb.2006 21:06):
|
| Hi,
| Does anybody know whether it is possible to perform a with the 
| standalone blast a database search where the composition based 
| statistics parameter is on
| and what's the abbreviation for the parameter
| 
| thanks
| Hubert
|
+----]

only for tblastn.

As Torsten said, 'blastall' with no arguments would have
revealed it: 

[ ... ]
  -C  Use composition-based statistics for tblastn:
      D or d: default (equivalent to F)
      0 or F or f: no composition-based statistics
      1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001
      2: Composition-based score adjustment as in Bioinformatics 21:902-911,
          2005, conditioned on sequence properties
      3: Composition-based score adjustment as in Bioinformatics 21:902-911,
          2005, unconditionally
      For programs other than tblastn, must either be absent or be D, F or 0.
      [String]
    default = D

Fernan

PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206)


From hubert.prielinger at gmx.at  Mon Feb  6 02:56:07 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 05 Feb 2006 20:56:07 -0600
Subject: [Bioperl-l] standalone blast composition based
	statistics	parameter
In-Reply-To: <20060205043427.GB39264@iib.unsam.edu.ar>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
Message-ID: <43E6BAC7.5050707@gmx.at>

Hi,
thank you very much, If I use the tblastn instead of blastp, I get the 
following error message

[blastall] WARNING: : Unable to open nr.00.nin

I looked up in the folder, but I don't have that file, and if I download 
the database and extract the file, it isn't there either...

thanks

Hubert

Fernan Aguero wrote:

>+----[ Hubert Prielinger  (03.Feb.2006 21:06):
>|
>| Hi,
>| Does anybody know whether it is possible to perform a with the 
>| standalone blast a database search where the composition based 
>| statistics parameter is on
>| and what's the abbreviation for the parameter
>| 
>| thanks
>| Hubert
>|
>+----]
>
>only for tblastn.
>
>As Torsten said, 'blastall' with no arguments would have
>revealed it: 
>
>[ ... ]
>  -C  Use composition-based statistics for tblastn:
>      D or d: default (equivalent to F)
>      0 or F or f: no composition-based statistics
>      1 or T or t: Composition-based statistics as in NAR 29:2994-3005, 2001
>      2: Composition-based score adjustment as in Bioinformatics 21:902-911,
>          2005, conditioned on sequence properties
>      3: Composition-based score adjustment as in Bioinformatics 21:902-911,
>          2005, unconditionally
>      For programs other than tblastn, must either be absent or be D, F or 0.
>      [String]
>    default = D
>
>Fernan
>
>PS: this is using the latest BLAST 2.2.13 (ncbi-toolkit 20051206)
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Mon Feb  6 04:29:11 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 06 Feb 2006 15:29:11 +1100
Subject: [Bioperl-l] standalone blast composition
	based	statistics	parameter
In-Reply-To: <43E6BAC7.5050707@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at>
Message-ID: <43E6D097.7080304@infotech.monash.edu.au>

Hubert

> thank you very much, If I use the tblastn instead of blastp, I get the 
> following error message
> [blastall] WARNING: : Unable to open nr.00.nin
> I looked up in the folder, but I don't have that file, and if I download 
> the database and extract the file, it isn't there either...

"tblastn" requires a NUCLEOTIDE database to search. It appears that you 
have specified a PROTEIN database with "-d nr" ("nr" is protein). You 
probably want to install the "nt" blast database and use that instead.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From hubert.prielinger at gmx.at  Mon Feb  6 04:12:27 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Sun, 05 Feb 2006 22:12:27 -0600
Subject: [Bioperl-l] standalone blast
	composition	based	statistics	parameter
In-Reply-To: <43E6D097.7080304@infotech.monash.edu.au>
References: <43E3DD89.7080903@gmx.at>
	<20060205043427.GB39264@iib.unsam.edu.ar>	<43E6BAC7.5050707@gmx.at>
	<43E6D097.7080304@infotech.monash.edu.au>
Message-ID: <43E6CCAB.2060107@gmx.at>

dear torsten,
thanks for your quick reply, I have looked up at the ftp server and 
there are nt.00 to nt.04. Do I have to download all of them, are there 
differences?

thanks
Hubert


Torsten Seemann wrote:

>Hubert
>
>  
>
>>thank you very much, If I use the tblastn instead of blastp, I get the 
>>following error message
>>[blastall] WARNING: : Unable to open nr.00.nin
>>I looked up in the folder, but I don't have that file, and if I download 
>>the database and extract the file, it isn't there either...
>>    
>>
>
>"tblastn" requires a NUCLEOTIDE database to search. It appears that you 
>have specified a PROTEIN database with "-d nr" ("nr" is protein). You 
>probably want to install the "nt" blast database and use that instead.
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Mon Feb  6 05:22:09 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 06 Feb 2006 16:22:09 +1100
Subject: [Bioperl-l] standalone blast
	composition	based	statistics	parameter
In-Reply-To: <43E6CCAB.2060107@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au>
	<43E6CCAB.2060107@gmx.at>
Message-ID: <43E6DD01.2010600@infotech.monash.edu.au>

Hubert

> thanks for your quick reply, I have looked up at the ftp server and 
> there are nt.00 to nt.04. Do I have to download all of them, are there 
> differences?

You have to download them all. The "nt" database (actually the index 
files) is very big, and it is split up into gigabyte (?) parts. Although 
they are called "nt.00" "nt.01" etc, you still pass "-d nt" to 
"blastall", because together these parts are one "nt" database. The 
"blastall" program will automatically use the separate parts; you do not 
have to join them.

You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are 
using the correct BLAST search for your problem.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From shameer at ncbs.res.in  Mon Feb  6 08:27:50 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 6 Feb 2006 13:57:50 +0530 (IST)
Subject: [Bioperl-l] Need a  slogan for OBF
In-Reply-To: <47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
References: <001001c62793$bef08f70$93656785@zhur>
	<2d4f320602012326x1742a7d7u13ccd550f2d2e0e4@mail.gmail.com>
	<47205.192.168.1.176.1139048133.squirrel@192.168.1.176>
Message-ID: <2888.192.168.4.38.1139214470.squirrel@192.168.4.38>

Dear All,

As we are moving to the all new look wiki-style-web - why dont we think
about a unique logo +  slogan that can express our spirit and excitement
???

For Example we can have a logo with O|B|F its full form and the slogan -
any body is interested - i would be happy to design logos once we have
done with the logo.

I have a couple of suggestions -I hope all OBF members can sent much more
powerful slogans than mine

'Let's Code for Life'
'Let's Decode Life'
'Let's Recode Life'
'Code your Life '

Happy O|B|!!!
-- 
Mr. Shameer Khadar (JRF)
Dr. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From olsonbr2 at msu.edu  Fri Feb  3 20:54:22 2006
From: olsonbr2 at msu.edu (Bradley J. S. C. Olson)
Date: Fri, 3 Feb 2006 15:54:22 -0500
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the
	method?
Message-ID: <005e01c62904$02b2ad30$db4c0a23@dihedral>

I have been working with the RemoteBlast.pm module and have found that it is
a bit clunky to use loops to keep checking to see if you RID has finished.

 

For example, every time you write a script, you need to add a code block
(see example in the documentation) in order to keep checking if @rid is
finished.

 

Would it be better to maybe write this in as a method in the RemoteBlast
module?  It seems like it would be better for remoteblast to have a method
we could call say retrieve_when_done that would return the blast report when
the value of retrieve_blast is no longer 0.

 

The only issue may be report parsing, but I wonder if it might be better to
separate out submittal/retrieval of BLAST requests from the parsing step and
make these more discrete processes?  Since NCBI seems to be not supporting
text results as a standard, maybe the module should work exclusively with
XML and we could change report handling away from the headaches of text
processing and just allow Bio::SeqIO or blastxml handle the task of making a
blast reports into different forms (such as HTML, text etc).

 

This would definitely simplifying coding using the RemoteBlast.pm module as
then you could treat the report retrieval process as an object and just wait
for the object to return its value, instead of coding in a bunch of test
loops to see if it is done.  This may also help keep bugs out of the module
and make the module longer lasting and not require module users to rewrite
their code every time NCBI makes changes.

 

Any thoughts or ideas?

 

Is anyone working on this?

 

Thanks

 

Brad Olson

 

 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
 


From cjfields at uiuc.edu  Mon Feb  6 17:27:56 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 6 Feb 2006 11:27:56 -0600
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
	themethod?
In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral>
Message-ID: <002c01c62b42$ab7671a0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C. Olson
> Sent: Friday, February 03, 2006 2:54 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
> themethod?
> 
> I have been working with the RemoteBlast.pm module and have found that it
> is
> a bit clunky to use loops to keep checking to see if you RID has finished.
> 
> 
> 
> For example, every time you write a script, you need to add a code block
> (see example in the documentation) in order to keep checking if @rid is
> finished.
> 
> Would it be better to maybe write this in as a method in the RemoteBlast
> module?  It seems like it would be better for remoteblast to have a method
> we could call say retrieve_when_done that would return the blast report
> when
> the value of retrieve_blast is no longer 0.

Sounds reasonable, though I'm not sure how easy it would be to implement.
Why not drop by Bugzilla (http://bugzilla.bioperl.org/) and submit this as
an enhancement?

> The only issue may be report parsing, but I wonder if it might be better
> to
> separate out submittal/retrieval of BLAST requests from the parsing step
> and
> make these more discrete processes?  Since NCBI seems to be not supporting
> text results as a standard, maybe the module should work exclusively with
> XML and we could change report handling away from the headaches of text
> processing and just allow Bio::SeqIO or blastxml handle the task of making
> a
> blast reports into different forms (such as HTML, text etc).

They are separated.  RemoteBlast executes BLAST remotely (via HTTP).
Results are parsed via various Bio::SearchIO modules depending on what you
set '-readmethod' to.  This is from perldoc:

>From Bio::Tools::Run::RemoteBlast
________________________________________________________

DESCRIPTION
    Class for remote execution of the NCBI Blast via HTTP.

    For a description of the many CGI parameters see:
    http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html

    Various additional options and input formats are available.

________________________________________________________

>From Bio::SearchIO____________
____________________________________________
DESCRIPTION
    This is a driver for instantiating a parser for report files from
    sequence database searches. This object serves as a wrapper for the
    format parsers in Bio::SearchIO::* - you should not need to ever use
    those format parsers directly. (For people used to the SeqIO system it,
    we are deliberately using the same pattern).

    Once you get a SearchIO object, calling next_result() gives you back a
    Bio::Search::Result::ResultI compliant object, which is an object that
    represents one Blast/Fasta/HMMER whatever report.

    A list of module names and formats is below:

      blast      BLAST (WUBLAST, NCBIBLAST,bl2seq)
      fasta      FASTA -m9 and -m0
      blasttable BLAST -m9 or -m8 output (NCBI not WUBLAST tabular)
      megablast  MEGABLAST
      psl        UCSC PSL format
      waba       WABA output
      axt        AXT format
      sim4       Sim4
      hmmer      HMMER hmmpfam and hmmsearch
      exonerate  Exonerate CIGAR and VULGAR format
      blastxml   NCBI BLAST XML
      wise       Genewise -genesf format

    See the SearchIO HOWTO linked from http://bioperl.org/HOWTOs/

________________________________________________________

This is also in the wiki online now:

http://www.bioperl.org/wiki/Module:Bio::SearchIO 
http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

I think the current line of thought is to make XML the default, but I also
know you would irritate a LOT of people out there by cutting off text output
parsing completely.  Roger Hall or Jason pointed out that doing so will
break many scripts out there.  

Furthermore, the problems with text output parsing are usually minimal.  For
instance, the last one was a small change which broke a regex, causing an
infinite loop; the actual bug was in Bio::SearchIO::blast and not in
RemoteBlast.  A simple addition to the regex fixed it.  The only change to
RemoteBlast was to implement the option of saving XML formatted BLAST
output.

I do like the idea of using XML output to build custom (bioperl-specific)
BLAST reports, but that also requires more work, likely a lot more work.
Again, maybe add that as an enhancement in Bugzilla or, better yet, submit
some sample code maybe as an example.  

> This would definitely simplifying coding using the RemoteBlast.pm module
> as
> then you could treat the report retrieval process as an object and just
> wait
> for the object to return its value, instead of coding in a bunch of test
> loops to see if it is done.  This may also help keep bugs out of the
> module
> and make the module longer lasting and not require module users to rewrite
> their code every time NCBI makes changes.

I think the most stable way of submitting jobs is by using the netblast
client (blastcl3) and parsing the results from that.  No CGI, no HTML, just
saving to a temp file and parsing through SearchIO.

RemoteBlast was designed, I believe, with the idea of letting researchers
with some basic knowledge of perl use an interface familiar to them (i.e.
the BLAST interface at NCBI) and retrieve results on a regular basis.  The
results are parsed via SearchIO::blast/blastxml/blasttable.  The problem is,
though convenient, RemoteBlast is also reliant on the powers that be at NCBI
not changing anything dramatically.  It is possible that NCBI could modify
the HTML code from the BLAST retrieval process, thus breaking RemoteBlast.
Text output could change again, even more dramatically, thus severely
breaking Bio::SearchIO::blast.  Thus, we adapt to those changes by modifying
the broken modules.  It's evolution at its finest.  It's also a fact of life
that code breaks and needs to be fixed every once in a while to stay
current.

Okay, I'm waxing philosophical now so I know I've definitely had too much
coffee.  Must get back to work...

> 
> 
> 
> Any thoughts or ideas?
> 
> 
> 
> Is anyone working on this?
> 
> 
> 
> Thanks
> 
> 
> 
> Brad Olson
> 
> 
> 
> 
> 
> 
> --
> No virus found in this outgoing message.
> Checked by AVG Free Edition.
> Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign



From roger at iosea.com  Mon Feb  6 18:14:11 2006
From: roger at iosea.com (Roger Hall)
Date: Mon, 6 Feb 2006 12:14:11 -0600
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter
	the	method?
In-Reply-To: <005e01c62904$02b2ad30$db4c0a23@dihedral>
Message-ID: <000f01c62b49$25732d30$4301a8c0@LIBERAL>

Brad,

I decided to fix this module about ten days ago, and then was out all of
last week with Strep plus a virus or two - it's one of the advantages of
having young kids.

I see that there have been quite a few messages about this module in just
the last week. I am sitting down now to read through them.

I'll get back to you (and the list) ASAP.

If you have any other questions or suggestions about RemoteBlast, feel free
to bug me with 'em. 

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Bradley J. S. C.
Olson
Sent: Friday, February 03, 2006 2:54 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] RemoteBlast.pm getting RID requests-make/alter the
method?

I have been working with the RemoteBlast.pm module and have found that it is
a bit clunky to use loops to keep checking to see if you RID has finished.

 

For example, every time you write a script, you need to add a code block
(see example in the documentation) in order to keep checking if @rid is
finished.

 

Would it be better to maybe write this in as a method in the RemoteBlast
module?  It seems like it would be better for remoteblast to have a method
we could call say retrieve_when_done that would return the blast report when
the value of retrieve_blast is no longer 0.

 

The only issue may be report parsing, but I wonder if it might be better to
separate out submittal/retrieval of BLAST requests from the parsing step and
make these more discrete processes?  Since NCBI seems to be not supporting
text results as a standard, maybe the module should work exclusively with
XML and we could change report handling away from the headaches of text
processing and just allow Bio::SeqIO or blastxml handle the task of making a
blast reports into different forms (such as HTML, text etc).

 

This would definitely simplifying coding using the RemoteBlast.pm module as
then you could treat the report retrieval process as an object and just wait
for the object to return its value, instead of coding in a bunch of test
loops to see if it is done.  This may also help keep bugs out of the module
and make the module longer lasting and not require module users to rewrite
their code every time NCBI makes changes.

 

Any thoughts or ideas?

 

Is anyone working on this?

 

Thanks

 

Brad Olson

 

 


-- 
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.1.375 / Virus Database: 267.15.0/249 - Release Date: 2/2/2006
 
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From barry.m.dancis at gsk.com  Mon Feb  6 17:17:13 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Mon, 6 Feb 2006 12:17:13 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: <003701c625c4$5527d790$2f01a8c0@GOLHARMOBILE1>
Message-ID: 

Hi --

        Are there any classes for manipulating miRNA's with functions such 
as parsing the name, storing and interlinking pri/pre/mat sequences, etc?

Thanks,

Barry


From hubert.prielinger at gmx.at  Mon Feb  6 23:16:01 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Mon, 06 Feb 2006 17:16:01 -0600
Subject: [Bioperl-l] no results with standalone tblastn
In-Reply-To: <43E6DD01.2010600@infotech.monash.edu.au>
References: <43E3DD89.7080903@gmx.at>
	<20060205043427.GB39264@iib.unsam.edu.ar>	<43E6BAC7.5050707@gmx.at>
	<43E6D097.7080304@infotech.monash.edu.au>	<43E6CCAB.2060107@gmx.at>
	<43E6DD01.2010600@infotech.monash.edu.au>
Message-ID: <43E7D8B1.5030307@gmx.at>

dear torsten,
I have downloaded all the databases, as you recommended me. And it is 
working, but I don't get any results, if I try it online it works fine.
my result file looks like that:

TBLASTN 2.2.13 [Nov-27-2005]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query=
         (8 letters)

Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
           3,749,503 sequences; 16,556,997,203 total letters

Searching..................................................done

                                                                
Sequences producing significant alignments:                Score    
E      (bits) Value



the program code for it looks like that:

#!/usr/local/bin/perl -w
BEGIN
{
      $ENV{BLASTDIR}= "/home/Hubert/blast/blast-2.2.13/bin";
    $ENV{BLASTDATADIR}= "/home/Hubert/blast/blast-2.2.13/data"; 
}

use Bio::Tools::Run::StandAloneBlast;
use Bio::Seq;
use Bio::SeqIO;
use strict;

print "Please insert matrix:\t";
my $matrix_STD = ;
chomp $matrix_STD;

print "Please insert count:\t";
my $count_STD = ;
chomp $count_STD;



# parameters
my $expect_value = 20000;
#my $filter_query_sequence = 'T';
my $one_line_description = 1000;
my $alignments = 1000;
#my $matrix = 'BLOSUM80';
my $gapcost = 10;
my $gapextend = 1;
my $wordsize = 2;
#my $compbasedStat = '1';
#my $count = 1;
# my $strands = 1;

my @params = ('program' => 'tblastn','database' => 'nt');
#my $progress_interval = 100;


my $seqio_obj = Bio::SeqIO->new(
  -file   => "aloneblosum62.txt",
  -format => "raw",
);

# create factory object and set parameters

my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);
print "submitted parameters successfully \n";

$factory->e($expect_value);
#$factory->F($filter_query_sequence);
$factory->v($one_line_description);
$factory->b($alignments);
$factory->M($matrix_STD);
$factory->G($gapcost);
$factory->E($gapextend);
$factory->W($wordsize);
#$factory->C($compbasedStat);
#$factory->S($strands);

print "changed parameters successfully \n";
print "\n";


# get query

while ( my $query = $seqio_obj->next_seq) {
      print "entered while loop \n";
      my $blast_report = $factory->blastall($query);
#      print "$blast_report\n";
      $factory->outfile("nucleo80$count_STD.txt");
      $count_STD++;
      print $query->seq;
      print "\n";
     
}



thanks
Hubert



Torsten Seemann wrote:

>Hubert
>
>  
>
>>thanks for your quick reply, I have looked up at the ftp server and 
>>there are nt.00 to nt.04. Do I have to download all of them, are there 
>>differences?
>>    
>>
>
>You have to download them all. The "nt" database (actually the index 
>files) is very big, and it is split up into gigabyte (?) parts. Although 
>they are called "nt.00" "nt.01" etc, you still pass "-d nt" to 
>"blastall", because together these parts are one "nt" database. The 
>"blastall" program will automatically use the separate parts; you do not 
>have to join them.
>
>You should read http://www.ncbi.nlm.nih.gov/BLAST/ to make sure you are 
>using the correct BLAST search for your problem.
>
>  
>



From torsten.seemann at infotech.monash.edu.au  Tue Feb  7 02:17:40 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 07 Feb 2006 13:17:40 +1100
Subject: [Bioperl-l] no results with standalone tblastn
In-Reply-To: <43E7D8B1.5030307@gmx.at>
References: <43E3DD89.7080903@gmx.at> <20060205043427.GB39264@iib.unsam.edu.ar>
	<43E6BAC7.5050707@gmx.at> <43E6D097.7080304@infotech.monash.edu.au>
	<43E6CCAB.2060107@gmx.at> <43E6DD01.2010600@infotech.monash.edu.au>
	<43E7D8B1.5030307@gmx.at>
Message-ID: <43E80344.5090207@infotech.monash.edu.au>


> I have downloaded all the databases, as you recommended me. And it is 
> working, but I don't get any results, if I try it online it works fine.
> my result file looks like that:
> 
> TBLASTN 2.2.13 [Nov-27-2005]
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> Query=
>          (8 letters)
> Database: All GenBank+EMBL+DDBJ+PDB sequences (but no EST, STS,
> GSS,environmental samples or phase 0, 1 or 2 HTGS sequences)
>            3,749,503 sequences; 16,556,997,203 total letters
> Searching..................................................done
> Sequences producing significant alignments:                Score    
> E      (bits) Value

Is your query only 8 amino acids long?

This report looks like it did have alignments that were not displayed, 
otherwise it would print "**** No hits ****".

This mailing list is not here to solve your BLAST problems unless it is 
a problem with the Perl module running BLAST.

You first need to try and get your problem working on the command line 
*without* Perl. eg.

/home/Hubert/blast/blast-2.2.13/bin/blastall -p tblastn -d nt -i 
YOUR_FASTA_FILE_WITH_SEQUENCE_IN_IT -o OUTPUT_FILE.txt -e 0.001
...

where "..." is the rest of the options you are setting in your Perl 
script. If it doesn't work that way, it will never work in Perl.

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From rahall2 at ualr.edu  Tue Feb  7 02:46:44 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Mon, 6 Feb 2006 20:46:44 -0600
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - please
	reply
Message-ID: <002001c62b90$bb9dbe00$4301a8c0@LIBERAL>

To everyone who uses RemoteBlast.pm:

 

Would anyone object to RemoteBlast being rewritten in a way that requires
NCBI's blastcl3 executable?

 

Binary downloads of blastcl3 (column "netblast") are available for numerous
platforms at: http://ncbi.nih.gov/BLAST/download.shtml

 

Does anyone require or desire a "pure perl" implementation? If so, please
explain the advantage you see with such an implementation.

 

Thanks!

 

Roger Hall

Technical Director

MidSouth Bioinformatics Center

University of Arkansas at Little Rock

(501) 569-8074

 



From osborne1 at optonline.net  Tue Feb  7 17:05:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 07 Feb 2006 12:05:56 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: 

Barry,

If the sequence information is in one of the formats that Bioperl
understands (Genbank, Swissprot flat, and so on) then the answer is yes.
This assumes that the details on sequence that you mentioned are found in
some sequence feature section in the file. But it looks to me like there's
no specialized parser for miRNA sequence per se, I'll be corrected if I'm
wrong.

Brian O.


On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com"  wrote:

> Hi --
> 
>         Are there any classes for manipulating miRNA's with functions such
> as parsing the name, storing and interlinking pri/pre/mat sequences, etc?
> 
> Thanks,
> 
> Barry
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From barry.m.dancis at gsk.com  Tue Feb  7 20:26:27 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Tue, 7 Feb 2006 15:26:27 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: 

It's the parser in particular that I need




"Brian Osborne"  
Sent by: bioperl-l-bounces at lists.open-bio.org
07-Feb-2006 12:05
 
To
barry.m.dancis at gsk.com, "bioperl-l" , 
bioperl-l-bounces at lists.open-bio.org
cc

Subject
Re: [Bioperl-l] Handling miRNA's






Barry,

If the sequence information is in one of the formats that Bioperl
understands (Genbank, Swissprot flat, and so on) then the answer is yes.
This assumes that the details on sequence that you mentioned are found in
some sequence feature section in the file. But it looks to me like there's
no specialized parser for miRNA sequence per se, I'll be corrected if I'm
wrong.

Brian O.


On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com"  
wrote:

> Hi --
> 
>         Are there any classes for manipulating miRNA's with functions 
such
> as parsing the name, storing and interlinking pri/pre/mat sequences, 
etc?
> 
> Thanks,
> 
> Barry
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From deep.raman at gmail.com  Tue Feb  7 20:16:48 2006
From: deep.raman at gmail.com (Raman Deep Singh)
Date: Wed, 8 Feb 2006 01:46:48 +0530
Subject: [Bioperl-l] Needed help
Message-ID: 

Hi all
     I have a huge task of retrieving a number of sequences from the
swiss prot databases on some fixed criteria. FOr that i want to index
the swiss prot database on my local disk. I have downloaded the whole
swiss prot database on my local disc  (the january 2006 release).

  I am currently using the bioperl on linux machine . I am using the
code listed below


=======================

    use Bio::Index::Swissprot;

    my $Index_File_Name = shift;
    my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name,
 '-write_flag' => 'WRITE');
    $inx->make_index(@ARGV);
-----------------------------------------
    # Print out several sequences present in the index
    # in gcg format
    use Bio::Index::Swissprot;
    use Bio::SeqIO;

    my $out = Bio::SeqIO->new( '-format' => 'gcg', '-fh' => \*STDOUT );
    my $Index_File_Name = shift;
    my $inx = Bio::Index::Swissprot->new('-filename' => $Index_File_Name);

    foreach my $id (@ARGV) {
        my $seq = $inx->fetch($id); # Returns Bio::Seq object
        $out->write_seq($seq);
    }

    # alternatively

    my $seq1 = $inx->get_Seq_by_id($id);
    my $seq2 = $inx->get_Seq_by_acc($acc);


-- -------------------------------
i am running teh script as

 perl getseqfromid.pl sample.dat

from the shell

and i am getting this error repeatedly

------------- EXCEPTION  -------------
MSG: Can't open 'DB_File' dbm file 'swiss100.dat' : No such file or directory
STACK Bio::Index::Abstract::open_dbm
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:389
STACK Bio::Index::Abstract::new
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/Abstract.pm:150
STACK Bio::Index::AbstractSeq::new
/usr/lib/perl5/site_perl/5.8.5/Bio/Index/AbstractSeq.pm:91
STACK toplevel i.pl:6


--------------------------
At some place online, i also found some document that some variables
need to be exported. I also did the same but still got teh same errors

kindly  help




Ramandeep Singh



From cjfields at uiuc.edu  Tue Feb  7 22:40:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 16:40:15 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <007701c62c37$7914af60$15327e82@pyrimidine>

Are you talking about sequences or text output from a specific program?  If
you are talking about sequences in a particular format, then listen to
Brian.  If you are talking about output, then we need to know which program
you're using, as a parser may exist or could be built.  

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
>  
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Tue Feb  7 23:06:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 17:06:21 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <000001c62c3b$1c6017b0$15327e82@pyrimidine>

Sorry if this gets posted twice.

Are you talking about sequences or text output from a specific program?  If
you are talking about sequences in a particular format, then Brian's right.
If you are talking about output, then we need to know which program you're
using, as a parser may exist, or prbably could be built from and existing
one.

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
>  
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From paul.boutros at utoronto.ca  Wed Feb  8 01:38:42 2006
From: paul.boutros at utoronto.ca (Paul Boutros)
Date: Tue,  7 Feb 2006 20:38:42 -0500
Subject: [Bioperl-l] (no subject)
Message-ID: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>

Hi Roger,

I would definitely prefer a fully Perl-based implementation.  For starters, I have not 
been successful in compiling the Toolkit that contains netblast for some platforms (e.g. 
AIX 5.2 w/gcc 4.0).

I haven't been following the discussion: is there some compelling reason to prefer a 
netblast-based system that's come up recently?  I'm guessing that adding a new non-perl 
dependency would only be done if there was considerable justification for this type of 
change, but I'm not clear from your message what that justification is.

Paul



------------------------------ 

Message: 12 
Date: Mon, 6 Feb 2006 20:46:44 -0600 
From: "Roger Hall"  
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
        please        reply 
To:  
Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
Content-Type: text/plain;        charset="us-ascii" 

To everyone who uses RemoteBlast.pm: 

Would anyone object to RemoteBlast being rewritten in a way that requires 
NCBI's blastcl3 executable? 

Binary downloads of blastcl3 (column "netblast") are available for numerous 
platforms at: http://ncbi.nih.gov/BLAST/download.shtml 

Does anyone require or desire a "pure perl" implementation? If so, please 
explain the advantage you see with such an implementation. 

Thanks! 
 

Roger Hall 

Technical Director 

MidSouth Bioinformatics Center 

University of Arkansas at Little Rock 

(501) 569-8074 

  





From cjfields at uiuc.edu  Wed Feb  8 04:52:36 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 7 Feb 2006 22:52:36 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
Message-ID: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>

I want to submit a module for parsing RNAMotif output  
(Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning  
output and returning Bio::SeqFeature::Generic objects with added tags  
for descriptors/sequences/file info.  I'm in the process of writing  
up tests and going through biodesign to make sure everything's  
kosher, but the module itself is essentially ready-to-go.  What  
should I do next?

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From rahall2 at ualr.edu  Wed Feb  8 05:16:44 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Tue, 7 Feb 2006 23:16:44 -0600
Subject: [Bioperl-l] RemoteBlast  [was: (no subject)]
In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
Message-ID: <004401c62c6e$da906a40$4301a8c0@LIBERAL>

Paul,

I think that most core Bioperl folks have long since moved away from
RemoteBlast and are using the functionality in StandAloneBlast to run their
own local servers. More importantly, they are, in general, researchers who
are coming to Bioinformatics from the life sciences side, and are
particularly tired of dealing with the technical issues that RemoteBlast
consistently generates due to changes in the text-formatted BLAST reports. 

They aren't code-for-code-sake geeks like me. ;}

When RemoteBlast was written, XML was barely on the technology radar, and
XML-formatted BLAST reports weren't even available. It seems that everyone
recognizes that the XML reports now generated by NCBI's blast server is the
wave of the future, but I think there is still some concern that not every
flavor of BLAST produces XML yet. Even so, the XML parser is considered to
be very strong, and only helps hasten the end of text-formatted support,
since parsing text-formatted reports is the primary source of pain. 

In discussing the shift from old to new, I think the idea of relying on
NCBI's application (and NCBI's issue system and NCBI's developers) entered
the realm of possibility, so as the guy who just showed up to adopt
RemoteBlast, I am trying to air all options and beg for all requirements. 

Personally, I am okay with the idea of maintaining text-formatted report
parsing, but like I said, I'm pound foolish about code sometimes. Additional
foolishness arises from the fact that the first money I earned in
Bioinformatics was on a contract gig where I relied on RemoteBlast (and the
related text parsers).

For my money, I just needed anyone, anywhere, to say they desired a pure
perl implementation to meet my personal threshold. So far, you're the
second. ;}

I do, however, see the advantage in shifting to XML-formatted reporting and
parsing *only* as soon as every BLAST flavor supports it, if not before.
(Anyone - is this still an issue. Please educate me.)

At the moment, I'm leaning towards adding an option to RemoteBlast. The
default (no option) would use a "pure perl" implementation, and the
enhancement (with explicit option) would merely wrap the NCBI executable.
However, there are other issues (queuing, batches) that I don't fully
understand in context, so I haven't zeroed in on a complete recommendation
yet. Additionally, the end of text-formatted reports, while drawing near, is
not yet agreed, although it is pretty clear that the only way text support
will be continued is if I insist on it and then deliver the support myself.
:}

In any case, I am very interested in a pure perl implementation for exactly
the two reasons stated thus far: it's one less thing for a newbie to worry
about, and it will run on every platform that runs perl. 

Thanks much for the input!

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock
(501) 569-8074




-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Paul Boutros
Sent: Tuesday, February 07, 2006 7:39 PM
To: BioPerl Mailing List
Cc: Roger Hall
Subject: [Bioperl-l] (no subject)

Hi Roger,

I would definitely prefer a fully Perl-based implementation.  For starters,
I have not 
been successful in compiling the Toolkit that contains netblast for some
platforms (e.g. 
AIX 5.2 w/gcc 4.0).

I haven't been following the discussion: is there some compelling reason to
prefer a 
netblast-based system that's come up recently?  I'm guessing that adding a
new non-perl 
dependency would only be done if there was considerable justification for
this type of 
change, but I'm not clear from your message what that justification is.

Paul



------------------------------ 

Message: 12 
Date: Mon, 6 Feb 2006 20:46:44 -0600 
From: "Roger Hall"  
Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
        please        reply 
To:  
Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
Content-Type: text/plain;        charset="us-ascii" 

To everyone who uses RemoteBlast.pm: 

Would anyone object to RemoteBlast being rewritten in a way that requires 
NCBI's blastcl3 executable? 

Binary downloads of blastcl3 (column "netblast") are available for numerous 
platforms at: http://ncbi.nih.gov/BLAST/download.shtml 

Does anyone require or desire a "pure perl" implementation? If so, please 
explain the advantage you see with such an implementation. 

Thanks! 
 

Roger Hall 

Technical Director 

MidSouth Bioinformatics Center 

University of Arkansas at Little Rock 

(501) 569-8074 

  



_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From heikki at sanbi.ac.za  Wed Feb  8 06:53:58 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 8 Feb 2006 08:53:58 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
Message-ID: <200602080853.58889.heikki@sanbi.ac.za>

Chris,

Post your files to bugzilla (ticket type enhancement, add files to ticket 
after creation)  and someone with commit ability will add them to CVS once 
the code is in satisfactory condition. 

Thanks,

	-Heikki

On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> I want to submit a module for parsing RNAMotif output
> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> output and returning Bio::SeqFeature::Generic objects with added tags
> for descriptors/sequences/file info.  I'm in the process of writing
> up tests and going through biodesign to make sure everything's
> kosher, but the module itself is essentially ready-to-go.  What
> should I do next?
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hlapp at gmx.net  Wed Feb  8 05:48:40 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 7 Feb 2006 21:48:40 -0800
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
Message-ID: 

I presume you don't have a cvs write account yet - if you do just add
and commit the module and test. Otherwise could you post the POD to
the list please; either somebody with an account will hopefully
volunteer or Jason or I or Heikki or Aaron will assume mentorship and
commit the code with feedback to you. Unless you completely refuse to
heed any and all advice ;) that person will then soon try to absolve
him/herself of having to do this again for you and support you for
receiving a cvs write account of your own.

   -hilmar

On 2/7/06, Chris Fields  wrote:
> I want to submit a module for parsing RNAMotif output
> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> output and returning Bio::SeqFeature::Generic objects with added tags
> for descriptors/sequences/file info.  I'm in the process of writing
> up tests and going through biodesign to make sure everything's
> kosher, but the module itself is essentially ready-to-go.  What
> should I do next?
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb  8 12:57:46 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 06:57:46 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: 
References: <7A0355B9-303E-4324-A21A-61484D8627EE@uiuc.edu>
	
Message-ID: 

I'll probably goes with Heikki's advice and post the module (with  
POD, tests, and test file) to bugzilla as an enhancement.  That way  
it can be looked through before committing.  I will likely have a few  
more modules for ERPIN and maybe Infernal int he next few months (if  
I can get it up and running).

Also, completely off-topic, I'll post what I have written up for  
installing bioperl-db on WinXP here soon.  I think it should probably  
be included in the wiki in some way, maybe as a link from the bioperl- 
db wiki page.

Thanks Hilmar, Heikki!

Chris


On Feb 7, 2006, at 11:48 PM, Hilmar Lapp wrote:

> I presume you don't have a cvs write account yet - if you do just add
> and commit the module and test. Otherwise could you post the POD to
> the list please; either somebody with an account will hopefully
> volunteer or Jason or I or Heikki or Aaron will assume mentorship and
> commit the code with feedback to you. Unless you completely refuse to
> heed any and all advice ;) that person will then soon try to absolve
> him/herself of having to do this again for you and support you for
> receiving a cvs write account of your own.
>
>    -hilmar
>
> On 2/7/06, Chris Fields  wrote:
>> I want to submit a module for parsing RNAMotif output
>> (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
>> output and returning Bio::SeqFeature::Generic objects with added tags
>> for descriptors/sequences/file info.  I'm in the process of writing
>> up tests and going through biodesign to make sure everything's
>> kosher, but the module itself is essentially ready-to-go.  What
>> should I do next?
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Wed Feb  8 15:32:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 09:32:25 -0600
Subject: [Bioperl-l] RemoteBlast  [was: (no subject)]
In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
Message-ID: <000401c62cc4$de0cc9b0$15327e82@pyrimidine>

Roger, 

It might be better to build a wrapper for the blastcl3 and make it a
separate Bio::Tools::Run module, maybe branch it off from RemoteBlast or,
better yet, StandAloneBlast.  All the put/get parameters in the BEGIN{}
block for RemoteBlast look like they are configured for NCBI's HTTP
submission via CGI; I don't think you can use these for blastcl3.  Ergo,
you'll have to create a whole new set of hashes or parameter arrays inside
RemoteBlast just for blastcl3 since everything is passed via command-line
flags, like so (from http://www.ncbi.nlm.nih.gov/blast/docs/netblast.html):

blastcl3 -p blastp -d nr -i MY_QUEYR -o MY_QUERY.out

However, StandAloneBlast looks like it has all the parameters mapped out in
the BEGIN{} block.  And it looks like the command line options support just
about everything you get via the web version.  It probably wouldn't take
much modification from StandAloneBlast to get it to run blastcl3.

As for queueing, I don't think it's supported, though you can send in a
FASTA file with multiple sequences for multiple BLAST queries (I tried this
and it works).  You could also create a queue using a sequence factory,
sending them to the netblast client one at a time, though I'd suggest
putting a delay in between cycles in that case so as not to make the guys at
NCBI cranky.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Roger Hall
> Sent: Tuesday, February 07, 2006 11:17 PM
> To: Paul.Boutros at utoronto.ca; 'BioPerl Mailing List'
> Subject: Re: [Bioperl-l] RemoteBlast [was: (no subject)]
> 
> Paul,
> 
> I think that most core Bioperl folks have long since moved 
> away from RemoteBlast and are using the functionality in 
> StandAloneBlast to run their own local servers. More 
> importantly, they are, in general, researchers who are coming 
> to Bioinformatics from the life sciences side, and are 
> particularly tired of dealing with the technical issues that 
> RemoteBlast consistently generates due to changes in the 
> text-formatted BLAST reports. 
> 
> They aren't code-for-code-sake geeks like me. ;}
> 
> When RemoteBlast was written, XML was barely on the 
> technology radar, and XML-formatted BLAST reports weren't 
> even available. It seems that everyone recognizes that the 
> XML reports now generated by NCBI's blast server is the wave 
> of the future, but I think there is still some concern that 
> not every flavor of BLAST produces XML yet. Even so, the XML 
> parser is considered to be very strong, and only helps hasten 
> the end of text-formatted support, since parsing 
> text-formatted reports is the primary source of pain. 
> 
> In discussing the shift from old to new, I think the idea of 
> relying on NCBI's application (and NCBI's issue system and 
> NCBI's developers) entered the realm of possibility, so as 
> the guy who just showed up to adopt RemoteBlast, I am trying 
> to air all options and beg for all requirements. 
> 
> Personally, I am okay with the idea of maintaining 
> text-formatted report parsing, but like I said, I'm pound 
> foolish about code sometimes. Additional foolishness arises 
> from the fact that the first money I earned in Bioinformatics 
> was on a contract gig where I relied on RemoteBlast (and the 
> related text parsers).
> 
> For my money, I just needed anyone, anywhere, to say they 
> desired a pure perl implementation to meet my personal 
> threshold. So far, you're the second. ;}
> 
> I do, however, see the advantage in shifting to XML-formatted 
> reporting and parsing *only* as soon as every BLAST flavor 
> supports it, if not before.
> (Anyone - is this still an issue. Please educate me.)
> 
> At the moment, I'm leaning towards adding an option to 
> RemoteBlast. The default (no option) would use a "pure perl" 
> implementation, and the enhancement (with explicit option) 
> would merely wrap the NCBI executable.
> However, there are other issues (queuing, batches) that I 
> don't fully understand in context, so I haven't zeroed in on 
> a complete recommendation yet. Additionally, the end of 
> text-formatted reports, while drawing near, is not yet 
> agreed, although it is pretty clear that the only way text 
> support will be continued is if I insist on it and then 
> deliver the support myself.
> :}
> 
> In any case, I am very interested in a pure perl 
> implementation for exactly the two reasons stated thus far: 
> it's one less thing for a newbie to worry about, and it will 
> run on every platform that runs perl. 
> 
> Thanks much for the input!
> 
> Roger Hall
> Technical Director
> MidSouth Bioinformatics Center
> University of Arkansas at Little Rock
> (501) 569-8074
> 
> 
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Paul Boutros
> Sent: Tuesday, February 07, 2006 7:39 PM
> To: BioPerl Mailing List
> Cc: Roger Hall
> Subject: [Bioperl-l] (no subject)
> 
> Hi Roger,
> 
> I would definitely prefer a fully Perl-based implementation.  
> For starters, I have not been successful in compiling the 
> Toolkit that contains netblast for some platforms (e.g. 
> AIX 5.2 w/gcc 4.0).
> 
> I haven't been following the discussion: is there some 
> compelling reason to prefer a netblast-based system that's 
> come up recently?  I'm guessing that adding a new non-perl 
> dependency would only be done if there was considerable 
> justification for this type of change, but I'm not clear from 
> your message what that justification is.
> 
> Paul
> 
> 
> 
> ------------------------------ 
> 
> Message: 12
> Date: Mon, 6 Feb 2006 20:46:44 -0600
> From: "Roger Hall" 
> Subject: [Bioperl-l] RemoteBlast users - potentially major changes - 
>         please        reply 
> To: 
> Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL> 
> Content-Type: text/plain;        charset="us-ascii" 
> 
> To everyone who uses RemoteBlast.pm: 
> 
> Would anyone object to RemoteBlast being rewritten in a way 
> that requires NCBI's blastcl3 executable? 
> 
> Binary downloads of blastcl3 (column "netblast") are 
> available for numerous platforms at: 
> http://ncbi.nih.gov/BLAST/download.shtml 
> 
> Does anyone require or desire a "pure perl" implementation? 
> If so, please explain the advantage you see with such an 
> implementation. 
> 
> Thanks! 
>  
> 
> Roger Hall 
> 
> Technical Director 
> 
> MidSouth Bioinformatics Center 
> 
> University of Arkansas at Little Rock 
> 
> (501) 569-8074 
> 
>   
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Wed Feb  8 20:51:41 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 14:51:41 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast output
Message-ID: <43EA59DD.1030608@gmx.at>

Hi,
If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO,  
I get the following error message:

MSG: no data for midline Query  1   WWWKWRW  7
STACK Bio::SearchIO::blast::next_result 
/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
STACK toplevel 
/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

is that a bug......

If I want to parse Blast Output (version 2.2.13), I don't get anything.....
I'm using bioperl 1.4

before, I have installed bioperl 1.4, it worked fine parsing Blast 
Output (version 2.2.12), but I don't remember which bioperl version I 
had installed

thanks in advance

Hubert





From cjfields at uiuc.edu  Wed Feb  8 22:15:23 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 16:15:23 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA59DD.1030608@gmx.at>
Message-ID: <001101c62cfd$28605df0$15327e82@pyrimidine>

My guess is you're running into text parsing problems in
Bio::SearchIO::blast.  Upgrade to the latest developer version (1.5.1) or
bioperl-live (CVS), then see the bug below. 

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

I think the first problem you ran into is solved in bioperl 1.5.1, the last
problem (more recent, not related to the first) has been fixed but hasn't
been committed to bioperl-live yet.  The fixed SearchIO::blast is available
in the link above, but realize it hasn't been committed yet and may change.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Hubert Prielinger
> Sent: Wednesday, February 08, 2006 2:52 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> Hi,
> If I want to parse a Blast Output (Version 2.2.12) with 
> Bio::SearchIO, I get the following error message:
> 
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> is that a bug......
> 
> If I want to parse Blast Output (version 2.2.13), I don't get 
> anything.....
> I'm using bioperl 1.4
> 
> before, I have installed bioperl 1.4, it worked fine parsing 
> Blast Output (version 2.2.12), but I don't remember which 
> bioperl version I had installed
> 
> thanks in advance
> 
> Hubert
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Wed Feb  8 21:41:04 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 15:41:04 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <001101c62cfd$28605df0$15327e82@pyrimidine>
References: <001101c62cfd$28605df0$15327e82@pyrimidine>
Message-ID: <43EA6570.9070909@gmx.at>

hi chris,
thanks, I have upgraded to version 1.5.1 but it isn't still working, do 
you have any ohter idea, the problem I have is that I have to parse a 
lot of textfiles....
or shall I look for another option to parse those files...

regards
Hubert



Chris Fields wrote:

>My guess is you're running into text parsing problems in
>Bio::SearchIO::blast.  Upgrade to the latest developer version (1.5.1) or
>bioperl-live (CVS), then see the bug below. 
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>I think the first problem you ran into is solved in bioperl 1.5.1, the last
>problem (more recent, not related to the first) has been fixed but hasn't
>been committed to bioperl-live yet.  The fixed SearchIO::blast is available
>in the link above, but realize it hasn't been committed yet and may change.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org 
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>Hubert Prielinger
>>Sent: Wednesday, February 08, 2006 2:52 PM
>>To: bioperl-l at bioperl.org
>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>Hi,
>>If I want to parse a Blast Output (Version 2.2.12) with 
>>Bio::SearchIO, I get the following error message:
>>
>>MSG: no data for midline Query  1   WWWKWRW  7
>>STACK Bio::SearchIO::blast::next_result
>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>STACK toplevel
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>>is that a bug......
>>
>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>anything.....
>>I'm using bioperl 1.4
>>
>>before, I have installed bioperl 1.4, it worked fine parsing 
>>Blast Output (version 2.2.12), but I don't remember which 
>>bioperl version I had installed
>>
>>thanks in advance
>>
>>Hubert
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>    
>>
>
>
>  
>



From cjfields at uiuc.edu  Wed Feb  8 23:00:21 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 17:00:21 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6570.9070909@gmx.at>
Message-ID: <001201c62d03$703178c0$15327e82@pyrimidine>

Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
just the modules you want; mixing bioperl versions might work, but you might
run into interoperability problems).  Then replace the Bio::SearchIO::blast
with the one in Bugzilla.  The 'other option' you mentioned might be trying
XML instead of text, which is more stable in the long run.  You will still
need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
this:

http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

If you're using SearchIO directly instead of Remoteblast, you should be able
to set the '-readmethod' flag to 'blastxml'.

It also wouldn't hurt to know what OS you're using or see some code.  Roger
is out there somewhere (I think) and may also have some input.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
> Sent: Wednesday, February 08, 2006 3:41 PM
> To: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> hi chris,
> thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, do you have any ohter idea, the problem I have is 
> that I have to parse a lot of textfiles....
> or shall I look for another option to parse those files...
> 
> regards
> Hubert
> 
> 
> 
> Chris Fields wrote:
> 
> >My guess is you're running into text parsing problems in 
> >Bio::SearchIO::blast.  Upgrade to the latest developer 
> version (1.5.1) 
> >or bioperl-live (CVS), then see the bug below.
> >
> >http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >
> >I think the first problem you ran into is solved in bioperl 
> 1.5.1, the 
> >last problem (more recent, not related to the first) has 
> been fixed but 
> >hasn't been committed to bioperl-live yet.  The fixed 
> SearchIO::blast 
> >is available in the link above, but realize it hasn't been 
> committed yet and may change.
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >  
> >
> >>-----Original Message-----
> >>From: bioperl-l-bounces at lists.open-bio.org
> >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>Prielinger
> >>Sent: Wednesday, February 08, 2006 2:52 PM
> >>To: bioperl-l at bioperl.org
> >>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>output
> >>
> >>Hi,
> >>If I want to parse a Blast Output (Version 2.2.12) with 
> Bio::SearchIO, 
> >>I get the following error message:
> >>
> >>MSG: no data for midline Query  1   WWWKWRW  7
> >>STACK Bio::SearchIO::blast::next_result
> >>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>STACK toplevel
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>is that a bug......
> >>
> >>If I want to parse Blast Output (version 2.2.13), I don't get 
> >>anything.....
> >>I'm using bioperl 1.4
> >>
> >>before, I have installed bioperl 1.4, it worked fine parsing Blast 
> >>Output (version 2.2.12), but I don't remember which bioperl 
> version I 
> >>had installed
> >>
> >>thanks in advance
> >>
> >>Hubert
> >>
> >>
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>    
> >>
> >
> >
> >  
> >
> 



From hubert.prielinger at gmx.at  Wed Feb  8 22:22:44 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Wed, 08 Feb 2006 16:22:44 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <001201c62d03$703178c0$15327e82@pyrimidine>
References: <001201c62d03$703178c0$15327e82@pyrimidine>
Message-ID: <43EA6F34.4090007@gmx.at>

hi,
I have installed from the following page: 
http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext. 
I'm using only the SearchIO without remoteblast module, because I have 
already all my Blast output files.
My operating system is fedora core 9.

Code:

#!/usr/bin/perl -w

use Bio::SearchIO;

print "start program\n";
my $directory = 
"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
print "read file\n";

my $search = new Bio::SearchIO (-format => 'blast',
                                -file => $file);
                               
my $cutoff_len = 10;
                               


#iterate over each query sequence
while (my $result = $search->next_result) {
print "entered 1st while loop\n";
   
    #iterate over each hit on the query sequence
    while (my $hit = $result->next_hit) {
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
           
            if ($hsp->length('sbjct') <= $cutoff_len) {
                #print $hsp->hit_string, "\n";
                for ($hsp->hit_string) {
               
                   
                    if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || 
tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
                       
                        # Print some tab-delimited data about this HSP
           
                           open (bigShot, ">>BlastOutputTrial.txt") || 
die ("Could not open file. $!");
                                #print $result->query_name, "\t";
           
#                        print $hit->significance, "\t";
                         print bigShot $hit->name, "-->";
                         print bigShot $hit->description, "\n";
                         #print bigShot "Query:   ", 
$hsp->start('query'), "  ", $hsp->query_string, "  ", 
$hsp->end('query'), "\n";
                         print bigShot "Seq:     ", $hsp->start('hit'), 
"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
                          
#                        print $hsp->rank, "\t";
#                        print $hsp->percent_identity, "\t";
#                        print $hsp->evalue, "\t";
#                        print $hsp->hsp_length, "\n";
                   
                        close (bigShot);
                       
                    };
               
           
            }
        }
        }
    }
}

}

closedir(DIR);


Chris Fields wrote:

>Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
>just the modules you want; mixing bioperl versions might work, but you might
>run into interoperability problems).  Then replace the Bio::SearchIO::blast
>with the one in Bugzilla.  The 'other option' you mentioned might be trying
>XML instead of text, which is more stable in the long run.  You will still
>need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
>this:
>
>http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>
>If you're using SearchIO directly instead of Remoteblast, you should be able
>to set the '-readmethod' flag to 'blastxml'.
>
>It also wouldn't hurt to know what OS you're using or see some code.  Roger
>is out there somewhere (I think) and may also have some input.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
>>Sent: Wednesday, February 08, 2006 3:41 PM
>>To: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>hi chris,
>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>working, do you have any ohter idea, the problem I have is 
>>that I have to parse a lot of textfiles....
>>or shall I look for another option to parse those files...
>>
>>regards
>>Hubert
>>
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>My guess is you're running into text parsing problems in 
>>>Bio::SearchIO::blast.  Upgrade to the latest developer 
>>>      
>>>
>>version (1.5.1) 
>>    
>>
>>>or bioperl-live (CVS), then see the bug below.
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>I think the first problem you ran into is solved in bioperl 
>>>      
>>>
>>1.5.1, the 
>>    
>>
>>>last problem (more recent, not related to the first) has 
>>>      
>>>
>>been fixed but 
>>    
>>
>>>hasn't been committed to bioperl-live yet.  The fixed 
>>>      
>>>
>>SearchIO::blast 
>>    
>>
>>>is available in the link above, but realize it hasn't been 
>>>      
>>>
>>committed yet and may change.
>>    
>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>Prielinger
>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>To: bioperl-l at bioperl.org
>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>        
>>>>
>>parsing Blast 
>>    
>>
>>>>output
>>>>
>>>>Hi,
>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>        
>>>>
>>Bio::SearchIO, 
>>    
>>
>>>>I get the following error message:
>>>>
>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>STACK Bio::SearchIO::blast::next_result
>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>STACK toplevel
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>is that a bug......
>>>>
>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>anything.....
>>>>I'm using bioperl 1.4
>>>>
>>>>before, I have installed bioperl 1.4, it worked fine parsing Blast 
>>>>Output (version 2.2.12), but I don't remember which bioperl 
>>>>        
>>>>
>>version I 
>>    
>>
>>>>had installed
>>>>
>>>>thanks in advance
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>
>
>  
>



From rahall2 at ualr.edu  Wed Feb  8 23:34:45 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Wed, 8 Feb 2006 17:34:45 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6F34.4090007@gmx.at>
Message-ID: <000401c62d08$3ede6b70$4301a8c0@LIBERAL>

Hubert,

Give me a bit to look over your code and think this through. I am still
re-familiarizing myself with the relevant modules, so I can't give an answer
off the top of my head.

Also, please send me one or more of your blast reports (zipped) if you don't
mind (and maybe avoid including the list in your reply). Let's take this
"offline" relative to the list - we'll include the list again if there is a
Bioperl issue and solution. (In case you are concerned at all, I promise not
to share or study the actual BLAST results.)

I'm not particularly familiar with the Fedora distributions, but I'm sure I
can either chase down the perl problem or at least eliminate everything else
but Fedora as the culprit. ;}

(Chris - I'm not quite paying attention on an hourly basis yet, but I do
intend to help support these issues for the foreseeable future. Thanks as
always for the assist.)

Thanks!

Roger Hall
Technical Director
MidSouth Bioinformatics Center
University of Arkansas at Little Rock
(501) 569-8074



-----Original Message-----
From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
Sent: Wednesday, February 08, 2006 4:23 PM
To: Chris Fields; bioperl-l at bioperl.org; rahall2 at ualr.edu
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
output

hi,
I have installed from the following page: 
http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext. 
I'm using only the SearchIO without remoteblast module, because I have 
already all my Blast output files.
My operating system is fedora core 9.

Code:

#!/usr/bin/perl -w

use Bio::SearchIO;

print "start program\n";
my $directory = 
"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
opendir(DIR, $directory) || die("Cannot open directory");
print "opened directory\n";

foreach my $file (readdir(DIR))  {
print "read file\n";

my $search = new Bio::SearchIO (-format => 'blast',
                                -file => $file);
                               
my $cutoff_len = 10;
                               


#iterate over each query sequence
while (my $result = $search->next_result) {
print "entered 1st while loop\n";
   
    #iterate over each hit on the query sequence
    while (my $hit = $result->next_hit) {
       
        #iterate over each HSP in the hit
        while (my $hsp = $hit->next_hsp) {
           
            if ($hsp->length('sbjct') <= $cutoff_len) {
                #print $hsp->hit_string, "\n";
                for ($hsp->hit_string) {
               
                   
                    if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 || 
tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
                       
                        # Print some tab-delimited data about this HSP
           
                           open (bigShot, ">>BlastOutputTrial.txt") || 
die ("Could not open file. $!");
                                #print $result->query_name, "\t";
           
#                        print $hit->significance, "\t";
                         print bigShot $hit->name, "-->";
                         print bigShot $hit->description, "\n";
                         #print bigShot "Query:   ", 
$hsp->start('query'), "  ", $hsp->query_string, "  ", 
$hsp->end('query'), "\n";
                         print bigShot "Seq:     ", $hsp->start('hit'), 
"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
                          
#                        print $hsp->rank, "\t";
#                        print $hsp->percent_identity, "\t";
#                        print $hsp->evalue, "\t";
#                        print $hsp->hsp_length, "\n";
                   
                        close (bigShot);
                       
                    };
               
           
            }
        }
        }
    }
}

}

closedir(DIR);


Chris Fields wrote:

>Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live (not
>just the modules you want; mixing bioperl versions might work, but you
might
>run into interoperability problems).  Then replace the Bio::SearchIO::blast
>with the one in Bugzilla.  The 'other option' you mentioned might be trying
>XML instead of text, which is more stable in the long run.  You will still
>need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
>this:
>
>http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>
>If you're using SearchIO directly instead of Remoteblast, you should be
able
>to set the '-readmethod' flag to 'blastxml'.
>
>It also wouldn't hurt to know what OS you're using or see some code.  Roger
>is out there somewhere (I think) and may also have some input.
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at] 
>>Sent: Wednesday, February 08, 2006 3:41 PM
>>To: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>hi chris,
>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>working, do you have any ohter idea, the problem I have is 
>>that I have to parse a lot of textfiles....
>>or shall I look for another option to parse those files...
>>
>>regards
>>Hubert
>>
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>My guess is you're running into text parsing problems in 
>>>Bio::SearchIO::blast.  Upgrade to the latest developer 
>>>      
>>>
>>version (1.5.1) 
>>    
>>
>>>or bioperl-live (CVS), then see the bug below.
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>I think the first problem you ran into is solved in bioperl 
>>>      
>>>
>>1.5.1, the 
>>    
>>
>>>last problem (more recent, not related to the first) has 
>>>      
>>>
>>been fixed but 
>>    
>>
>>>hasn't been committed to bioperl-live yet.  The fixed 
>>>      
>>>
>>SearchIO::blast 
>>    
>>
>>>is available in the link above, but realize it hasn't been 
>>>      
>>>
>>committed yet and may change.
>>    
>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>Prielinger
>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>To: bioperl-l at bioperl.org
>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>        
>>>>
>>parsing Blast 
>>    
>>
>>>>output
>>>>
>>>>Hi,
>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>        
>>>>
>>Bio::SearchIO, 
>>    
>>
>>>>I get the following error message:
>>>>
>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>STACK Bio::SearchIO::blast::next_result
>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>STACK toplevel
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>is that a bug......
>>>>
>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>anything.....
>>>>I'm using bioperl 1.4
>>>>
>>>>before, I have installed bioperl 1.4, it worked fine parsing Blast 
>>>>Output (version 2.2.12), but I don't remember which bioperl 
>>>>        
>>>>
>>version I 
>>    
>>
>>>>had installed
>>>>
>>>>thanks in advance
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>   
>>>>
>>>>        
>>>>
>>> 
>>>
>>>      
>>>
>
>
>  
>




From injunjoel at hotmail.com  Thu Feb  9 00:54:26 2006
From: injunjoel at hotmail.com (Joel Steele)
Date: Wed, 08 Feb 2006 16:54:26 -0800
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blastoutput
In-Reply-To: <43EA6F34.4090007@gmx.at>
Message-ID: 

Greetings,
Im not well versed in Bio::SearchIO but there are a few comments about your 
code that may or may not be relevant...

first thing:

=-=-=-=-=code snippet=-=-=-=-=

#!/usr/bin/perl -w
use strict;   #save yourself the headaches and force yourself to write clean 
code.

=-=-=-=-=code snippet=-=-=-=-=

next thing:
when you are reading the files from the directory you are not doing any sort 
of filtering as to what is returned. If you are on a Unix flavored system 
you may be getting the '.' and '..' entries from your readdir(DIR) call. I 
would suggest placing a grep in there somewhere to get only blast files.
something like:

=-=-=-=-=code snippet=-=-=-=-=

#assuming the file extension for blast files is .bls
#the -e and -f are filetests; you could probably get away with just
#-f. Here is a link for reference on the filetests available in Perl.
#
# http://www.perlmonks.org/?node_id=370

my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
closedir(DIR);

#then proceed with your foreach but over @files_to_parse

foreach my $file(@files_to_parse){
     #do cool stuff here...
}

=-=-=-=-=code snippet=-=-=-=-=

Hope that helps.
-Joel Steele


"The surest way to corrupt a youth is to instruct him to hold in higher 
regard those who think alike than those who think differently." -Nietzsche

"I do not feel obliged to believe that the same God who endowed us with 
sense, reason and intellect has intended us to forego their use." -Galileo




>From: Hubert Prielinger 
>To: Chris Fields , bioperl-l at bioperl.org, 
>rahall2 at ualr.edu
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>Blastoutput
>Date: Wed, 08 Feb 2006 16:22:44 -0600
>MIME-Version: 1.0
>Received: from newportal.open-bio.org ([209.59.5.172]) by 
>bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Wed, 8 
>Feb 2006 15:21:55 -0800
>Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by 
>newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k18NKjCX009295;Wed, 8 
>Feb 2006 18:20:53 -0500
>Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by 
>newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for 
>; Wed, 8 Feb 2006 18:20:43 -0500
>Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) 
>[136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006 00:19:21 
>+0100
>X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>X-Authenticated: #16854991
>User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>X-Accept-Language: en-us, en
>References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>X-Y-GMX-Trusted: 0
>X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 
>(newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21 -0500 (EST)
>X-Greylist: IP, sender and recipient auto-whitelisted, not delayed 
>bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Wed, 08 
>Feb 2006 18:20:43 -0500 (EST)
>X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>X-Scanned-By: MIMEDefang 2.52
>X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>X-BeenThere: bioperl-l at lists.open-bio.org
>X-Mailman-Version: 2.1.7
>Precedence: list
>List-Id: Bioperl Project Discussion List 
>List-Unsubscribe: 
>,
>List-Archive: 
>List-Post: 
>List-Help: 
>List-Subscribe: 
>,
>Errors-To: bioperl-l-bounces at lists.open-bio.org
>Return-Path: bioperl-l-bounces at lists.open-bio.org
>X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC) 
>FILETIME=[7419CF20:01C62D06]
>
>hi,
>I have installed from the following page:
>http://news.open-bio.org/archives/2005_10.html,  the Core, Run and Ext.
>I'm using only the SearchIO without remoteblast module, because I have
>already all my Blast output files.
>My operating system is fedora core 9.
>
>Code:
>
>#!/usr/bin/perl -w
>
>use Bio::SearchIO;
>
>print "start program\n";
>my $directory =
>"/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>opendir(DIR, $directory) || die("Cannot open directory");
>print "opened directory\n";
>
>foreach my $file (readdir(DIR))  {
>print "read file\n";
>
>my $search = new Bio::SearchIO (-format => 'blast',
>                                 -file => $file);
>
>my $cutoff_len = 10;
>
>
>
>#iterate over each query sequence
>while (my $result = $search->next_result) {
>print "entered 1st while loop\n";
>
>     #iterate over each hit on the query sequence
>     while (my $hit = $result->next_hit) {
>
>         #iterate over each HSP in the hit
>         while (my $hsp = $hit->next_hsp) {
>
>             if ($hsp->length('sbjct') <= $cutoff_len) {
>                 #print $hsp->hit_string, "\n";
>                 for ($hsp->hit_string) {
>
>
>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>
>                         # Print some tab-delimited data about this HSP
>
>                            open (bigShot, ">>BlastOutputTrial.txt") ||
>die ("Could not open file. $!");
>                                 #print $result->query_name, "\t";
>
>#                        print $hit->significance, "\t";
>                          print bigShot $hit->name, "-->";
>                          print bigShot $hit->description, "\n";
>                          #print bigShot "Query:   ",
>$hsp->start('query'), "  ", $hsp->query_string, "  ",
>$hsp->end('query'), "\n";
>                          print bigShot "Seq:     ", $hsp->start('hit'),
>"  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
>
>#                        print $hsp->rank, "\t";
>#                        print $hsp->percent_identity, "\t";
>#                        print $hsp->evalue, "\t";
>#                        print $hsp->hsp_length, "\n";
>
>                         close (bigShot);
>
>                     };
>
>
>             }
>         }
>         }
>     }
>}
>
>}
>
>closedir(DIR);
>
>
>Chris Fields wrote:
>
> >Make sure you ran a full installation of bioperl-1.5.1 or bioperl-live 
>(not
> >just the modules you want; mixing bioperl versions might work, but you 
>might
> >run into interoperability problems).  Then replace the 
>Bio::SearchIO::blast
> >with the one in Bugzilla.  The 'other option' you mentioned might be 
>trying
> >XML instead of text, which is more stable in the long run.  You will 
>still
> >need to run a full upgrade to bioperl 1.5.1 for that; make sure you read
> >this:
> >
> >http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
> >
> >If you're using SearchIO directly instead of Remoteblast, you should be 
>able
> >to set the '-readmethod' flag to 'blastxml'.
> >
> >It also wouldn't hurt to know what OS you're using or see some code.  
>Roger
> >is out there somewhere (I think) and may also have some input.
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >
> >
> >>-----Original Message-----
> >>From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
> >>Sent: Wednesday, February 08, 2006 3:41 PM
> >>To: Chris Fields; bioperl-l at bioperl.org
> >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>parsing Blast output
> >>
> >>hi chris,
> >>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>working, do you have any ohter idea, the problem I have is
> >>that I have to parse a lot of textfiles....
> >>or shall I look for another option to parse those files...
> >>
> >>regards
> >>Hubert
> >>
> >>
> >>
> >>Chris Fields wrote:
> >>
> >>
> >>
> >>>My guess is you're running into text parsing problems in
> >>>Bio::SearchIO::blast.  Upgrade to the latest developer
> >>>
> >>>
> >>version (1.5.1)
> >>
> >>
> >>>or bioperl-live (CVS), then see the bug below.
> >>>
> >>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>
> >>>I think the first problem you ran into is solved in bioperl
> >>>
> >>>
> >>1.5.1, the
> >>
> >>
> >>>last problem (more recent, not related to the first) has
> >>>
> >>>
> >>been fixed but
> >>
> >>
> >>>hasn't been committed to bioperl-live yet.  The fixed
> >>>
> >>>
> >>SearchIO::blast
> >>
> >>
> >>>is available in the link above, but realize it hasn't been
> >>>
> >>>
> >>committed yet and may change.
> >>
> >>
> >>>Christopher Fields
> >>>Postdoctoral Researcher - Switzer Lab
> >>>Dept. of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>>-----Original Message-----
> >>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
> >>>>Prielinger
> >>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>To: bioperl-l at bioperl.org
> >>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>
> >>>>
> >>parsing Blast
> >>
> >>
> >>>>output
> >>>>
> >>>>Hi,
> >>>>If I want to parse a Blast Output (Version 2.2.12) with
> >>>>
> >>>>
> >>Bio::SearchIO,
> >>
> >>
> >>>>I get the following error message:
> >>>>
> >>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>STACK Bio::SearchIO::blast::next_result
> >>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>STACK toplevel
> >>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>>
> >>>>is that a bug......
> >>>>
> >>>>If I want to parse Blast Output (version 2.2.13), I don't get
> >>>>anything.....
> >>>>I'm using bioperl 1.4
> >>>>
> >>>>before, I have installed bioperl 1.4, it worked fine parsing Blast
> >>>>Output (version 2.2.12), but I don't remember which bioperl
> >>>>
> >>>>
> >>version I
> >>
> >>
> >>>>had installed
> >>>>
> >>>>thanks in advance
> >>>>
> >>>>Hubert
> >>>>
> >>>>
> >>>>
> >>>>_______________________________________________
> >>>>Bioperl-l mailing list
> >>>>Bioperl-l at lists.open-bio.org
> >>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >>>
> >
> >
> >
> >
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l




From saldroubi at yahoo.com  Thu Feb  9 01:12:16 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Wed, 8 Feb 2006 17:12:16 -0800 (PST)
Subject: [Bioperl-l] Documentation link?
Message-ID: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com>

All,
  
 Forgive me but I don't see the documentation link on the  new website.  I only see a link to the HOWTO's. I think I am  looking for the Pdoc link. 
  
  Thank you. 
  


Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From saldroubi at yahoo.com  Thu Feb  9 01:24:23 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Wed, 8 Feb 2006 17:24:23 -0800 (PST)
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>

All,
  
  Say I have an array of nucleotide sequences of of length N.  I  want to calculate the count matrix (weight matrix). That is for each  position 1..N, I want to know how many As, Cs ,Ts and Gs there  are.  Is the code to do this already written in bioperl to build  this matrix if I pass it those strings?
  
  Please excuse my lack of knowledge as I am a new comer to bioinformatics.
  
  Thank you. 
  
  
  
  

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From osborne1 at optonline.net  Thu Feb  9 01:44:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Wed, 08 Feb 2006 20:44:56 -0500
Subject: [Bioperl-l] Documentation link?
In-Reply-To: <20060209011216.39949.qmail@web34311.mail.mud.yahoo.com>
Message-ID: 

Sam,

http://bioperl.open-bio.org/wiki/Main_Page

Look for the API Docs under "main links".

Brian O.


On 2/8/06 8:12 PM, "Sam Al-Droubi"  wrote:

> All,
>   
>  Forgive me but I don't see the documentation link on the  new website.  I
> only see a link to the HOWTO's. I think I am  looking for the Pdoc link.
>   
>   Thank you. 
>   
> 
> 
> Sincerely, 
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From torsten.seemann at infotech.monash.edu.au  Thu Feb  9 02:54:39 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 13:54:39 +1100
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>
References: <20060209012423.88400.qmail@web34305.mail.mud.yahoo.com>
Message-ID: <43EAAEEF.3000304@infotech.monash.edu.au>

>   Say I have an array of nucleotide sequences of of length N.  I  want to calculate the count matrix (weight matrix). That is for each  position 1..N, I want to know how many As, Cs ,Ts and Gs there  are.  Is the code to do this already written in bioperl to build  this matrix if I pass it those strings?
>   Please excuse my lack of knowledge as I am a new comer to bioinformatics.

Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
example similar to what you want to do:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html

--Torsten Seemann



From cjfields at uiuc.edu  Thu Feb  9 05:07:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 8 Feb 2006 23:07:15 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blastoutput
In-Reply-To: 
References: 
Message-ID: 


On Feb 8, 2006, at 6:54 PM, Joel Steele wrote:

> Greetings,
> Im not well versed in Bio::SearchIO but there are a few comments  
> about your
> code that may or may not be relevant...
>
> first thing:
>
> =-=-=-=-=code snippet=-=-=-=-=
>
> #!/usr/bin/perl -w
> use strict;   #save yourself the headaches and force yourself to  
> write clean
> code.
>
> =-=-=-=-=code snippet=-=-=-=-=
>

Tread very carefully here.  Just about every book on perl suggests  
'use strict' and adding warnings for code development (ex. the Camel,  
the Llama, and others); in fact, these are the very books most  
beginners start from.  Some would consider NOT using -w or 'use  
strict' a bad habit; everybody has an opinion (I would repeat an oft- 
heard Texas saying, but I'll refrain).  Just remember: try to be a  
little more constructive in your critique and insert a little less  
about your personal coding style.  If you hit the wrong person, you  
might get flamed.

Here's a link that may help a bit here:

http://bioperl.org/Core/Latest/ 
biodesign.html#respect_people_s_code__in_particular_if_it_works_

> next thing:
> when you are reading the files from the directory you are not doing  
> any sort
> of filtering as to what is returned. If you are on a Unix flavored  
> system
> you may be getting the '.' and '..' entries from your readdir(DIR)  
> call. I
> would suggest placing a grep in there somewhere to get only blast  
> files.
> something like:
>

I agree here.  You could probably also use something like File::Find  
here to make things a bit easier with the file names as well; works  
wonderfully, esp. when traversing a directory tree.

> =-=-=-=-=code snippet=-=-=-=-=
>
> #assuming the file extension for blast files is .bls
> #the -e and -f are filetests; you could probably get away with just
> #-f. Here is a link for reference on the filetests available in Perl.
> #
> # http://www.perlmonks.org/?node_id=370
>
> my @files_to_parse = grep{/\w+\.bls/ && -e && -f} readdir(DIR);
> closedir(DIR);
>
> #then proceed with your foreach but over @files_to_parse
>
> foreach my $file(@files_to_parse){
>      #do cool stuff here...
> }
>

Again, agreed.  But, does it really solve the main problem, which is  
an issue with SearchIO::blast?  It seemed to try parsing a blast file...

> =-=-=-=-=code snippet=-=-=-=-=
>
> Hope that helps.
> -Joel Steele
>
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger 
>> To: Chris Fields , bioperl-l at bioperl.org,
>> rahall2 at ualr.edu
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
>> Blastoutput
>> Date: Wed, 08 Feb 2006 16:22:44 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc11-f17.bay0.hotmail.com with Microsoft SMTPSVC 
>> (6.0.3790.211); Wed, 8
>> Feb 2006 15:21:55 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k18NKjCX009295;Wed, 8
>> Feb 2006 18:20:53 -0500
>> Received: from mail.gmx.net (mail.gmx.net [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k18NKhS5009289for
>> ; Wed, 8 Feb 2006 18:20:43 -0500
>> Received: (qmail invoked by alias); 08 Feb 2006 23:19:21 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp020) with SMTP; 09 Feb 2006  
>> 00:19:21
>> +0100
>> X-Message-Info: N4u0pqWW+O3IGnF2tRfvcViLTroM8CQX8qbJiCtgSIY=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <001201c62d03$703178c0$15327e82 at pyrimidine>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Wed, 08 Feb 2006 18:21:21  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Wed, 08
>> Feb 2006 18:20:43 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List > bio.org>
>> List-Unsubscribe:
>> > l>,
>> List-Archive: 
>> List-Post: 
>> List-Help: 
>> List-Subscribe:
>> > l>,
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 08 Feb 2006 23:21:56.0754 (UTC)
>> FILETIME=[7419CF20:01C62D06]
>>
>> hi,
>> I have installed from the following page:
>> http://news.open-bio.org/archives/2005_10.html,  the Core, Run and  
>> Ext.
>> I'm using only the SearchIO without remoteblast module, because I  
>> have
>> already all my Blast output files.
>> My operating system is fedora core 9.
>>
>> Code:
>>
>> #!/usr/bin/perl -w
>>
>> use Bio::SearchIO;
>>
>> print "start program\n";
>> my $directory =
>> "/home/Hubert/installed/eclipse/workspace/Database_Search/result_4";
>> opendir(DIR, $directory) || die("Cannot open directory");
>> print "opened directory\n";
>>
>> foreach my $file (readdir(DIR))  {
>> print "read file\n";
>>
>> my $search = new Bio::SearchIO (-format => 'blast',
>>                                 -file => $file);
>>
>> my $cutoff_len = 10;
>>
>>
>>
>> #iterate over each query sequence
>> while (my $result = $search->next_result) {
>> print "entered 1st while loop\n";
>>
>>     #iterate over each hit on the query sequence
>>     while (my $hit = $result->next_hit) {
>>
>>         #iterate over each HSP in the hit
>>         while (my $hsp = $hit->next_hsp) {
>>
>>             if ($hsp->length('sbjct') <= $cutoff_len) {
>>                 #print $hsp->hit_string, "\n";
>>                 for ($hsp->hit_string) {
>>
>>
>>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>> tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>>
>>                         # Print some tab-delimited data about this  
>> HSP
>>
>>                            open (bigShot,  
>> ">>BlastOutputTrial.txt") ||
>> die ("Could not open file. $!");
>>                                 #print $result->query_name, "\t";
>>
>> #                        print $hit->significance, "\t";
>>                          print bigShot $hit->name, "-->";
>>                          print bigShot $hit->description, "\n";
>>                          #print bigShot "Query:   ",
>> $hsp->start('query'), "  ", $hsp->query_string, "  ",
>> $hsp->end('query'), "\n";
>>                          print bigShot "Seq:     ", $hsp->start 
>> ('hit'),
>> "  ", $hsp->hit_string, "  ", $hsp->end('hit'), "\n";
>>
>> #                        print $hsp->rank, "\t";
>> #                        print $hsp->percent_identity, "\t";
>> #                        print $hsp->evalue, "\t";
>> #                        print $hsp->hsp_length, "\n";
>>
>>                         close (bigShot);
>>
>>                     };
>>
>>
>>             }
>>         }
>>         }
>>     }
>> }
>>
>> }
>>
>> closedir(DIR);
>>
>>
>> Chris Fields wrote:
>>
>>> Make sure you ran a full installation of bioperl-1.5.1 or bioperl- 
>>> live
>> (not
>>> just the modules you want; mixing bioperl versions might work,  
>>> but you
>> might
>>> run into interoperability problems).  Then replace the
>> Bio::SearchIO::blast
>>> with the one in Bugzilla.  The 'other option' you mentioned might be
>> trying
>>> XML instead of text, which is more stable in the long run.  You will
>> still
>>> need to run a full upgrade to bioperl 1.5.1 for that; make sure  
>>> you read
>>> this:
>>>
>>> http://bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast
>>>
>>> If you're using SearchIO directly instead of Remoteblast, you  
>>> should be
>> able
>>> to set the '-readmethod' flag to 'blastxml'.
>>>
>>> It also wouldn't hurt to know what OS you're using or see some code.
>> Roger
>>> is out there somewhere (I think) and may also have some input.
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Hubert Prielinger [mailto:hubert.prielinger at gmx.at]
>>>> Sent: Wednesday, February 08, 2006 3:41 PM
>>>> To: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>> working, do you have any ohter idea, the problem I have is
>>>> that I have to parse a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>>>
>>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer
>>>>>
>>>>>
>>>> version (1.5.1)
>>>>
>>>>
>>>>> or bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl
>>>>>
>>>>>
>>>> 1.5.1, the
>>>>
>>>>
>>>>> last problem (more recent, not related to the first) has
>>>>>
>>>>>
>>>> been fixed but
>>>>
>>>>
>>>>> hasn't been committed to bioperl-live yet.  The fixed
>>>>>
>>>>>
>>>> SearchIO::blast
>>>>
>>>>
>>>>> is available in the link above, but realize it hasn't been
>>>>>
>>>>>
>>>> committed yet and may change.
>>>>
>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab
>>>>> Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>
>>>>>>
>>>> Bio::SearchIO,
>>>>
>>>>
>>>>>> I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>> Blast.pl:21
>>>>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine parsing  
>>>>>> Blast
>>>>>> Output (version 2.2.12), but I don't remember which bioperl
>>>>>>
>>>>>>
>>>> version I
>>>>
>>>>
>>>>>> had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From golharam at umdnj.edu  Thu Feb  9 04:46:43 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Wed, 08 Feb 2006 23:46:43 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
Message-ID: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>

Does anyone know of tool to mutate a DNA sequence by a specified amount?
For instance, say I have a DNA sequence 1000 bases long, and I want to
simulate mutations to make it 75% (or 80%, etc) similar to the original.


Ryan



From torsten.seemann at infotech.monash.edu.au  Thu Feb  9 11:15:28 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 09 Feb 2006 22:15:28 +1100
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <43EB2450.6000606@infotech.monash.edu.au>

Ryan,

> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.

The EMBOSS suite comes with a tool called "msbar" which can controllably 
mutate sequences:

http://emboss.sourceforge.net/apps/msbar.html

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/


From cjfields at uiuc.edu  Thu Feb  9 16:16:28 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 10:16:28 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu>
Message-ID: <001b01c62d94$2e8bee50$15327e82@pyrimidine>


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Thursday, February 09, 2006 9:13 AM
> To: Hubert Prielinger
> Cc: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> > hi chris,
> > thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, 
> > do you have any ohter idea, the problem I have is that I 
> have to parse 
> > a lot of textfiles....
> > or shall I look for another option to parse those files...
> >
> > regards
> > Hubert
> 
> 
> The code from Bioperl 1.5.1 works fine for me for blast 
> 2.2.13 reports but unless you post your blast report we can't 
> really determine the problem.
> 
> If you are still getting the same error like this I am not 
> convinced you have upgraded to 1.5.1 which includes a fix in 
> the fact that NCBI changed the HSP result format to remove 
> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
> as it was apparent sometime in September.
> 
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> If you are just getting no results but also no warnings wrt 
> parsing, are you sure your logic is correct?
> 
> If you remove your filters do you see all the HSPS?
> 
> 
> while (my $result = $search->next_result) {
>      print $result->query_name, "\n";
>      #iterate over each hit on the query sequence
>      while (my $hit = $result->next_hit) {
> 	print $hit->name, "\n";
>          #iterate over each HSP in the hit
>          while (my $hsp = $hit->next_hsp) {
> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>  >hit_string, "\n";	
>         }
>     }
> }

I tested some of the BLAST results that Hubert sent Roger and me with a
similar script to the above.  I removed the file parsing logic and it seemed
to work just fine.  It may very well be a logic issue or that he hasn't
installed the latest fix.
    
It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
though the returned output was from nr, the top of the blast output showed
that it was v2.2.12:  

BLASTP 2.2.12 [Aug-07-2005]

I double-checked my local version and it's definitely v.2.2.13:
-------------------------------------
C:\Perl\Scripts>blastcl3 -

blastcl3 2.2.13   arguments:...
-------------------------------------

If you use RemoteBlast using the same settings, the version in the header
looks like this:

BLASTP 2.2.13 [Nov-27-2005]

I'm wondering if all the blast executables (blast and netblast) from NCBI
have text output like v.2.2.12, while the wwwblast outputs a new format
(2.2.13).  I'll ask blast-help at NCBI about this.

> 
> To clarify some stuff -
> Chris I don't necessarily think the XML is best way forward 
> for BLAST reports generated locally, it isn't as detailed as 
> the Text format and it is what most people expect to be able 
> to scroll through and parse -- it is also harder for the 
> format to change dramatically if you have a static binary on 
> your machine =).  I think for remoteblast the XML format 
> should be the way forward but I expect Bioperl to maintain 
> support of any plain text BLAST report format that people use 
> on a regular basis.
> 

Does XML lack some specific info that text output has?  Didn't know that.  I
believe that XML should be default in RemoteBlast since it will not break,
but I agree with you about text output.  I also agree that it will need
somebody to maintain it constantly, much like RemoteBlast.

> -jason
> >
> >
> > Chris Fields wrote:
> >
> >> My guess is you're running into text parsing problems in 
> >> Bio::SearchIO::blast.  Upgrade to the latest developer version
> >> (1.5.1) or
> >> bioperl-live (CVS), then see the bug below.
> >>
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>
> >> I think the first problem you ran into is solved in bioperl 1.5.1, 
> >> the last problem (more recent, not related to the first) has been 
> >> fixed but hasn't been committed to bioperl-live yet.  The fixed 
> >> SearchIO::blast is available in the link above, but 
> realize it hasn't 
> >> been committed yet and may change.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org
> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>> Prielinger
> >>> Sent: Wednesday, February 08, 2006 2:52 PM
> >>> To: bioperl-l at bioperl.org
> >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>> output
> >>>
> >>> Hi,
> >>> If I want to parse a Blast Output (Version 2.2.12) with 
> >>> Bio::SearchIO, I get the following error message:
> >>>
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>
> >>> is that a bug......
> >>>
> >>> If I want to parse Blast Output (version 2.2.13), I don't get 
> >>> anything.....
> >>> I'm using bioperl 1.4
> >>>
> >>> before, I have installed bioperl 1.4, it worked fine 
> parsing Blast 
> >>> Output (version 2.2.12), but I don't remember which 
> bioperl version 
> >>> I had installed
> >>>
> >>> thanks in advance
> >>>
> >>> Hubert
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  



From cjfields at uiuc.edu  Thu Feb  9 17:53:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 11:53:24 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <200602080853.58889.heikki@sanbi.ac.za>
Message-ID: <000001c62da1$ba346ba0$15327e82@pyrimidine>

Heikki, 

I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and
two test data files to bugzilla.  The first data file is needed for normal
tests, the second is for testing parsing with modified data in the score tag
(using sprintf() in the RNAMotif descriptor).  I ran 'perl t\RNAMotif.t' and
they all passed.

Thanks!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Heikki Lehvaslaiho
> Sent: Wednesday, February 08, 2006 12:54 AM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> 
> Chris,
> 
> Post your files to bugzilla (ticket type enhancement, add 
> files to ticket after creation)  and someone with commit 
> ability will add them to CVS once the code is in satisfactory 
> condition. 
> 
> Thanks,
> 
> 	-Heikki
> 
> On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > I want to submit a module for parsing RNAMotif output 
> > (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning 
> > output and returning Bio::SeqFeature::Generic objects with 
> added tags 
> > for descriptors/sequences/file info.  I'm in the process of 
> writing up 
> > tests and going through biodesign to make sure everything's kosher, 
> > but the module itself is essentially ready-to-go.  What should I do 
> > next?
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason.stajich at duke.edu  Thu Feb  9 15:13:09 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 10:13:09 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
	output
In-Reply-To: <43EA6570.9070909@gmx.at>
References: <001101c62cfd$28605df0$15327e82@pyrimidine>
	<43EA6570.9070909@gmx.at>
Message-ID: <57361AED-AFEA-4927-AF29-9944E7F0895B@duke.edu>

On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> hi chris,
> thanks, I have upgraded to version 1.5.1 but it isn't still  
> working, do
> you have any ohter idea, the problem I have is that I have to parse a
> lot of textfiles....
> or shall I look for another option to parse those files...
>
> regards
> Hubert


The code from Bioperl 1.5.1 works fine for me for blast 2.2.13  
reports but unless you post your blast report we can't really  
determine the problem.

If you are still getting the same error like this I am not convinced  
you have upgraded to 1.5.1 which includes a fix in the fact that NCBI  
changed the HSP result format to remove the ':' from the Query/Sbjct  
prefixes.  We fixed this as soon as it was apparent sometime in  
September.

>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

If you are just getting no results but also no warnings wrt parsing,  
are you sure your logic is correct?

If you remove your filters do you see all the HSPS?


while (my $result = $search->next_result) {
     print $result->query_name, "\n";
     #iterate over each hit on the query sequence
     while (my $hit = $result->next_hit) {
	print $hit->name, "\n";
         #iterate over each HSP in the hit
         while (my $hsp = $hit->next_hsp) {
	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
 >hit_string, "\n";	
        }
    }
}


To clarify some stuff -
Chris I don't necessarily think the XML is best way forward for BLAST  
reports generated locally, it isn't as detailed as the Text format  
and it is what most people expect to be able to scroll through and  
parse -- it is also harder for the format to change dramatically if  
you have a static binary on your machine =).  I think for remoteblast  
the XML format should be the way forward but I expect Bioperl to  
maintain support of any plain text BLAST report format that people  
use on a regular basis.

-jason
>
>
> Chris Fields wrote:
>
>> My guess is you're running into text parsing problems in
>> Bio::SearchIO::blast.  Upgrade to the latest developer version  
>> (1.5.1) or
>> bioperl-live (CVS), then see the bug below.
>>
>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>
>> I think the first problem you ran into is solved in bioperl 1.5.1,  
>> the last
>> problem (more recent, not related to the first) has been fixed but  
>> hasn't
>> been committed to bioperl-live yet.  The fixed SearchIO::blast is  
>> available
>> in the link above, but realize it hasn't been committed yet and  
>> may change.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Hubert Prielinger
>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>> To: bioperl-l at bioperl.org
>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>> parsing Blast output
>>>
>>> Hi,
>>> If I want to parse a Blast Output (Version 2.2.12) with
>>> Bio::SearchIO, I get the following error message:
>>>
>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> is that a bug......
>>>
>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>> anything.....
>>> I'm using bioperl 1.4
>>>
>>> before, I have installed bioperl 1.4, it worked fine parsing
>>> Blast Output (version 2.2.12), but I don't remember which
>>> bioperl version I had installed
>>>
>>> thanks in advance
>>>
>>> Hubert
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>>
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From barry.m.dancis at gsk.com  Wed Feb  8 21:44:55 2006
From: barry.m.dancis at gsk.com (barry.m.dancis at gsk.com)
Date: Wed, 8 Feb 2006 16:44:55 -0500
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: <007701c62c37$7914af60$15327e82@pyrimidine>
Message-ID: 

Hi Chris--

        The problem I am solving is given a mature miRna name, how do I 
use it to search for its pre/pri miRna and vice versa. For example, how to 
go from mir-102a* to hsa-mir-102a-1*. Yes, I can write a parser for it, 
but I'm hoping that someone else has already done it and has some bells 
and whistles to go with it.  Below is a hierarchy chart of a data 
structure to hold the naming information. The parsing is not trivial and 
given data in that structure there could be all kinds of neat functions 
that return various aspects of the names.

Barry












"Chris Fields"  
Sent by: bioperl-l-bounces at lists.open-bio.org
07-Feb-2006 17:40
 
To
barry.m.dancis at gsk.com, "'bioperl-l'" 
cc

Subject
Re: [Bioperl-l] Handling miRNA's






Are you talking about sequences or text output from a specific program? If
you are talking about sequences in a particular format, then listen to
Brian.  If you are talking about output, then we need to know which 
program
you're using, as a parser may exist or could be built. 

There are a few modules in Bio::Tools that handle RNA (like QRNA,
tRNAscan-SE), so check those out first.  I'm currently finishing up a
Bio::Tools module for RNAMotif and have plans for making an ERPIN parser.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Tuesday, February 07, 2006 2:26 PM
> To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> It's the parser in particular that I need
> 
> 
> 
> 
> "Brian Osborne"  Sent by: 
> bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 12:05
> 
> To
> barry.m.dancis at gsk.com, "bioperl-l" , 
> bioperl-l-bounces at lists.open-bio.org
> cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Barry,
> 
> If the sequence information is in one of the formats that 
> Bioperl understands (Genbank, Swissprot flat, and so on) then 
> the answer is yes.
> This assumes that the details on sequence that you mentioned 
> are found in some sequence feature section in the file. But 
> it looks to me like there's no specialized parser for miRNA 
> sequence per se, I'll be corrected if I'm wrong.
> 
> Brian O.
> 
> 
> On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> wrote:
> 
> > Hi --
> > 
> >         Are there any classes for manipulating miRNA's with 
> functions
> such
> > as parsing the name, storing and interlinking pri/pre/mat sequences,
> etc?
> > 
> > Thanks,
> > 
> > Barry
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 8775 bytes
Desc: not available
URL: 

From pmr at ebi.ac.uk  Thu Feb  9 08:25:24 2006
From: pmr at ebi.ac.uk (pmr at ebi.ac.uk)
Date: Thu, 9 Feb 2006 08:25:24 -0000 (GMT)
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <2714.86.132.216.50.1139473524.squirrel@webmail.ebi.ac.uk>

Ryan Golhar writes:

> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.

EMBOSS has the msbar program ("mutate sequence beyond all recognition")
which allows you to select the number and type of changes.

With some tuning of options to match the sequence length you should be
able to get results that match whatever your definition of 75% similar
might be (amazing how much more similarity you can get by adding gaps in
an alignment :-)

If you can specify a clear and generally useful way to define what you
need we could of course add a "percent change" option to the msbar program
for a future release.

Hope that helps,

Peter



From sofia at neuro.utah.edu  Thu Feb  9 18:00:05 2006
From: sofia at neuro.utah.edu (Sofia Robb)
Date: Thu, 09 Feb 2006 11:00:05 -0700
Subject: [Bioperl-l] Bio::Assembly::IO::phrap and Bio::Assembly::IO::ace
	with large files
Message-ID: <43EB8325.6050501@neuro.utah.edu>

I am having trouble parsing large (2030 contigs) phrap.out and ace.1 
files.  I have no problem with a small files (1 contig).  Here are the 
errors I get when try the code that is at the end of my email.  My 
script fails on this line:  my $assembly = $in->next_assembly;  I think 
it may be something to do with BTREE in Collection.pm, but have been 
unable to correct my errors.

-------

file with 2030 contigs
Bio::Assembly::IO::ace
Can't call method "get_dup" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 359,  line 
17699.

line 17699 of my ace file is the last line of the record for Contig253

------

file with 2030 contigs
Bio::Assembly::IO::phrap
Can't call method "put" on an undefined value at 
/Library/Perl/5.8.6/Bio/SeqFeature/Collection.pm line 225,  line 
39839. 

line 39839 of my phrap.out file is first line of the record for Contig253

------

use Bio::Assembly::IO;

my $filename = $ARGV[0];

my $in = Bio::Assembly::IO->new(-file=>"$filename",
                                -format=>"phrap"    #or -format=>"ace" 
for ace.1 files
                                );
my $assembly = $in->next_assembly;
my @contigs = $assembly->all_contigs();
foreach my $contig ($assembly->all_contigs){
        my $id = $contig->id();
        print "contig id = $id ";
        my $seqObj = $contig->get_consensus_sequence();
        my $seq = $seqObj->seq();
        print "is $seq\n";
}
my $id = $assembly->id();
print "$id\n";       

-----

Thanks for any input,
Sofia

Sofia Robb
Molecular Biology Ph.D Program
Sanchez Laboratory
Department of Neurobiology and Anatomy
University of Utah
http://planaria.neuro.utah.edu





From hubert.prielinger at gmx.at  Thu Feb  9 17:32:39 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 11:32:39 -0600
Subject: [Bioperl-l] zip file
In-Reply-To: 
References: <43EA75FF.7010504@gmx.at>
	
Message-ID: <43EB7CB7.7040602@gmx.at>

Hi Chris,
It doesn't work with the simple input line either, but I have tried my 
script on the command line with the file scanning part and it is 
working, but it takes more than 10 minutes!!!!!!!!!!! for reading one 
file and it doesn't create the output file, so there is no output. 
Before I run the script in the eclipse IDE.
I'm trying to upgrade to bioperl 1.5.1 once more, hopefully that's the 
problem, I have installed the from bioperl.org the core, run and ext part...
the output as you got it is just fine, but nevertheless I need the 
script with the file scanning part, because I have a lot of them.

to Roger: I have tried it with different files, but always the same 
result.....reads the files, but takes them a very long time and no 
Output result file


Hubert




Chris Fields wrote:

> Hubert,
>
> I tried this script out it and it managed to parse your reports.  I  
> removed the file scanning and replaced it with a simple arg line  
> input (i.e. script.pl blast_file).   I attached one of the output files.
>
> Chris
>
>
>
> #!perl
>
> $file = shift @ARGV;
>
> use Bio::SearchIO;
> my $cutoff_len = 10;
> my $searchio = Bio::SearchIO->new( -format => 'blast',
>                                    -file   =>  $file );
> while ( my $result = $searchio->next_result() ) {
>       while( my $hit = $result->next_hit ) {
>           while(my $hsp = $hit->next_hsp) {
>             if ($hsp->length('sbjct') <= $cutoff_len) {
>                 for ($hsp->hit_string) {
>                     if (tr/K// >= 2 || tr/R// >= 2 && tr/W// >= 2 ||
>                         tr/K// == 1 && tr/R// == 1 && tr/W// >= 2) {
>                          #Print some tab-delimited data about this HSP
>                            open (bigShot, ">>BlastOutputTrial.txt") ||
>                                  die ("Could not open file. $!");
>                          #print $result->query_name, "\t";
>                          #print $hit->significance, "\t";
>                          print bigShot $hit->name, "-->";
>                          print bigShot $hit->description, "\n";
>                          print bigShot "Query:   ",
>                          $hsp->start('query'), "  ", $hsp- 
> >query_string, "  ",
>                             $hsp->end('query'), "\n";
>                          print bigShot "Seq:     ", $hsp->start('hit'),
>                             "  ", $hsp->hit_string, "  ", 
> $hsp->end('hit'), "\n";
> #                        print $hsp->rank, "\t";
> #                        print $hsp->percent_identity, "\t";
> #                        print $hsp->evalue, "\t";
> #                        print $hsp->hsp_length, "\n";
>
>                         close (bigShot);
>
>                     };
>
>
>             }
>         }
>         }
>     }
> }
>
>------------------------------------------------------------------------
>
>  
>



From heikki at sanbi.ac.za  Thu Feb  9 14:54:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 16:54:30 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091654.30890.heikki@sanbi.ac.za>

Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of course,  
all that depends on what you will be using the sequences for. If you are 
going to draw evolutionary conclusions from those sequences, you must mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in every 
location. Mutations do occur multiple times in same places as sequences get 
saturated by mutations. Also, you should decide the relative occurrence of 
transversions versus transitions. Then there are indels; do you want to take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of Celera, 
one of the tools that enabled them to estimate the feasibility of the whole 
genome shotgun sequence assembly, was a very complete program to 'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 11:31:20 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 13:31:20 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091331.21690.heikki@sanbi.ac.za>

Ryan,

Instructions in pseudo code:

take the sequence string out of the object
use a hash to store changed locations
repeat 
    pick a location in the string randomly
    if the location is not in a hash , i.e. changed already, 
        change it into something else
    add the changed location into the hash
    if enough locations have been changed (scalar keys hash), exit loop
put the sequence string back into the seq object

   -Heikki   

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 11:31:20 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 13:31:20 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091331.21690.heikki@sanbi.ac.za>

Ryan,

Instructions in pseudo code:

take the sequence string out of the object
use a hash to store changed locations
repeat 
    pick a location in the string randomly
    if the location is not in a hash , i.e. changed already, 
        change it into something else
    add the changed location into the hash
    if enough locations have been changed (scalar keys hash), exit loop
put the sequence string back into the seq object

   -Heikki   

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From jason.stajich at duke.edu  Thu Feb  9 19:10:54 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 14:10:54 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>

Depending on whether or not you want to use evolutionary realistic  
models...
* evolver which comes with PAML lets you evolve sequences on a tree
* SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html? 
id=seqgen
also lets you do this
I believe there are PISE interfaces to both of these at the pasteur  
bioweb site - http://bioweb.pasteur.fr/

-jason
On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote:

> Does anyone know of tool to mutate a DNA sequence by a specified  
> amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the  
> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From heikki at sanbi.ac.za  Thu Feb  9 14:54:30 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 16:54:30 +0200
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <200602091654.30890.heikki@sanbi.ac.za>

Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of course,  
all that depends on what you will be using the sequences for. If you are 
going to draw evolutionary conclusions from those sequences, you must mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in every 
location. Mutations do occur multiple times in same places as sequences get 
saturated by mutations. Also, you should decide the relative occurrence of 
transversions versus transitions. Then there are indels; do you want to take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of Celera, 
one of the tools that enabled them to estimate the feasibility of the whole 
genome shotgun sequence assembly, was a very complete program to 'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified amount?
> For instance, say I have a DNA sequence 1000 bases long, and I want to
> simulate mutations to make it 75% (or 80%, etc) similar to the original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Thu Feb  9 19:41:33 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 9 Feb 2006 21:41:33 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <000001c62da1$ba346ba0$15327e82@pyrimidine>
References: <000001c62da1$ba346ba0$15327e82@pyrimidine>
Message-ID: <200602092141.34401.heikki@sanbi.ac.za>

Chris,

I committed your file. All tests pass; code looks like written by a long term 
bioperl contributor! Impressive.

I truncated the larger test file from 270K to 20K (200 lines), to not bloat 
the distribution unnecessarily. Tests pass which is the main thing. Shout if 
if you disagree.

Great job!

	-Heikki
 

On Thursday 09 February 2006 19:53, Chris Fields wrote:
> Heikki,
>
> I've added the Bio::Tools::RNAMotif module with test suite (24 tests) and
> two test data files to bugzilla.  The first data file is needed for normal
> tests, the second is for testing parsing with modified data in the score
> tag (using sprintf() in the RNAMotif descriptor).  I ran 'perl
> t\RNAMotif.t' and they all passed.
>
> Thanks!
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> > Heikki Lehvaslaiho
> > Sent: Wednesday, February 08, 2006 12:54 AM
> > To: bioperl-l at lists.open-bio.org
> > Cc: Chris Fields
> > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> >
> > Chris,
> >
> > Post your files to bugzilla (ticket type enhancement, add
> > files to ticket after creation)  and someone with commit
> > ability will add them to CVS once the code is in satisfactory
> > condition.
> >
> > Thanks,
> >
> > 	-Heikki
> >
> > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > I want to submit a module for parsing RNAMotif output
> > > (Bio::Tools::RNAMotif).  It is capable, at the moment, of scanning
> > > output and returning Bio::SeqFeature::Generic objects with
> >
> > added tags
> >
> > > for descriptors/sequences/file info.  I'm in the process of
> >
> > writing up
> >
> > > tests and going through biodesign to make sure everything's kosher,
> > > but the module itself is essentially ready-to-go.  What should I do
> > > next?
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hubert.prielinger at gmx.at  Thu Feb  9 20:13:31 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 14:13:31 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blast	output
In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
Message-ID: <43EBA26B.4010907@gmx.at>

dear roger,
this error message I got, when I tried to parse Blast output (version 
2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot 
of Blast output files
with version 2.2.13 and for that I don't get any error message.....it 
just doesn't work

Hubert



Roger Hall wrote:

>Guys - I'm looking at the error message:
>
>MSG: no data for midline Query  1   WWWKWRW  7
>STACK Bio::SearchIO::blast::next_result
>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>STACK toplevel
>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
>This is my line of thought:
>1. "no data for midline $_" is a unique message generated by blast.pm in one
>location only at the point of a. reading three lines b. dropping lines with
>spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
>2. There is a regexp match that fails in order to reach that error message
>3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
>4. It does anyway
>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
>reports
>
>I suspect a newline/chomp/metacharacter issue. Not finding the string
>anywhere has me thoroughly confused - I asked Hubert for the additional
>file, assuming that I didn't have it.
>
>My next thought is to write a quick script to test perl behavior on "Fedora
>Core 9".
>
>Thoughts?
>
>Did I misread the issue entirely? :}
>
>Roger
>
>
>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
>Sent: Thursday, February 09, 2006 10:16 AM
>To: 'Jason Stajich'; 'Hubert Prielinger'
>Cc: bioperl-l at bioperl.org
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
>output
>
>
>  
>
>>-----Original Message-----
>>From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>>Sent: Thursday, February 09, 2006 9:13 AM
>>To: Hubert Prielinger
>>Cc: Chris Fields; bioperl-l at bioperl.org
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>    
>>
>>>hi chris,
>>>thanks, I have upgraded to version 1.5.1 but it isn't still 
>>>      
>>>
>>working, 
>>    
>>
>>>do you have any ohter idea, the problem I have is that I 
>>>      
>>>
>>have to parse 
>>    
>>
>>>a lot of textfiles....
>>>or shall I look for another option to parse those files...
>>>
>>>regards
>>>Hubert
>>>      
>>>
>>The code from Bioperl 1.5.1 works fine for me for blast 
>>2.2.13 reports but unless you post your blast report we can't 
>>really determine the problem.
>>
>>If you are still getting the same error like this I am not 
>>convinced you have upgraded to 1.5.1 which includes a fix in 
>>the fact that NCBI changed the HSP result format to remove 
>>the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
>>as it was apparent sometime in September.
>>
>>    
>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>
>>>>>          
>>>>>
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>>If you are just getting no results but also no warnings wrt 
>>parsing, are you sure your logic is correct?
>>
>>If you remove your filters do you see all the HSPS?
>>
>>
>>while (my $result = $search->next_result) {
>>     print $result->query_name, "\n";
>>     #iterate over each hit on the query sequence
>>     while (my $hit = $result->next_hit) {
>>	print $hit->name, "\n";
>>         #iterate over each HSP in the hit
>>         while (my $hsp = $hit->next_hsp) {
>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>> >hit_string, "\n";	
>>        }
>>    }
>>}
>>    
>>
>
>I tested some of the BLAST results that Hubert sent Roger and me with a
>similar script to the above.  I removed the file parsing logic and it seemed
>to work just fine.  It may very well be a logic issue or that he hasn't
>installed the latest fix.
>    
>It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
>though the returned output was from nr, the top of the blast output showed
>that it was v2.2.12:  
>
>BLASTP 2.2.12 [Aug-07-2005]
>
>I double-checked my local version and it's definitely v.2.2.13:
>-------------------------------------
>C:\Perl\Scripts>blastcl3 -
>
>blastcl3 2.2.13   arguments:...
>-------------------------------------
>
>If you use RemoteBlast using the same settings, the version in the header
>looks like this:
>
>BLASTP 2.2.13 [Nov-27-2005]
>
>I'm wondering if all the blast executables (blast and netblast) from NCBI
>have text output like v.2.2.12, while the wwwblast outputs a new format
>(2.2.13).  I'll ask blast-help at NCBI about this.
>
>  
>
>>To clarify some stuff -
>>Chris I don't necessarily think the XML is best way forward 
>>for BLAST reports generated locally, it isn't as detailed as 
>>the Text format and it is what most people expect to be able 
>>to scroll through and parse -- it is also harder for the 
>>format to change dramatically if you have a static binary on 
>>your machine =).  I think for remoteblast the XML format 
>>should be the way forward but I expect Bioperl to maintain 
>>support of any plain text BLAST report format that people use 
>>on a regular basis.
>>
>>    
>>
>
>Does XML lack some specific info that text output has?  Didn't know that.  I
>believe that XML should be default in RemoteBlast since it will not break,
>but I agree with you about text output.  I also agree that it will need
>somebody to maintain it constantly, much like RemoteBlast.
>
>  
>
>>-jason
>>    
>>
>>>Chris Fields wrote:
>>>
>>>      
>>>
>>>>My guess is you're running into text parsing problems in 
>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>(1.5.1) or
>>>>bioperl-live (CVS), then see the bug below.
>>>>
>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>>I think the first problem you ran into is solved in bioperl 1.5.1, 
>>>>the last problem (more recent, not related to the first) has been 
>>>>fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>>SearchIO::blast is available in the link above, but 
>>>>        
>>>>
>>realize it hasn't 
>>    
>>
>>>>been committed yet and may change.
>>>>
>>>>Christopher Fields
>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>>University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>>Prielinger
>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>To: bioperl-l at bioperl.org
>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>>          
>>>>>
>>parsing Blast 
>>    
>>
>>>>>output
>>>>>
>>>>>Hi,
>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>>Bio::SearchIO, I get the following error message:
>>>>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>
>>>>>          
>>>>>
>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>    
>>
>>>>>is that a bug......
>>>>>
>>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>>anything.....
>>>>>I'm using bioperl 1.4
>>>>>
>>>>>before, I have installed bioperl 1.4, it worked fine 
>>>>>          
>>>>>
>>parsing Blast 
>>    
>>
>>>>>Output (version 2.2.12), but I don't remember which 
>>>>>          
>>>>>
>>bioperl version 
>>    
>>
>>>>>I had installed
>>>>>
>>>>>thanks in advance
>>>>>
>>>>>Hubert
>>>>>
>>>>>
>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>
>>>>
>>>>        
>>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>      
>>>
>>--
>>Jason Stajich
>>Duke University
>>http://www.duke.edu/~jes12
>>
>>    
>>
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>  
>



From rahall2 at ualr.edu  Thu Feb  9 20:09:52 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Thu, 09 Feb 2006 14:09:52 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	Blast	output
In-Reply-To: <001b01c62d94$2e8bee50$15327e82@pyrimidine>
Message-ID: <004301c62db4$c9bcbab0$d416a790@LIBERAL>

Guys - I'm looking at the error message:

MSG: no data for midline Query  1   WWWKWRW  7
STACK Bio::SearchIO::blast::next_result
/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
STACK toplevel
/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21

This is my line of thought:
1. "no data for midline $_" is a unique message generated by blast.pm in one
location only at the point of a. reading three lines b. dropping lines with
spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
2. There is a regexp match that fails in order to reach that error message
3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
4. It does anyway
5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
reports

I suspect a newline/chomp/metacharacter issue. Not finding the string
anywhere has me thoroughly confused - I asked Hubert for the additional
file, assuming that I didn't have it.

My next thought is to write a quick script to test perl behavior on "Fedora
Core 9".

Thoughts?

Did I misread the issue entirely? :}

Roger


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Thursday, February 09, 2006 10:16 AM
To: 'Jason Stajich'; 'Hubert Prielinger'
Cc: bioperl-l at bioperl.org
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
output


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
> Sent: Thursday, February 09, 2006 9:13 AM
> To: Hubert Prielinger
> Cc: Chris Fields; bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast output
> 
> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> > hi chris,
> > thanks, I have upgraded to version 1.5.1 but it isn't still 
> working, 
> > do you have any ohter idea, the problem I have is that I 
> have to parse 
> > a lot of textfiles....
> > or shall I look for another option to parse those files...
> >
> > regards
> > Hubert
> 
> 
> The code from Bioperl 1.5.1 works fine for me for blast 
> 2.2.13 reports but unless you post your blast report we can't 
> really determine the problem.
> 
> If you are still getting the same error like this I am not 
> convinced you have upgraded to 1.5.1 which includes a fix in 
> the fact that NCBI changed the HSP result format to remove 
> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
> as it was apparent sometime in September.
> 
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> 
> If you are just getting no results but also no warnings wrt 
> parsing, are you sure your logic is correct?
> 
> If you remove your filters do you see all the HSPS?
> 
> 
> while (my $result = $search->next_result) {
>      print $result->query_name, "\n";
>      #iterate over each hit on the query sequence
>      while (my $hit = $result->next_hit) {
> 	print $hit->name, "\n";
>          #iterate over each HSP in the hit
>          while (my $hsp = $hit->next_hsp) {
> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>  >hit_string, "\n";	
>         }
>     }
> }

I tested some of the BLAST results that Hubert sent Roger and me with a
similar script to the above.  I removed the file parsing logic and it seemed
to work just fine.  It may very well be a logic issue or that he hasn't
installed the latest fix.
    
It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
though the returned output was from nr, the top of the blast output showed
that it was v2.2.12:  

BLASTP 2.2.12 [Aug-07-2005]

I double-checked my local version and it's definitely v.2.2.13:
-------------------------------------
C:\Perl\Scripts>blastcl3 -

blastcl3 2.2.13   arguments:...
-------------------------------------

If you use RemoteBlast using the same settings, the version in the header
looks like this:

BLASTP 2.2.13 [Nov-27-2005]

I'm wondering if all the blast executables (blast and netblast) from NCBI
have text output like v.2.2.12, while the wwwblast outputs a new format
(2.2.13).  I'll ask blast-help at NCBI about this.

> 
> To clarify some stuff -
> Chris I don't necessarily think the XML is best way forward 
> for BLAST reports generated locally, it isn't as detailed as 
> the Text format and it is what most people expect to be able 
> to scroll through and parse -- it is also harder for the 
> format to change dramatically if you have a static binary on 
> your machine =).  I think for remoteblast the XML format 
> should be the way forward but I expect Bioperl to maintain 
> support of any plain text BLAST report format that people use 
> on a regular basis.
> 

Does XML lack some specific info that text output has?  Didn't know that.  I
believe that XML should be default in RemoteBlast since it will not break,
but I agree with you about text output.  I also agree that it will need
somebody to maintain it constantly, much like RemoteBlast.

> -jason
> >
> >
> > Chris Fields wrote:
> >
> >> My guess is you're running into text parsing problems in 
> >> Bio::SearchIO::blast.  Upgrade to the latest developer version
> >> (1.5.1) or
> >> bioperl-live (CVS), then see the bug below.
> >>
> >> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>
> >> I think the first problem you ran into is solved in bioperl 1.5.1, 
> >> the last problem (more recent, not related to the first) has been 
> >> fixed but hasn't been committed to bioperl-live yet.  The fixed 
> >> SearchIO::blast is available in the link above, but 
> realize it hasn't 
> >> been committed yet and may change.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >> University of Illinois Urbana-Champaign
> >>
> >>
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org
> >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
> >>> Prielinger
> >>> Sent: Wednesday, February 08, 2006 2:52 PM
> >>> To: bioperl-l at bioperl.org
> >>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing Blast 
> >>> output
> >>>
> >>> Hi,
> >>> If I want to parse a Blast Output (Version 2.2.12) with 
> >>> Bio::SearchIO, I get the following error message:
> >>>
> >>> MSG: no data for midline Query  1   WWWKWRW  7
> >>> STACK Bio::SearchIO::blast::next_result
> >>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>> STACK toplevel
> >>> 
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>>
> >>> is that a bug......
> >>>
> >>> If I want to parse Blast Output (version 2.2.13), I don't get 
> >>> anything.....
> >>> I'm using bioperl 1.4
> >>>
> >>> before, I have installed bioperl 1.4, it worked fine 
> parsing Blast 
> >>> Output (version 2.2.12), but I don't remember which 
> bioperl version 
> >>> I had installed
> >>>
> >>> thanks in advance
> >>>
> >>> Hubert
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >>
> >>
> >>
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From Lalancettec at AGR.GC.CA  Thu Feb  9 20:53:10 2006
From: Lalancettec at AGR.GC.CA (Lalancette, Claudia)
Date: Thu, 9 Feb 2006 15:53:10 -0500
Subject: [Bioperl-l] module for finding restriction site in batch of
	sequences?
Message-ID: 

Greetings,

 

I need to find a way to look for a specific restriction enzyme site in
hundreds of sequences.  Been looking at Bio::Restriction, but not sure
if will work...  Any suggestions?

 

Thanks,

Claudia

 

 




From cjfields at uiuc.edu  Thu Feb  9 21:25:01 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 15:25:01 -0600
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <200602092141.34401.heikki@sanbi.ac.za>
Message-ID: <000901c62dbf$49bfae20$15327e82@pyrimidine>

Thanks!  I think, as long as the tests pass everything is fine with me.  I
may be submitting another module or two in the next few weeks; just depends
on how much time I can spend on them.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za] 
> Sent: Thursday, February 09, 2006 1:42 PM
> To: bioperl-l at lists.open-bio.org
> Cc: Chris Fields
> Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> 
> Chris,
> 
> I committed your file. All tests pass; code looks like 
> written by a long term bioperl contributor! Impressive.
> 
> I truncated the larger test file from 270K to 20K (200 
> lines), to not bloat the distribution unnecessarily. Tests 
> pass which is the main thing. Shout if if you disagree.
> 
> Great job!
> 
> 	-Heikki
>  
> 
> On Thursday 09 February 2006 19:53, Chris Fields wrote:
> > Heikki,
> >
> > I've added the Bio::Tools::RNAMotif module with test suite 
> (24 tests) 
> > and two test data files to bugzilla.  The first data file is needed 
> > for normal tests, the second is for testing parsing with 
> modified data 
> > in the score tag (using sprintf() in the RNAMotif 
> descriptor).  I ran 
> > 'perl t\RNAMotif.t' and they all passed.
> >
> > Thanks!
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki 
> > > Lehvaslaiho
> > > Sent: Wednesday, February 08, 2006 12:54 AM
> > > To: bioperl-l at lists.open-bio.org
> > > Cc: Chris Fields
> > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> > >
> > > Chris,
> > >
> > > Post your files to bugzilla (ticket type enhancement, add 
> files to 
> > > ticket after creation)  and someone with commit ability will add 
> > > them to CVS once the code is in satisfactory condition.
> > >
> > > Thanks,
> > >
> > > 	-Heikki
> > >
> > > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > > I want to submit a module for parsing RNAMotif output 
> > > > (Bio::Tools::RNAMotif).  It is capable, at the moment, 
> of scanning 
> > > > output and returning Bio::SeqFeature::Generic objects with
> > >
> > > added tags
> > >
> > > > for descriptors/sequences/file info.  I'm in the process of
> > >
> > > writing up
> > >
> > > > tests and going through biodesign to make sure everything's 
> > > > kosher, but the module itself is essentially ready-to-go.  What 
> > > > should I do next?
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher
> > > > Lab of Dr. Robert Switzer
> > > > Dept of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > ______ _/      
> _/_____________________________________________________
> > >       _/      _/
> > >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> > >    _/  _/  _/  SANBI, South African National 
> Bioinformatics Institute
> > >   _/  _/  _/  University of Western Cape, South Africa
> > >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > ___ 
> > > _/_/_/_/_/________________________________________________________
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> ______ _/      _/_____________________________________________________
>       _/      _/
>      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
>    _/  _/  _/  SANBI, South African National Bioinformatics Institute
>   _/  _/  _/  University of Western Cape, South Africa
>      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ 
> _/_/_/_/_/________________________________________________________



From golharam at umdnj.edu  Thu Feb  9 21:19:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 09 Feb 2006 16:19:46 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za>
Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1>

Thanks all.  The responses I got were definitely more than helpful.  FYI
- I did initially look at msbar.  I glanced over the "Number of times to
perform mutation operations", which is what I was looking for.  

I'm looking to statistically test some simply scoring matrices.  I think
msbar will do.

Ryan

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Thursday, February 09, 2006 9:55 AM
To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu
Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l';
emboss at emboss.open-bio.org
Subject: Re: [Bioperl-l] Tool to mutate DNA sequence


Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of
course,  
all that depends on what you will be using the sequences for. If you are

going to draw evolutionary conclusions from those sequences, you must
mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in
every 
location. Mutations do occur multiple times in same places as sequences
get 
saturated by mutations. Also, you should decide the relative occurrence
of 
transversions versus transitions. Then there are indels; do you want to
take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of
Celera, 
one of the tools that enabled them to estimate the feasibility of the
whole 
genome shotgun sequence assembly, was a very complete program to
'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of
that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified 
> amount? For instance, say I have a DNA sequence 1000 bases long, and I

> want to simulate mutations to make it 75% (or 80%, etc) similar to the

> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From golharam at umdnj.edu  Thu Feb  9 21:19:46 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Thu, 09 Feb 2006 16:19:46 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <200602091654.30890.heikki@sanbi.ac.za>
Message-ID: <002801c62dbe$8d4d7e20$e6028a0a@GOLHARMOBILE1>

Thanks all.  The responses I got were definitely more than helpful.  FYI
- I did initially look at msbar.  I glanced over the "Number of times to
perform mutation operations", which is what I was looking for.  

I'm looking to statistically test some simply scoring matrices.  I think
msbar will do.

Ryan

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Thursday, February 09, 2006 9:55 AM
To: bioperl-l at lists.open-bio.org; golharam at umdnj.edu
Cc: 'The general forum at Bioinformatics.Org'; 'bioperl-l';
emboss at emboss.open-bio.org
Subject: Re: [Bioperl-l] Tool to mutate DNA sequence


Ryan,

I should have made this very clear in my first reply:

You have to plan very carefully what rules you use when you mutate your 
sequence because it will affect directly the resulting sequences. Of
course,  
all that depends on what you will be using the sequences for. If you are

going to draw evolutionary conclusions from those sequences, you must
mutate 
them in a way that simulates evolutionary principles.

My earlier pseudocode example, for example, should allow mutations in
every 
location. Mutations do occur multiple times in same places as sequences
get 
saturated by mutations. Also, you should decide the relative occurrence
of 
transversions versus transitions. Then there are indels; do you want to
take 
those into account?

Also, check the EMBOSS program 'msbar'.

You did not ask this, but... I remember that during the early days of
Celera, 
one of the tools that enabled them to estimate the feasibility of the
whole 
genome shotgun sequence assembly, was a very complete program to
'synthesize' 
in-silico the whole complexity of the human genome. I have no idea of
that 
program is generally available now.

Yours,

    -Heikki

On Thursday 09 February 2006 06:46, Ryan Golhar wrote:
> Does anyone know of tool to mutate a DNA sequence by a specified 
> amount? For instance, say I have a DNA sequence 1000 bases long, and I

> want to simulate mutations to make it 75% (or 80%, etc) similar to the

> original.
>
>
> Ryan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From injunjoel at hotmail.com  Thu Feb  9 21:33:45 2006
From: injunjoel at hotmail.com (Joel Steele)
Date: Thu, 09 Feb 2006 13:33:45 -0800
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast
	output
In-Reply-To: <43EBA26B.4010907@gmx.at>
Message-ID: 

Greetings again,
Its the colon...
observe.

-=Code Snippet=-
#!/usr/bin/perl -w
use strict;

#the string as reported from your error.
my $string1 = 'Query  1   WWWKWRW  7';

#your string with a colon thrown in for testing.
my $string2 = 'Query:  1   WWWKWRW  7';

foreach ($string1, $string2){
	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
		print "Match Found in $_\n";
		print $1."\n";
		print $2."\n";
		print $3."\n";
		print $4."\n";
		print $5."\n";
	}else{
		print "no Match for $_\n";
	}
}

-=End Code=-

The Output

-=Code Snippet=-
no Match for Query  1   WWWKWRW  7
Match Found in Query:  1   WWWKWRW  7
Query:  1
Query
1
WWWKWRW
7

-=End Code=-


Now I would suggest changing the regexp

From:
/^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

To:
/^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/

in SearchIO::Blast.

General suggestion:
Again I would like to suggest that everyone get use to using the strict 
pragma. Though it may not applicable to this particular problem it becomes 
essential if you wish progress in your use of Perl.
It is a core module so there is nothing to download from CPAN. It helps with 
development and once your code can run without warnings and errors you can 
remove it. This is not a targeted attack as some may interpret it, rather a 
general FYI for those out there new to Perl or programming in general. 
Better to start learning the rules early before bad habits creep in.
One more thing. There is a wonderfully supportive Perl community available 
to anyone who wants to join at PerlMonks.org check it out, who knows you may 
even catch a glimpse of Larry Wall while youre there.

-Joel Steele

"The surest way to corrupt a youth is to instruct him to hold in higher 
regard those who think alike than those who think differently." -Nietzsche

"I do not feel obliged to believe that the same God who endowed us with 
sense, reason and intellect has intended us to forego their use." -Galileo




>From: Hubert Prielinger 
>To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields 
>,        Jason Stajich 
>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>parsingBlast	output
>Date: Thu, 09 Feb 2006 14:13:31 -0600
>MIME-Version: 1.0
>Received: from newportal.open-bio.org ([209.59.5.172]) by 
>bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 9 
>Feb 2006 13:14:17 -0800
>Received: from newportal.open-bio.org (localhost.localdomain [127.0.0.1])by 
>newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id k19LAD2j009778;Thu, 9 
>Feb 2006 16:10:49 -0500
>Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by 
>newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for 
>; Thu, 9 Feb 2006 16:09:59 -0500
>Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7]) 
>[136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006 22:10:05 
>+0100
>X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>X-Authenticated: #16854991
>User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>X-Accept-Language: en-us, en
>References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>X-Y-GMX-Trusted: 0
>X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-2.0.2 
>(newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08 -0500 (EST)
>X-Greylist: IP, sender and recipient auto-whitelisted, not delayed 
>bymilter-greylist-2.0.2 (newportal.open-bio.org [207.154.17.70]);Thu, 09 
>Feb 2006 16:09:59 -0500 (EST)
>X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>X-Scanned-By: MIMEDefang 2.52
>X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>X-BeenThere: bioperl-l at lists.open-bio.org
>X-Mailman-Version: 2.1.7
>Precedence: list
>List-Id: Bioperl Project Discussion List 
>List-Unsubscribe: 
>,
>List-Archive: 
>List-Post: 
>List-Help: 
>List-Subscribe: 
>,
>Errors-To: bioperl-l-bounces at lists.open-bio.org
>Return-Path: bioperl-l-bounces at lists.open-bio.org
>X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC) 
>FILETIME=[C95D94A0:01C62DBD]
>
>dear roger,
>this error message I got, when I tried to parse Blast output (version
>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have a lot
>of Blast output files
>with version 2.2.13 and for that I don't get any error message.....it
>just doesn't work
>
>Hubert
>
>
>
>Roger Hall wrote:
>
> >Guys - I'm looking at the error message:
> >
> >MSG: no data for midline Query  1   WWWKWRW  7
> >STACK Bio::SearchIO::blast::next_result
> >/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >STACK toplevel
> >/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >
> >This is my line of thought:
> >1. "no data for midline $_" is a unique message generated by blast.pm in 
>one
> >location only at the point of a. reading three lines b. dropping lines 
>with
> >spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 
>3)
> >2. There is a regexp match that fails in order to reach that error 
>message
> >3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
> >4. It does anyway
> >5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
> >reports
> >
> >I suspect a newline/chomp/metacharacter issue. Not finding the string
> >anywhere has me thoroughly confused - I asked Hubert for the additional
> >file, assuming that I didn't have it.
> >
> >My next thought is to write a quick script to test perl behavior on 
>"Fedora
> >Core 9".
> >
> >Thoughts?
> >
> >Did I misread the issue entirely? :}
> >
> >Roger
> >
> >
> >-----Original Message-----
> >From: bioperl-l-bounces at lists.open-bio.org
> >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >Sent: Thursday, February 09, 2006 10:16 AM
> >To: 'Jason Stajich'; 'Hubert Prielinger'
> >Cc: bioperl-l at bioperl.org
> >Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> >output
> >
> >
> >
> >
> >>-----Original Message-----
> >>From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >>Sent: Thursday, February 09, 2006 9:13 AM
> >>To: Hubert Prielinger
> >>Cc: Chris Fields; bioperl-l at bioperl.org
> >>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>parsing Blast output
> >>
> >>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> >>
> >>
> >>>hi chris,
> >>>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>>
> >>>
> >>working,
> >>
> >>
> >>>do you have any ohter idea, the problem I have is that I
> >>>
> >>>
> >>have to parse
> >>
> >>
> >>>a lot of textfiles....
> >>>or shall I look for another option to parse those files...
> >>>
> >>>regards
> >>>Hubert
> >>>
> >>>
> >>The code from Bioperl 1.5.1 works fine for me for blast
> >>2.2.13 reports but unless you post your blast report we can't
> >>really determine the problem.
> >>
> >>If you are still getting the same error like this I am not
> >>convinced you have upgraded to 1.5.1 which includes a fix in
> >>the fact that NCBI changed the HSP result format to remove
> >>the ':' from the Query/Sbjct prefixes.  We fixed this as soon
> >>as it was apparent sometime in September.
> >>
> >>
> >>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>If you are just getting no results but also no warnings wrt
> >>parsing, are you sure your logic is correct?
> >>
> >>If you remove your filters do you see all the HSPS?
> >>
> >>
> >>while (my $result = $search->next_result) {
> >>     print $result->query_name, "\n";
> >>     #iterate over each hit on the query sequence
> >>     while (my $hit = $result->next_hit) {
> >>	print $hit->name, "\n";
> >>         #iterate over each HSP in the hit
> >>         while (my $hsp = $hit->next_hsp) {
> >>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
> >> >hit_string, "\n";
> >>        }
> >>    }
> >>}
> >>
> >>
> >
> >I tested some of the BLAST results that Hubert sent Roger and me with a
> >similar script to the above.  I removed the file parsing logic and it 
>seemed
> >to work just fine.  It may very well be a logic issue or that he hasn't
> >installed the latest fix.
> >
> >It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), 
>even
> >though the returned output was from nr, the top of the blast output 
>showed
> >that it was v2.2.12:
> >
> >BLASTP 2.2.12 [Aug-07-2005]
> >
> >I double-checked my local version and it's definitely v.2.2.13:
> >-------------------------------------
> >C:\Perl\Scripts>blastcl3 -
> >
> >blastcl3 2.2.13   arguments:...
> >-------------------------------------
> >
> >If you use RemoteBlast using the same settings, the version in the header
> >looks like this:
> >
> >BLASTP 2.2.13 [Nov-27-2005]
> >
> >I'm wondering if all the blast executables (blast and netblast) from NCBI
> >have text output like v.2.2.12, while the wwwblast outputs a new format
> >(2.2.13).  I'll ask blast-help at NCBI about this.
> >
> >
> >
> >>To clarify some stuff -
> >>Chris I don't necessarily think the XML is best way forward
> >>for BLAST reports generated locally, it isn't as detailed as
> >>the Text format and it is what most people expect to be able
> >>to scroll through and parse -- it is also harder for the
> >>format to change dramatically if you have a static binary on
> >>your machine =).  I think for remoteblast the XML format
> >>should be the way forward but I expect Bioperl to maintain
> >>support of any plain text BLAST report format that people use
> >>on a regular basis.
> >>
> >>
> >>
> >
> >Does XML lack some specific info that text output has?  Didn't know that. 
>  I
> >believe that XML should be default in RemoteBlast since it will not 
>break,
> >but I agree with you about text output.  I also agree that it will need
> >somebody to maintain it constantly, much like RemoteBlast.
> >
> >
> >
> >>-jason
> >>
> >>
> >>>Chris Fields wrote:
> >>>
> >>>
> >>>
> >>>>My guess is you're running into text parsing problems in
> >>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
> >>>>(1.5.1) or
> >>>>bioperl-live (CVS), then see the bug below.
> >>>>
> >>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>
> >>>>I think the first problem you ran into is solved in bioperl 1.5.1,
> >>>>the last problem (more recent, not related to the first) has been
> >>>>fixed but hasn't been committed to bioperl-live yet.  The fixed
> >>>>SearchIO::blast is available in the link above, but
> >>>>
> >>>>
> >>realize it hasn't
> >>
> >>
> >>>>been committed yet and may change.
> >>>>
> >>>>Christopher Fields
> >>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
> >>>>University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>-----Original Message-----
> >>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
> >>>>>Prielinger
> >>>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>>To: bioperl-l at bioperl.org
> >>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>output
> >>>>>
> >>>>>Hi,
> >>>>>If I want to parse a Blast Output (Version 2.2.12) with
> >>>>>Bio::SearchIO, I get the following error message:
> >>>>>
> >>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>STACK toplevel
> >>>>>
> >>>>>
> >>>>>
> >>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
> >>
> >>
> >>>>>is that a bug......
> >>>>>
> >>>>>If I want to parse Blast Output (version 2.2.13), I don't get
> >>>>>anything.....
> >>>>>I'm using bioperl 1.4
> >>>>>
> >>>>>before, I have installed bioperl 1.4, it worked fine
> >>>>>
> >>>>>
> >>parsing Blast
> >>
> >>
> >>>>>Output (version 2.2.12), but I don't remember which
> >>>>>
> >>>>>
> >>bioperl version
> >>
> >>
> >>>>>I had installed
> >>>>>
> >>>>>thanks in advance
> >>>>>
> >>>>>Hubert
> >>>>>
> >>>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>Bioperl-l mailing list
> >>>>>Bioperl-l at lists.open-bio.org
> >>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at lists.open-bio.org
> >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>--
> >>Jason Stajich
> >>Duke University
> >>http://www.duke.edu/~jes12
> >>
> >>
> >>
> >
> >Christopher Fields
> >Postdoctoral Researcher - Switzer Lab
> >Dept. of Biochemistry
> >University of Illinois Urbana-Champaign
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> >
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l




From jason.stajich at duke.edu  Thu Feb  9 22:13:16 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 9 Feb 2006 17:13:16 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsingBlast
	output
In-Reply-To: 
References: 
Message-ID: 

Uh, that was done in sept see the CVS log...

On Feb 9, 2006, at 4:33 PM, Joel Steele wrote:

> Greetings again,
> Its the colon...
> observe.
>
> -=Code Snippet=-
> #!/usr/bin/perl -w
> use strict;
>
> #the string as reported from your error.
> my $string1 = 'Query  1   WWWKWRW  7';
>
> #your string with a colon thrown in for testing.
> my $string2 = 'Query:  1   WWWKWRW  7';
>
> foreach ($string1, $string2){
> 	if(/^((Query|Sbjct):\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/){
> 		print "Match Found in $_\n";
> 		print $1."\n";
> 		print $2."\n";
> 		print $3."\n";
> 		print $4."\n";
> 		print $5."\n";
> 	}else{
> 		print "no Match for $_\n";
> 	}
> }
>
> -=End Code=-
>
> The Output
>
> -=Code Snippet=-
> no Match for Query  1   WWWKWRW  7
> Match Found in Query:  1   WWWKWRW  7
> Query:  1
> Query
> 1
> WWWKWRW
> 7
>
> -=End Code=-
>
>
> Now I would suggest changing the regexp
>
> From:
> /^((Query|Sbjct)\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> To:
> /^((Query|Sbjct):?\s+(\-?\d+)\s*)(\S+)\s+(\-?\d+)/
>
> in SearchIO::Blast.
>
> General suggestion:
> Again I would like to suggest that everyone get use to using the  
> strict
> pragma. Though it may not applicable to this particular problem it  
> becomes
> essential if you wish progress in your use of Perl.
> It is a core module so there is nothing to download from CPAN. It  
> helps with
> development and once your code can run without warnings and errors  
> you can
> remove it. This is not a targeted attack as some may interpret it,  
> rather a
> general FYI for those out there new to Perl or programming in general.
> Better to start learning the rules early before bad habits creep in.
> One more thing. There is a wonderfully supportive Perl community  
> available
> to anyone who wants to join at PerlMonks.org check it out, who  
> knows you may
> even catch a glimpse of Larry Wall while youre there.
>
> -Joel Steele
>
> "The surest way to corrupt a youth is to instruct him to hold in  
> higher
> regard those who think alike than those who think differently." - 
> Nietzsche
>
> "I do not feel obliged to believe that the same God who endowed us  
> with
> sense, reason and intellect has intended us to forego their use." - 
> Galileo
>
>
>
>
>> From: Hubert Prielinger 
>> To: rahall2 at ualr.edu, bioperl-l at bioperl.org, Chris Fields
>> ,        Jason Stajich 
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>> parsingBlast	output
>> Date: Thu, 09 Feb 2006 14:13:31 -0600
>> MIME-Version: 1.0
>> Received: from newportal.open-bio.org ([209.59.5.172]) by
>> bay0-mc3-f3.bay0.hotmail.com with Microsoft SMTPSVC(6.0.3790.211);  
>> Thu, 9
>> Feb 2006 13:14:17 -0800
>> Received: from newportal.open-bio.org (localhost.localdomain  
>> [127.0.0.1])by
>> newportal.open-bio.org (8.13.1/8.13.1) with ESMTP id  
>> k19LAD2j009778;Thu, 9
>> Feb 2006 16:10:49 -0500
>> Received: from mail.gmx.net (mail.gmx.de [213.165.64.21])by
>> newportal.open-bio.org (8.13.1/8.13.1) with SMTP id k19L9xBm009764for
>> ; Thu, 9 Feb 2006 16:09:59 -0500
>> Received: (qmail invoked by alias); 09 Feb 2006 21:10:05 -0000
>> Received: from ppc7.bio.ucalgary.ca (EHLO [136.159.234.7])
>> [136.159.234.7]by mail.gmx.net (mp018) with SMTP; 09 Feb 2006  
>> 22:10:05
>> +0100
>> X-Message-Info: N4u0pqWW+O09Rw986s70rvz+qniXEeX0FLoTz5maLnA=
>> X-Authenticated: #16854991
>> User-Agent: Mozilla Thunderbird 1.0.7-1.1.fc4 (X11/20050929)
>> X-Accept-Language: en-us, en
>> References: <004301c62db4$c9bcbab0$d416a790 at LIBERAL>
>> X-Y-GMX-Trusted: 0
>> X-Greylist: Sender IP whitelisted, not delayed by milter- 
>> greylist-2.0.2
>> (newportal.open-bio.org [127.0.0.1]); Thu, 09 Feb 2006 16:12:08  
>> -0500 (EST)
>> X-Greylist: IP, sender and recipient auto-whitelisted, not delayed
>> bymilter-greylist-2.0.2 (newportal.open-bio.org  
>> [207.154.17.70]);Thu, 09
>> Feb 2006 16:09:59 -0500 (EST)
>> X-Spam-Score: (0) X-Spam-Score: (-0.001) SPF_PASS
>> X-Scanned-By: MIMEDefang 2.52
>> X-Scanned-By: MIMEDefang 2.52 on 207.154.17.70
>> X-BeenThere: bioperl-l at lists.open-bio.org
>> X-Mailman-Version: 2.1.7
>> Precedence: list
>> List-Id: Bioperl Project Discussion List > bio.org>
>> List-Unsubscribe:
>> > l>,
>> List-Archive: 
>> List-Post: 
>> List-Help: 
>> List-Subscribe:
>> > l>,
>> Errors-To: bioperl-l-bounces at lists.open-bio.org
>> Return-Path: bioperl-l-bounces at lists.open-bio.org
>> X-OriginalArrivalTime: 09 Feb 2006 21:14:17.0706 (UTC)
>> FILETIME=[C95D94A0:01C62DBD]
>>
>> dear roger,
>> this error message I got, when I tried to parse Blast output (version
>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>> a lot
>> of Blast output files
>> with version 2.2.13 and for that I don't get any error message.....it
>> just doesn't work
>>
>> Hubert
>>
>>
>>
>> Roger Hall wrote:
>>
>>> Guys - I'm looking at the error message:
>>>
>>> MSG: no data for midline Query  1   WWWKWRW  7
>>> STACK Bio::SearchIO::blast::next_result
>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>> STACK toplevel
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> This is my line of thought:
>>> 1. "no data for midline $_" is a unique message generated by  
>>> blast.pm in
>> one
>>> location only at the point of a. reading three lines b. dropping  
>>> lines
>> with
>>> spaces only c. identifying the Query, Midline, and Match lines (0  
>>> <= $i <
>> 3)
>>> 2. There is a regexp match that fails in order to reach that error
>> message
>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>>> expression
>>> 4. It does anyway
>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in  
>>> the blast
>>> reports
>>>
>>> I suspect a newline/chomp/metacharacter issue. Not finding the  
>>> string
>>> anywhere has me thoroughly confused - I asked Hubert for the  
>>> additional
>>> file, assuming that I didn't have it.
>>>
>>> My next thought is to write a quick script to test perl behavior on
>> "Fedora
>>> Core 9".
>>>
>>> Thoughts?
>>>
>>> Did I misread the issue entirely? :}
>>>
>>> Roger
>>>
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>>> Fields
>>> Sent: Thursday, February 09, 2006 10:16 AM
>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>> Cc: bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast
>>> output
>>>
>>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>> To: Hubert Prielinger
>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>> parsing Blast output
>>>>
>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>
>>>>
>>>>> hi chris,
>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>
>>>>>
>>>> working,
>>>>
>>>>
>>>>> do you have any ohter idea, the problem I have is that I
>>>>>
>>>>>
>>>> have to parse
>>>>
>>>>
>>>>> a lot of textfiles....
>>>>> or shall I look for another option to parse those files...
>>>>>
>>>>> regards
>>>>> Hubert
>>>>>
>>>>>
>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>> 2.2.13 reports but unless you post your blast report we can't
>>>> really determine the problem.
>>>>
>>>> If you are still getting the same error like this I am not
>>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>>> the fact that NCBI changed the HSP result format to remove
>>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>>> as it was apparent sometime in September.
>>>>
>>>>
>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> If you are just getting no results but also no warnings wrt
>>>> parsing, are you sure your logic is correct?
>>>>
>>>> If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>> while (my $result = $search->next_result) {
>>>>     print $result->query_name, "\n";
>>>>     #iterate over each hit on the query sequence
>>>>     while (my $hit = $result->next_hit) {
>>>> 	print $hit->name, "\n";
>>>>         #iterate over each HSP in the hit
>>>>         while (my $hsp = $hit->next_hsp) {
>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>> hit_string, "\n";
>>>>        }
>>>>    }
>>>> }
>>>>
>>>>
>>>
>>> I tested some of the BLAST results that Hubert sent Roger and me  
>>> with a
>>> similar script to the above.  I removed the file parsing logic  
>>> and it
>> seemed
>>> to work just fine.  It may very well be a logic issue or that he  
>>> hasn't
>>> installed the latest fix.
>>>
>>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>>> 2.2.13),
>> even
>>> though the returned output was from nr, the top of the blast output
>> showed
>>> that it was v2.2.12:
>>>
>>> BLASTP 2.2.12 [Aug-07-2005]
>>>
>>> I double-checked my local version and it's definitely v.2.2.13:
>>> -------------------------------------
>>> C:\Perl\Scripts>blastcl3 -
>>>
>>> blastcl3 2.2.13   arguments:...
>>> -------------------------------------
>>>
>>> If you use RemoteBlast using the same settings, the version in  
>>> the header
>>> looks like this:
>>>
>>> BLASTP 2.2.13 [Nov-27-2005]
>>>
>>> I'm wondering if all the blast executables (blast and netblast)  
>>> from NCBI
>>> have text output like v.2.2.12, while the wwwblast outputs a new  
>>> format
>>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>>
>>>
>>>> To clarify some stuff -
>>>> Chris I don't necessarily think the XML is best way forward
>>>> for BLAST reports generated locally, it isn't as detailed as
>>>> the Text format and it is what most people expect to be able
>>>> to scroll through and parse -- it is also harder for the
>>>> format to change dramatically if you have a static binary on
>>>> your machine =).  I think for remoteblast the XML format
>>>> should be the way forward but I expect Bioperl to maintain
>>>> support of any plain text BLAST report format that people use
>>>> on a regular basis.
>>>>
>>>>
>>>>
>>>
>>> Does XML lack some specific info that text output has?  Didn't  
>>> know that.
>>  I
>>> believe that XML should be default in RemoteBlast since it will not
>> break,
>>> but I agree with you about text output.  I also agree that it  
>>> will need
>>> somebody to maintain it constantly, much like RemoteBlast.
>>>
>>>
>>>
>>>> -jason
>>>>
>>>>
>>>>> Chris Fields wrote:
>>>>>
>>>>>
>>>>>
>>>>>> My guess is you're running into text parsing problems in
>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>> (1.5.1) or
>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>> I think the first problem you ran into is solved in bioperl  
>>>>>> 1.5.1,
>>>>>> the last problem (more recent, not related to the first) has been
>>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>>> SearchIO::blast is available in the link above, but
>>>>>>
>>>>>>
>>>> realize it hasn't
>>>>
>>>>
>>>>>> been committed yet and may change.
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of  
>>>>>>> Hubert
>>>>>>> Prielinger
>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>> To: bioperl-l at bioperl.org
>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> output
>>>>>>>
>>>>>>> Hi,
>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>> STACK toplevel
>>>>>>>
>>>>>>>
>>>>>>>
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>>
>>>>>>> is that a bug......
>>>>>>>
>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>>> anything.....
>>>>>>> I'm using bioperl 1.4
>>>>>>>
>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>
>>>>>>>
>>>> parsing Blast
>>>>
>>>>
>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>
>>>>>>>
>>>> bioperl version
>>>>
>>>>
>>>>>>> I had installed
>>>>>>>
>>>>>>> thanks in advance
>>>>>>>
>>>>>>> Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>> --
>>>> Jason Stajich
>>>> Duke University
>>>> http://www.duke.edu/~jes12
>>>>
>>>>
>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From boris.steipe at utoronto.ca  Thu Feb  9 21:54:53 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Thu, 9 Feb 2006 16:54:53 -0500
Subject: [Bioperl-l] Tool to mutate DNA sequence
In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
Message-ID: <1B7E8DA9-86F5-4411-B16C-E6573E5E8C36@utoronto.ca>

Golf, anyone?


#!/usr/bin/perl -nl
for(split//){push at a,$_}
END{
   while($n/@a<0.5) {
     $p=rand(@a);
     if($a[$p]=~/[A-Z]/){$a[$p]=lc((grep!/$a[$p]/,split//,"ACGT")[rand 
(3)]);
       $n++;
     }
   }
print @a;
}

(144, not counting \s and the # !line )

:-)


B.



>> Does anyone know of tool to mutate a DNA sequence by a specified
>> amount?
>> For instance, say I have a DNA sequence 1000 bases long, and I  
>> want to
>> simulate mutations to make it 75% (or 80%, etc) similar to the
>> original.
>>
>>
>> Ryan
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hubert.prielinger at gmx.at  Thu Feb  9 22:20:46 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Thu, 09 Feb 2006 16:20:46 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
	output
In-Reply-To: <000e01c62dca$bc66df60$15327e82@pyrimidine>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>
Message-ID: <43EBC03E.4040900@gmx.at>

Hi Chris,
I'm incredibly sorry for causing so much inconvenience, yes you are 
right, I had only to change the blast.pm file, it is working very fine, 
thank you very much, and you are right, you have mentioned it ealier 
either to change the file... ;)

but I have another question: does it work with the WU-Blast output too? 

regards
Hubert


Chris Fields wrote:

>Ha!  I come back from meeting and there's a billion emails!  What have we
>started? ;p .  Sorry about this Jason; I know you're busy.
>
>Hubert, if you're out there, I sent you an email with an attachment.  You
>said the output looks like what you were expecting.  So I think we have two
>problems:
>
>1)  I haven't delved into the file scanning, but the fact that it takes so
>long should tell you something's seriously wrong there.  Strip that part out
>and start with a simple script, say, like the one Jason or that I sent you;
>the script I used to generate that output works fine (on two OS's, WinXP and
>Mac OS X).  Use it on one file at a time.  Do everything on command line
>(not through Eclipse).  IDE's can be notoriously flaky about running
>scripts, esp. when they run debugging.  
>
>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast will still
>not work whenever the text blast output has the following header, which
>comes from the new web version of BLAST:
>
>-----------------------------------------------------
>BLASTP 2.2.13 [Nov-27-2005]
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Sch??ffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman 
>(1997), "Gapped BLAST and PSI-BLAST: a new generation of 
>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>
>RID: 1139501210-857-165793005128.BLASTQ1
>
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           3,292,813 sequences; 1,128,164,434 total letters
>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis 
>H37Rv].
>Length=193
>.......
>-----------------------------------------------------
>
>It will work if the text output has the following header (or is an older
>version of BLAST):
>
>-----------------------------------------------------
>BLASTP 2.2.12 [Aug-07-2005]
>
>
>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, 
>Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), 
>"Gapped BLAST and PSI-BLAST: a new generation of protein database search
>programs",  Nucleic Acids Res. 25:3389-3402.
>
>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>tuberculosis H37Rv].
>         (193 letters)
>
>Database: All non-redundant GenBank CDS
>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>           2,895,325 sequences; 997,103,285 total letters
>-----------------------------------------------------
>You have the former (2.2.13) version.  I know b/c I have your BLAST files.
>Therefore, even bioperl-1.5.1 will not work!
>
>If you want the really gory details on why this is a problem, look here:
>
>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>
>So, any text output with the above header will not work; it will either hang
>or end abruptly (depending on OS, perl version, memory, patience).  If you
>look in the above, I have added a preliminary fix for this.  I'll reiterate
>for the billionth time, it hasn't been committed yet, so don't kill me if
>blows your computer up ;>   
>
>Here's the direct link:
>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>This is a modified version of Bio::SearchIO::blast.pm (it says it's version
>1.90, but it's lying, I didn't change the version, only the regex; sorry
>Jason).  From what you've been posting it doesn't sound like you've tried
>this, and I believe I've suggested this fix before.
>
>Replace the one in your Bio/SearchIO directory (which looks like
>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your prev.
>message) with this file.  Make sure the filename stays the same (blast.pm).
>
>Run everything again, one file at a time.  Make sure you use Jason's script
>as well as the one I sent you.  Do NOT rely on running through multiple
>files yet.  Fix one bug at a time.  And heed Joel's words about file checks.
>
>
>Here's a small chunk of output from one of your blast files using the
>modifed script I sent you:
>
>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>Query:   1  RWKWKRKK  8
>Seq:     542  RWAWRRKK  549
>
>Look familiar?
>
>Christopher Fields
>Postdoctoral Researcher - Switzer Lab
>Dept. of Biochemistry
>University of Illinois Urbana-Champaign  
>
>  
>
>>-----Original Message-----
>>From: Roger Hall [mailto:rahall2 at ualr.edu] 
>>Sent: Thursday, February 09, 2006 3:24 PM
>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>In other words, yes, I'm on the wrong trail. :}
>>
>>Sorry - I'll look at the output issue this evening (or 
>>realize that Chris already solved the issue).  ;}
>>
>>Thanks!
>>
>>Roger
>>
>>-----Original Message-----
>>From: bioperl-l-bounces at lists.open-bio.org
>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>Hubert Prielinger
>>Sent: Thursday, February 09, 2006 2:14 PM
>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; 
>>Jason Stajich
>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>parsing Blast output
>>
>>dear roger,
>>this error message I got, when I tried to parse Blast output (version
>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I 
>>have a lot of Blast output files with version 2.2.13 and for 
>>that I don't get any error message.....it just doesn't work
>>
>>Hubert
>>
>>
>>
>>Roger Hall wrote:
>>
>>    
>>
>>>Guys - I'm looking at the error message:
>>>
>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>STACK Bio::SearchIO::blast::next_result
>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>STACK toplevel
>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>This is my line of thought:
>>>1. "no data for midline $_" is a unique message generated by 
>>>      
>>>
>>blast.pm 
>>    
>>
>>>in
>>>      
>>>
>>one
>>    
>>
>>>location only at the point of a. reading three lines b. 
>>>      
>>>
>>dropping lines 
>>    
>>
>>>with spaces only c. identifying the Query, Midline, and 
>>>      
>>>
>>Match lines (0 
>>    
>>
>>><= $i <
>>>      
>>>
>>3)
>>    
>>
>>>2. There is a regexp match that fails in order to reach that 
>>>      
>>>
>>error message
>>    
>>
>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the 
>>>      
>>>
>>expression
>>    
>>
>>>4. It does anyway
>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere 
>>>      
>>>
>>in the blast
>>    
>>
>>>reports
>>>
>>>I suspect a newline/chomp/metacharacter issue. Not finding 
>>>      
>>>
>>the string 
>>    
>>
>>>anywhere has me thoroughly confused - I asked Hubert for the 
>>>      
>>>
>>additional 
>>    
>>
>>>file, assuming that I didn't have it.
>>>
>>>My next thought is to write a quick script to test perl behavior on 
>>>"Fedora Core 9".
>>>
>>>Thoughts?
>>>
>>>Did I misread the issue entirely? :}
>>>
>>>Roger
>>>
>>>
>>>-----Original Message-----
>>>From: bioperl-l-bounces at lists.open-bio.org
>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
>>>      
>>>
>>Chris Fields
>>    
>>
>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>Cc: bioperl-l at bioperl.org
>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>Blast output
>>>
>>>
>>> 
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>To: Hubert Prielinger
>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing 
>>>>Blast output
>>>>
>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>   
>>>>
>>>>        
>>>>
>>>>>hi chris,
>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>working,
>>>>   
>>>>
>>>>        
>>>>
>>>>>do you have any ohter idea, the problem I have is that I
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>have to parse
>>>>   
>>>>
>>>>        
>>>>
>>>>>a lot of textfiles....
>>>>>or shall I look for another option to parse those files...
>>>>>
>>>>>regards
>>>>>Hubert
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>2.2.13 reports but unless you post your blast report we 
>>>>        
>>>>
>>can't really 
>>    
>>
>>>>determine the problem.
>>>>
>>>>If you are still getting the same error like this I am not 
>>>>        
>>>>
>>convinced 
>>    
>>
>>>>you have upgraded to 1.5.1 which includes a fix in the fact 
>>>>        
>>>>
>>that NCBI 
>>    
>>
>>>>changed the HSP result format to remove the ':' from the 
>>>>        
>>>>
>>Query/Sbjct 
>>    
>>
>>>>prefixes.  We fixed this as soon as it was apparent sometime in 
>>>>September.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>
>>>>If you are just getting no results but also no warnings wrt 
>>>>        
>>>>
>>parsing, 
>>    
>>
>>>>are you sure your logic is correct?
>>>>
>>>>If you remove your filters do you see all the HSPS?
>>>>
>>>>
>>>>while (my $result = $search->next_result) {
>>>>    print $result->query_name, "\n";
>>>>    #iterate over each hit on the query sequence
>>>>    while (my $hit = $result->next_hit) {
>>>>	print $hit->name, "\n";
>>>>        #iterate over each HSP in the hit
>>>>        while (my $hsp = $hit->next_hsp) {
>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>        
>>>>
>>>>>hit_string, "\n";	
>>>>>          
>>>>>
>>>>       }
>>>>   }
>>>>}
>>>>   
>>>>
>>>>        
>>>>
>>>I tested some of the BLAST results that Hubert sent Roger 
>>>      
>>>
>>and me with a 
>>    
>>
>>>similar script to the above.  I removed the file parsing logic and it
>>>      
>>>
>>seemed
>>    
>>
>>>to work just fine.  It may very well be a logic issue or 
>>>      
>>>
>>that he hasn't 
>>    
>>
>>>installed the latest fix.
>>>   
>>>It's a funny thing, though.  When I tried using blastcl3 (v. 
>>>      
>>>
>>2.2.13), 
>>    
>>
>>>even though the returned output was from nr, the top of the blast 
>>>output showed that it was v2.2.12:
>>>
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>I double-checked my local version and it's definitely v.2.2.13:
>>>-------------------------------------
>>>C:\Perl\Scripts>blastcl3 -
>>>
>>>blastcl3 2.2.13   arguments:...
>>>-------------------------------------
>>>
>>>If you use RemoteBlast using the same settings, the version in the 
>>>header looks like this:
>>>
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>
>>>I'm wondering if all the blast executables (blast and netblast) from 
>>>NCBI have text output like v.2.2.12, while the wwwblast 
>>>      
>>>
>>outputs a new 
>>    
>>
>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>
>>> 
>>>
>>>      
>>>
>>>>To clarify some stuff -
>>>>Chris I don't necessarily think the XML is best way forward 
>>>>        
>>>>
>>for BLAST 
>>    
>>
>>>>reports generated locally, it isn't as detailed as the Text 
>>>>        
>>>>
>>format and 
>>    
>>
>>>>it is what most people expect to be able to scroll through 
>>>>        
>>>>
>>and parse 
>>    
>>
>>>>-- it is also harder for the format to change dramatically 
>>>>        
>>>>
>>if you have 
>>    
>>
>>>>a static binary on your machine =).  I think for 
>>>>        
>>>>
>>remoteblast the XML 
>>    
>>
>>>>format should be the way forward but I expect Bioperl to maintain 
>>>>support of any plain text BLAST report format that people use on a 
>>>>regular basis.
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Does XML lack some specific info that text output has?  
>>>      
>>>
>>Didn't know that.
>>I
>>    
>>
>>>believe that XML should be default in RemoteBlast since it will not 
>>>break, but I agree with you about text output.  I also agree that it 
>>>will need somebody to maintain it constantly, much like RemoteBlast.
>>>
>>> 
>>>
>>>      
>>>
>>>>-jason
>>>>   
>>>>
>>>>        
>>>>
>>>>>Chris Fields wrote:
>>>>>
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>>>My guess is you're running into text parsing problems in 
>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>(1.5.1) or
>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>
>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>
>>>>>>I think the first problem you ran into is solved in 
>>>>>>            
>>>>>>
>>bioperl 1.5.1, 
>>    
>>
>>>>>>the last problem (more recent, not related to the first) has been 
>>>>>>fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>>>>SearchIO::blast is available in the link above, but
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>realize it hasn't
>>>>   
>>>>
>>>>        
>>>>
>>>>>>been committed yet and may change.
>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf 
>>>>>>>              
>>>>>>>
>>Of Hubert 
>>    
>>
>>>>>>>Prielinger
>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>output
>>>>>>>
>>>>>>>Hi,
>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>is that a bug......
>>>>>>>
>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>>>>anything.....
>>>>>>>I'm using bioperl 1.4
>>>>>>>
>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>parsing Blast
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>bioperl version
>>>>   
>>>>
>>>>        
>>>>
>>>>>>>I had installed
>>>>>>>
>>>>>>>thanks in advance
>>>>>>>
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>       
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>--
>>>>Jason Stajich
>>>>Duke University
>>>>http://www.duke.edu/~jes12
>>>>
>>>>   
>>>>
>>>>        
>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>> 
>>>
>>>      
>>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>    
>>
>
>
>  
>



From olenka.m at gmail.com  Thu Feb  9 22:49:48 2006
From: olenka.m at gmail.com (Olena Morozova)
Date: Thu, 9 Feb 2006 17:49:48 -0500
Subject: [Bioperl-l] Bio::TreeIO
Message-ID: <259a224c0602091449u353e4bf1g5a3cfbb46297217a@mail.gmail.com>

Hi all,

Probably a very stupid question, but the get_lca function does not
work for unrooted trees, does it?
I am trying to get the LCA for a set of nodes in a phylip tree, and I
am using the script in the HOWTO.
Thanks,
Olena

On 2/8/06, Hubert Prielinger  wrote:
> Hi,
> If I want to parse a Blast Output (Version 2.2.12) with Bio::SearchIO,
> I get the following error message:
>
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
> is that a bug......
>
> If I want to parse Blast Output (version 2.2.13), I don't get anything.....
> I'm using bioperl 1.4
>
> before, I have installed bioperl 1.4, it worked fine parsing Blast
> Output (version 2.2.12), but I don't remember which bioperl version I
> had installed
>
> thanks in advance
>
> Hubert
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From victor.ruotti at gmail.com  Thu Feb  9 23:22:11 2006
From: victor.ruotti at gmail.com (Victor)
Date: Thu, 9 Feb 2006 17:22:11 -0600
Subject: [Bioperl-l] Running BLAT with BioPerl
Message-ID: <36d7e5550602091522g114728a2w57f2a1cb7c1383ee@mail.gmail.com>

Hi,
Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to date
in the lastest bioperl release?



use Bio::Tools::Run::Alignment::Blat;
my $factory = Bio::Tools::Run::Alignment::Blat->new();
my $seq =
"TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";

my @feats = $factory->run( $seq);

Here is what I get when tring to use it:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
/tmp/fB09bp5F76) crashed: -1

Notice that it is using "blat' twice in the path. The way that I fixed this
is by going to the blat.pm module and changing the following lines:
#my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
my $str= Bio::Root::IO->catfile($self->program_name);

Any ideas, maybe I'm missing the $ENV variable somewhere?
I'd like to avoid making this change. Also does anyone have a known synopsis
of this blat module (where to set the parameters, and whether it allows you
to have a config file).
I'll be happy to add a better synopsis to the module if needed.

Thanks in advance,
Victor



From osborne1 at optonline.net  Fri Feb 10 01:37:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 09 Feb 2006 20:37:39 -0500
Subject: [Bioperl-l] module for finding restriction site in batch of
 sequences?
In-Reply-To: 
Message-ID: 

Claudia,

Yes, Bio::Restricion does this, see bptutorial.pl for code examples. Note
that statement "@fragments = $analysis->fragments($enzyme)". If the array
@fragments has more than 1 element that means your sequence has a site for
the enzyme in question.

Alternatively it sounds like you could use some kind of regular expression.

Brian O.


On 2/9/06 3:53 PM, "Lalancette, Claudia"  wrote:

> Greetings,
> 
>  
> 
> I need to find a way to look for a specific restriction enzyme site in
> hundreds of sequences.  Been looking at Bio::Restriction, but not sure
> if will work...  Any suggestions?
> 
>  
> 
> Thanks,
> 
> Claudia
> 
>  
> 
>  
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Fri Feb 10 01:52:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 9 Feb 2006 19:52:34 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
	output
In-Reply-To: <43EBC03E.4040900@gmx.at>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>
	<43EBC03E.4040900@gmx.at>
Message-ID: 

 From 'perldoc Bio::SearchIO::blast':

DESCRIPTION
        This object encapsulated the necessary methods for generating  
events
        suitable for building Bio::Search objects from a BLAST report  
file.
        Read the Bio::SearchIO for more information about how to use  
this.

        This driver can parse:

        o   NCBI produced plain text BLAST reports from blastall,  
this also
            includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
reports.  NCBI
            XML BLAST output is parsed with the blastxml SearchIO driver

        o   WU-BLAST all reports

        o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
BLAT)

        o   BLAST-like output from Paracel BTK output

So, it should.  Let us know if it doesn't.

On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:

> Hi Chris,
> I'm incredibly sorry for causing so much inconvenience, yes you are  
> right, I had only to change the blast.pm file, it is working very  
> fine, thank you very much, and you are right, you have mentioned it  
> ealier either to change the file... ;)
>
> but I have another question: does it work with the WU-Blast output  
> too?
> regards
> Hubert
>
>
> Chris Fields wrote:
>
>> Ha!  I come back from meeting and there's a billion emails!  What  
>> have we
>> started? ;p .  Sorry about this Jason; I know you're busy.
>>
>> Hubert, if you're out there, I sent you an email with an  
>> attachment.  You
>> said the output looks like what you were expecting.  So I think we  
>> have two
>> problems:
>>
>> 1)  I haven't delved into the file scanning, but the fact that it  
>> takes so
>> long should tell you something's seriously wrong there.  Strip  
>> that part out
>> and start with a simple script, say, like the one Jason or that I  
>> sent you;
>> the script I used to generate that output works fine (on two OS's,  
>> WinXP and
>> Mac OS X).  Use it on one file at a time.  Do everything on  
>> command line
>> (not through Eclipse).  IDE's can be notoriously flaky about running
>> scripts, esp. when they run debugging.
>> 2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>> will still
>> not work whenever the text blast output has the following header,  
>> which
>> comes from the new web version of BLAST:
>>
>> -----------------------------------------------------
>> BLASTP 2.2.13 [Nov-27-2005]
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>
>> RID: 1139501210-857-165793005128.BLASTQ1
>>
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           3,292,813 sequences; 1,128,164,434 total letters
>> Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>> Length=193
>> .......
>> -----------------------------------------------------
>>
>> It will work if the text output has the following header (or is an  
>> older
>> version of BLAST):
>>
>> -----------------------------------------------------
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>>
>> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>> Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>> Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>> protein database search
>> programs",  Nucleic Acids Res. 25:3389-3402.
>>
>> Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>> tuberculosis H37Rv].
>>         (193 letters)
>>
>> Database: All non-redundant GenBank CDS
>> translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>           2,895,325 sequences; 997,103,285 total letters
>> -----------------------------------------------------
>> You have the former (2.2.13) version.  I know b/c I have your  
>> BLAST files.
>> Therefore, even bioperl-1.5.1 will not work!
>>
>> If you want the really gory details on why this is a problem, look  
>> here:
>>
>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>
>> So, any text output with the above header will not work; it will  
>> either hang
>> or end abruptly (depending on OS, perl version, memory,  
>> patience).  If you
>> look in the above, I have added a preliminary fix for this.  I'll  
>> reiterate
>> for the billionth time, it hasn't been committed yet, so don't  
>> kill me if
>> blows your computer up ;>
>> Here's the direct link:
>> http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>> This is a modified version of Bio::SearchIO::blast.pm (it says  
>> it's version
>> 1.90, but it's lying, I didn't change the version, only the regex;  
>> sorry
>> Jason).  From what you've been posting it doesn't sound like  
>> you've tried
>> this, and I believe I've suggested this fix before.
>>
>> Replace the one in your Bio/SearchIO directory (which looks like
>> '/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>> prev.
>> message) with this file.  Make sure the filename stays the same  
>> (blast.pm).
>>
>> Run everything again, one file at a time.  Make sure you use  
>> Jason's script
>> as well as the one I sent you.  Do NOT rely on running through  
>> multiple
>> files yet.  Fix one bug at a time.  And heed Joel's words about  
>> file checks.
>>
>>
>> Here's a small chunk of output from one of your blast files using the
>> modifed script I sent you:
>>
>> sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>> Query:   1  RWKWKRKK  8
>> Seq:     542  RWAWRRKK  549
>>
>> Look familiar?
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>> February 09, 2006 3:24 PM
>>> To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> In other words, yes, I'm on the wrong trail. :}
>>>
>>> Sorry - I'll look at the output issue this evening (or realize  
>>> that Chris already solved the issue).  ;}
>>>
>>> Thanks!
>>>
>>> Roger
>>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org
>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>> Prielinger
>>> Sent: Thursday, February 09, 2006 2:14 PM
>>> To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>> Stajich
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>> parsing Blast output
>>>
>>> dear roger,
>>> this error message I got, when I tried to parse Blast output  
>>> (version
>>> 2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>> a lot of Blast output files with version 2.2.13 and for that I  
>>> don't get any error message.....it just doesn't work
>>>
>>> Hubert
>>>
>>>
>>>
>>> Roger Hall wrote:
>>>
>>>
>>>> Guys - I'm looking at the error message:
>>>>
>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>> STACK Bio::SearchIO::blast::next_result
>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>> STACK toplevel
>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>> Blast.pl:21
>>>>
>>>> This is my line of thought:
>>>> 1. "no data for midline $_" is a unique message generated by
>>> blast.pm
>>>> in
>>>>
>>> one
>>>
>>>> location only at the point of a. reading three lines b.
>>> dropping lines
>>>> with spaces only c. identifying the Query, Midline, and
>>> Match lines (0
>>>> <= $i <
>>>>
>>> 3)
>>>
>>>> 2. There is a regexp match that fails in order to reach that
>>> error message
>>>
>>>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>> expression
>>>
>>>> 4. It does anyway
>>>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>> in the blast
>>>
>>>> reports
>>>>
>>>> I suspect a newline/chomp/metacharacter issue. Not finding
>>> the string
>>>> anywhere has me thoroughly confused - I asked Hubert for the
>>> additional
>>>> file, assuming that I didn't have it.
>>>>
>>>> My next thought is to write a quick script to test perl behavior  
>>>> on "Fedora Core 9".
>>>>
>>>> Thoughts?
>>>>
>>>> Did I misread the issue entirely? :}
>>>>
>>>> Roger
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>> Chris Fields
>>>
>>>> Sent: Thursday, February 09, 2006 10:16 AM
>>>> To: 'Jason Stajich'; 'Hubert Prielinger'
>>>> Cc: bioperl-l at bioperl.org
>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>> parsing Blast output
>>>>
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>> Sent: Thursday, February 09, 2006 9:13 AM
>>>>> To: Hubert Prielinger
>>>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>> parsing Blast output
>>>>>
>>>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>
>>>>>
>>>>>> hi chris,
>>>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>
>>>>>>
>>>>> working,
>>>>>
>>>>>
>>>>>> do you have any ohter idea, the problem I have is that I
>>>>>>
>>>>>>
>>>>> have to parse
>>>>>
>>>>>
>>>>>> a lot of textfiles....
>>>>>> or shall I look for another option to parse those files...
>>>>>>
>>>>>> regards
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>> The code from Bioperl 1.5.1 works fine for me for blast
>>>>> 2.2.13 reports but unless you post your blast report we
>>> can't really
>>>>> determine the problem.
>>>>>
>>>>> If you are still getting the same error like this I am not
>>> convinced
>>>>> you have upgraded to 1.5.1 which includes a fix in the fact
>>> that NCBI
>>>>> changed the HSP result format to remove the ':' from the
>>> Query/Sbjct
>>>>> prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>> September.
>>>>>
>>>>>
>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>> If you are just getting no results but also no warnings wrt
>>> parsing,
>>>>> are you sure your logic is correct?
>>>>>
>>>>> If you remove your filters do you see all the HSPS?
>>>>>
>>>>>
>>>>> while (my $result = $search->next_result) {
>>>>>    print $result->query_name, "\n";
>>>>>    #iterate over each hit on the query sequence
>>>>>    while (my $hit = $result->next_hit) {
>>>>> 	print $hit->name, "\n";
>>>>>        #iterate over each HSP in the hit
>>>>>        while (my $hsp = $hit->next_hsp) {
>>>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>
>>>>>> hit_string, "\n";	
>>>>>>
>>>>>       }
>>>>>   }
>>>>> }
>>>>>
>>>>>
>>>> I tested some of the BLAST results that Hubert sent Roger
>>> and me with a
>>>> similar script to the above.  I removed the file parsing logic  
>>>> and it
>>>>
>>> seemed
>>>
>>>> to work just fine.  It may very well be a logic issue or
>>> that he hasn't
>>>> installed the latest fix.
>>>>   It's a funny thing, though.  When I tried using blastcl3 (v.
>>> 2.2.13),
>>>> even though the returned output was from nr, the top of the  
>>>> blast output showed that it was v2.2.12:
>>>>
>>>> BLASTP 2.2.12 [Aug-07-2005]
>>>>
>>>> I double-checked my local version and it's definitely v.2.2.13:
>>>> -------------------------------------
>>>> C:\Perl\Scripts>blastcl3 -
>>>>
>>>> blastcl3 2.2.13   arguments:...
>>>> -------------------------------------
>>>>
>>>> If you use RemoteBlast using the same settings, the version in  
>>>> the header looks like this:
>>>>
>>>> BLASTP 2.2.13 [Nov-27-2005]
>>>>
>>>> I'm wondering if all the blast executables (blast and netblast)  
>>>> from NCBI have text output like v.2.2.12, while the wwwblast
>>> outputs a new
>>>> format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>
>>>>
>>>>
>>>>> To clarify some stuff -
>>>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST
>>>>> reports generated locally, it isn't as detailed as the Text
>>> format and
>>>>> it is what most people expect to be able to scroll through
>>> and parse
>>>>> -- it is also harder for the format to change dramatically        
>>> if you have
>>>>> a static binary on your machine =).  I think for
>>> remoteblast the XML
>>>>> format should be the way forward but I expect Bioperl to  
>>>>> maintain support of any plain text BLAST report format that  
>>>>> people use on a regular basis.
>>>>>
>>>>>
>>>>>
>>>> Does XML lack some specific info that text output has?
>>> Didn't know that.
>>> I
>>>
>>>> believe that XML should be default in RemoteBlast since it will  
>>>> not break, but I agree with you about text output.  I also agree  
>>>> that it will need somebody to maintain it constantly, much like  
>>>> RemoteBlast.
>>>>
>>>>
>>>>
>>>>> -jason
>>>>>
>>>>>
>>>>>> Chris Fields wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>>> My guess is you're running into text parsing problems in  
>>>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>> (1.5.1) or
>>>>>>> bioperl-live (CVS), then see the bug below.
>>>>>>>
>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>
>>>>>>> I think the first problem you ran into is solved in
>>> bioperl 1.5.1,
>>>>>>> the last problem (more recent, not related to the first) has  
>>>>>>> been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>> The fixed SearchIO::blast is available in the link above, but
>>>>>>>
>>>>>>>
>>>>> realize it hasn't
>>>>>
>>>>>
>>>>>>> been committed yet and may change.
>>>>>>>
>>>>>>> Christopher Fields
>>>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>> Of Hubert
>>>>>>>> Prielinger
>>>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>> To: bioperl-l at bioperl.org
>>>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> output
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>> If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>>>
>>>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>> STACK toplevel
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>> /home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>> Blast.pl:21
>>>>>
>>>>>
>>>>>>>> is that a bug......
>>>>>>>>
>>>>>>>> If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>> get anything.....
>>>>>>>> I'm using bioperl 1.4
>>>>>>>>
>>>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>>>
>>>>>>>>
>>>>> parsing Blast
>>>>>
>>>>>
>>>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>>>
>>>>>>>>
>>>>> bioperl version
>>>>>
>>>>>
>>>>>>>> I had installed
>>>>>>>>
>>>>>>>> thanks in advance
>>>>>>>>
>>>>>>>> Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>> --
>>>>> Jason Stajich
>>>>> Duke University
>>>>> http://www.duke.edu/~jes12
>>>>>
>>>>>
>>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>>
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign






From heikki at sanbi.ac.za  Fri Feb 10 04:47:42 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 06:47:42 +0200
Subject: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
In-Reply-To: <000901c62dbf$49bfae20$15327e82@pyrimidine>
References: <000901c62dbf$49bfae20$15327e82@pyrimidine>
Message-ID: <200602100647.43173.heikki@sanbi.ac.za>

On Thursday 09 February 2006 23:25, Chris Fields wrote:
> Thanks!  I think, as long as the tests pass everything is fine with me.  I
> may be submitting another module or two in the next few weeks; just depends
> on how much time I can spend on them.

Looking forwart to them!

	-Heikki

> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: Heikki Lehvaslaiho [mailto:heikki at sanbi.ac.za]
> > Sent: Thursday, February 09, 2006 1:42 PM
> > To: bioperl-l at lists.open-bio.org
> > Cc: Chris Fields
> > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> >
> > Chris,
> >
> > I committed your file. All tests pass; code looks like
> > written by a long term bioperl contributor! Impressive.
> >
> > I truncated the larger test file from 270K to 20K (200
> > lines), to not bloat the distribution unnecessarily. Tests
> > pass which is the main thing. Shout if if you disagree.
> >
> > Great job!
> >
> > 	-Heikki
> >
> > On Thursday 09 February 2006 19:53, Chris Fields wrote:
> > > Heikki,
> > >
> > > I've added the Bio::Tools::RNAMotif module with test suite
> >
> > (24 tests)
> >
> > > and two test data files to bugzilla.  The first data file is needed
> > > for normal tests, the second is for testing parsing with
> >
> > modified data
> >
> > > in the score tag (using sprintf() in the RNAMotif
> >
> > descriptor).  I ran
> >
> > > 'perl t\RNAMotif.t' and they all passed.
> > >
> > > Thanks!
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org
> > > > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
> > > > Lehvaslaiho
> > > > Sent: Wednesday, February 08, 2006 12:54 AM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Cc: Chris Fields
> > > > Subject: Re: [Bioperl-l] RNAMotif module (Bio::Tools::RNAMotif)
> > > >
> > > > Chris,
> > > >
> > > > Post your files to bugzilla (ticket type enhancement, add
> >
> > files to
> >
> > > > ticket after creation)  and someone with commit ability will add
> > > > them to CVS once the code is in satisfactory condition.
> > > >
> > > > Thanks,
> > > >
> > > > 	-Heikki
> > > >
> > > > On Wednesday 08 February 2006 06:52, Chris Fields wrote:
> > > > > I want to submit a module for parsing RNAMotif output
> > > > > (Bio::Tools::RNAMotif).  It is capable, at the moment,
> >
> > of scanning
> >
> > > > > output and returning Bio::SeqFeature::Generic objects with
> > > >
> > > > added tags
> > > >
> > > > > for descriptors/sequences/file info.  I'm in the process of
> > > >
> > > > writing up
> > > >
> > > > > tests and going through biodesign to make sure everything's
> > > > > kosher, but the module itself is essentially ready-to-go.  What
> > > > > should I do next?
> > > > >
> > > > > Christopher Fields
> > > > > Postdoctoral Researcher
> > > > > Lab of Dr. Robert Switzer
> > > > > Dept of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > >
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > > --
> > > > ______ _/
> >
> > _/_____________________________________________________
> >
> > > >       _/      _/
> > > >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> > > >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> > > >    _/  _/  _/  SANBI, South African National
> >
> > Bioinformatics Institute
> >
> > > >   _/  _/  _/  University of Western Cape, South Africa
> > > >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > > > ___
> > > > _/_/_/_/_/________________________________________________________
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > ______ _/      _/_____________________________________________________
> >       _/      _/
> >      _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >     _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
> >    _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >   _/  _/  _/  University of Western Cape, South Africa
> >      _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___
> > _/_/_/_/_/________________________________________________________
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Fri Feb 10 04:51:11 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 06:51:11 +0200
Subject: [Bioperl-l] module for finding restriction site in batch of
	sequences?
In-Reply-To: 
References: 
Message-ID: <200602100651.12028.heikki@sanbi.ac.za>


It should:

#loop over each seq
    my $ra=Bio::Restriction::Analysis->new(-seq=>$seq1);
    @cuts = $ra->fragments('EcoRI'); # or call some other method

or is it something else you are trying to do?

Yours,
	-Heikki


On Thursday 09 February 2006 22:53, Lalancette, Claudia wrote:
> Greetings,
>
>
>
> I need to find a way to look for a specific restriction enzyme site in
> hundreds of sequences.  Been looking at Bio::Restriction, but not sure
> if will work...  Any suggestions?
>
>
>
> Thanks,
>
> Claudia
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Fri Feb 10 07:06:11 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Fri, 10 Feb 2006 09:06:11 +0200
Subject: [Bioperl-l] planning sequence mutating modules
Message-ID: <200602100906.11885.heikki@sanbi.ac.za>


Ryan Golhar's mail got me thinking that we should have a simple framework for 
mutating sequences to a desired level. The model can then be extended to 
necessary complexity when needed by subclassing.

To start with, I have been planning:


Bio::SeqEvolution::EvolutionI - interface file
Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
        (defaults to Bio::PrimarySeq)
Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
Bio::SeqEvolution::EvolutionI::each_seqs($count) 
       - returns an array of $count seqs
Bio::SeqEvolution::EvolutionI::_generate_seq() 
Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
      converteed to probabilites of change internally

  various methods to define the extent of divergence:
  only one to start with:
Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
   (= 100% - identity)

Bio::SeqEvolution::Factory - core class to call,
         instantiates subclasses, Bio::SeqEvolution::DNASimple for nucleotides
Bio::SeqEvolution::EvolutionI::type() - evolution model,
      defaults to Bio::SeqEvolution::DNASimple for nucleotides


Bio::SeqEvolution::DNASimple - default for nucleotides
Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
        e.g. 5 => 5:1, defaults to 1:1
        simple alternative to a scoring matrix


I am soliciting usual comments and suggestions about naming and minimal 
functionality.


   -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From Pieter.Monsieurs at esat.kuleuven.be  Fri Feb 10 08:53:43 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Fri, 10 Feb 2006 09:53:43 +0100
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing
	blast	output
In-Reply-To: 
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>
	
Message-ID: <43EC5497.3050505@esat.kuleuven.be>

Hi Chris,

The parsing of the Blast output still doesn't work for me with the bug 
fix download of blast.pm.
The module keeps turning around in the while loop at line 487 looking 
for a database or query-size:

while( defined ($_) ) {
	if( /^Database:/ ) {
		$self->_pushback($_);
		last;
	}
	chomp;               
	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
		$size = $1;
		$size =~ s/,//g;
		last;
	} else {
		$q .= " $_";
		$q =~ s/ +/ /g;
		$q =~ s/^ | $//g;
	}
	$_ = $self->_readline;
}


The code keeps looking for the database information, however - as you 
mentioned - this information is given before the query line in the new 
Blast output format.
This way, all hits and hsps are stored in the query_description 
($hit->query_description), no hits are found and query_length is 0.
Because you already adapted the module to retrieve database information 
at another position in the module, deleting the while loop and adding 
the following lines after $_ = $self->_readline (line 486), worked fine 
for me (using blastn and blastp):

if (/Length=([\d,]+)/) {
	$size = $1;
	$size =~ s/,//g;
}


Regards,
Pieter



Chris Fields wrote:

> From 'perldoc Bio::SearchIO::blast':
>
>DESCRIPTION
>        This object encapsulated the necessary methods for generating  
>events
>        suitable for building Bio::Search objects from a BLAST report  
>file.
>        Read the Bio::SearchIO for more information about how to use  
>this.
>
>        This driver can parse:
>
>        o   NCBI produced plain text BLAST reports from blastall,  
>this also
>            includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>reports.  NCBI
>            XML BLAST output is parsed with the blastxml SearchIO driver
>
>        o   WU-BLAST all reports
>
>        o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>BLAT)
>
>        o   BLAST-like output from Paracel BTK output
>
>So, it should.  Let us know if it doesn't.
>
>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>
>  
>
>>Hi Chris,
>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>right, I had only to change the blast.pm file, it is working very  
>>fine, thank you very much, and you are right, you have mentioned it  
>>ealier either to change the file... ;)
>>
>>but I have another question: does it work with the WU-Blast output  
>>too?
>>regards
>>Hubert
>>
>>
>>Chris Fields wrote:
>>
>>    
>>
>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>have we
>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>
>>>Hubert, if you're out there, I sent you an email with an  
>>>attachment.  You
>>>said the output looks like what you were expecting.  So I think we  
>>>have two
>>>problems:
>>>
>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>takes so
>>>long should tell you something's seriously wrong there.  Strip  
>>>that part out
>>>and start with a simple script, say, like the one Jason or that I  
>>>sent you;
>>>the script I used to generate that output works fine (on two OS's,  
>>>WinXP and
>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>command line
>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>scripts, esp. when they run debugging.
>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>will still
>>>not work whenever the text blast output has the following header,  
>>>which
>>>comes from the new web version of BLAST:
>>>
>>>-----------------------------------------------------
>>>BLASTP 2.2.13 [Nov-27-2005]
>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>
>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>
>>>
>>>Database: All non-redundant GenBank CDS
>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>          3,292,813 sequences; 1,128,164,434 total letters
>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>tuberculosis H37Rv].
>>>Length=193
>>>.......
>>>-----------------------------------------------------
>>>
>>>It will work if the text output has the following header (or is an  
>>>older
>>>version of BLAST):
>>>
>>>-----------------------------------------------------
>>>BLASTP 2.2.12 [Aug-07-2005]
>>>
>>>
>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>protein database search
>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>
>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>tuberculosis H37Rv].
>>>        (193 letters)
>>>
>>>Database: All non-redundant GenBank CDS
>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>          2,895,325 sequences; 997,103,285 total letters
>>>-----------------------------------------------------
>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>BLAST files.
>>>Therefore, even bioperl-1.5.1 will not work!
>>>
>>>If you want the really gory details on why this is a problem, look  
>>>here:
>>>
>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>
>>>So, any text output with the above header will not work; it will  
>>>either hang
>>>or end abruptly (depending on OS, perl version, memory,  
>>>patience).  If you
>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>reiterate
>>>for the billionth time, it hasn't been committed yet, so don't  
>>>kill me if
>>>blows your computer up ;>
>>>Here's the direct link:
>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>it's version
>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>sorry
>>>Jason).  From what you've been posting it doesn't sound like  
>>>you've tried
>>>this, and I believe I've suggested this fix before.
>>>
>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>prev.
>>>message) with this file.  Make sure the filename stays the same  
>>>(blast.pm).
>>>
>>>Run everything again, one file at a time.  Make sure you use  
>>>Jason's script
>>>as well as the one I sent you.  Do NOT rely on running through  
>>>multiple
>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>file checks.
>>>
>>>
>>>Here's a small chunk of output from one of your blast files using the
>>>modifed script I sent you:
>>>
>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>Query:   1  RWKWKRKK  8
>>>Seq:     542  RWAWRRKK  549
>>>
>>>Look familiar?
>>>
>>>Christopher Fields
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>      
>>>
>>>>-----Original Message-----
>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>February 09, 2006 3:24 PM
>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>parsing Blast output
>>>>
>>>>In other words, yes, I'm on the wrong trail. :}
>>>>
>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>that Chris already solved the issue).  ;}
>>>>
>>>>Thanks!
>>>>
>>>>Roger
>>>>
>>>>-----Original Message-----
>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>Prielinger
>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>Stajich
>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>parsing Blast output
>>>>
>>>>dear roger,
>>>>this error message I got, when I tried to parse Blast output  
>>>>(version
>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>don't get any error message.....it just doesn't work
>>>>
>>>>Hubert
>>>>
>>>>
>>>>
>>>>Roger Hall wrote:
>>>>
>>>>
>>>>        
>>>>
>>>>>Guys - I'm looking at the error message:
>>>>>
>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>STACK toplevel
>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>Blast.pl:21
>>>>>
>>>>>This is my line of thought:
>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>          
>>>>>
>>>>blast.pm
>>>>        
>>>>
>>>>>in
>>>>>
>>>>>          
>>>>>
>>>>one
>>>>
>>>>        
>>>>
>>>>>location only at the point of a. reading three lines b.
>>>>>          
>>>>>
>>>>dropping lines
>>>>        
>>>>
>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>          
>>>>>
>>>>Match lines (0
>>>>        
>>>>
>>>>><= $i <
>>>>>
>>>>>          
>>>>>
>>>>3)
>>>>
>>>>        
>>>>
>>>>>2. There is a regexp match that fails in order to reach that
>>>>>          
>>>>>
>>>>error message
>>>>
>>>>        
>>>>
>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>          
>>>>>
>>>>expression
>>>>
>>>>        
>>>>
>>>>>4. It does anyway
>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>          
>>>>>
>>>>in the blast
>>>>
>>>>        
>>>>
>>>>>reports
>>>>>
>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>          
>>>>>
>>>>the string
>>>>        
>>>>
>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>          
>>>>>
>>>>additional
>>>>        
>>>>
>>>>>file, assuming that I didn't have it.
>>>>>
>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>on "Fedora Core 9".
>>>>>
>>>>>Thoughts?
>>>>>
>>>>>Did I misread the issue entirely? :}
>>>>>
>>>>>Roger
>>>>>
>>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>          
>>>>>
>>>>Chris Fields
>>>>
>>>>        
>>>>
>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>Cc: bioperl-l at bioperl.org
>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>To: Hubert Prielinger
>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>hi chris,
>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>working,
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>have to parse
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>a lot of textfiles....
>>>>>>>or shall I look for another option to parse those files...
>>>>>>>
>>>>>>>regards
>>>>>>>Hubert
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>            
>>>>>>
>>>>can't really
>>>>        
>>>>
>>>>>>determine the problem.
>>>>>>
>>>>>>If you are still getting the same error like this I am not
>>>>>>            
>>>>>>
>>>>convinced
>>>>        
>>>>
>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>            
>>>>>>
>>>>that NCBI
>>>>        
>>>>
>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>            
>>>>>>
>>>>Query/Sbjct
>>>>        
>>>>
>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>September.
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>STACK toplevel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>            
>>>>>>
>>>>parsing,
>>>>        
>>>>
>>>>>>are you sure your logic is correct?
>>>>>>
>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>
>>>>>>
>>>>>>while (my $result = $search->next_result) {
>>>>>>   print $result->query_name, "\n";
>>>>>>   #iterate over each hit on the query sequence
>>>>>>   while (my $hit = $result->next_hit) {
>>>>>>	print $hit->name, "\n";
>>>>>>       #iterate over each HSP in the hit
>>>>>>       while (my $hsp = $hit->next_hsp) {
>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>hit_string, "\n";	
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>      }
>>>>>>  }
>>>>>>}
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>          
>>>>>
>>>>and me with a
>>>>        
>>>>
>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>and it
>>>>>
>>>>>          
>>>>>
>>>>seemed
>>>>
>>>>        
>>>>
>>>>>to work just fine.  It may very well be a logic issue or
>>>>>          
>>>>>
>>>>that he hasn't
>>>>        
>>>>
>>>>>installed the latest fix.
>>>>>  It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>          
>>>>>
>>>>2.2.13),
>>>>        
>>>>
>>>>>even though the returned output was from nr, the top of the  
>>>>>blast output showed that it was v2.2.12:
>>>>>
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>-------------------------------------
>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>
>>>>>blastcl3 2.2.13   arguments:...
>>>>>-------------------------------------
>>>>>
>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>the header looks like this:
>>>>>
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>
>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>          
>>>>>
>>>>outputs a new
>>>>        
>>>>
>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>To clarify some stuff -
>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>            
>>>>>>
>>>>for BLAST
>>>>        
>>>>
>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>            
>>>>>>
>>>>format and
>>>>        
>>>>
>>>>>>it is what most people expect to be able to scroll through
>>>>>>            
>>>>>>
>>>>and parse
>>>>        
>>>>
>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>            
>>>>>>
>>>>if you have
>>>>        
>>>>
>>>>>>a static binary on your machine =).  I think for
>>>>>>            
>>>>>>
>>>>remoteblast the XML
>>>>        
>>>>
>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>people use on a regular basis.
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>Does XML lack some specific info that text output has?
>>>>>          
>>>>>
>>>>Didn't know that.
>>>>I
>>>>
>>>>        
>>>>
>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>RemoteBlast.
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>>>-jason
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>Chris Fields wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>(1.5.1) or
>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>
>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>
>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>                
>>>>>>>>
>>>>bioperl 1.5.1,
>>>>        
>>>>
>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>realize it hasn't
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>been committed yet and may change.
>>>>>>>>
>>>>>>>>Christopher Fields
>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>-----Original Message-----
>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>                  
>>>>>>>>>
>>>>Of Hubert
>>>>        
>>>>
>>>>>>>>>Prielinger
>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>parsing Blast
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>output
>>>>>>>>>
>>>>>>>>>Hi,
>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>
>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>STACK toplevel
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>is that a bug......
>>>>>>>>>
>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>get anything.....
>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>
>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>parsing Blast
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>bioperl version
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>>>I had installed
>>>>>>>>>
>>>>>>>>>thanks in advance
>>>>>>>>>
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>--
>>>>>>Jason Stajich
>>>>>>Duke University
>>>>>>http://www.duke.edu/~jes12
>>>>>>
>>>>>>
>>>>>>
>>>>>>            
>>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>          
>>>>>
>>>>_______________________________________________
>>>>Bioperl-l mailing list
>>>>Bioperl-l at lists.open-bio.org
>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>>        
>>>>
>>>
>>>      
>>>
>
>Christopher Fields
>Postdoctoral Researcher
>Lab of Dr. Robert Switzer
>Dept of Biochemistry
>University of Illinois Urbana-Champaign
>
>
>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From Pieter.Monsieurs at esat.kuleuven.be  Fri Feb 10 09:44:10 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Fri, 10 Feb 2006 10:44:10 +0100
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	blast	output
In-Reply-To: <43EC5497.3050505@esat.kuleuven.be>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>	
	<43EC5497.3050505@esat.kuleuven.be>
Message-ID: <43EC606A.20003@esat.kuleuven.be>

Sorry for disturbing. I now works correctly with the bug fix of Chris. 
Thanx,
Pieter

Pieter Monsieurs wrote:

>Hi Chris,
>
>The parsing of the Blast output still doesn't work for me with the bug 
>fix download of blast.pm.
>The module keeps turning around in the while loop at line 487 looking 
>for a database or query-size:
>
>while( defined ($_) ) {
>	if( /^Database:/ ) {
>		$self->_pushback($_);
>		last;
>	}
>	chomp;               
>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>		$size = $1;
>		$size =~ s/,//g;
>		last;
>	} else {
>		$q .= " $_";
>		$q =~ s/ +/ /g;
>		$q =~ s/^ | $//g;
>	}
>	$_ = $self->_readline;
>}
>
>
>The code keeps looking for the database information, however - as you 
>mentioned - this information is given before the query line in the new 
>Blast output format.
>This way, all hits and hsps are stored in the query_description 
>($hit->query_description), no hits are found and query_length is 0.
>Because you already adapted the module to retrieve database information 
>at another position in the module, deleting the while loop and adding 
>the following lines after $_ = $self->_readline (line 486), worked fine 
>for me (using blastn and blastp):
>
>if (/Length=([\d,]+)/) {
>	$size = $1;
>	$size =~ s/,//g;
>}
>
>
>Regards,
>Pieter
>
>
>
>Chris Fields wrote:
>
>  
>
>>From 'perldoc Bio::SearchIO::blast':
>>
>>DESCRIPTION
>>       This object encapsulated the necessary methods for generating  
>>events
>>       suitable for building Bio::Search objects from a BLAST report  
>>file.
>>       Read the Bio::SearchIO for more information about how to use  
>>this.
>>
>>       This driver can parse:
>>
>>       o   NCBI produced plain text BLAST reports from blastall,  
>>this also
>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>reports.  NCBI
>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>
>>       o   WU-BLAST all reports
>>
>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>BLAT)
>>
>>       o   BLAST-like output from Paracel BTK output
>>
>>So, it should.  Let us know if it doesn't.
>>
>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>
>> 
>>
>>    
>>
>>>Hi Chris,
>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>right, I had only to change the blast.pm file, it is working very  
>>>fine, thank you very much, and you are right, you have mentioned it  
>>>ealier either to change the file... ;)
>>>
>>>but I have another question: does it work with the WU-Blast output  
>>>too?
>>>regards
>>>Hubert
>>>
>>>
>>>Chris Fields wrote:
>>>
>>>   
>>>
>>>      
>>>
>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>have we
>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>
>>>>Hubert, if you're out there, I sent you an email with an  
>>>>attachment.  You
>>>>said the output looks like what you were expecting.  So I think we  
>>>>have two
>>>>problems:
>>>>
>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>takes so
>>>>long should tell you something's seriously wrong there.  Strip  
>>>>that part out
>>>>and start with a simple script, say, like the one Jason or that I  
>>>>sent you;
>>>>the script I used to generate that output works fine (on two OS's,  
>>>>WinXP and
>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>command line
>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>scripts, esp. when they run debugging.
>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>will still
>>>>not work whenever the text blast output has the following header,  
>>>>which
>>>>comes from the new web version of BLAST:
>>>>
>>>>-----------------------------------------------------
>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>
>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>
>>>>
>>>>Database: All non-redundant GenBank CDS
>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>tuberculosis H37Rv].
>>>>Length=193
>>>>.......
>>>>-----------------------------------------------------
>>>>
>>>>It will work if the text output has the following header (or is an  
>>>>older
>>>>version of BLAST):
>>>>
>>>>-----------------------------------------------------
>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>
>>>>
>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>protein database search
>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>
>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>tuberculosis H37Rv].
>>>>       (193 letters)
>>>>
>>>>Database: All non-redundant GenBank CDS
>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>-----------------------------------------------------
>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>BLAST files.
>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>
>>>>If you want the really gory details on why this is a problem, look  
>>>>here:
>>>>
>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>>So, any text output with the above header will not work; it will  
>>>>either hang
>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>patience).  If you
>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>reiterate
>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>kill me if
>>>>blows your computer up ;>
>>>>Here's the direct link:
>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>it's version
>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>sorry
>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>you've tried
>>>>this, and I believe I've suggested this fix before.
>>>>
>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>prev.
>>>>message) with this file.  Make sure the filename stays the same  
>>>>(blast.pm).
>>>>
>>>>Run everything again, one file at a time.  Make sure you use  
>>>>Jason's script
>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>multiple
>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>file checks.
>>>>
>>>>
>>>>Here's a small chunk of output from one of your blast files using the
>>>>modifed script I sent you:
>>>>
>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>Query:   1  RWKWKRKK  8
>>>>Seq:     542  RWAWRRKK  549
>>>>
>>>>Look familiar?
>>>>
>>>>Christopher Fields
>>>>Postdoctoral Researcher - Switzer Lab
>>>>Dept. of Biochemistry
>>>>University of Illinois Urbana-Champaign
>>>>
>>>>     
>>>>
>>>>        
>>>>
>>>>>-----Original Message-----
>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>February 09, 2006 3:24 PM
>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>
>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>that Chris already solved the issue).  ;}
>>>>>
>>>>>Thanks!
>>>>>
>>>>>Roger
>>>>>
>>>>>-----Original Message-----
>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>Prielinger
>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>Stajich
>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>parsing Blast output
>>>>>
>>>>>dear roger,
>>>>>this error message I got, when I tried to parse Blast output  
>>>>>(version
>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>don't get any error message.....it just doesn't work
>>>>>
>>>>>Hubert
>>>>>
>>>>>
>>>>>
>>>>>Roger Hall wrote:
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Guys - I'm looking at the error message:
>>>>>>
>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>STACK toplevel
>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>Blast.pl:21
>>>>>>
>>>>>>This is my line of thought:
>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>blast.pm
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>in
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>one
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>location only at the point of a. reading three lines b.
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>dropping lines
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Match lines (0
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>><= $i <
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>3)
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>error message
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>expression
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>4. It does anyway
>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>in the blast
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>reports
>>>>>>
>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>the string
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>additional
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>file, assuming that I didn't have it.
>>>>>>
>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>on "Fedora Core 9".
>>>>>>
>>>>>>Thoughts?
>>>>>>
>>>>>>Did I misread the issue entirely? :}
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Chris Fields
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>To: Hubert Prielinger
>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>hi chris,
>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>working,
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>have to parse
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>a lot of textfiles....
>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>
>>>>>>>>regards
>>>>>>>>Hubert
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>can't really
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>determine the problem.
>>>>>>>
>>>>>>>If you are still getting the same error like this I am not
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>convinced
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>that NCBI
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>Query/Sbjct
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>September.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>STACK toplevel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>parsing,
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>are you sure your logic is correct?
>>>>>>>
>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>
>>>>>>>
>>>>>>>while (my $result = $search->next_result) {
>>>>>>>  print $result->query_name, "\n";
>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>	print $hit->name, "\n";
>>>>>>>      #iterate over each HSP in the hit
>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>hit_string, "\n";	
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>     }
>>>>>>> }
>>>>>>>}
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>and me with a
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>and it
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>seemed
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>that he hasn't
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>installed the latest fix.
>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>2.2.13),
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>even though the returned output was from nr, the top of the  
>>>>>>blast output showed that it was v2.2.12:
>>>>>>
>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>
>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>-------------------------------------
>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>
>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>-------------------------------------
>>>>>>
>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>the header looks like this:
>>>>>>
>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>
>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>            
>>>>>>
>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>          
>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>outputs a new
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>To clarify some stuff -
>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>for BLAST
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>format and
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>and parse
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>if you have
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>a static binary on your machine =).  I think for
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>remoteblast the XML
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>people use on a regular basis.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Does XML lack some specific info that text output has?
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>Didn't know that.
>>>>>I
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>RemoteBlast.
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>>>-jason
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>Chris Fields wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>(1.5.1) or
>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>
>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>bioperl 1.5.1,
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>realize it hasn't
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>been committed yet and may change.
>>>>>>>>>
>>>>>>>>>Christopher Fields
>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>Of Hubert
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>>>>>>>Prielinger
>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>parsing Blast
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>output
>>>>>>>>>>
>>>>>>>>>>Hi,
>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>
>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>STACK toplevel
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>is that a bug......
>>>>>>>>>>
>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>get anything.....
>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>
>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>parsing Blast
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>bioperl version
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>>>>>I had installed
>>>>>>>>>>
>>>>>>>>>>thanks in advance
>>>>>>>>>>
>>>>>>>>>>Hubert
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>                 
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>               
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>Bioperl-l mailing list
>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>             
>>>>>>>>
>>>>>>>>                
>>>>>>>>
>>>>>>>--
>>>>>>>Jason Stajich
>>>>>>>Duke University
>>>>>>>http://www.duke.edu/~jes12
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>           
>>>>>>>
>>>>>>>              
>>>>>>>
>>>>>>Christopher Fields
>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>Dept. of Biochemistry
>>>>>>University of Illinois Urbana-Champaign
>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>         
>>>>>>
>>>>>>            
>>>>>>
>>>>>_______________________________________________
>>>>>Bioperl-l mailing list
>>>>>Bioperl-l at lists.open-bio.org
>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>       
>>>>>
>>>>>          
>>>>>
>>>>     
>>>>
>>>>        
>>>>
>>Christopher Fields
>>Postdoctoral Researcher
>>Lab of Dr. Robert Switzer
>>Dept of Biochemistry
>>University of Illinois Urbana-Champaign
>>
>>
>>
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> 
>>
>>    
>>
>
>
>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From andrej.kastrin at guest.arnes.si  Fri Feb 10 14:28:19 2006
From: andrej.kastrin at guest.arnes.si (Andrej Kastrin)
Date: Fri, 10 Feb 2006 15:28:19 +0100
Subject: [Bioperl-l] Medline to XML
Message-ID: <43ECA303.8090904@guest.arnes.si>

Dear users,

my problem is not directly related to this list, by I hope, you can help 
me. Is there any tool to convert standard Medline record to XML format. 
I know there is build in function (med2xml) in Pubmed, but I'm looking 
for some independent perl script.

Thanks in advance for any suggesions or pointers.

Cheers, Andrej


From cjfields at uiuc.edu  Fri Feb 10 17:01:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 11:01:27 -0600
Subject: [Bioperl-l] Handling miRNA's
In-Reply-To: 
Message-ID: <001801c62e63$a4a71090$15327e82@pyrimidine>

I don't think there's anything like this in Bioperl, and I'm unfamilar with
the naming scheme you're using.  If you're searching for specific miRNA's, a
good resource looks like the miRNA database, which seems to be updated
regularly (http://microrna.sanger.ac.uk/sequences/) and uses the same system
for RNA annotation that you use (which, I'm guessing, is a standardized
annotation scheme of some sort).  I believe the database is downloadable and
searchable by name, so you could probably build a querying scheme using LWP
or HTTP::Request (if the web interface allows for this).  I know that Sean
Eddy's Rfam database (http://www.sanger.ac.uk/Software/Rfam/) also has
information on miRNA's, but it's somewhat limited. 


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> barry.m.dancis at gsk.com
> Sent: Wednesday, February 08, 2006 3:45 PM
> To: 'bioperl-l'; bioperl-l-bounces at lists.open-bio.org
> Cc: James.R.Brown at gsk.com
> Subject: Re: [Bioperl-l] Handling miRNA's
> 
> Hi Chris--
> 
>         The problem I am solving is given a mature miRna 
> name, how do I use it to search for its pre/pri miRna and 
> vice versa. For example, how to go from mir-102a* to 
> hsa-mir-102a-1*. Yes, I can write a parser for it, but I'm 
> hoping that someone else has already done it and has some 
> bells and whistles to go with it.  Below is a hierarchy chart 
> of a data structure to hold the naming information. The 
> parsing is not trivial and given data in that structure there 
> could be all kinds of neat functions that return various 
> aspects of the names.
> 
> Barry
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> "Chris Fields" 
> Sent by: bioperl-l-bounces at lists.open-bio.org
> 07-Feb-2006 17:40
>  
> To
> barry.m.dancis at gsk.com, "'bioperl-l'"  cc
> 
> Subject
> Re: [Bioperl-l] Handling miRNA's
> 
> 
> 
> 
> 
> 
> Are you talking about sequences or text output from a 
> specific program? If you are talking about sequences in a 
> particular format, then listen to Brian.  If you are talking 
> about output, then we need to know which program you're 
> using, as a parser may exist or could be built. 
> 
> There are a few modules in Bio::Tools that handle RNA (like 
> QRNA, tRNAscan-SE), so check those out first.  I'm currently 
> finishing up a Bio::Tools module for RNAMotif and have plans 
> for making an ERPIN parser.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> > barry.m.dancis at gsk.com
> > Sent: Tuesday, February 07, 2006 2:26 PM
> > To: bioperl-l; bioperl-l-bounces at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Handling miRNA's
> > 
> > It's the parser in particular that I need
> > 
> > 
> > 
> > 
> > "Brian Osborne"  Sent by: 
> > bioperl-l-bounces at lists.open-bio.org
> > 07-Feb-2006 12:05
> > 
> > To
> > barry.m.dancis at gsk.com, "bioperl-l" , 
> > bioperl-l-bounces at lists.open-bio.org
> > cc
> > 
> > Subject
> > Re: [Bioperl-l] Handling miRNA's
> > 
> > 
> > 
> > 
> > 
> > 
> > Barry,
> > 
> > If the sequence information is in one of the formats that Bioperl 
> > understands (Genbank, Swissprot flat, and so on) then the answer is 
> > yes.
> > This assumes that the details on sequence that you 
> mentioned are found 
> > in some sequence feature section in the file. But it looks 
> to me like 
> > there's no specialized parser for miRNA sequence per se, I'll be 
> > corrected if I'm wrong.
> > 
> > Brian O.
> > 
> > 
> > On 2/6/06 12:17 PM, "barry.m.dancis at gsk.com" 
> 
> > wrote:
> > 
> > > Hi --
> > > 
> > >         Are there any classes for manipulating miRNA's with
> > functions
> > such
> > > as parsing the name, storing and interlinking pri/pre/mat 
> sequences,
> > etc?
> > > 
> > > Thanks,
> > > 
> > > Barry
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 



From allenday at ucla.edu  Fri Feb 10 16:13:39 2006
From: allenday at ucla.edu (Allen Day)
Date: Fri, 10 Feb 2006 08:13:39 -0800 (PST)
Subject: [Bioperl-l] Medline to XML
In-Reply-To: <43ECA303.8090904@guest.arnes.si>
References: <43ECA303.8090904@guest.arnes.si>
Message-ID: 

why not just retrieve xml directly from the eutils service?

-allen

On Fri, 10 Feb 2006, Andrej Kastrin wrote:

> Dear users,
> 
> my problem is not directly related to this list, by I hope, you can help 
> me. Is there any tool to convert standard Medline record to XML format. 
> I know there is build in function (med2xml) in Pubmed, but I'm looking 
> for some independent perl script.
> 
> Thanks in advance for any suggesions or pointers.
> 
> Cheers, Andrej
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason.stajich at duke.edu  Fri Feb 10 17:15:17 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 10 Feb 2006 12:15:17 -0500
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
Message-ID: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>

Paul -

The reason for suggesting a change has to do with the instability of  
the CGI interface/format of the returned data, the text format is not  
a stable format from the webserver which reportedly will cease to be  
reliably parsed.  Yes we can keep hacking the blast parser code to  
handle this, but the bioperl release cycle is certainly not tied to  
the NCBI blast release cycle so I find it unsatisfying to know that  
we are going to have broken code when they change the output formats  
(but not know when).

Mostly I think we need to try and support something that will  
"ALWAYS" work so that individuals setting up webservices which rely  
on remote blast functionality.  In theory, netblast/blastcl3 should  
always work since NCBI has to update the exe when they change their  
server setup.

In terms of the web-based queues - I think the best change we can  
make is have the XML be the preferred retrieval method.

I also see value in providing a wrapper for netblast since it should  
look an awful lot like running blast locally.

Ideally I'd like to see a more extensible system, something like (and  
please feel free to come up with better names for the modules!):

Bio::Tools::Run::Blast
  -->             StandAlone (support for both WU-BLAST and NCBI- 
BLAST local binaries and MPI-BLAST too if simple)
  -->             RemoteNCBI (currently the RemoteBlast server)
  -->             RemoteEBISOAP (EBI has a nice SOAP interface that  
works quite well, but may not provide all the same databases as what  
people expect from NCBI)
  -->             RemoteNetBlast (blastcl3 or netblast local executable)
  (other things that people want)

[note: If these ideas are appealing or not, someone should archive  
the discussions and discussions on the wiki page so we can rely less  
on people searching the mailing archives for how a decision was  
made.  Perhaps Roger can do this sort of editing in addition to the  
planning for support of this module].

-jason

On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote:

> Hi Roger,
>
> I would definitely prefer a fully Perl-based implementation.  For  
> starters, I have not
> been successful in compiling the Toolkit that contains netblast for  
> some platforms (e.g.
> AIX 5.2 w/gcc 4.0).
>
> I haven't been following the discussion: is there some compelling  
> reason to prefer a
> netblast-based system that's come up recently?  I'm guessing that  
> adding a new non-perl
> dependency would only be done if there was considerable  
> justification for this type of
> change, but I'm not clear from your message what that justification  
> is.
>
> Paul
>
>
>
> ------------------------------
>
> Message: 12
> Date: Mon, 6 Feb 2006 20:46:44 -0600
> From: "Roger Hall" 
> Subject: [Bioperl-l] RemoteBlast users - potentially major changes -
>         please        reply
> To: 
> Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL>
> Content-Type: text/plain;        charset="us-ascii"
>
> To everyone who uses RemoteBlast.pm:
>
> Would anyone object to RemoteBlast being rewritten in a way that  
> requires
> NCBI's blastcl3 executable?
>
> Binary downloads of blastcl3 (column "netblast") are available for  
> numerous
> platforms at: http://ncbi.nih.gov/BLAST/download.shtml
>
> Does anyone require or desire a "pure perl" implementation? If so,  
> please
> explain the advantage you see with such an implementation.
>
> Thanks!
>
>
> Roger Hall
>
> Technical Director
>
> MidSouth Bioinformatics Center
>
> University of Arkansas at Little Rock
>
> (501) 569-8074
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From hubert.prielinger at gmx.at  Fri Feb 10 16:26:47 2006
From: hubert.prielinger at gmx.at (Hubert Prielinger)
Date: Fri, 10 Feb 2006 10:26:47 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	blast	output
In-Reply-To: <43EC606A.20003@esat.kuleuven.be>
References: <000e01c62dca$bc66df60$15327e82@pyrimidine>	<43EBC03E.4040900@gmx.at>	
	<43EC5497.3050505@esat.kuleuven.be>
	<43EC606A.20003@esat.kuleuven.be>
Message-ID: <43ECBEC7.7040506@gmx.at>

Hi,
I'm sorry for disturbing once more. Yesterday the script was working, 
today it isn't working at all, but I didn't change anything, I get the 
following error message:

------------- EXCEPTION  -------------
MSG: Could not open comp80swiss2114.txt: No such file or directory
STACK Bio::Root::IO::_initialize_io 
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
STACK toplevel ./Blast.pl:14

--------------------------------------

the file exists and the bug I have fixed yesterday
thanks for help

Hubert




Pieter Monsieurs wrote:

> Sorry for disturbing. I now works correctly with the bug fix of Chris. 
> Thanx,
> Pieter
>
> Pieter Monsieurs wrote:
>
>>Hi Chris,
>>
>>The parsing of the Blast output still doesn't work for me with the bug 
>>fix download of blast.pm.
>>The module keeps turning around in the while loop at line 487 looking 
>>for a database or query-size:
>>
>>while( defined ($_) ) {
>>	if( /^Database:/ ) {
>>		$self->_pushback($_);
>>		last;
>>	}
>>	chomp;               
>>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>>		$size = $1;
>>		$size =~ s/,//g;
>>		last;
>>	} else {
>>		$q .= " $_";
>>		$q =~ s/ +/ /g;
>>		$q =~ s/^ | $//g;
>>	}
>>	$_ = $self->_readline;
>>}
>>
>>
>>The code keeps looking for the database information, however - as you 
>>mentioned - this information is given before the query line in the new 
>>Blast output format.
>>This way, all hits and hsps are stored in the query_description 
>>($hit->query_description), no hits are found and query_length is 0.
>>Because you already adapted the module to retrieve database information 
>>at another position in the module, deleting the while loop and adding 
>>the following lines after $_ = $self->_readline (line 486), worked fine 
>>for me (using blastn and blastp):
>>
>>if (/Length=([\d,]+)/) {
>>	$size = $1;
>>	$size =~ s/,//g;
>>}
>>
>>
>>Regards,
>>Pieter
>>
>>
>>
>>Chris Fields wrote:
>>
>>  
>>
>>>From 'perldoc Bio::SearchIO::blast':
>>>
>>>DESCRIPTION
>>>       This object encapsulated the necessary methods for generating  
>>>events
>>>       suitable for building Bio::Search objects from a BLAST report  
>>>file.
>>>       Read the Bio::SearchIO for more information about how to use  
>>>this.
>>>
>>>       This driver can parse:
>>>
>>>       o   NCBI produced plain text BLAST reports from blastall,  
>>>this also
>>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>>reports.  NCBI
>>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>>
>>>       o   WU-BLAST all reports
>>>
>>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>>BLAT)
>>>
>>>       o   BLAST-like output from Paracel BTK output
>>>
>>>So, it should.  Let us know if it doesn't.
>>>
>>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>>
>>> 
>>>
>>>    
>>>
>>>>Hi Chris,
>>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>>right, I had only to change the blast.pm file, it is working very  
>>>>fine, thank you very much, and you are right, you have mentioned it  
>>>>ealier either to change the file... ;)
>>>>
>>>>but I have another question: does it work with the WU-Blast output  
>>>>too?
>>>>regards
>>>>Hubert
>>>>
>>>>
>>>>Chris Fields wrote:
>>>>
>>>>   
>>>>
>>>>      
>>>>
>>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>>have we
>>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>>
>>>>>Hubert, if you're out there, I sent you an email with an  
>>>>>attachment.  You
>>>>>said the output looks like what you were expecting.  So I think we  
>>>>>have two
>>>>>problems:
>>>>>
>>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>>takes so
>>>>>long should tell you something's seriously wrong there.  Strip  
>>>>>that part out
>>>>>and start with a simple script, say, like the one Jason or that I  
>>>>>sent you;
>>>>>the script I used to generate that output works fine (on two OS's,  
>>>>>WinXP and
>>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>>command line
>>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>>scripts, esp. when they run debugging.
>>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>>will still
>>>>>not work whenever the text blast output has the following header,  
>>>>>which
>>>>>comes from the new web version of BLAST:
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>>
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>Length=193
>>>>>.......
>>>>>-----------------------------------------------------
>>>>>
>>>>>It will work if the text output has the following header (or is an  
>>>>>older
>>>>>version of BLAST):
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search
>>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>       (193 letters)
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>>-----------------------------------------------------
>>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>>BLAST files.
>>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>>
>>>>>If you want the really gory details on why this is a problem, look  
>>>>>here:
>>>>>
>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>>So, any text output with the above header will not work; it will  
>>>>>either hang
>>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>>patience).  If you
>>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>>reiterate
>>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>>kill me if
>>>>>blows your computer up ;>
>>>>>Here's the direct link:
>>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>>it's version
>>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>>sorry
>>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>>you've tried
>>>>>this, and I believe I've suggested this fix before.
>>>>>
>>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>>prev.
>>>>>message) with this file.  Make sure the filename stays the same  
>>>>>(blast.pm).
>>>>>
>>>>>Run everything again, one file at a time.  Make sure you use  
>>>>>Jason's script
>>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>>multiple
>>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>>file checks.
>>>>>
>>>>>
>>>>>Here's a small chunk of output from one of your blast files using the
>>>>>modifed script I sent you:
>>>>>
>>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>>Query:   1  RWKWKRKK  8
>>>>>Seq:     542  RWAWRRKK  549
>>>>>
>>>>>Look familiar?
>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>>February 09, 2006 3:24 PM
>>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>>
>>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>>that Chris already solved the issue).  ;}
>>>>>>
>>>>>>Thanks!
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>>Prielinger
>>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>>Stajich
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>dear roger,
>>>>>>this error message I got, when I tried to parse Blast output  
>>>>>>(version
>>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>>don't get any error message.....it just doesn't work
>>>>>>
>>>>>>Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>>Roger Hall wrote:
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Guys - I'm looking at the error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>This is my line of thought:
>>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>blast.pm
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>in
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>one
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>location only at the point of a. reading three lines b.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>dropping lines
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Match lines (0
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>><= $i <
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>3)
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>error message
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>expression
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>4. It does anyway
>>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>in the blast
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>reports
>>>>>>>
>>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>the string
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>additional
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>file, assuming that I didn't have it.
>>>>>>>
>>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>>on "Fedora Core 9".
>>>>>>>
>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Did I misread the issue entirely? :}
>>>>>>>
>>>>>>>Roger
>>>>>>>
>>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Chris Fields
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>>To: Hubert Prielinger
>>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>>parsing Blast output
>>>>>>>>
>>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hi chris,
>>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>working,
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>have to parse
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>a lot of textfiles....
>>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>>
>>>>>>>>>regards
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>can't really
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>determine the problem.
>>>>>>>>
>>>>>>>>If you are still getting the same error like this I am not
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>convinced
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>that NCBI
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>Query/Sbjct
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>>September.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>parsing,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>are you sure your logic is correct?
>>>>>>>>
>>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>>
>>>>>>>>
>>>>>>>>while (my $result = $search->next_result) {
>>>>>>>>  print $result->query_name, "\n";
>>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>>	print $hit->name, "\n";
>>>>>>>>      #iterate over each HSP in the hit
>>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hit_string, "\n";	
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>and me with a
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>>and it
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>seemed
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>that he hasn't
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>installed the latest fix.
>>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>2.2.13),
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>even though the returned output was from nr, the top of the  
>>>>>>>blast output showed that it was v2.2.12:
>>>>>>>
>>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>>
>>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>>-------------------------------------
>>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>>
>>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>>-------------------------------------
>>>>>>>
>>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>>the header looks like this:
>>>>>>>
>>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>>
>>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>>            
>>>>>>>
>>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>>          
>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>outputs a new
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>To clarify some stuff -
>>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>for BLAST
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>format and
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>and parse
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>if you have
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>a static binary on your machine =).  I think for
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>remoteblast the XML
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>>people use on a regular basis.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Does XML lack some specific info that text output has?
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Didn't know that.
>>>>>>I
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>>RemoteBlast.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-jason
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>>(1.5.1) or
>>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>
>>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>bioperl 1.5.1,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>realize it hasn't
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>been committed yet and may change.
>>>>>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>Of Hubert
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>>Prielinger
>>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>output
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>is that a bug......
>>>>>>>>>>>
>>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>>get anything.....
>>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>>
>>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>bioperl version
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>I had installed
>>>>>>>>>>>
>>>>>>>>>>>thanks in advance
>>>>>>>>>>>
>>>>>>>>>>>Hubert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>--
>>>>>>>>Jason Stajich
>>>>>>>>Duke University
>>>>>>>>http://www.duke.edu/~jes12
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Christopher Fields
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> 
>>>
>>>    
>>>
>>
>>
>>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>  
>>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> information.
>



From cjfields at uiuc.edu  Fri Feb 10 17:45:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 11:45:32 -0600
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
Message-ID: <002201c62e69$ca8363d0$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Friday, February 10, 2006 11:15 AM
> To: Paul.Boutros at utoronto.ca
> Cc: BioPerl Mailing List
> Subject: [Bioperl-l] Remote BLAST support discussion
> 
> Paul -
> 
> The reason for suggesting a change has to do with the 
> instability of the CGI interface/format of the returned data, 
> the text format is not a stable format from the webserver 
> which reportedly will cease to be reliably parsed.  Yes we 
> can keep hacking the blast parser code to handle this, but 
> the bioperl release cycle is certainly not tied to the NCBI 
> blast release cycle so I find it unsatisfying to know that we 
> are going to have broken code when they change the output 
> formats (but not know when).
> 
> Mostly I think we need to try and support something that will 
> "ALWAYS" work so that individuals setting up webservices 
> which rely on remote blast functionality.  In theory, 
> netblast/blastcl3 should always work since NCBI has to update 
> the exe when they change their server setup.
> 
> In terms of the web-based queues - I think the best change we 
> can make is have the XML be the preferred retrieval method.
> 
> I also see value in providing a wrapper for netblast since it 
> should look an awful lot like running blast locally.
> 
> Ideally I'd like to see a more extensible system, something 
> like (and please feel free to come up with better names for 
> the modules!):
> 
> Bio::Tools::Run::Blast
>   -->             StandAlone (support for both WU-BLAST and NCBI-> BLAST
local binaries and MPI-BLAST too if simple)
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that works
quite well, but may not provide all the same databases as what people expect
from NCBI)
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Sounds good to me.  I think any wrapper for netblast could most easily be
based on StandAloneBlast; the parameters look pretty much identical, though
it'll probably need a little configuring as a quick text search through
StandAloneBlast didn't show any 'xml' tags.  Roger seemed to agree on this.
 
> [note: If these ideas are appealing or not, someone should 
> archive the discussions and discussions on the wiki page so 
> we can rely less on people searching the mailing archives for 
> how a decision was made.  Perhaps Roger can do this sort of 
> editing in addition to the planning for support of this module].
> 
> -jason
> 
> On Feb 7, 2006, at 8:38 PM, Paul Boutros wrote:
> 
> > Hi Roger,
> >
> > I would definitely prefer a fully Perl-based implementation.  For 
> > starters, I have not been successful in compiling the Toolkit that 
> > contains netblast for some platforms (e.g.
> > AIX 5.2 w/gcc 4.0).
> >
> > I haven't been following the discussion: is there some compelling 
> > reason to prefer a netblast-based system that's come up 
> recently?  I'm 
> > guessing that adding a new non-perl dependency would only 
> be done if 
> > there was considerable justification for this type of 
> change, but I'm 
> > not clear from your message what that justification is.
> >
> > Paul
> >
> >
> >
> > ------------------------------
> >
> > Message: 12
> > Date: Mon, 6 Feb 2006 20:46:44 -0600
> > From: "Roger Hall" 
> > Subject: [Bioperl-l] RemoteBlast users - potentially major changes -
> >         please        reply
> > To: 
> > Message-ID: <002001c62b90$bb9dbe00$4301a8c0 at LIBERAL>
> > Content-Type: text/plain;        charset="us-ascii"
> >
> > To everyone who uses RemoteBlast.pm:
> >
> > Would anyone object to RemoteBlast being rewritten in a way that 
> > requires NCBI's blastcl3 executable?
> >
> > Binary downloads of blastcl3 (column "netblast") are available for 
> > numerous platforms at: http://ncbi.nih.gov/BLAST/download.shtml
> >
> > Does anyone require or desire a "pure perl" implementation? If so, 
> > please explain the advantage you see with such an implementation.
> >
> > Thanks!
> >
> >
> > Roger Hall
> >
> > Technical Director
> >
> > MidSouth Bioinformatics Center
> >
> > University of Arkansas at Little Rock
> >
> > (501) 569-8074
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  



From rahall2 at ualr.edu  Fri Feb 10 17:54:23 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 10 Feb 2006 11:54:23 -0600
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <002201c62e69$ca8363d0$15327e82@pyrimidine>
Message-ID: <002501c62e6b$0686be30$d416a790@LIBERAL>

It seems so obvious now. :}

The only issue I see is likely obvious to those of you who have maintained
this over the years - no backward compatibility, but I can live with that if
yall can.

I will document on wikki as suggested and then build the RemoteNCBI module
described. After that is tested and committed, I will contact Torsten to see
if I can help with the rest.

Thanks!

Roger 

> 
> Bio::Tools::Run::Blast
>   -->             StandAlone (support for both WU-BLAST and NCBI-> BLAST
local binaries and MPI-BLAST too if simple)
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that works
quite well, but may not provide all the same databases as what people expect
from NCBI)
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Sounds good to me.  I think any wrapper for netblast could most easily be
based on StandAloneBlast; the parameters look pretty much identical, though
it'll probably need a little configuring as a quick text search through
StandAloneBlast didn't show any 'xml' tags.  Roger seemed to agree on this.
 




From rahall2 at ualr.edu  Fri Feb 10 18:00:51 2006
From: rahall2 at ualr.edu (Roger Hall)
Date: Fri, 10 Feb 2006 12:00:51 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't
	work	parsing	blast	output
In-Reply-To: <43ECBEC7.7040506@gmx.at>
Message-ID: <002701c62e6b$edd845b0$d416a790@LIBERAL>

Hubert,

I got the same message when I first ran your script. The issue for me was
that "readdir(DIR)" doesn't return the full path, only the file name.

I edited your script to include:

	$file = $directory . '/' . $file;

just before the Bio::SearchIO call.

Roger


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger
Sent: Friday, February 10, 2006 10:27 AM
To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; rahall2 at ualr.edu
Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing blast
output

Hi,
I'm sorry for disturbing once more. Yesterday the script was working, 
today it isn't working at all, but I didn't change anything, I get the 
following error message:

------------- EXCEPTION  -------------
MSG: Could not open comp80swiss2114.txt: No such file or directory
STACK Bio::Root::IO::_initialize_io 
/usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
STACK toplevel ./Blast.pl:14

--------------------------------------

the file exists and the bug I have fixed yesterday
thanks for help

Hubert




Pieter Monsieurs wrote:

> Sorry for disturbing. I now works correctly with the bug fix of Chris. 
> Thanx,
> Pieter
>
> Pieter Monsieurs wrote:
>
>>Hi Chris,
>>
>>The parsing of the Blast output still doesn't work for me with the bug 
>>fix download of blast.pm.
>>The module keeps turning around in the while loop at line 487 looking 
>>for a database or query-size:
>>
>>while( defined ($_) ) {
>>	if( /^Database:/ ) {
>>		$self->_pushback($_);
>>		last;
>>	}
>>	chomp;               
>>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
>>		$size = $1;
>>		$size =~ s/,//g;
>>		last;
>>	} else {
>>		$q .= " $_";
>>		$q =~ s/ +/ /g;
>>		$q =~ s/^ | $//g;
>>	}
>>	$_ = $self->_readline;
>>}
>>
>>
>>The code keeps looking for the database information, however - as you 
>>mentioned - this information is given before the query line in the new 
>>Blast output format.
>>This way, all hits and hsps are stored in the query_description 
>>($hit->query_description), no hits are found and query_length is 0.
>>Because you already adapted the module to retrieve database information 
>>at another position in the module, deleting the while loop and adding 
>>the following lines after $_ = $self->_readline (line 486), worked fine 
>>for me (using blastn and blastp):
>>
>>if (/Length=([\d,]+)/) {
>>	$size = $1;
>>	$size =~ s/,//g;
>>}
>>
>>
>>Regards,
>>Pieter
>>
>>
>>
>>Chris Fields wrote:
>>
>>  
>>
>>>From 'perldoc Bio::SearchIO::blast':
>>>
>>>DESCRIPTION
>>>       This object encapsulated the necessary methods for generating  
>>>events
>>>       suitable for building Bio::Search objects from a BLAST report  
>>>file.
>>>       Read the Bio::SearchIO for more information about how to use  
>>>this.
>>>
>>>       This driver can parse:
>>>
>>>       o   NCBI produced plain text BLAST reports from blastall,  
>>>this also
>>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq  
>>>reports.  NCBI
>>>           XML BLAST output is parsed with the blastxml SearchIO driver
>>>
>>>       o   WU-BLAST all reports
>>>
>>>       o   Jim Kent's BLAST-like output from his programs (BLASTZ,  
>>>BLAT)
>>>
>>>       o   BLAST-like output from Paracel BTK output
>>>
>>>So, it should.  Let us know if it doesn't.
>>>
>>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
>>>
>>> 
>>>
>>>    
>>>
>>>>Hi Chris,
>>>>I'm incredibly sorry for causing so much inconvenience, yes you are  
>>>>right, I had only to change the blast.pm file, it is working very  
>>>>fine, thank you very much, and you are right, you have mentioned it  
>>>>ealier either to change the file... ;)
>>>>
>>>>but I have another question: does it work with the WU-Blast output  
>>>>too?
>>>>regards
>>>>Hubert
>>>>
>>>>
>>>>Chris Fields wrote:
>>>>
>>>>   
>>>>
>>>>      
>>>>
>>>>>Ha!  I come back from meeting and there's a billion emails!  What  
>>>>>have we
>>>>>started? ;p .  Sorry about this Jason; I know you're busy.
>>>>>
>>>>>Hubert, if you're out there, I sent you an email with an  
>>>>>attachment.  You
>>>>>said the output looks like what you were expecting.  So I think we  
>>>>>have two
>>>>>problems:
>>>>>
>>>>>1)  I haven't delved into the file scanning, but the fact that it  
>>>>>takes so
>>>>>long should tell you something's seriously wrong there.  Strip  
>>>>>that part out
>>>>>and start with a simple script, say, like the one Jason or that I  
>>>>>sent you;
>>>>>the script I used to generate that output works fine (on two OS's,  
>>>>>WinXP and
>>>>>Mac OS X).  Use it on one file at a time.  Do everything on  
>>>>>command line
>>>>>(not through Eclipse).  IDE's can be notoriously flaky about running
>>>>>scripts, esp. when they run debugging.
>>>>>2) Even if you have bioperl-1.5.1 installed, Bio::SearchIO::blast  
>>>>>will still
>>>>>not work whenever the text blast output has the following header,  
>>>>>which
>>>>>comes from the new web version of BLAST:
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search programs", Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>RID: 1139501210-857-165793005128.BLASTQ1
>>>>>
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         3,292,813 sequences; 1,128,164,434 total letters
>>>>>Query=  NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>Length=193
>>>>>.......
>>>>>-----------------------------------------------------
>>>>>
>>>>>It will work if the text output has the following header (or is an  
>>>>>older
>>>>>version of BLAST):
>>>>>
>>>>>-----------------------------------------------------
>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>
>>>>>
>>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
>>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
>>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of  
>>>>>protein database search
>>>>>programs",  Nucleic Acids Res. 25:3389-3402.
>>>>>
>>>>>Query= NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium
>>>>>tuberculosis H37Rv].
>>>>>       (193 letters)
>>>>>
>>>>>Database: All non-redundant GenBank CDS
>>>>>translations+PDB+SwissProt+PIR+PRF excluding environmental samples
>>>>>         2,895,325 sequences; 997,103,285 total letters
>>>>>-----------------------------------------------------
>>>>>You have the former (2.2.13) version.  I know b/c I have your  
>>>>>BLAST files.
>>>>>Therefore, even bioperl-1.5.1 will not work!
>>>>>
>>>>>If you want the really gory details on why this is a problem, look  
>>>>>here:
>>>>>
>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>>So, any text output with the above header will not work; it will  
>>>>>either hang
>>>>>or end abruptly (depending on OS, perl version, memory,  
>>>>>patience).  If you
>>>>>look in the above, I have added a preliminary fix for this.  I'll  
>>>>>reiterate
>>>>>for the billionth time, it hasn't been committed yet, so don't  
>>>>>kill me if
>>>>>blows your computer up ;>
>>>>>Here's the direct link:
>>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
>>>>>This is a modified version of Bio::SearchIO::blast.pm (it says  
>>>>>it's version
>>>>>1.90, but it's lying, I didn't change the version, only the regex;  
>>>>>sorry
>>>>>Jason).  From what you've been posting it doesn't sound like  
>>>>>you've tried
>>>>>this, and I believe I've suggested this fix before.
>>>>>
>>>>>Replace the one in your Bio/SearchIO directory (which looks like
>>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging from your  
>>>>>prev.
>>>>>message) with this file.  Make sure the filename stays the same  
>>>>>(blast.pm).
>>>>>
>>>>>Run everything again, one file at a time.  Make sure you use  
>>>>>Jason's script
>>>>>as well as the one I sent you.  Do NOT rely on running through  
>>>>>multiple
>>>>>files yet.  Fix one bug at a time.  And heed Joel's words about  
>>>>>file checks.
>>>>>
>>>>>
>>>>>Here's a small chunk of output from one of your blast files using the
>>>>>modifed script I sent you:
>>>>>
>>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
>>>>>Query:   1  RWKWKRKK  8
>>>>>Seq:     542  RWAWRRKK  549
>>>>>
>>>>>Look familiar?
>>>>>
>>>>>Christopher Fields
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>>>>-----Original Message-----
>>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday,  
>>>>>>February 09, 2006 3:24 PM
>>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
>>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>In other words, yes, I'm on the wrong trail. :}
>>>>>>
>>>>>>Sorry - I'll look at the output issue this evening (or realize  
>>>>>>that Chris already solved the issue).  ;}
>>>>>>
>>>>>>Thanks!
>>>>>>
>>>>>>Roger
>>>>>>
>>>>>>-----Original Message-----
>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert  
>>>>>>Prielinger
>>>>>>Sent: Thursday, February 09, 2006 2:14 PM
>>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris Fields; Jason  
>>>>>>Stajich
>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>parsing Blast output
>>>>>>
>>>>>>dear roger,
>>>>>>this error message I got, when I tried to parse Blast output  
>>>>>>(version
>>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, because I have  
>>>>>>a lot of Blast output files with version 2.2.13 and for that I  
>>>>>>don't get any error message.....it just doesn't work
>>>>>>
>>>>>>Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>>Roger Hall wrote:
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Guys - I'm looking at the error message:
>>>>>>>
>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>STACK toplevel
>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>Blast.pl:21
>>>>>>>
>>>>>>>This is my line of thought:
>>>>>>>1. "no data for midline $_" is a unique message generated by
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>blast.pm
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>in
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>one
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>location only at the point of a. reading three lines b.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>dropping lines
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>with spaces only c. identifying the Query, Midline, and
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Match lines (0
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>><= $i <
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>3)
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>2. There is a regexp match that fails in order to reach that
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>error message
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>expression
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>4. It does anyway
>>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>in the blast
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>reports
>>>>>>>
>>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>the string
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>additional
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>file, assuming that I didn't have it.
>>>>>>>
>>>>>>>My next thought is to write a quick script to test perl behavior  
>>>>>>>on "Fedora Core 9".
>>>>>>>
>>>>>>>Thoughts?
>>>>>>>
>>>>>>>Did I misread the issue entirely? :}
>>>>>>>
>>>>>>>Roger
>>>>>>>
>>>>>>>
>>>>>>>-----Original Message-----
>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Chris Fields
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
>>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
>>>>>>>Cc: bioperl-l at bioperl.org
>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>parsing Blast output
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-----Original Message-----
>>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
>>>>>>>>To: Hubert Prielinger
>>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
>>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work  
>>>>>>>>parsing Blast output
>>>>>>>>
>>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hi chris,
>>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>working,
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>do you have any ohter idea, the problem I have is that I
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>have to parse
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>a lot of textfiles....
>>>>>>>>>or shall I look for another option to parse those files...
>>>>>>>>>
>>>>>>>>>regards
>>>>>>>>>Hubert
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
>>>>>>>>2.2.13 reports but unless you post your blast report we
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>can't really
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>determine the problem.
>>>>>>>>
>>>>>>>>If you are still getting the same error like this I am not
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>convinced
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>that NCBI
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>changed the HSP result format to remove the ':' from the
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>Query/Sbjct
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>prefixes.  We fixed this as soon as it was apparent sometime in  
>>>>>>>>September.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>If you are just getting no results but also no warnings wrt
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>parsing,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>are you sure your logic is correct?
>>>>>>>>
>>>>>>>>If you remove your filters do you see all the HSPS?
>>>>>>>>
>>>>>>>>
>>>>>>>>while (my $result = $search->next_result) {
>>>>>>>>  print $result->query_name, "\n";
>>>>>>>>  #iterate over each hit on the query sequence
>>>>>>>>  while (my $hit = $result->next_hit) {
>>>>>>>>	print $hit->name, "\n";
>>>>>>>>      #iterate over each HSP in the hit
>>>>>>>>      while (my $hsp = $hit->next_hsp) {
>>>>>>>>	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>hit_string, "\n";	
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>     }
>>>>>>>> }
>>>>>>>>}
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>I tested some of the BLAST results that Hubert sent Roger
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>and me with a
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>similar script to the above.  I removed the file parsing logic  
>>>>>>>and it
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>seemed
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>to work just fine.  It may very well be a logic issue or
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>that he hasn't
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>installed the latest fix.
>>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>2.2.13),
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>even though the returned output was from nr, the top of the  
>>>>>>>blast output showed that it was v2.2.12:
>>>>>>>
>>>>>>>BLASTP 2.2.12 [Aug-07-2005]
>>>>>>>
>>>>>>>I double-checked my local version and it's definitely v.2.2.13:
>>>>>>>-------------------------------------
>>>>>>>C:\Perl\Scripts>blastcl3 -
>>>>>>>
>>>>>>>blastcl3 2.2.13   arguments:...
>>>>>>>-------------------------------------
>>>>>>>
>>>>>>>If you use RemoteBlast using the same settings, the version in  
>>>>>>>the header looks like this:
>>>>>>>
>>>>>>>BLASTP 2.2.13 [Nov-27-2005]
>>>>>>>
>>>>>>>I'm wondering if all the blast executables (blast and netblast)  
>>>>>>>            
>>>>>>>
>>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
>>>>>>          
>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>outputs a new
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>To clarify some stuff -
>>>>>>>>Chris I don't necessarily think the XML is best way forward
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>for BLAST
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>reports generated locally, it isn't as detailed as the Text
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>format and
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>it is what most people expect to be able to scroll through
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>and parse
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>-- it is also harder for the format to change dramatically        
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>if you have
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>a static binary on your machine =).  I think for
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>remoteblast the XML
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>format should be the way forward but I expect Bioperl to  
>>>>>>>>maintain support of any plain text BLAST report format that  
>>>>>>>>people use on a regular basis.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Does XML lack some specific info that text output has?
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>Didn't know that.
>>>>>>I
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>believe that XML should be default in RemoteBlast since it will  
>>>>>>>not break, but I agree with you about text output.  I also agree  
>>>>>>>that it will need somebody to maintain it constantly, much like  
>>>>>>>RemoteBlast.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>>>-jason
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>Chris Fields wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>>>My guess is you're running into text parsing problems in  
>>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>>>>>>>(1.5.1) or
>>>>>>>>>>bioperl-live (CVS), then see the bug below.
>>>>>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>
>>>>>>>>>>I think the first problem you ran into is solved in
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>bioperl 1.5.1,
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>the last problem (more recent, not related to the first) has  
>>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
>>>>>>>>>>The fixed SearchIO::blast is available in the link above, but
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>realize it hasn't
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>been committed yet and may change.
>>>>>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry  
>>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
>>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>Of Hubert
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>>>>>>>Prielinger
>>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>output
>>>>>>>>>>>
>>>>>>>>>>>Hi,
>>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with  
>>>>>>>>>>>Bio::SearchIO, I get the following error message:
>>>>>>>>>>>
>>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
>>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
>>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>>>>>>>STACK toplevel
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/ 
>>>>>>>>Blast.pl:21
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>is that a bug......
>>>>>>>>>>>
>>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), I don't  
>>>>>>>>>>>get anything.....
>>>>>>>>>>>I'm using bioperl 1.4
>>>>>>>>>>>
>>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>parsing Blast
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>Output (version 2.2.12), but I don't remember which
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>bioperl version
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>>>>>I had installed
>>>>>>>>>>>
>>>>>>>>>>>thanks in advance
>>>>>>>>>>>
>>>>>>>>>>>Hubert
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>_______________________________________________
>>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                 
>>>>>>>>>>>
>>>>>>>>>>>                    
>>>>>>>>>>>
>>>>>>>>>>               
>>>>>>>>>>
>>>>>>>>>>                  
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>             
>>>>>>>>>
>>>>>>>>>                
>>>>>>>>>
>>>>>>>>--
>>>>>>>>Jason Stajich
>>>>>>>>Duke University
>>>>>>>>http://www.duke.edu/~jes12
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>           
>>>>>>>>
>>>>>>>>              
>>>>>>>>
>>>>>>>Christopher Fields
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>_______________________________________________
>>>>>>>Bioperl-l mailing list
>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>         
>>>>>>>
>>>>>>>            
>>>>>>>
>>>>>>_______________________________________________
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>       
>>>>>>
>>>>>>          
>>>>>>
>>>>>     
>>>>>
>>>>>        
>>>>>
>>>Christopher Fields
>>>Postdoctoral Researcher
>>>Lab of Dr. Robert Switzer
>>>Dept of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>>_______________________________________________
>>>Bioperl-l mailing list
>>>Bioperl-l at lists.open-bio.org
>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> 
>>>
>>>    
>>>
>>
>>
>>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>>
>>_______________________________________________
>>Bioperl-l mailing list
>>Bioperl-l at lists.open-bio.org
>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>  
>>
>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> information.
>

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Fri Feb 10 18:08:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 10 Feb 2006 12:08:37 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't
	work	parsing	blast	output
In-Reply-To: <002701c62e6b$edd845b0$d416a790@LIBERAL>
Message-ID: <002501c62e6d$04158530$15327e82@pyrimidine>

Makes sense.  I didn't see this since I passed the files directly from
command-line.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign  

> -----Original Message-----
> From: Roger Hall [mailto:rahall2 at ualr.edu] 
> Sent: Friday, February 10, 2006 12:01 PM
> To: 'Hubert Prielinger'; 'Pieter Monsieurs'; 
> bioperl-l at bioperl.org; 'Chris Fields'
> Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing blast output
> 
> Hubert,
> 
> I got the same message when I first ran your script. The 
> issue for me was that "readdir(DIR)" doesn't return the full 
> path, only the file name.
> 
> I edited your script to include:
> 
> 	$file = $directory . '/' . $file;
> 
> just before the Bio::SearchIO call.
> 
> Roger
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Hubert Prielinger
> Sent: Friday, February 10, 2006 10:27 AM
> To: Pieter Monsieurs; bioperl-l at bioperl.org; Chris Fields; 
> rahall2 at ualr.edu
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> parsing blast output
> 
> Hi,
> I'm sorry for disturbing once more. Yesterday the script was 
> working, today it isn't working at all, but I didn't change 
> anything, I get the following error message:
> 
> ------------- EXCEPTION  -------------
> MSG: Could not open comp80swiss2114.txt: No such file or 
> directory STACK Bio::Root::IO::_initialize_io
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:273
> STACK Bio::Root::IO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new 
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO.pm:167
> STACK toplevel ./Blast.pl:14
> 
> --------------------------------------
> 
> the file exists and the bug I have fixed yesterday thanks for help
> 
> Hubert
> 
> 
> 
> 
> Pieter Monsieurs wrote:
> 
> > Sorry for disturbing. I now works correctly with the bug 
> fix of Chris. 
> > Thanx,
> > Pieter
> >
> > Pieter Monsieurs wrote:
> >
> >>Hi Chris,
> >>
> >>The parsing of the Blast output still doesn't work for me 
> with the bug 
> >>fix download of blast.pm.
> >>The module keeps turning around in the while loop at line 
> 487 looking 
> >>for a database or query-size:
> >>
> >>while( defined ($_) ) {
> >>	if( /^Database:/ ) {
> >>		$self->_pushback($_);
> >>		last;
> >>	}
> >>	chomp;               
> >>	if( /\((\-?[\d,]+)\s+letters.*\)/ || /^Length=(\-?[\d,]+)/ ) {
> >>		$size = $1;
> >>		$size =~ s/,//g;
> >>		last;
> >>	} else {
> >>		$q .= " $_";
> >>		$q =~ s/ +/ /g;
> >>		$q =~ s/^ | $//g;
> >>	}
> >>	$_ = $self->_readline;
> >>}
> >>
> >>
> >>The code keeps looking for the database information, 
> however - as you 
> >>mentioned - this information is given before the query line 
> in the new 
> >>Blast output format.
> >>This way, all hits and hsps are stored in the query_description 
> >>($hit->query_description), no hits are found and query_length is 0.
> >>Because you already adapted the module to retrieve database 
> >>information at another position in the module, deleting the 
> while loop 
> >>and adding the following lines after $_ = $self->_readline 
> (line 486), 
> >>worked fine for me (using blastn and blastp):
> >>
> >>if (/Length=([\d,]+)/) {
> >>	$size = $1;
> >>	$size =~ s/,//g;
> >>}
> >>
> >>
> >>Regards,
> >>Pieter
> >>
> >>
> >>
> >>Chris Fields wrote:
> >>
> >>  
> >>
> >>>From 'perldoc Bio::SearchIO::blast':
> >>>
> >>>DESCRIPTION
> >>>       This object encapsulated the necessary methods for 
> generating 
> >>>events
> >>>       suitable for building Bio::Search objects from a 
> BLAST report 
> >>>file.
> >>>       Read the Bio::SearchIO for more information about 
> how to use 
> >>>this.
> >>>
> >>>       This driver can parse:
> >>>
> >>>       o   NCBI produced plain text BLAST reports from blastall,  
> >>>this also
> >>>           includes PSIBLAST, PSITBLASTN, RPSBLAST, and bl2seq 
> >>>reports.  NCBI
> >>>           XML BLAST output is parsed with the blastxml SearchIO 
> >>>driver
> >>>
> >>>       o   WU-BLAST all reports
> >>>
> >>>       o   Jim Kent's BLAST-like output from his programs 
> (BLASTZ,  
> >>>BLAT)
> >>>
> >>>       o   BLAST-like output from Paracel BTK output
> >>>
> >>>So, it should.  Let us know if it doesn't.
> >>>
> >>>On Feb 9, 2006, at 4:20 PM, Hubert Prielinger wrote:
> >>>
> >>> 
> >>>
> >>>    
> >>>
> >>>>Hi Chris,
> >>>>I'm incredibly sorry for causing so much inconvenience, 
> yes you are 
> >>>>right, I had only to change the blast.pm file, it is working very 
> >>>>fine, thank you very much, and you are right, you have 
> mentioned it 
> >>>>ealier either to change the file... ;)
> >>>>
> >>>>but I have another question: does it work with the 
> WU-Blast output 
> >>>>too?
> >>>>regards
> >>>>Hubert
> >>>>
> >>>>
> >>>>Chris Fields wrote:
> >>>>
> >>>>   
> >>>>
> >>>>      
> >>>>
> >>>>>Ha!  I come back from meeting and there's a billion 
> emails!  What 
> >>>>>have we started? ;p .  Sorry about this Jason; I know 
> you're busy.
> >>>>>
> >>>>>Hubert, if you're out there, I sent you an email with an 
> >>>>>attachment.  You said the output looks like what you were 
> >>>>>expecting.  So I think we have two
> >>>>>problems:
> >>>>>
> >>>>>1)  I haven't delved into the file scanning, but the 
> fact that it 
> >>>>>takes so long should tell you something's seriously 
> wrong there.  
> >>>>>Strip that part out and start with a simple script, say, 
> like the 
> >>>>>one Jason or that I sent you; the script I used to generate that 
> >>>>>output works fine (on two OS's, WinXP and Mac OS X).  
> Use it on one 
> >>>>>file at a time.  Do everything on command line (not through 
> >>>>>Eclipse).  IDE's can be notoriously flaky about running scripts, 
> >>>>>esp. when they run debugging.
> >>>>>2) Even if you have bioperl-1.5.1 installed, 
> Bio::SearchIO::blast 
> >>>>>will still not work whenever the text blast output has the 
> >>>>>following header, which comes from the new web version of BLAST:
> >>>>>
> >>>>>-----------------------------------------------------
> >>>>>BLASTP 2.2.13 [Nov-27-2005]
> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> >>>>>Sch??ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and 
> David J.  
> >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of 
> >>>>>protein database search programs", Nucleic Acids Res. 
> 25:3389-3402.
> >>>>>
> >>>>>RID: 1139501210-857-165793005128.BLASTQ1
> >>>>>
> >>>>>
> >>>>>Database: All non-redundant GenBank CDS
> >>>>>translations+PDB+SwissProt+PIR+PRF excluding 
> environmental samples
> >>>>>         3,292,813 sequences; 1,128,164,434 total 
> letters Query=  
> >>>>>NP_215895 pyrimidine regulatory protein PyrR [Mycobacterium 
> >>>>>tuberculosis H37Rv].
> >>>>>Length=193
> >>>>>.......
> >>>>>-----------------------------------------------------
> >>>>>
> >>>>>It will work if the text output has the following header 
> (or is an 
> >>>>>older version of BLAST):
> >>>>>
> >>>>>-----------------------------------------------------
> >>>>>BLASTP 2.2.12 [Aug-07-2005]
> >>>>>
> >>>>>
> >>>>>Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> >>>>>Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.  
> >>>>>Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of 
> >>>>>protein database search programs",  Nucleic Acids Res. 
> >>>>>25:3389-3402.
> >>>>>
> >>>>>Query= NP_215895 pyrimidine regulatory protein PyrR 
> [Mycobacterium 
> >>>>>tuberculosis H37Rv].
> >>>>>       (193 letters)
> >>>>>
> >>>>>Database: All non-redundant GenBank CDS
> >>>>>translations+PDB+SwissProt+PIR+PRF excluding 
> environmental samples
> >>>>>         2,895,325 sequences; 997,103,285 total letters
> >>>>>-----------------------------------------------------
> >>>>>You have the former (2.2.13) version.  I know b/c I have 
> your BLAST 
> >>>>>files.
> >>>>>Therefore, even bioperl-1.5.1 will not work!
> >>>>>
> >>>>>If you want the really gory details on why this is a 
> problem, look
> >>>>>here:
> >>>>>
> >>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>
> >>>>>So, any text output with the above header will not work; it will 
> >>>>>either hang or end abruptly (depending on OS, perl 
> version, memory, 
> >>>>>patience).  If you look in the above, I have added a preliminary 
> >>>>>fix for this.  I'll reiterate for the billionth time, it hasn't 
> >>>>>been committed yet, so don't kill me if blows your 
> computer up ;> 
> >>>>>Here's the direct link:
> >>>>>http://bugzilla.bioperl.org/attachment.cgi?id=267&action=view
> >>>>>This is a modified version of Bio::SearchIO::blast.pm 
> (it says it's 
> >>>>>version 1.90, but it's lying, I didn't change the 
> version, only the 
> >>>>>regex; sorry Jason).  From what you've been posting it doesn't 
> >>>>>sound like you've tried this, and I believe I've 
> suggested this fix 
> >>>>>before.
> >>>>>
> >>>>>Replace the one in your Bio/SearchIO directory (which looks like 
> >>>>>'/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/', judging 
> from your 
> >>>>>prev.
> >>>>>message) with this file.  Make sure the filename stays the same 
> >>>>>(blast.pm).
> >>>>>
> >>>>>Run everything again, one file at a time.  Make sure you use 
> >>>>>Jason's script as well as the one I sent you.  Do NOT rely on 
> >>>>>running through multiple files yet.  Fix one bug at a time.  And 
> >>>>>heed Joel's words about file checks.
> >>>>>
> >>>>>
> >>>>>Here's a small chunk of output from one of your blast 
> files using 
> >>>>>the modifed script I sent you:
> >>>>>
> >>>>>sp|Q10264|PSO2_SCHPO-->DNA cross-link repair protein pso2/snm1
> >>>>>Query:   1  RWKWKRKK  8
> >>>>>Seq:     542  RWAWRRKK  549
> >>>>>
> >>>>>Look familiar?
> >>>>>
> >>>>>Christopher Fields
> >>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>University of Illinois Urbana-Champaign
> >>>>>
> >>>>>     
> >>>>>
> >>>>>        
> >>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: Roger Hall [mailto:rahall2 at ualr.edu] Sent: Thursday, 
> >>>>>>February 09, 2006 3:24 PM
> >>>>>>To: 'Hubert Prielinger'; 'Chris Fields'; 'Jason Stajich'
> >>>>>>Subject: RE: [Bioperl-l] bioperl 1.4 SearchIO doesn't 
> work parsing 
> >>>>>>Blast output
> >>>>>>
> >>>>>>In other words, yes, I'm on the wrong trail. :}
> >>>>>>
> >>>>>>Sorry - I'll look at the output issue this evening (or realize 
> >>>>>>that Chris already solved the issue).  ;}
> >>>>>>
> >>>>>>Thanks!
> >>>>>>
> >>>>>>Roger
> >>>>>>
> >>>>>>-----Original Message-----
> >>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf 
> Of Hubert 
> >>>>>>Prielinger
> >>>>>>Sent: Thursday, February 09, 2006 2:14 PM
> >>>>>>To: rahall2 at ualr.edu; bioperl-l at bioperl.org; Chris 
> Fields; Jason 
> >>>>>>Stajich
> >>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't 
> work parsing 
> >>>>>>Blast output
> >>>>>>
> >>>>>>dear roger,
> >>>>>>this error message I got, when I tried to parse Blast output 
> >>>>>>(version
> >>>>>>2.2.12) with bioperl 1.5.1, but it doesn't matter, 
> because I have 
> >>>>>>a lot of Blast output files with version 2.2.13 and for that I 
> >>>>>>don't get any error message.....it just doesn't work
> >>>>>>
> >>>>>>Hubert
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>Roger Hall wrote:
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>Guys - I'm looking at the error message:
> >>>>>>>
> >>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>STACK toplevel
> >>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>Blast.pl:21
> >>>>>>>
> >>>>>>>This is my line of thought:
> >>>>>>>1. "no data for midline $_" is a unique message generated by
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>blast.pm
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>in
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>one
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>location only at the point of a. reading three lines b.
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>dropping lines
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>with spaces only c. identifying the Query, Midline, and
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Match lines (0
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>><= $i <
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>3)
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>2. There is a regexp match that fails in order to reach that
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>error message
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>3. The $_ value "Query  1   WWWKWRW  7" should not fail the
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>expression
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>4. It does anyway
> >>>>>>>5. I cannot find the value "Query  1   WWWKWRW  7" anywhere
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>in the blast
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>reports
> >>>>>>>
> >>>>>>>I suspect a newline/chomp/metacharacter issue. Not finding
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>the string
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>anywhere has me thoroughly confused - I asked Hubert for the
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>additional
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>file, assuming that I didn't have it.
> >>>>>>>
> >>>>>>>My next thought is to write a quick script to test 
> perl behavior 
> >>>>>>>on "Fedora Core 9".
> >>>>>>>
> >>>>>>>Thoughts?
> >>>>>>>
> >>>>>>>Did I misread the issue entirely? :}
> >>>>>>>
> >>>>>>>Roger
> >>>>>>>
> >>>>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Chris Fields
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>Sent: Thursday, February 09, 2006 10:16 AM
> >>>>>>>To: 'Jason Stajich'; 'Hubert Prielinger'
> >>>>>>>Cc: bioperl-l at bioperl.org
> >>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> >>>>>>>parsing Blast output
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>-----Original Message-----
> >>>>>>>>From: Jason Stajich [mailto:jason.stajich at duke.edu]
> >>>>>>>>Sent: Thursday, February 09, 2006 9:13 AM
> >>>>>>>>To: Hubert Prielinger
> >>>>>>>>Cc: Chris Fields; bioperl-l at bioperl.org
> >>>>>>>>Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
> >>>>>>>>parsing Blast output
> >>>>>>>>
> >>>>>>>>On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>hi chris,
> >>>>>>>>>thanks, I have upgraded to version 1.5.1 but it isn't still
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>working,
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>do you have any ohter idea, the problem I have is that I
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>have to parse
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>a lot of textfiles....
> >>>>>>>>>or shall I look for another option to parse those files...
> >>>>>>>>>
> >>>>>>>>>regards
> >>>>>>>>>Hubert
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>The code from Bioperl 1.5.1 works fine for me for blast
> >>>>>>>>2.2.13 reports but unless you post your blast report we
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>can't really
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>determine the problem.
> >>>>>>>>
> >>>>>>>>If you are still getting the same error like this I am not
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>convinced
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>you have upgraded to 1.5.1 which includes a fix in the fact
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>that NCBI
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>changed the HSP result format to remove the ':' from the
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>Query/Sbjct
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>prefixes.  We fixed this as soon as it was apparent 
> sometime in 
> >>>>>>>>September.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>>>>>STACK toplevel
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>>Blast.pl:21
> >>>>>>>>
> >>>>>>>>If you are just getting no results but also no warnings wrt
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>parsing,
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>are you sure your logic is correct?
> >>>>>>>>
> >>>>>>>>If you remove your filters do you see all the HSPS?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>while (my $result = $search->next_result) {
> >>>>>>>>  print $result->query_name, "\n";
> >>>>>>>>  #iterate over each hit on the query sequence
> >>>>>>>>  while (my $hit = $result->next_hit) {
> >>>>>>>>	print $hit->name, "\n";
> >>>>>>>>      #iterate over each HSP in the hit
> >>>>>>>>      while (my $hsp = $hit->next_hsp) {
> >>>>>>>>	 print $hsp->evalue, " ", 
> $hsp->length('sbjct'), " ", $hsp-
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>hit_string, "\n";	
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>     }
> >>>>>>>> }
> >>>>>>>>}
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>I tested some of the BLAST results that Hubert sent Roger
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>and me with a
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>similar script to the above.  I removed the file parsing logic 
> >>>>>>>and it
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>seemed
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>to work just fine.  It may very well be a logic issue or
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>that he hasn't
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>installed the latest fix.
> >>>>>>> It's a funny thing, though.  When I tried using blastcl3 (v.
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>2.2.13),
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>even though the returned output was from nr, the top 
> of the blast 
> >>>>>>>output showed that it was v2.2.12:
> >>>>>>>
> >>>>>>>BLASTP 2.2.12 [Aug-07-2005]
> >>>>>>>
> >>>>>>>I double-checked my local version and it's definitely v.2.2.13:
> >>>>>>>-------------------------------------
> >>>>>>>C:\Perl\Scripts>blastcl3 -
> >>>>>>>
> >>>>>>>blastcl3 2.2.13   arguments:...
> >>>>>>>-------------------------------------
> >>>>>>>
> >>>>>>>If you use RemoteBlast using the same settings, the version in 
> >>>>>>>the header looks like this:
> >>>>>>>
> >>>>>>>BLASTP 2.2.13 [Nov-27-2005]
> >>>>>>>
> >>>>>>>I'm wondering if all the blast executables (blast and netblast)
> >>>>>>>            
> >>>>>>>
> >>>>>>>from NCBI have text output like v.2.2.12, while the wwwblast
> >>>>>>          
> >>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>outputs a new
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>format (2.2.13).  I'll ask blast-help at NCBI about this.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>To clarify some stuff -
> >>>>>>>>Chris I don't necessarily think the XML is best way forward
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>for BLAST
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>reports generated locally, it isn't as detailed as the Text
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>format and
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>it is what most people expect to be able to scroll through
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>and parse
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>-- it is also harder for the format to change 
> dramatically        
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>if you have
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>a static binary on your machine =).  I think for
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>remoteblast the XML
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>format should be the way forward but I expect Bioperl to 
> >>>>>>>>maintain support of any plain text BLAST report format that 
> >>>>>>>>people use on a regular basis.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>Does XML lack some specific info that text output has?
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>Didn't know that.
> >>>>>>I
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>believe that XML should be default in RemoteBlast 
> since it will 
> >>>>>>>not break, but I agree with you about text output.  I 
> also agree 
> >>>>>>>that it will need somebody to maintain it constantly, 
> much like 
> >>>>>>>RemoteBlast.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>>>-jason
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>Chris Fields wrote:
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>>>My guess is you're running into text parsing problems in 
> >>>>>>>>>>Bio::SearchIO::blast.  Upgrade to the latest 
> developer version
> >>>>>>>>>>(1.5.1) or
> >>>>>>>>>>bioperl-live (CVS), then see the bug below.
> >>>>>>>>>>
> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>>
> >>>>>>>>>>I think the first problem you ran into is solved in
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>bioperl 1.5.1,
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>>>the last problem (more recent, not related to the 
> first) has  
> >>>>>>>>>>been fixed but hasn't been committed to bioperl-live yet.   
> >>>>>>>>>>The fixed SearchIO::blast is available in the link 
> above, but
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>realize it hasn't
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>been committed yet and may change.
> >>>>>>>>>>
> >>>>>>>>>>Christopher Fields
> >>>>>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org
> >>>>>>>>>>>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>Of Hubert
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>>>>>>>Prielinger
> >>>>>>>>>>>Sent: Wednesday, February 08, 2006 2:52 PM
> >>>>>>>>>>>To: bioperl-l at bioperl.org
> >>>>>>>>>>>Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>parsing Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>output
> >>>>>>>>>>>
> >>>>>>>>>>>Hi,
> >>>>>>>>>>>If I want to parse a Blast Output (Version 2.2.12) with 
> >>>>>>>>>>>Bio::SearchIO, I get the following error message:
> >>>>>>>>>>>
> >>>>>>>>>>>MSG: no data for midline Query  1   WWWKWRW  7
> >>>>>>>>>>>STACK Bio::SearchIO::blast::next_result
> >>>>>>>>>>>/usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> >>>>>>>>>>>STACK toplevel
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>/home/Hubert/installed/eclipse/workspace/Database_Search/
> >>>>>>>>Blast.pl:21
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>is that a bug......
> >>>>>>>>>>>
> >>>>>>>>>>>If I want to parse Blast Output (version 2.2.13), 
> I don't get 
> >>>>>>>>>>>anything.....
> >>>>>>>>>>>I'm using bioperl 1.4
> >>>>>>>>>>>
> >>>>>>>>>>>before, I have installed bioperl 1.4, it worked fine
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>parsing Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>Output (version 2.2.12), but I don't remember which
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>bioperl version
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>>>>>I had installed
> >>>>>>>>>>>
> >>>>>>>>>>>thanks in advance
> >>>>>>>>>>>
> >>>>>>>>>>>Hubert
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>_______________________________________________
> >>>>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>                 
> >>>>>>>>>>>
> >>>>>>>>>>>                    
> >>>>>>>>>>>
> >>>>>>>>>>               
> >>>>>>>>>>
> >>>>>>>>>>                  
> >>>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>             
> >>>>>>>>>
> >>>>>>>>>                
> >>>>>>>>>
> >>>>>>>>--
> >>>>>>>>Jason Stajich
> >>>>>>>>Duke University
> >>>>>>>>http://www.duke.edu/~jes12
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>           
> >>>>>>>>
> >>>>>>>>              
> >>>>>>>>
> >>>>>>>Christopher Fields
> >>>>>>>Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
> >>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>
> >>>>>>>_______________________________________________
> >>>>>>>Bioperl-l mailing list
> >>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>         
> >>>>>>>
> >>>>>>>            
> >>>>>>>
> >>>>>>_______________________________________________
> >>>>>>Bioperl-l mailing list
> >>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>       
> >>>>>>
> >>>>>>          
> >>>>>>
> >>>>>     
> >>>>>
> >>>>>        
> >>>>>
> >>>Christopher Fields
> >>>Postdoctoral Researcher
> >>>Lab of Dr. Robert Switzer
> >>>Dept of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>
> >>>
> >>>_______________________________________________
> >>>Bioperl-l mailing list
> >>>Bioperl-l at lists.open-bio.org
> >>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>> 
> >>>
> >>>    
> >>>
> >>
> >>
> >>Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >>
> >>_______________________________________________
> >>Bioperl-l mailing list
> >>Bioperl-l at lists.open-bio.org
> >>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>  
> >>
> >
> >
> > Disclaimer: 
> http://www.kuleuven.be/cwis/email_disclaimer.htm for more 
> > information.
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From victor.ruotti at gmail.com  Fri Feb 10 20:09:16 2006
From: victor.ruotti at gmail.com (Victor)
Date: Fri, 10 Feb 2006 14:09:16 -0600
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: 
References: 
	
Message-ID: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>

Hi Jason,
Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to
/usr/local/bin, I get the same problem. I think this might have to do with
the _run internal method/sub. If you look at that subroutine, you'll see
that it is using both $self->executable and $self->program_name. The test
passes fine, but we might need to write a better test for this particular
case.

Instead of saying:
     my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
I think the author meant to say:
     my $str=
Bio::Root::IO->catfile($self->program_dir,$self->program_name);

I quickly used Data::Dumper on both executate and program_name and this is
what I get:
$VAR1 = 'blat';
$VAR1 = 'blat';

So the path is hardcoded to be /usr/local/bin/blat/blat when calling run
though factory.

I'd like to change the constructor a bit to deal with the params a little
better and include a config file using
Config::General. Also, I noticed that there is a another Blat.pm module, a
parser module. Should we integrate this parser with the blat run module?

Brian/Jason. Does that sound like a good idea?

Victor


On 2/10/06, Jason Stajich  wrote:
>
> brian -   just FYI -
>
> The AUTOLOAD stuff is present a great number of the run modules so  this
> is standard per se in that set.
>
> I think Victor's problem may have been the BLATDIR env variable pointing
> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor?
>
> tests passed for me before I did the 1.5.1 release for  this module so it
> basically works.   It definitely needs a carekeeper as lot of these run
> modules were built during the fugu group annotation project and never got
> audited/re-vised after that.
>
>
> -jason
> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>
> Victor,
>
> Fantastic, this is certainly a module in need, in fact there was already a
> note on this in the Wiki, I'll update it:
>
> http://bioperl.open-bio.org/wiki/Orphan_modules
>
> So all I did was:
>
> >cd bioperl-run
> >perl ?I. -w t/Blat.t
>
> This is the most recent bioperl-run, the live version, and all tests
> passed. I'd downloaded the most recent binaries and put them in my
> /usr/local/bin, already in my PATH. That's it.
>
> That's the saddest looking new() I've ever seen in Bioperl, a mixture of
> named and unnamed parameters like that, how bizarre. The "proper" way, of
> course, is to use _rearrange, and not use AUTOLOAD.
>
> Thanks again,
>
> Brian O.
>
>
> On 2/10/06 11:02 AM, "Victor"  wrote:
>
> Brian,
> I'd be happy to do that. Can you send me a quick snap on how you got it to
> work first. I'd like to see what is working first, before I start fixing
> things.
>
> And yes I'll take a look at the Blat.t to see more on it.
>
> Victor
>
>
> On 2/9/06, *Brian Osborne*  wrote:
>
> Victor,
>
> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is
> working for me even though I haven't set BLATDIR. This is using the latest
> blat, v. 33.
>
> There is a problem here though, you can see it if you read Blat.t. The
> constructor does not look like your usual new():
>
> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>
> -verbose => $verbose,
>                             "DB"     => $db);
>
> Unfortunate - would you be willing to do more than add a useful SYNOPSIS
> and
> actually fix new()? There is a subtext here, we're trying to find people
> who
> would be willing to maintain useful modules like these, the ideal person
> in
> this case would be someone who'd regularly use the module.
>
> Brian O.
>
>
> On 2/9/06 6:22 PM, "Victor"  wrote:
>
> > Hi,
> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to
> date
> > in the lastest bioperl release?
> >
> >
> >
> > use Bio::Tools::Run::Alignment::Blat;
> > my $factory = Bio::Tools::Run::Alignment::Blat->new();
> > my $seq =
> > "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";
> >
> > my @feats = $factory->run( $seq);
> >
> > Here is what I get when tring to use it:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
> > /tmp/fB09bp5F76) crashed: -1
> >
> > Notice that it is using "blat' twice in the path. The way that I fixed
> this
> > is by going to the blat.pm    module and
> changing the following lines:
> > #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
> > my $str= Bio::Root::IO->catfile($self->program_name);
> >
> > Any ideas, maybe I'm missing the $ENV variable somewhere?
> > I'd like to avoid making this change. Also does anyone have a known
> synopsis
> > of this blat module (where to set the parameters, and whether it allows
> you
> > to have a config file).
> > I'll be happy to add a better synopsis to the module if needed.
> >
> > Thanks in advance,
> > Victor
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>
>
>
>
>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12 
>
>
>



From jason.stajich at duke.edu  Fri Feb 10 20:36:04 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 10 Feb 2006 15:36:04 -0500
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
References: 
	
	<36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
Message-ID: <7F520AFA-84C9-485B-A408-7A9DEFC1186E@duke.edu>


On Feb 10, 2006, at 3:09 PM, Victor wrote:

> Hi Jason,
> Well, in my env. BLATDIR was not setup at all. When setting BLATDIR  
> to /usr/local/bin, I get the same problem. I think this might have  
> to do with the _run internal method/sub. If you look at that  
> subroutine, you'll see that it is using both $self->executable and  
> $self->program_name. The test passes fine, but we might need to  
> write a better test for this particular case.
>
> Instead of saying:
>      my $str= Bio::Root::IO->catfile($self->executable,$self- 
> >program_name);
> I think the author meant to say:
>      my $str= Bio::Root::IO->catfile($self->program_dir,$self- 
> >program_name);
>
> I quickly used Data::Dumper on both executate and program_name and  
> this is what I get:
> $VAR1 = 'blat';
> $VAR1 = 'blat';
>
> So the path is hardcoded to be /usr/local/bin/blat/blat when  
> calling run though factory.
>
Hmm are you sure you are looking at the 1.5.1 code and/or what is in  
CVS?

> I'd like to change the constructor a bit to deal with the params a  
> little better and include a config file using
> Config::General. Also, I noticed that there is a another Blat.pm  
> module, a parser module. Should we integrate this parser with the  
> blat run module?
>
Well maybe as another parser option - I believe I added/edited it to  
use the PSL parser in Bio::SearchIO is that not what you see?

Ick there are also some system commands in this module too which need  
to be removed and replaced with File::Copy or figure out how to  
remove them all together.


> Brian/Jason. Does that sound like a good idea?

But yes it needs some TLC
  I'm not sure I know enough about Config::General  to say  yes or no  
- but all of the run modules need some help in standardization so I  
would propose trying to integrate some changes into the base class  
(WrapperBase) that can be utilized by all the sub-classes -- if you  
want to use this as a model for how to do it that would be great too.

thx,
-j
>
> Victor
>
>
> On 2/10/06, Jason Stajich  wrote:
> brian -
>   just FYI -
>
> The AUTOLOAD stuff is present a great number of the run modules so   
> this is standard per se in that set.
>
> I think Victor's problem may have been the BLATDIR env variable  
> pointing to /usr/local/bin/blat instead of /usr/local/bin - is that  
> the case victor?
>
> tests passed for me before I did the 1.5.1 release for  this module  
> so it basically works.   It definitely needs a carekeeper as lot of  
> these run modules were built during the fugu group annotation  
> project and never got audited/re-vised after that.
>
>
> -jason
>
> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>
>> Victor,
>>
>> Fantastic, this is certainly a module in need, in fact there was  
>> already a note on this in the Wiki, I'll update it:
>>
>> http://bioperl.open-bio.org/wiki/Orphan_modules
>>
>> So all I did was:
>>
>> >cd bioperl-run
>> >perl ?I. -w t/Blat.t
>>
>> This is the most recent bioperl-run, the live version, and all  
>> tests passed. I'd downloaded the most recent binaries and put them  
>> in my /usr/local/bin, already in my PATH. That's it.
>>
>> That's the saddest looking new() I've ever seen in Bioperl, a  
>> mixture of named and unnamed parameters like that, how bizarre.  
>> The "proper" way, of course, is to use _rearrange, and not use  
>> AUTOLOAD.
>>
>> Thanks again,
>>
>> Brian O.
>>
>>
>> On 2/10/06 11:02 AM, "Victor"  wrote:
>>
>>> Brian,
>>> I'd be happy to do that. Can you send me a quick snap on how you  
>>> got it to work first. I'd like to see what is working first,  
>>> before I start fixing things.
>>>
>>> And yes I'll take a look at the Blat.t to see more on it.
>>>
>>> Victor
>>>
>>>
>>> On 2/9/06, Brian Osborne < osborne1 at optonline.net> wrote:
>>>> Victor,
>>>>
>>>> Yes, it may be that blat is not in your path, bioperl-run/t/ 
>>>> Blat.t is
>>>> working for me even though I haven't set BLATDIR. This is using  
>>>> the latest
>>>> blat, v. 33.
>>>>
>>>> There is a problem here though, you can see it if you read  
>>>> Blat.t. The
>>>> constructor does not look like your usual new():
>>>>
>>>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>>>>
>>>> -verbose => $verbose,
>>>>                             "DB"     => $db);
>>>>
>>>> Unfortunate - would you be willing to do more than add a useful  
>>>> SYNOPSIS and
>>>> actually fix new()? There is a subtext here, we're trying to  
>>>> find people who
>>>> would be willing to maintain useful modules like these, the  
>>>> ideal person in
>>>> this case would be someone who'd regularly use the module.
>>>>
>>>> Brian O.
>>>>
>>>>
>>>> On 2/9/06 6:22 PM, "Victor"  wrote:
>>>>
>>>> > Hi,
>>>> > Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module  
>>>> is up to date
>>>> > in the lastest bioperl release?
>>>> >
>>>> >
>>>> >
>>>> > use Bio::Tools::Run::Alignment::Blat;
>>>> > my $factory = Bio::Tools::Run::Alignment::Blat->new();
>>>> > my $seq =
>>>> >  
>>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAG 
>>>> TA";
>>>> >
>>>> > my @feats = $factory->run( $seq);
>>>> >
>>>> > Here is what I get when tring to use it:
>>>> >
>>>> > ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> > MSG: Blat call (/usr/local/bin/blat/blat -out=blast   
>>>> TGAAATAAAACTCAGTA
>>>> > /tmp/fB09bp5F76) crashed: -1
>>>> >
>>>> > Notice that it is using "blat' twice in the path. The way that  
>>>> I fixed this
>>>> > is by going to the blat.pm   module and  
>>>> changing the following lines:
>>>> > #my $str= Bio::Root::IO->catfile($self->executable,$self- 
>>>> >program_name);
>>>> > my $str= Bio::Root::IO->catfile($self->program_name);
>>>> >
>>>> > Any ideas, maybe I'm missing the $ENV variable somewhere?
>>>> > I'd like to avoid making this change. Also does anyone have a  
>>>> known synopsis
>>>> > of this blat module (where to set the parameters, and whether  
>>>> it allows you
>>>> > to have a config file).
>>>> > I'll be happy to add a better synopsis to the module if needed.
>>>> >
>>>> > Thanks in advance,
>>>> > Victor
>>>> >
>>>> > _______________________________________________
>>>> > Bioperl-l mailing list
>>>> > Bioperl-l at lists.open-bio.org
>>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l  >>> lists.open-bio.org/mailman/listinfo/bioperl-l>
>>>>
>>>>
>>>
>>>
>>
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
>

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12





From hlapp at gmx.net  Fri Feb 10 21:39:39 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Fri, 10 Feb 2006 13:39:39 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c62e60$9acecca0$c2987ca5@pc13>
References: <000001c62e60$9acecca0$c2987ca5@pc13>
Message-ID: 

Sohel,

please allow me to copy the list in my response. There's many good and 
insightful people on the list who may have something to add or 
different ideas.

I've come across that problem myself, for instance with InterPro. What 
I've done so far simply is to stick it unstructured into the definition 
slot, which is not helpful if your purpose goes further than just 
displaying it in an unstructured fashion.

I'm not sure you would want to create another class for this (like 
AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the 
implementation, probably not the interface) annotatable (i.e., 
implement Bio::Annotatable), which supposedly would be simple to do 
(AnnotationCollection is already implemented, you'd just return an 
instance of it).

Even though tag/value pairs sound like quick&fast way to go I'm leaning 
against it; in essence we're moving away from that elsewhere 
(SeqFeatureI) and hence I don't think we should restart it here.

I'm not giving a definitive answer here, just my (initial) thoughts. 
Hope that helps nonetheless. Can you fancy yourself trying the 
Annotatable approach and let us know how it goes?

	-hilmar


On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:

> Hi Hilmar,
> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase, 
> Northwestern University. I am working on a parser for an ontology 
> file. I really like the ontology object model which you have 
> contributed to Bioperl. I think its just Awesome!! One of things which 
> I thought would be great to capture is the ontology headers. Right now 
> one can specify only the name, authority information. I was wondering 
> if there is any way, I could also capture other ontology file headers 
> like version of the file, date when that ontology file was made. I was 
> thinking of making a header class or alternatively it could go as Hash 
> of values in the Bio::Ontology::Ontology class itself. I wanted to 
> know whets your thoughts about on this.
> ?
> Thanks,
> Sohel Merchant
> dictyBase
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





From osborne1 at optonline.net  Fri Feb 10 21:49:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 10 Feb 2006 16:49:18 -0500
Subject: [Bioperl-l] Running BLAT with BioPerl
In-Reply-To: <36d7e5550602101209j76df385dse5706d95b2103b63@mail.gmail.com>
Message-ID: 

Victor,

Just a note on "convention", excuse me if this is obvious. A few different
greps on the modules in bioperl-run shows that executable() gets or sets the
full path to the program in question, program() or program_name() gets or
sets the name of the app (e.g. "blat"). program_dir() does what it sounds
like. So you're right, "($self->executable,$self->program_name)", doesn't
make sense.

I can't speak to Config::General but I'd say that my first concern would be
that the things works in the normal way, either by naming parameters or by
passing an array of arguments, but not a mixture of both!

Of course you're right in thinking that tying execution to parsing is a good
idea, and it looks like this is done already, just glancing at t/Blat.t.

Brian O.


On 2/10/06 3:09 PM, "Victor"  wrote:

> Hi Jason,
> Well, in my env. BLATDIR was not setup at all. When setting BLATDIR to
> /usr/local/bin, I get the same problem. I think this might have to do with
> the _run internal method/sub. If you look at that subroutine, you'll see
> that it is using both $self->executable and $self->program_name. The test
> passes fine, but we might need to write a better test for this particular
> case.
> 
> Instead of saying:
>      my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
> I think the author meant to say:
>      my $str=
> Bio::Root::IO->catfile($self->program_dir,$self->program_name);
> 
> I quickly used Data::Dumper on both executate and program_name and this is
> what I get:
> $VAR1 = 'blat';
> $VAR1 = 'blat';
> 
> So the path is hardcoded to be /usr/local/bin/blat/blat when calling run
> though factory.
> 
> I'd like to change the constructor a bit to deal with the params a little
> better and include a config file using
> Config::General. Also, I noticed that there is a another Blat.pm module, a
> parser module. Should we integrate this parser with the blat run module?
> 
> Brian/Jason. Does that sound like a good idea?
> 
> Victor
> 
> 
> On 2/10/06, Jason Stajich  wrote:
>> 
>> brian -   just FYI -
>> 
>> The AUTOLOAD stuff is present a great number of the run modules so  this
>> is standard per se in that set.
>> 
>> I think Victor's problem may have been the BLATDIR env variable pointing
>> to /usr/local/bin/blat instead of /usr/local/bin - is that the case victor?
>> 
>> tests passed for me before I did the 1.5.1 release for  this module so it
>> basically works.   It definitely needs a carekeeper as lot of these run
>> modules were built during the fugu group annotation project and never got
>> audited/re-vised after that.
>> 
>> 
>> -jason
>> On Feb 10, 2006, at 11:34 AM, Brian Osborne wrote:
>> 
>> Victor,
>> 
>> Fantastic, this is certainly a module in need, in fact there was already a
>> note on this in the Wiki, I'll update it:
>> 
>> http://bioperl.open-bio.org/wiki/Orphan_modules
>> 
>> So all I did was:
>> 
>>> cd bioperl-run
>>> perl ?I. -w t/Blat.t
>> 
>> This is the most recent bioperl-run, the live version, and all tests
>> passed. I'd downloaded the most recent binaries and put them in my
>> /usr/local/bin, already in my PATH. That's it.
>> 
>> That's the saddest looking new() I've ever seen in Bioperl, a mixture of
>> named and unnamed parameters like that, how bizarre. The "proper" way, of
>> course, is to use _rearrange, and not use AUTOLOAD.
>> 
>> Thanks again,
>> 
>> Brian O.
>> 
>> 
>> On 2/10/06 11:02 AM, "Victor"  wrote:
>> 
>> Brian,
>> I'd be happy to do that. Can you send me a quick snap on how you got it to
>> work first. I'd like to see what is working first, before I start fixing
>> things.
>> 
>> And yes I'll take a look at the Blat.t to see more on it.
>> 
>> Victor
>> 
>> 
>> On 2/9/06, *Brian Osborne*  wrote:
>> 
>> Victor,
>> 
>> Yes, it may be that blat is not in your path, bioperl-run/t/Blat.t is
>> working for me even though I haven't set BLATDIR. This is using the latest
>> blat, v. 33.
>> 
>> There is a problem here though, you can see it if you read Blat.t. The
>> constructor does not look like your usual new():
>> 
>> my $factory = Bio::Tools::Run::Alignment::Blat->new('quiet'  => 1,
>> 
>> -verbose => $verbose,
>>                             "DB"     => $db);
>> 
>> Unfortunate - would you be willing to do more than add a useful SYNOPSIS
>> and
>> actually fix new()? There is a subtext here, we're trying to find people
>> who
>> would be willing to maintain useful modules like these, the ideal person
>> in
>> this case would be someone who'd regularly use the module.
>> 
>> Brian O.
>> 
>> 
>> On 2/9/06 6:22 PM, "Victor"  wrote:
>> 
>>> Hi,
>>> Does anyone know if the Bio/Tools/Run/Alignment/Blat.pm module is up to
>> date
>>> in the lastest bioperl release?
>>> 
>>> 
>>> 
>>> use Bio::Tools::Run::Alignment::Blat;
>>> my $factory = Bio::Tools::Run::Alignment::Blat->new();
>>> my $seq =
>>> "TGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTATGAAATAAAACTCAGTA";
>>> 
>>> my @feats = $factory->run( $seq);
>>> 
>>> Here is what I get when tring to use it:
>>> 
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: Blat call (/usr/local/bin/blat/blat -out=blast  TGAAATAAAACTCAGTA
>>> /tmp/fB09bp5F76) crashed: -1
>>> 
>>> Notice that it is using "blat' twice in the path. The way that I fixed
>> this
>>> is by going to the blat.pm    module and
>> changing the following lines:
>>> #my $str= Bio::Root::IO->catfile($self->executable,$self->program_name);
>>> my $str= Bio::Root::IO->catfile($self->program_name);
>>> 
>>> Any ideas, maybe I'm missing the $ENV variable somewhere?
>>> I'd like to avoid making this change. Also does anyone have a known
>> synopsis
>>> of this blat module (where to set the parameters, and whether it allows
>> you
>>> to have a config file).
>>> I'll be happy to add a better synopsis to the module if needed.
>>> 
>>> Thanks in advance,
>>> Victor
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> > org/mailman/listinfo/bioperl-l>
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12 
>> 
>> 
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From heikki at sanbi.ac.za  Sat Feb 11 06:54:51 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sat, 11 Feb 2006 08:54:51 +0200
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: 
References: <000001c62e60$9acecca0$c2987ca5@pc13>
	
Message-ID: <200602110854.52116.heikki@sanbi.ac.za>


I second Hilmar's suggestion to use Bio::Annotation::Collection for database 
(ontology database in this case) metadata. While you are at it, why do not 
define or use an existing (?) public ontology to do that. ;-)

	-Heikki

On Friday 10 February 2006 23:39, Hilmar Lapp wrote:
> Sohel,
>
> please allow me to copy the list in my response. There's many good and
> insightful people on the list who may have something to add or
> different ideas.
>
> I've come across that problem myself, for instance with InterPro. What
> I've done so far simply is to stick it unstructured into the definition
> slot, which is not helpful if your purpose goes further than just
> displaying it in an unstructured fashion.
>
> I'm not sure you would want to create another class for this (like
> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> implementation, probably not the interface) annotatable (i.e.,
> implement Bio::Annotatable), which supposedly would be simple to do
> (AnnotationCollection is already implemented, you'd just return an
> instance of it).
>
> Even though tag/value pairs sound like quick&fast way to go I'm leaning
> against it; in essence we're moving away from that elsewhere
> (SeqFeatureI) and hence I don't think we should restart it here.
>
> I'm not giving a definitive answer here, just my (initial) thoughts.
> Hope that helps nonetheless. Can you fancy yourself trying the
> Annotatable approach and let us know how it goes?
>
> 	-hilmar
>
> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> > Hi Hilmar,
> > ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> > Northwestern University. I am working on a parser for an ontology
> > file. I really like the ontology object model which you have
> > contributed to Bioperl. I think its just Awesome!! One of things which
> > I thought would be great to capture is the ontology headers. Right now
> > one can specify only the name, authority information. I was wondering
> > if there is any way, I could also capture other ontology file headers
> > like version of the file, date when that ontology file was made. I was
> > thinking of making a header class or alternatively it could go as Hash
> > of values in the Bio::Ontology::Ontology class itself. I wanted to
> > know whets your thoughts about on this.
> > ?
> > Thanks,
> > Sohel Merchant
> > dictyBase

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________



From hlapp at gmx.net  Sun Feb 12 05:10:35 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 11 Feb 2006 21:10:35 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c62e9a$4f82eee0$c2987ca5@pc13>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
Message-ID: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net>

Sohel, please do keep the discussion on the list, in your own interest 
as there's a multitude of people who can respond to you.

SimpleValue would probably be what I'd use too. As Heikki hinted you 
might even create an ontology for annotating ontologies, which would 
allow you to use Annotation::OntologyTerm for annotation, but then 
there's no qualifier value ...

Bioperl 1.5.1 has been released last year, please check the website.

	-hilmar

On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:

> Hi Hilmar,
>   I really like your suggestion of implementing the Bio::AnnotatableI
> interface in the Bio::Ontology::Ontology class. I am going to implement
> this and play around a little with it. I am planning to use
> Bio::Annotation::SimpleValue for annotating the header as it provides a
> good way of specifying the Tag/value pair. What are your thoughts on
> using this?
>
>   Also, I was wondering if you have any idea about the scheduled date
> for the Bioperl 1.51 release. I would like to contribute some stuff in
> the next release.
>
> Thanks,
> Sohel.
>
> -----Original Message-----
> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> Sent: Friday, February 10, 2006 3:40 PM
> To: Sohel Merchant
> Cc: Bioperl
> Subject: Re: Bio::Ontology::Ontology
>
> Sohel,
>
> please allow me to copy the list in my response. There's many good and
> insightful people on the list who may have something to add or
> different ideas.
>
> I've come across that problem myself, for instance with InterPro. What
> I've done so far simply is to stick it unstructured into the definition
> slot, which is not helpful if your purpose goes further than just
> displaying it in an unstructured fashion.
>
> I'm not sure you would want to create another class for this (like
> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> implementation, probably not the interface) annotatable (i.e.,
> implement Bio::Annotatable), which supposedly would be simple to do
> (AnnotationCollection is already implemented, you'd just return an
> instance of it).
>
> Even though tag/value pairs sound like quick&fast way to go I'm leaning
> against it; in essence we're moving away from that elsewhere
> (SeqFeatureI) and hence I don't think we should restart it here.
>
> I'm not giving a definitive answer here, just my (initial) thoughts.
> Hope that helps nonetheless. Can you fancy yourself trying the
> Annotatable approach and let us know how it goes?
>
> 	-hilmar
>
>
> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
>
>> Hi Hilmar,
>> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
>> Northwestern University. I am working on a parser for an ontology
>> file. I really like the ontology object model which you have
>> contributed to Bioperl. I think its just Awesome!! One of things which
>
>> I thought would be great to capture is the ontology headers. Right now
>
>> one can specify only the name, authority information. I was wondering
>> if there is any way, I could also capture other ontology file headers
>> like version of the file, date when that ontology file was made. I was
>
>> thinking of making a header class or alternatively it could go as Hash
>
>> of values in the Bio::Ontology::Ontology class itself. I wanted to
>> know whets your thoughts about on this.
>> ?
>> Thanks,
>> Sohel Merchant
>> dictyBase
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------





From hjm at tacgi.com  Sun Feb 12 06:46:38 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Sat, 11 Feb 2006 22:46:38 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
Message-ID: <200602112246.38926.hjm@tacgi.com>

Hi All,

After perusing the tutorial and other docs for a an evening, I still can't 
find the answer to this.  Forgive me if I've missed something obvious.

This should not be a novel request, but I've not found it answered.  If 
bioperl isn't the best way to do this, I'd be grateful to a pointer to a 
better way, especially if it includes an illuminating bit of code.

The problem is to retrieve genomic sequences plus & minus some offset from a 
locus determined by HUGO keyword or GeneID.  This would be a common followup 
chore for some extra analysis from a gene expression expt.  Or maybe this is 
in the DBFetch routines, but I've missed the sequence type to specify...?


TIA!

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From osborne1 at optonline.net  Sun Feb 12 16:37:39 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 12 Feb 2006 11:37:39 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602112246.38926.hjm@tacgi.com>
Message-ID: 

Harry,

Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
from its documentation:

  use Bio::DB::Fasta;

  # create database from directory of fasta files
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  # simple access (for those without Bioperl)
  my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
  my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
  my @ids     = $db->ids;
  my $length   = $db->length('CHROMOSOME_I');
  my $alphabet = $db->alphabet('CHROMOSOME_I');
  my $header   = $db->header('CHROMOSOME_I');

  # Bioperl-style access
  my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');

  my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
  my $seq     = $obj->seq;
  my $subseq  = $obj->subseq(4_000_000 => 4_100_000);

Do you already have the offsets?

Brian O.


On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:

> Hi All,
> 
> After perusing the tutorial and other docs for a an evening, I still can't
> find the answer to this.  Forgive me if I've missed something obvious.
> 
> This should not be a novel request, but I've not found it answered.  If
> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> better way, especially if it includes an illuminating bit of code.
> 
> The problem is to retrieve genomic sequences plus & minus some offset from a
> locus determined by HUGO keyword or GeneID.  This would be a common followup
> chore for some extra analysis from a gene expression expt.  Or maybe this is
> in the DBFetch routines, but I've missed the sequence type to specify...?
> 
> 
> TIA!




From pmiguel at purdue.edu  Sun Feb 12 20:05:47 2006
From: pmiguel at purdue.edu (Phillip SanMiguel)
Date: Sun, 12 Feb 2006 15:05:47 -0500
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	Blast	output
In-Reply-To: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
Message-ID: <43EF951B.4030601@purdue.edu>

Roger,
Just a data point, but in case you were not already aware of it, the 
characters W, K and R may be included in some DNA sequences. 'W' means 
'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember 
correctly. These are ambiguous bases, where a basecaller isn't sure, for 
example, whether a particular peak is an A or a T. Although I see these 
ambiguous bases less frequently these days, even common modern 
basecallers (such as Applied Biosystems basecallers) can generally be 
configured so they will generate them. Downstream applications may not 
like them, however.
    I may be just stating the obvious, or this might be irrelevant to 
the issue at hand. If so, my apologies.

Phillip
Roger Hall wrote:
> Guys - I'm looking at the error message:
>
> MSG: no data for midline Query  1   WWWKWRW  7
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
> STACK toplevel
> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>
> This is my line of thought:
> 1. "no data for midline $_" is a unique message generated by blast.pm in one
> location only at the point of a. reading three lines b. dropping lines with
> spaces only c. identifying the Query, Midline, and Match lines (0 <= $i < 3)
> 2. There is a regexp match that fails in order to reach that error message
> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the expression
> 4. It does anyway
> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the blast
> reports
>
> I suspect a newline/chomp/metacharacter issue. Not finding the string
> anywhere has me thoroughly confused - I asked Hubert for the additional
> file, assuming that I didn't have it.
>
> My next thought is to write a quick script to test perl behavior on "Fedora
> Core 9".
>
> Thoughts?
>
> Did I misread the issue entirely? :}
>
> Roger
>
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 09, 2006 10:16 AM
> To: 'Jason Stajich'; 'Hubert Prielinger'
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing Blast
> output
>
>
>   
>> -----Original Message-----
>> From: Jason Stajich [mailto:jason.stajich at duke.edu] 
>> Sent: Thursday, February 09, 2006 9:13 AM
>> To: Hubert Prielinger
>> Cc: Chris Fields; bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>> parsing Blast output
>>
>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>     
>>> hi chris,
>>> thanks, I have upgraded to version 1.5.1 but it isn't still 
>>>       
>> working, 
>>     
>>> do you have any ohter idea, the problem I have is that I 
>>>       
>> have to parse 
>>     
>>> a lot of textfiles....
>>> or shall I look for another option to parse those files...
>>>
>>> regards
>>> Hubert
>>>       
>> The code from Bioperl 1.5.1 works fine for me for blast 
>> 2.2.13 reports but unless you post your blast report we can't 
>> really determine the problem.
>>
>> If you are still getting the same error like this I am not 
>> convinced you have upgraded to 1.5.1 which includes a fix in 
>> the fact that NCBI changed the HSP result format to remove 
>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon 
>> as it was apparent sometime in September.
>>
>>     
>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>           
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> If you are just getting no results but also no warnings wrt 
>> parsing, are you sure your logic is correct?
>>
>> If you remove your filters do you see all the HSPS?
>>
>>
>> while (my $result = $search->next_result) {
>>      print $result->query_name, "\n";
>>      #iterate over each hit on the query sequence
>>      while (my $hit = $result->next_hit) {
>> 	print $hit->name, "\n";
>>          #iterate over each HSP in the hit
>>          while (my $hsp = $hit->next_hsp) {
>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp- 
>>  >hit_string, "\n";	
>>         }
>>     }
>> }
>>     
>
> I tested some of the BLAST results that Hubert sent Roger and me with a
> similar script to the above.  I removed the file parsing logic and it seemed
> to work just fine.  It may very well be a logic issue or that he hasn't
> installed the latest fix.
>     
> It's a funny thing, though.  When I tried using blastcl3 (v. 2.2.13), even
> though the returned output was from nr, the top of the blast output showed
> that it was v2.2.12:  
>
> BLASTP 2.2.12 [Aug-07-2005]
>
> I double-checked my local version and it's definitely v.2.2.13:
> -------------------------------------
> C:\Perl\Scripts>blastcl3 -
>
> blastcl3 2.2.13   arguments:...
> -------------------------------------
>
> If you use RemoteBlast using the same settings, the version in the header
> looks like this:
>
> BLASTP 2.2.13 [Nov-27-2005]
>
> I'm wondering if all the blast executables (blast and netblast) from NCBI
> have text output like v.2.2.12, while the wwwblast outputs a new format
> (2.2.13).  I'll ask blast-help at NCBI about this.
>
>   
>> To clarify some stuff -
>> Chris I don't necessarily think the XML is best way forward 
>> for BLAST reports generated locally, it isn't as detailed as 
>> the Text format and it is what most people expect to be able 
>> to scroll through and parse -- it is also harder for the 
>> format to change dramatically if you have a static binary on 
>> your machine =).  I think for remoteblast the XML format 
>> should be the way forward but I expect Bioperl to maintain 
>> support of any plain text BLAST report format that people use 
>> on a regular basis.
>>
>>     
>
> Does XML lack some specific info that text output has?  Didn't know that.  I
> believe that XML should be default in RemoteBlast since it will not break,
> but I agree with you about text output.  I also agree that it will need
> somebody to maintain it constantly, much like RemoteBlast.
>
>   
>> -jason
>>     
>>> Chris Fields wrote:
>>>
>>>       
>>>> My guess is you're running into text parsing problems in 
>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>> (1.5.1) or
>>>> bioperl-live (CVS), then see the bug below.
>>>>
>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>
>>>> I think the first problem you ran into is solved in bioperl 1.5.1, 
>>>> the last problem (more recent, not related to the first) has been 
>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed 
>>>> SearchIO::blast is available in the link above, but 
>>>>         
>> realize it hasn't 
>>     
>>>> been committed yet and may change.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry 
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>         
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert 
>>>>> Prielinger
>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>> To: bioperl-l at bioperl.org
>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work 
>>>>>           
>> parsing Blast 
>>     
>>>>> output
>>>>>
>>>>> Hi,
>>>>> If I want to parse a Blast Output (Version 2.2.12) with 
>>>>> Bio::SearchIO, I get the following error message:
>>>>>
>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>> STACK Bio::SearchIO::blast::next_result
>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>> STACK toplevel
>>>>>
>>>>>           
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>     
>>>>> is that a bug......
>>>>>
>>>>> If I want to parse Blast Output (version 2.2.13), I don't get 
>>>>> anything.....
>>>>> I'm using bioperl 1.4
>>>>>
>>>>> before, I have installed bioperl 1.4, it worked fine 
>>>>>           
>> parsing Blast 
>>     
>>>>> Output (version 2.2.12), but I don't remember which 
>>>>>           
>> bioperl version 
>>     
>>>>> I had installed
>>>>>
>>>>> thanks in advance
>>>>>
>>>>> Hubert
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>>           
>>>>
>>>>
>>>>         
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>       
>> --
>> Jason Stajich
>> Duke University
>> http://www.duke.edu/~jes12
>>
>>     
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign  
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   



From cjfields at uiuc.edu  Sun Feb 12 22:30:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 12 Feb 2006 16:30:07 -0600
Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
	parsing	Blast	output
In-Reply-To: <43EF951B.4030601@purdue.edu>
References: <004301c62db4$c9bcbab0$d416a790@LIBERAL>
	<43EF951B.4030601@purdue.edu>
Message-ID: <855DEC6F-8057-47BA-9D1D-9BDC16D1D83B@uiuc.edu>

Sequences are converted to FASTA format in RemoteBlast using  
Bio::SeqIO, which I think includes IUPAC base and amino acid  
ambiguities like you mention, so my guess is any errors (like odd non- 
IUPAC letters in nucleotide or aa queries) are likely caught there.   
As long as it passes Bio::SeqIO it shouldn't be a problem.  Haven't  
tried this myself, though, so I can't say that with absolute certainty.

Chris



On Feb 12, 2006, at 2:05 PM, Phillip SanMiguel wrote:

> Roger,
> Just a data point, but in case you were not already aware of it, the
> characters W, K and R may be included in some DNA sequences. 'W' means
> 'A' or 'T', [AT], 'K' means [TG] and 'R' means [AG] if I remember
> correctly. These are ambiguous bases, where a basecaller isn't  
> sure, for
> example, whether a particular peak is an A or a T. Although I see  
> these
> ambiguous bases less frequently these days, even common modern
> basecallers (such as Applied Biosystems basecallers) can generally be
> configured so they will generate them. Downstream applications may not
> like them, however.
>     I may be just stating the obvious, or this might be irrelevant to
> the issue at hand. If so, my apologies.
>
> Phillip
> Roger Hall wrote:
>> Guys - I'm looking at the error message:
>>
>> MSG: no data for midline Query  1   WWWKWRW  7
>> STACK Bio::SearchIO::blast::next_result
>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>> STACK toplevel
>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>
>> This is my line of thought:
>> 1. "no data for midline $_" is a unique message generated by  
>> blast.pm in one
>> location only at the point of a. reading three lines b. dropping  
>> lines with
>> spaces only c. identifying the Query, Midline, and Match lines (0  
>> <= $i < 3)
>> 2. There is a regexp match that fails in order to reach that error  
>> message
>> 3. The $_ value "Query  1   WWWKWRW  7" should not fail the  
>> expression
>> 4. It does anyway
>> 5. I cannot find the value "Query  1   WWWKWRW  7" anywhere in the  
>> blast
>> reports
>>
>> I suspect a newline/chomp/metacharacter issue. Not finding the string
>> anywhere has me thoroughly confused - I asked Hubert for the  
>> additional
>> file, assuming that I didn't have it.
>>
>> My next thought is to write a quick script to test perl behavior  
>> on "Fedora
>> Core 9".
>>
>> Thoughts?
>>
>> Did I misread the issue entirely? :}
>>
>> Roger
>>
>>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
>> Fields
>> Sent: Thursday, February 09, 2006 10:16 AM
>> To: 'Jason Stajich'; 'Hubert Prielinger'
>> Cc: bioperl-l at bioperl.org
>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work parsing  
>> Blast
>> output
>>
>>
>>
>>> -----Original Message-----
>>> From: Jason Stajich [mailto:jason.stajich at duke.edu]
>>> Sent: Thursday, February 09, 2006 9:13 AM
>>> To: Hubert Prielinger
>>> Cc: Chris Fields; bioperl-l at bioperl.org
>>> Subject: Re: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>> parsing Blast output
>>>
>>> On Feb 8, 2006, at 4:41 PM, Hubert Prielinger wrote:
>>>
>>>> hi chris,
>>>> thanks, I have upgraded to version 1.5.1 but it isn't still
>>>>
>>> working,
>>>
>>>> do you have any ohter idea, the problem I have is that I
>>>>
>>> have to parse
>>>
>>>> a lot of textfiles....
>>>> or shall I look for another option to parse those files...
>>>>
>>>> regards
>>>> Hubert
>>>>
>>> The code from Bioperl 1.5.1 works fine for me for blast
>>> 2.2.13 reports but unless you post your blast report we can't
>>> really determine the problem.
>>>
>>> If you are still getting the same error like this I am not
>>> convinced you have upgraded to 1.5.1 which includes a fix in
>>> the fact that NCBI changed the HSP result format to remove
>>> the ':' from the Query/Sbjct prefixes.  We fixed this as soon
>>> as it was apparent sometime in September.
>>>
>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>> If you are just getting no results but also no warnings wrt
>>> parsing, are you sure your logic is correct?
>>>
>>> If you remove your filters do you see all the HSPS?
>>>
>>>
>>> while (my $result = $search->next_result) {
>>>      print $result->query_name, "\n";
>>>      #iterate over each hit on the query sequence
>>>      while (my $hit = $result->next_hit) {
>>> 	print $hit->name, "\n";
>>>          #iterate over each HSP in the hit
>>>          while (my $hsp = $hit->next_hsp) {
>>> 	 print $hsp->evalue, " ", $hsp->length('sbjct'), " ", $hsp-
>>>> hit_string, "\n";	
>>>         }
>>>     }
>>> }
>>>
>>
>> I tested some of the BLAST results that Hubert sent Roger and me  
>> with a
>> similar script to the above.  I removed the file parsing logic and  
>> it seemed
>> to work just fine.  It may very well be a logic issue or that he  
>> hasn't
>> installed the latest fix.
>>
>> It's a funny thing, though.  When I tried using blastcl3 (v.  
>> 2.2.13), even
>> though the returned output was from nr, the top of the blast  
>> output showed
>> that it was v2.2.12:
>>
>> BLASTP 2.2.12 [Aug-07-2005]
>>
>> I double-checked my local version and it's definitely v.2.2.13:
>> -------------------------------------
>> C:\Perl\Scripts>blastcl3 -
>>
>> blastcl3 2.2.13   arguments:...
>> -------------------------------------
>>
>> If you use RemoteBlast using the same settings, the version in the  
>> header
>> looks like this:
>>
>> BLASTP 2.2.13 [Nov-27-2005]
>>
>> I'm wondering if all the blast executables (blast and netblast)  
>> from NCBI
>> have text output like v.2.2.12, while the wwwblast outputs a new  
>> format
>> (2.2.13).  I'll ask blast-help at NCBI about this.
>>
>>
>>> To clarify some stuff -
>>> Chris I don't necessarily think the XML is best way forward
>>> for BLAST reports generated locally, it isn't as detailed as
>>> the Text format and it is what most people expect to be able
>>> to scroll through and parse -- it is also harder for the
>>> format to change dramatically if you have a static binary on
>>> your machine =).  I think for remoteblast the XML format
>>> should be the way forward but I expect Bioperl to maintain
>>> support of any plain text BLAST report format that people use
>>> on a regular basis.
>>>
>>>
>>
>> Does XML lack some specific info that text output has?  Didn't  
>> know that.  I
>> believe that XML should be default in RemoteBlast since it will  
>> not break,
>> but I agree with you about text output.  I also agree that it will  
>> need
>> somebody to maintain it constantly, much like RemoteBlast.
>>
>>
>>> -jason
>>>
>>>> Chris Fields wrote:
>>>>
>>>>
>>>>> My guess is you're running into text parsing problems in
>>>>> Bio::SearchIO::blast.  Upgrade to the latest developer version
>>>>> (1.5.1) or
>>>>> bioperl-live (CVS), then see the bug below.
>>>>>
>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>
>>>>> I think the first problem you ran into is solved in bioperl 1.5.1,
>>>>> the last problem (more recent, not related to the first) has been
>>>>> fixed but hasn't been committed to bioperl-live yet.  The fixed
>>>>> SearchIO::blast is available in the link above, but
>>>>>
>>> realize it hasn't
>>>
>>>>> been committed yet and may change.
>>>>>
>>>>> Christopher Fields
>>>>> Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry
>>>>> University of Illinois Urbana-Champaign
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hubert
>>>>>> Prielinger
>>>>>> Sent: Wednesday, February 08, 2006 2:52 PM
>>>>>> To: bioperl-l at bioperl.org
>>>>>> Subject: [Bioperl-l] bioperl 1.4 SearchIO doesn't work
>>>>>>
>>> parsing Blast
>>>
>>>>>> output
>>>>>>
>>>>>> Hi,
>>>>>> If I want to parse a Blast Output (Version 2.2.12) with
>>>>>> Bio::SearchIO, I get the following error message:
>>>>>>
>>>>>> MSG: no data for midline Query  1   WWWKWRW  7
>>>>>> STACK Bio::SearchIO::blast::next_result
>>>>>> /usr/lib/perl5/site_perl/5.8.6/Bio/SearchIO/blast.pm:1151
>>>>>> STACK toplevel
>>>>>>
>>>>>>
>>> /home/Hubert/installed/eclipse/workspace/Database_Search/Blast.pl:21
>>>
>>>>>> is that a bug......
>>>>>>
>>>>>> If I want to parse Blast Output (version 2.2.13), I don't get
>>>>>> anything.....
>>>>>> I'm using bioperl 1.4
>>>>>>
>>>>>> before, I have installed bioperl 1.4, it worked fine
>>>>>>
>>> parsing Blast
>>>
>>>>>> Output (version 2.2.12), but I don't remember which
>>>>>>
>>> bioperl version
>>>
>>>>>> I had installed
>>>>>>
>>>>>> thanks in advance
>>>>>>
>>>>>> Hubert
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>> --
>>> Jason Stajich
>>> Duke University
>>> http://www.duke.edu/~jes12
>>>
>>>
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From torsten.seemann at infotech.monash.edu.au  Sun Feb 12 23:56:32 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 13 Feb 2006 10:56:32 +1100
Subject: [Bioperl-l] RemoteBlast
In-Reply-To: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
References: <004401c62c6e$da906a40$4301a8c0@LIBERAL>
Message-ID: <1139788592.29375.13.camel@chauvel.csse.monash.edu.au>

Roger,

> I think that most core Bioperl folks have long since moved away from
> RemoteBlast and are using the functionality in StandAloneBlast to run their
> own local servers. 

Agreed. Even smaller centres like my workplace need the throughput that
a local PC, SMP system or Cluster can provide.

> wave of the future, but I think there is still some concern that not every
> flavor of BLAST produces XML yet. Even so, the XML parser is considered to
> be very strong, and only helps hasten the end of text-formatted support,
> since parsing text-formatted reports is the primary source of pain. 

If BioPerl switches primarily to XML parsing, the tool authors will soon
add support for XML (not very difficult really) due to BioPerl's
pervasiveness?

> I do, however, see the advantage in shifting to XML-formatted reporting and
> parsing *only* as soon as every BLAST flavor supports it, if not before.
> (Anyone - is this still an issue. Please educate me.)

The four BLAST flavours I utilise all support XML output: 
1) NCBI BLAST 2) WU-BLAST 3) MPI-BLAST 4) FSA-BLAST.

> At the moment, I'm leaning towards adding an option to RemoteBlast. The
> default (no option) would use a "pure perl" implementation, and the
> enhancement (with explicit option) would merely wrap the NCBI executable.

If the API is done correctly both of these could co-exist with very
little redundant code. (I personally rarely use remote blast).

-- 
Torsten Seemann 
Victorian Bioinformatics Consortium



From torsten.seemann at infotech.monash.edu.au  Mon Feb 13 00:35:06 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Mon, 13 Feb 2006 11:35:06 +1100
Subject: [Bioperl-l] Remote BLAST support discussion
In-Reply-To: <1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
References: <1139362722.43e94ba29ebcc@webmail.utoronto.ca>
	<1B57290D-EB25-4D88-81BD-08F735FF643C@duke.edu>
Message-ID: <1139790906.29375.27.camel@chauvel.csse.monash.edu.au>

> Mostly I think we need to try and support something that will  
> "ALWAYS" work so that individuals setting up webservices which rely  
> on remote blast functionality.  In theory, netblast/blastcl3 should  
> always work since NCBI has to update the exe when they change their  
> server setup.

What usually happens when an older 'blastcl3' binary is used on a newer
server setup? I guess it fails in a deterministic manner so the BioPerl
user can throw a useful exception.

> I also see value in providing a wrapper for netblast since it should  
> look an awful lot like running blast locally.

Agreed - they are virtually indistinguishable.

> Ideally I'd like to see a more extensible system, something like (and  
> please feel free to come up with better names for the modules!):

Do BioPerl coding standards require "::Blast" over "::BLAST" ?
(not important anyway)

> Bio::Tools::Run::Blast
>   -->             StandAlone (support for [..as many flavours as poss])
>   -->             RemoteNCBI (currently the RemoteBlast server)
>   -->             RemoteEBISOAP (EBI has a nice SOAP interface that  
>   -->             RemoteNetBlast (blastcl3 or netblast local executable)
>   (other things that people want)

Looks reasonable. I assume there's some interfaces in there like
Bio::Tools::Blast::BlastI etc.

Could probably call "RemoteNetBlast" just "RemoteNet" because it is
already in the Blast:: namespace. (not important though)

My only suggestion for StandAlone (and RemoteNetBlast) is that they both
do a generic "run a local binary with env. vars and parameters and
capture the stdout, stderr and return code". This needs to be abstracted
away (or re-use existing code from bioperl-run?). Jason mentioned
Ensembl::Runnable as a source of code we could incorporate into Bioperl.

-- 
Torsten Seemann 
Victorian Bioinformatics Consortium



From cjfields at uiuc.edu  Mon Feb 13 16:45:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 10:45:14 -0600
Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
In-Reply-To: <20060213152603.ed3f3118@dogwood.plantbio.uga.edu>
Message-ID: <001801c630bc$dd35bff0$15327e82@pyrimidine>

If you're using RemoteBlast 1.28, then you've likely updated from CVS which
isn't the latest fix.  

 

Make sure that you check the following: 

 

1) Always post to the mailing list:
http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .  

 

2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed
first.  Perform a clean installation; do not upgrade only
Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee
that mixing modules from old and new distributions (1.4 and 1.5.1, for
instance) will work.  A bioperl-1.5.1 or bioperl-live installation will
allow text output from BLAST v.2.2.12 to be saved and parsed; it will not
parse the newest BLAST text output from NCBI (v2.2.13) but it should still
save it. I believe as long as next_results() isn't called, it will work.

 

3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
are NOT in CVS; they haven't been cleared and checked in by Roger Hall
(who's now taking care of RemoteBlast) and the powers that be (Jason or
whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:

 

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

http://bugzilla.bioperl.org/show_bug.cgi?id=1935

 

The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving
XML output, so isn't necessary if you don't plan on using this option.  And,
remember, they haven't been committed yet to CVS, which means that the final
version will change to refle the new version.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

  _____  

From: Guojun Yang [mailto:gyang at plantbio.uga.edu] 
Sent: Monday, February 13, 2006 9:26 AM
To: Chris Fields
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

 

Hi, Chris

Thanks for your suggestion, however, it doesn't seem to work for my cgi even
after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID.
Is there any suggestion?

 

Guojun



Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun

  _____  

From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
Sent: Fri, 03 Feb 2006 16:07:29 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below). I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed. The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml. There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless. If you look back the
last month or so there has been a bit of discussion here about it. Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission). In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem? Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> _____
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




 

 



From gyang at plantbio.uga.edu  Mon Feb 13 18:32:14 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 13 Feb 2006 13:32:14 -0500
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
In-Reply-To: <001801c630bc$dd35bff0$15327e82@pyrimidine>
Message-ID: <20060213183214.342b90da@dogwood.plantbio.uga.edu>

Hi, Chris,  
I do have different versions of bioperl on my Linux machine (1.4. and 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I need to uninstall and remove the previous versions. I could not find any hint on uninstalling bioperl on linux. Could you please give me some suggestion?  
Thanks,  
Guojun

Department of Plant Biology
University of Georgia
      _____  

  From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Sent: Mon, 13 Feb 2006 11:45:14 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

  
  
If you?re using RemoteBlast 1.28, then you?ve likely updated from CVS which isn?t the latest fix.    
   
Make sure that you check the following:   
   
1) Always post to the mailing list: http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .    
   
2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS) installed first.  Perform a clean installation; do not upgrade only Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't guarantee that mixing modules from old and new distributions (1.4 and 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live installation will allow text output from BLAST v.2.2.12 to be saved and parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13) but it should still save it. I believe as long as next_results() isn?t called, it will work.  
   
3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output are NOT in CVS; they haven?t been cleared and checked in by Roger Hall (who?s now taking care of RemoteBlast) and the powers that be (Jason or whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:  
   
http://bugzilla.bioperl.org/show_bug.cgi?id=1934  
http://bugzilla.bioperl.org/show_bug.cgi?id=1935  
   
The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of saving XML output, so isn?t necessary if you don?t plan on using this option.  And, remember, they haven?t been committed yet to CVS, which means that the final version will change to refle the new version.  
  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign   
  
  
    _____  

    
From: Guojun Yang [mailto:gyang at plantbio.uga.edu] 
Sent: Monday, February 13, 2006 9:26 AM
To: Chris Fields
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28  
   
  
Hi, Chris  
  
Thanks for your suggestion, however, it doesn't seem to work for my cgi even after I replace both blast.pm and RemoteBlast.pm. I didn't even get any RID. Is there any suggestion?  
  
   
  
Guojun  


Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun  
    _____  

    
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
Sent: Fri, 03 Feb 2006 16:07:29 -0500
Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28

I would say give the new code a try, but realize that it hasn't been checked
in (like I said below). I will try going over the modified
Bio::SearchIO::blast again this weekend to see if there is anything I might
have missed. The changed order in the header of BLAST text output has me a
bit worried that it might not catch everything, but it at least doesn't hang
in the while() loop I described in the bug report below (bug #1934) and
seems to process everything fine.

If you want more stability in the code, you might consider changing over to
XML output and parsing with Bio::SearchIO::blastxml. There are some changes
in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
output, but I believe it parses everything regardless. If you look back the
last month or so there has been a bit of discussion here about it. Jason
describes a bit on how to set up RemoteBlast for XML:

http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Friday, February 03, 2006 1:45 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> 
> Hi, Everybody,
> I see this post and am wondering if this is the reason for the
> malfunctionning of my webserver. We set up a webserver named MAK, for MITE
> sequence analysis. It was working very well until around November 2005,
> when it stopped returning any result (the site is fine and seems to be
> doing sth after submission). In the CGI script, I used remoteblast (that
> work was done in 2003) to do searches. I currently do not have access to
> the server because I moved. Quite several people sent emails to us about
> its malfunctioning. Is there any suggestion on fixing the problem? Should
> I simplily ask the remoteblast.pm be replaced with the new version?
> Thanks a lot,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> _____
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> l at bioperl.org]
> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> 
> Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> will
> work for saving text output. However, it will not parse anything using
> next_result (it will likely hang) and will not save XML format. See these
> bugs:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> for explanations and possible fixes (changes to RemoteBlast and
> Bio::SearchIO::blast). Note that these haven't been checked in yet so are
> still not included in bioperl-live; they may be further modified before
> committing to CVS. If you're not worried about XML, you could just try the
> first fix, which is a change to SearchIO::blast.
> 
> Nagesh, I remember you posting to the list a month ago using a script
> which
> had problems; the script you used saves the output but doesn't actually
> parse it (i.e. you don't use next_result() to go through the data). Is the
> version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> blast"
> using your version of RemoteBlast and method next_result()? Like below
> (from
> perldoc):
> 
> while ( my @rids = $factory->each_rid ) {
> foreach my $rid ( @rids ) {
> my $rc = $factory->retrieve_blast($rid);
> if( !ref($rc) ) {
> if( $rc < 0 ) {
> $factory->remove_rid($rid);
> }
> print STDERR "." if ( $v > 0 );
> sleep 5;
> } else { # parsing
> starts here
> my $result = $rc->next_result(); # it should hang
> here
> #save the output
> my $filename = $result->query_name()."\.out";
> $factory->save_output($filename);
> $factory->remove_rid($rid);
> print "\nQuery Name: ", $result->query_name(), "\n";
> while ( my $hit = $result->next_hit ) {
> next unless ( $v > 0);
> print "\thit name is ", $hit->name, "\n";
> while( my $hsp = $hit->next_hsp ) {
> print "\t\tscore is ", $hsp->score, "\n";
> }
> }
> }
> }
> }
> }
> 
> 
> My script hanged if I used next_result() in any way prior to the fixes. I
> want to see how many others are having the same issues with parsing using
> the CVS version of bioperl-live.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > Sent: Thursday, February 02, 2006 7:24 PM
> > To: Huang Jian; bioperl-l
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Hi Huang,
> > Thanks for the message. The older version of RemoteBlast.pm works on the
> > logic of checking the temporary file size to determine whether the Blast
> > results are ready. This condition is not getting satisfied may be due to
> > some changes brought about by NCBI. I had this problem recently and
> > figured out that the solution was to use the latest version which has
> > this problem fixed (does not use file size logic any more) which is not
> > yet included in the BioPerl package.
> > Cheers
> > Nagesh
> >
> > Huang Jian wrote:
> >
> > > Dear Nagesh,
> > >
> > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > me. Now it works perfectly!!!
> > >
> > > Thank you!!
> > >
> > > Huang
> > >
> > > ----- Original Message ----- From: "Nagesh Chakka"
> > > 
> > > To: "Huang Jian" ; "bioperl-l"
> > > 
> > > Sent: Friday, February 03, 2006 7:48 AM
> > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > via email
> > >
> > >
> > >> Hi Huang,
> > >> I see that you are submitting a sequence for a remote blast search.
> Can
> > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09). If
> > >> not I have attached it with this email, try to replace it with the
> old
> > >> one which has a bug.
> > >> Let me know if it works.
> > >> Nagesh
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


  
  
   
  
       
   
 


From cjfields at uiuc.edu  Mon Feb 13 20:39:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 14:39:38 -0600
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
In-Reply-To: <20060213183214.342b90da@dogwood.plantbio.uga.edu>
Message-ID: <000901c630dd$9be54f40$15327e82@pyrimidine>

How do you know two versions are installed (i.e. how are you checking the
version)?  Do you see have two complete bioperl distributions (in two
separate directories) or are you looking in modules?  Here's the way to
check the version (from the FAQ):

perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'

If you have two full bioperl distributions on your computer, normally only
one will be in use unless you have explicitly set the environment variable
PERL5LIB.  The PERL5LIB  directories will be searched first before your
normal perl directory list (@INC) is searched.  You MAY get some mixing
then, but only if perl can't find a particular module in the path designated
in PERL5LIB; then it will progress through the directories listed in @INC.
This may happen if a module is unique to a particular release, but shouldn't
happen for the majority of modules, including RemoteBlast.  You can check
what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will differ
depending on your OS, perl build, etc.

Regardless, if you follow the directions for installing bioperl for your
system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you
explicitly change the installation directory when using 'perl Makefile.PL'),
then 'uninstalling' Bioperl shouldn't be a problem as it will install the
Bioperl distribution you downloaded over the old version in @INC.  See this
page:

http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL

for more details.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Monday, February 13, 2006 12:32 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> 
> Hi, Chris,
> I do have different versions of bioperl on my Linux machine (1.4. and
> 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I
> need to uninstall and remove the previous versions. I could not find any
> hint on uninstalling bioperl on linux. Could you please give me some
> suggestion?
> Thanks,
> Guojun
> 
> Department of Plant Biology
> University of Georgia
>       _____
> 
>   From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Sent: Mon, 13 Feb 2006 11:45:14 -0500
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> 
> 
> If you're using RemoteBlast 1.28, then you've likely updated from CVS
> which isn't the latest fix.
> 
> Make sure that you check the following:
> 
> 1) Always post to the mailing list:
> http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> 
> 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS)
> installed first.  Perform a clean installation; do not upgrade only
> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't
> guarantee that mixing modules from old and new distributions (1.4 and
> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> installation will allow text output from BLAST v.2.2.12 to be saved and
> parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13)
> but it should still save it. I believe as long as next_results() isn't
> called, it will work.
> 
> 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
> are NOT in CVS; they haven't been cleared and checked in by Roger Hall
> (who's now taking care of RemoteBlast) and the powers that be (Jason or
> whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> 
> The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of
> saving XML output, so isn't necessary if you don't plan on using this
> option.  And, remember, they haven't been committed yet to CVS, which
> means that the final version will change to refle the new version.
> 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
>     _____
> 
> 
> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> Sent: Monday, February 13, 2006 9:26 AM
> To: Chris Fields
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> 
> Hi, Chris
> 
> Thanks for your suggestion, however, it doesn't seem to work for my cgi
> even after I replace both blast.pm and RemoteBlast.pm. I didn't even get
> any RID. Is there any suggestion?
> 
> 
> 
> Guojun
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
>     _____
> 
> 
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> Sent: Fri, 03 Feb 2006 16:07:29 -0500
> Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> 1.28
> 
> I would say give the new code a try, but realize that it hasn't been
> checked
> in (like I said below). I will try going over the modified
> Bio::SearchIO::blast again this weekend to see if there is anything I
> might
> have missed. The changed order in the header of BLAST text output has me a
> bit worried that it might not catch everything, but it at least doesn't
> hang
> in the while() loop I described in the bug report below (bug #1934) and
> seems to process everything fine.
> 
> If you want more stability in the code, you might consider changing over
> to
> XML output and parsing with Bio::SearchIO::blastxml. There are some
> changes
> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
> output, but I believe it parses everything regardless. If you look back
> the
> last month or so there has been a bit of discussion here about it. Jason
> describes a bit on how to set up RemoteBlast for XML:
> 
> http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > Sent: Friday, February 03, 2006 1:45 PM
> > To: bioperl-l at bioperl.org
> > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> >
> > Hi, Everybody,
> > I see this post and am wondering if this is the reason for the
> > malfunctionning of my webserver. We set up a webserver named MAK, for
> MITE
> > sequence analysis. It was working very well until around November 2005,
> > when it stopped returning any result (the site is fine and seems to be
> > doing sth after submission). In the CGI script, I used remoteblast (that
> > work was done in 2003) to do searches. I currently do not have access to
> > the server because I moved. Quite several people sent emails to us about
> > its malfunctioning. Is there any suggestion on fixing the problem?
> Should
> > I simplily ask the remoteblast.pm be replaced with the new version?
> > Thanks a lot,
> > Guojun
> >
> > Department of Plant Biology
> > University of Georgia
> > Tel: 706-542-1857
> > Fax: 706-542-1805
> > http://www.arches.uga.edu/~guojun
> > _____
> >
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> > l at bioperl.org]
> > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >
> > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> > will
> > work for saving text output. However, it will not parse anything using
> > next_result (it will likely hang) and will not save XML format. See
> these
> > bugs:
> >
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >
> > for explanations and possible fixes (changes to RemoteBlast and
> > Bio::SearchIO::blast). Note that these haven't been checked in yet so
> are
> > still not included in bioperl-live; they may be further modified before
> > committing to CVS. If you're not worried about XML, you could just try
> the
> > first fix, which is a change to SearchIO::blast.
> >
> > Nagesh, I remember you posting to the list a month ago using a script
> > which
> > had problems; the script you used saves the output but doesn't actually
> > parse it (i.e. you don't use next_result() to go through the data). Is
> the
> > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> > parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> > blast"
> > using your version of RemoteBlast and method next_result()? Like below
> > (from
> > perldoc):
> >
> > while ( my @rids = $factory->each_rid ) {
> > foreach my $rid ( @rids ) {
> > my $rc = $factory->retrieve_blast($rid);
> > if( !ref($rc) ) {
> > if( $rc < 0 ) {
> > $factory->remove_rid($rid);
> > }
> > print STDERR "." if ( $v > 0 );
> > sleep 5;
> > } else { # parsing
> > starts here
> > my $result = $rc->next_result(); # it should hang
> > here
> > #save the output
> > my $filename = $result->query_name()."\.out";
> > $factory->save_output($filename);
> > $factory->remove_rid($rid);
> > print "\nQuery Name: ", $result->query_name(), "\n";
> > while ( my $hit = $result->next_hit ) {
> > next unless ( $v > 0);
> > print "\thit name is ", $hit->name, "\n";
> > while( my $hsp = $hit->next_hsp ) {
> > print "\t\tscore is ", $hsp->score, "\n";
> > }
> > }
> > }
> > }
> > }
> > }
> >
> >
> > My script hanged if I used next_result() in any way prior to the fixes.
> I
> > want to see how many others are having the same issues with parsing
> using
> > the CVS version of bioperl-live.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > Sent: Thursday, February 02, 2006 7:24 PM
> > > To: Huang Jian; bioperl-l
> > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >
> > > Hi Huang,
> > > Thanks for the message. The older version of RemoteBlast.pm works on
> the
> > > logic of checking the temporary file size to determine whether the
> Blast
> > > results are ready. This condition is not getting satisfied may be due
> to
> > > some changes brought about by NCBI. I had this problem recently and
> > > figured out that the solution was to use the latest version which has
> > > this problem fixed (does not use file size logic any more) which is
> not
> > > yet included in the BioPerl package.
> > > Cheers
> > > Nagesh
> > >
> > > Huang Jian wrote:
> > >
> > > > Dear Nagesh,
> > > >
> > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > > me. Now it works perfectly!!!
> > > >
> > > > Thank you!!
> > > >
> > > > Huang
> > > >
> > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > 
> > > > To: "Huang Jian" ; "bioperl-l"
> > > > 
> > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > > via email
> > > >
> > > >
> > > >> Hi Huang,
> > > >> I see that you are submitting a sequence for a remote blast search.
> > Can
> > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09).
> If
> > > >> not I have attached it with this email, try to replace it with the
> > old
> > > >> one which has a bug.
> > > >> Let me know if it works.
> > > >> Nagesh
> > > >
> > > >
> > > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Mon Feb 13 21:00:11 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 13 Feb 2006 16:00:11 -0500
Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
Message-ID: <20060213160011.1e89108c@dogwood.plantbio.uga.edu>

Thanks, Chris,
I installed version 1.5.1 and replaced the blast.pm file with the one from your bug report. The running version is 1.5 when I use the command you sent me. But when I tried the script, it doesn't change much. My remoteblast code (portion) is here:

sub search {
local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
local $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}= 'no';
local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
my $query = Bio::Seq -> new ( -seq=>"$_[0]",
			      -id=>"query",
			      -desc=>"new seq");
my $len=$query->length();
@db=('nr','htgs','wgs');
foreach my $db (@db) {
my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
						'-data' =>"$db",
					        '-expect'=>"$E_value");


my $blast_report = $factory->submit_blast($query);

my @rids = $factory->each_rid();
foreach my $rid ( @rids ) {
    print STDERR "$rid\n";
}
# RID = Remote Blast ID (e.g: 1017772174-16400-6638)
print STDERR "waiting...";
sleep 60;

foreach my $rid ( @rids ) {
    my $rc = $factory->retrieve_blast($rid);
    while (!ref($rc) ) {
	if( $rc < 0 ) {
# retrieve_blast returns -1 on error
	    $factory->remove_rid($rid);
	    print "Error!\n";
	    send_error($email,$function,$seqname,$queryname[$ST]);
	    die "Can't retrieve $rid";
	} if ($rc==0) { # retrieve_blast returns 0 on 'job not finished'
	    sleep 60;
	    $rc = $factory->retrieve_blast($rid);
	}	
    }
    if (ref($rc)) {
	print STDERR "Done.\n";
	 while( my $result = $rc->next_result) {
	    while( my $hit = $result->next_hit()) {
	    	$hit_name=$hit->name;
		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
		$name=$1;
		@left_plus_start=();
		@left_plus_end=();
		@left_minus_start=();
		@left_minus_end=();
		@right_plus_start=();
		@right_plus_end=();
		@right_minus_start=();
		@right_minus_end=();

		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
		while( my $hsp = $hit->next_hsp()) { 
......

It was working quite well before around October laster year, but it has stopped since then, When a submission is sent via a webpage, the cgi starts to work and use a memory of ~20 Mb. Then it hangs there, finally the expected email is received but without real results although it does contain something from other parts of the script. Apparently the search sub did not return anything (I know there is something should be returned.). Is it also possible the format of the NCBI output for each result has changed?
Thank you,
Guojun


Department of Plant Biology
University of Georgia



----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28


> How do you know two versions are installed (i.e. how are you checking the
> version)?  Do you see have two complete bioperl distributions (in two
> separate directories) or are you looking in modules?  Here's the way to
> check the version (from the FAQ):
> > perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> > If you have two full bioperl distributions on your computer, normally only
> one will be in use unless you have explicitly set the environment variable
> PERL5LIB.  The PERL5LIB  directories will be searched first before your
> normal perl directory list (@INC) is searched.  You MAY get some mixing
> then, but only if perl can't find a particular module in the path designated
> in PERL5LIB; then it will progress through the directories listed in @INC.
> This may happen if a module is unique to a particular release, but shouldn't
> happen for the majority of modules, including RemoteBlast.  You can check
> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will differ
> depending on your OS, perl build, etc.
> > Regardless, if you follow the directions for installing bioperl for your
> system ('perl Makefile.PL', 'make', 'make test', 'make install', unless you
> explicitly change the installation directory when using 'perl Makefile.PL'),
> then 'uninstalling' Bioperl shouldn't be a problem as it will install the
> Bioperl distribution you downloaded over the old version in @INC.  See this
> page:
> > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > for more details.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > Sent: Monday, February 13, 2006 12:32 PM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > Hi, Chris,
> > I do have different versions of bioperl on my Linux machine (1.4. and
> > 1.5.0), this may be the problem. Should I just install bioperl-1.5.1 or I
> > need to uninstall and remove the previous versions. I could not find any
> > hint on uninstalling bioperl on linux. Could you please give me some
> > suggestion?
> > Thanks,
> > Guojun
> > > > Department of Plant Biology
> > University of Georgia
> >       _____
> > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > > > > > If you're using RemoteBlast 1.28, then you've likely updated from CVS
> > which isn't the latest fix.
> > > > Make sure that you check the following:
> > > > 1) Always post to the mailing list:
> > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live (CVS)
> > installed first.  Perform a clean installation; do not upgrade only
> > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we can't
> > guarantee that mixing modules from old and new distributions (1.4 and
> > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > installation will allow text output from BLAST v.2.2.12 to be saved and
> > parsed; it will not parse the newest BLAST text output from NCBI (v2.2.13)
> > but it should still save it. I believe as long as next_results() isn't
> > called, it will work.
> > > > 3) The bug fixes for the above issue with parsing BLAST 2.2.13 text output
> > are NOT in CVS; they haven't been cleared and checked in by Roger Hall
> > (who's now taking care of RemoteBlast) and the powers that be (Jason or
> > whomever is in charge of Bio::SearchIO).  They can be found in Bugzilla:
> > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the option of
> > saving XML output, so isn't necessary if you don't plan on using this
> > option.  And, remember, they haven't been committed yet to CVS, which
> > means that the final version will change to refle the new version.
> > > > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > >     _____
> > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > Sent: Monday, February 13, 2006 9:26 AM
> > To: Chris Fields
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > > > Hi, Chris
> > > > Thanks for your suggestion, however, it doesn't seem to work for my cgi
> > even after I replace both blast.pm and RemoteBlast.pm. I didn't even get
> > any RID. Is there any suggestion?
> > > > > > > > Guojun
> > > > > > Guojun Yang
> > Department of Plant Biology
> > University of Georgia
> > Tel: 706-542-1857
> > Fax: 706-542-1805
> > http://www.arches.uga.edu/~guojun
> >     _____
> > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm version
> > 1.28
> > > > I would say give the new code a try, but realize that it hasn't been
> > checked
> > in (like I said below). I will try going over the modified
> > Bio::SearchIO::blast again this weekend to see if there is anything I
> > might
> > have missed. The changed order in the header of BLAST text output has me a
> > bit worried that it might not catch everything, but it at least doesn't
> > hang
> > in the while() loop I described in the bug report below (bug #1934) and
> > seems to process everything fine.
> > > > If you want more stability in the code, you might consider changing over
> > to
> > XML output and parsing with Bio::SearchIO::blastxml. There are some
> > changes
> > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate saving XML
> > output, but I believe it parses everything regardless. If you look back
> > the
> > last month or so there has been a bit of discussion here about it. Jason
> > describes a bit on how to set up RemoteBlast for XML:
> > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-remoteblast/
> > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > Sent: Friday, February 03, 2006 1:45 PM
> > > To: bioperl-l at bioperl.org
> > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm version 1.28
> > >
> > > Hi, Everybody,
> > > I see this post and am wondering if this is the reason for the
> > > malfunctionning of my webserver. We set up a webserver named MAK, for
> > MITE
> > > sequence analysis. It was working very well until around November 2005,
> > > when it stopped returning any result (the site is fine and seems to be
> > > doing sth after submission). In the CGI script, I used remoteblast (that
> > > work was done in 2003) to do searches. I currently do not have access to
> > > the server because I moved. Quite several people sent emails to us about
> > > its malfunctioning. Is there any suggestion on fixing the problem?
> > Should
> > > I simplily ask the remoteblast.pm be replaced with the new version?
> > > Thanks a lot,
> > > Guojun
> > >
> > > Department of Plant Biology
> > > University of Georgia
> > > Tel: 706-542-1857
> > > Fax: 706-542-1805
> > > http://www.arches.uga.edu/~guojun
> > > _____
> > >
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang Jian'
> > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l' [mailto:bioperl-
> > > l at bioperl.org]
> > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >
> > > Like Nagesh says, try the latest RemoteBlast from bioperl-live CVS. It
> > > will
> > > work for saving text output. However, it will not parse anything using
> > > next_result (it will likely hang) and will not save XML format. See
> > these
> > > bugs:
> > >
> > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >
> > > for explanations and possible fixes (changes to RemoteBlast and
> > > Bio::SearchIO::blast). Note that these haven't been checked in yet so
> > are
> > > still not included in bioperl-live; they may be further modified before
> > > committing to CVS. If you're not worried about XML, you could just try
> > the
> > > first fix, which is a change to SearchIO::blast.
> > >
> > > Nagesh, I remember you posting to the list a month ago using a script
> > > which
> > > had problems; the script you used saves the output but doesn't actually
> > > parse it (i.e. you don't use next_result() to go through the data). Is
> > the
> > > version of BLAST in your text output 2.2.12 or 2.2.13? Have you tried
> > > parsing the output using "-readmethod => SearchIO" or "-readmethod =>
> > > blast"
> > > using your version of RemoteBlast and method next_result()? Like below
> > > (from
> > > perldoc):
> > >
> > > while ( my @rids = $factory->each_rid ) {
> > > foreach my $rid ( @rids ) {
> > > my $rc = $factory->retrieve_blast($rid);
> > > if( !ref($rc) ) {
> > > if( $rc < 0 ) {
> > > $factory->remove_rid($rid);
> > > }
> > > print STDERR "." if ( $v > 0 );
> > > sleep 5;
> > > } else { # parsing
> > > starts here
> > > my $result = $rc->next_result(); # it should hang
> > > here
> > > #save the output
> > > my $filename = $result->query_name()."\.out";
> > > $factory->save_output($filename);
> > > $factory->remove_rid($rid);
> > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > while ( my $hit = $result->next_hit ) {
> > > next unless ( $v > 0);
> > > print "\thit name is ", $hit->name, "\n";
> > > while( my $hsp = $hit->next_hsp ) {
> > > print "\t\tscore is ", $hsp->score, "\n";
> > > }
> > > }
> > > }
> > > }
> > > }
> > > }
> > >
> > >
> > > My script hanged if I used next_result() in any way prior to the fixes.
> > I
> > > want to see how many others are having the same issues with parsing
> > using
> > > the CVS version of bioperl-live.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > To: Huang Jian; bioperl-l
> > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >
> > > > Hi Huang,
> > > > Thanks for the message. The older version of RemoteBlast.pm works on
> > the
> > > > logic of checking the temporary file size to determine whether the
> > Blast
> > > > results are ready. This condition is not getting satisfied may be due
> > to
> > > > some changes brought about by NCBI. I had this problem recently and
> > > > figured out that the solution was to use the latest version which has
> > > > this problem fixed (does not use file size logic any more) which is
> > not
> > > > yet included in the BioPerl package.
> > > > Cheers
> > > > Nagesh
> > > >
> > > > Huang Jian wrote:
> > > >
> > > > > Dear Nagesh,
> > > > >
> > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28 you send
> > > > > me. Now it works perfectly!!!
> > > > >
> > > > > Thank you!!
> > > > >
> > > > > Huang
> > > > >
> > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > 
> > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > 
> > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the net, so still
> > > > > via email
> > > > >
> > > > >
> > > > >> Hi Huang,
> > > > >> I see that you are submitting a sequence for a remote blast search.
> > > Can
> > > > >> you check if the RemoteBlast.pm being used is v 1.28 (2005/12/09).
> > If
> > > > >> not I have attached it with this email, try to replace it with the
> > > old
> > > > >> one which has a bug.
> > > > >> Let me know if it works.
> > > > >> Nagesh
> > > > >
> > > > >
> > > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 



From akarger at CGR.Harvard.edu  Mon Feb 13 20:57:08 2006
From: akarger at CGR.Harvard.edu (Amir Karger)
Date: Mon, 13 Feb 2006 15:57:08 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
Message-ID: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>

I'm trying to get the sequences of each exon in a gene. I have a genbank
file with mRNA and exon features (among others) that look like: 
     mRNA            join(complement(22257..22386),complement(22067..22186),
                     complement(16753..17101),complement(13840..13962),
                     complement(10649..10820),complement(502..3028))
                     /gene="ENSG00000005812"
                     /note="transcript_id=ENST00000355619"
     exon            complement(13840..13962)
                     /note="exon_id=ENSE00000802462"

I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
the mRNA above. I tried writing the below code, but it doesn't do what I
want. (You'll note that the code is stolen from the Bio::Seq and Feature
HOWTOs.)

my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
while (my $seq = $inseq->next_seq) {
    my @features = $seq->get_SeqFeatures(); # just top level
    foreach my $feat ( @features ) {
        my $type = $feat->primary_tag;
        if ($type eq "mRNA") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
                my @feats = $feat->get_SeqFeatures();
                print "Found ", scalar @feats, " sub-features\n";
        } elsif ($type eq "exon") {
                print "Feature ",$feat->primary_tag,
                      " starts ",$feat->start," ends ", $feat->end,
                      " strand ",$feat->strand,"\n";
        }
     }
}

When I run the above, it says that the mRNA features have no sub-features.
So how do I pull out the 6 sequences?

Thanks,
- Amir Karger
Computational Biology Group
Bauer Center for Genomics Research
Harvard University
617-496-0626


From cjfields at uiuc.edu  Mon Feb 13 23:18:24 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 13 Feb 2006 17:18:24 -0600
Subject: [Bioperl-l] INSTALL.WIN in wiki
Message-ID: <000001c630f3$c9efa5f0$15327e82@pyrimidine>

I just added "Installing Bioperl on Windows" to the wiki.  It needs some
major updating and changes in formatting:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

Jason has mentioned changing up some of the INSTALL docs for the wiki
(http://www.bioperl.org/wiki/Talk:Getting_BioPerl).  Any thoughts?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From osborne1 at optonline.net  Tue Feb 14 01:38:30 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 13 Feb 2006 20:38:30 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
Message-ID: 

Amir,

The idea is to look at the sub-locations in the SplitLocation object, this
is discussed in FAQ 5.2:

http://www.bioperl.org/wiki/FAQ#How_do_I_parse_the_CDS_join_or_complement_st
atements_in_GenBank_or_EMBL_files_to_get_the_sub-locations.3F

The sequence of the feature itself can be obtained by using the entire_seq()
method:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#Getting_Sequences


Brian O.


On 2/13/06 3:57 PM, "Amir Karger"  wrote:

> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
> 
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
> 
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
> 
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
> 
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From hlapp at gmx.net  Mon Feb 13 23:58:46 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 13 Feb 2006 15:58:46 -0800
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
References: <339D68B133EAD311971E009027DC47970423A24A@MONTECARLO>
Message-ID: 

Why you want subfeatures? This is genbank format you're parsing,
right? Your mRNA features will have a split location. Loop over
$feat->location->each_Location() and get $seq->subseq() with the start
and end of each sublocation. If you don't know how to do this check
out the implementation of $feature->splice_seq().

This should be in the HOWTO. Is it not?

    -hilmar


On 2/13/06, Amir Karger  wrote:
> I'm trying to get the sequences of each exon in a gene. I have a genbank
> file with mRNA and exon features (among others) that look like:
>      mRNA            join(complement(22257..22386),complement(22067..22186),
>                      complement(16753..17101),complement(13840..13962),
>                      complement(10649..10820),complement(502..3028))
>                      /gene="ENSG00000005812"
>                      /note="transcript_id=ENST00000355619"
>      exon            complement(13840..13962)
>                      /note="exon_id=ENSE00000802462"
>
> I want to make a FASTA file with 6 sequences corresponding to the 6 exons in
> the mRNA above. I tried writing the below code, but it doesn't do what I
> want. (You'll note that the code is stolen from the Bio::Seq and Feature
> HOWTOs.)
>
> my $inseq = Bio::SeqIO->new(-file   => "<$file", -format => $format );
> while (my $seq = $inseq->next_seq) {
>     my @features = $seq->get_SeqFeatures(); # just top level
>     foreach my $feat ( @features ) {
>         my $type = $feat->primary_tag;
>         if ($type eq "mRNA") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>                 my @feats = $feat->get_SeqFeatures();
>                 print "Found ", scalar @feats, " sub-features\n";
>         } elsif ($type eq "exon") {
>                 print "Feature ",$feat->primary_tag,
>                       " starts ",$feat->start," ends ", $feat->end,
>                       " strand ",$feat->strand,"\n";
>         }
>      }
> }
>
> When I run the above, it says that the mRNA features have no sub-features.
> So how do I pull out the 6 sequences?
>
> Thanks,
> - Amir Karger
> Computational Biology Group
> Bauer Center for Genomics Research
> Harvard University
> 617-496-0626
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From osborne1 at optonline.net  Tue Feb 14 02:11:33 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 13 Feb 2006 21:11:33 -0500
Subject: [Bioperl-l] Pulling exons out of a Genbank mRNA
In-Reply-To: 
Message-ID: 

Hilmar,

It could be spelled out a bit more explicitly.

Brian O.


On 2/13/06 6:58 PM, "Hilmar Lapp"  wrote:

> This should be in the HOWTO. Is it not?




From rmb32 at cornell.edu  Mon Feb 13 22:12:10 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Mon, 13 Feb 2006 17:12:10 -0500
Subject: [Bioperl-l] game xml SeqIO
Message-ID: <43F1043A.2000205@cornell.edu>

Hi all,

Currently, the SeqIO for doing GAME XML does not seem to support writing 
(or reading?)  elements.  Am I correct?

If I am, are there any plans to add this functionality?  Can I help / do it?

If there are plans to add this, how would one distinguish SeqFeatures 
that should be rendered as  from SeqFeatures 
that should be rendered as ?  Would we do that with 
Bio::SeqFeature::Computation?  I assume that a given Seq can have 
SeqFeatures of different types associated with it (I don't know, I'm a 
bioperl newb).

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 607-255-2360
rmb32 at cornell.edu
http://www.sgn.cornell.edu




From heikki at sanbi.ac.za  Tue Feb 14 06:59:29 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 14 Feb 2006 08:59:29 +0200
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602100906.11885.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
Message-ID: <200602140859.30136.heikki@sanbi.ac.za>

I've committed an interim solution to the sequence evolution problem:

    $newseq = Bio::SeqUtils-> evolve
        ($seq, $similarity, $transition_transversion_rate);

I will go on to transform this code to fully OO, extensible solution.

   -Heikki


On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> Ryan Golhar's mail got me thinking that we should have a simple framework
> for mutating sequences to a desired level. The model can then be extended
> to necessary complexity when needed by subclassing.
>
> To start with, I have been planning:
>
>
> Bio::SeqEvolution::EvolutionI - interface file
> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>         (defaults to Bio::PrimarySeq)
> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>        - returns an array of $count seqs
> Bio::SeqEvolution::EvolutionI::_generate_seq()
> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>       converteed to probabilites of change internally
>
>   various methods to define the extent of divergence:
>   only one to start with:
> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>    (= 100% - identity)
>
> Bio::SeqEvolution::Factory - core class to call,
>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>
>
> Bio::SeqEvolution::DNASimple - default for nucleotides
> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>         e.g. 5 => 5:1, defaults to 1:1
>         simple alternative to a scoring matrix
>
>
> I am soliciting usual comments and suggestions about naming and minimal
> functionality.
>
>
>    -Heikki

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From gbazykin at Princeton.EDU  Tue Feb 14 14:34:54 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Tue, 14 Feb 2006 09:34:54 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602140859.30136.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
Message-ID: <214316262.20060214093454@princeton.edu>

Hi,

Just a thought: I really think that in perspective, it would be nice
to be able to evolve the sequence along a tree of given shape. I think
PAML's "evolver" has this functionality. I've already been doing this
in my scripts, but I am not sure how to couple the tree and the
sequence data properly.

Yegor (George) Bazykin


------------------------------
Tuesday, February 14, 2006, 1:59:29 AM, you wrote:

> I've committed an interim solution to the sequence evolution problem:

>     $newseq = Bio::SeqUtils-> evolve
>         ($seq, $similarity, $transition_transversion_rate);

> I will go on to transform this code to fully OO, extensible solution.

>    -Heikki


> On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> Ryan Golhar's mail got me thinking that we should have a simple framework
>> for mutating sequences to a desired level. The model can then be extended
>> to necessary complexity when needed by subclassing.
>>
>> To start with, I have been planning:
>>
>>
>> Bio::SeqEvolution::EvolutionI - interface file
>> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>>         (defaults to Bio::PrimarySeq)
>> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>>        - returns an array of $count seqs
>> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>>       converteed to probabilites of change internally
>>
>>   various methods to define the extent of divergence:
>>   only one to start with:
>> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>>    (= 100% - identity)
>>
>> Bio::SeqEvolution::Factory - core class to call,
>>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>>
>>
>> Bio::SeqEvolution::DNASimple - default for nucleotides
>> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>>         e.g. 5 => 5:1, defaults to 1:1
>>         simple alternative to a scoring matrix
>>
>>
>> I am soliciting usual comments and suggestions about naming and minimal
>> functionality.
>>
>>
>>    -Heikki




From maximilianh at gmail.com  Tue Feb 14 10:11:42 2006
From: maximilianh at gmail.com (Maximilian Haeussler)
Date: Tue, 14 Feb 2006 11:11:42 +0100
Subject: [Bioperl-l] [BiO BB] Re:  Tool to mutate DNA sequence
In-Reply-To: <0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<0B84EE38-0BA5-4E56-B35F-C8CBAA342AC4@duke.edu>
Message-ID: <76f031ae0602140211n2a0bbf4fl@mail.gmail.com>

The tool ROSE also evolves sequences on a tree. There is a web
interface and downloadable source at
http://bibiserv.techfak.uni-bielefeld.de/rose/

Max

On 09/02/06, Jason Stajich  wrote:
> Depending on whether or not you want to use evolutionary realistic
> models...
> * evolver which comes with PAML lets you evolve sequences on a tree
> * SeqGen from Andrew Rambaut http://evolve.zoo.ox.ac.uk/software.html?
> id=seqgen
> also lets you do this
> I believe there are PISE interfaces to both of these at the pasteur
> bioweb site - http://bioweb.pasteur.fr/
>
> -jason
> On Feb 8, 2006, at 11:46 PM, Ryan Golhar wrote:
>
> > Does anyone know of tool to mutate a DNA sequence by a specified
> > amount?
> > For instance, say I have a DNA sequence 1000 bases long, and I want to
> > simulate mutations to make it 75% (or 80%, etc) similar to the
> > original.
> >
> >
> > Ryan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioinformatics.Org general forum  -  BiO_Bulletin_Board at bioinformatics.org
> https://bioinformatics.org/mailman/listinfo/bio_bulletin_board
>


--
Maximilian Haeussler,
CNRS Gif-sur-Yvette, Paris
tel: +33 6 12 82 76 16
icq: 3825815  -- msn: maximilian.haeussler at hpi.uni-potsdam.de
skype: maximilianhaeussler



From heikki at sanbi.ac.za  Tue Feb 14 16:09:27 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Tue, 14 Feb 2006 18:09:27 +0200
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <214316262.20060214093454@princeton.edu>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
	<214316262.20060214093454@princeton.edu>
Message-ID: <200602141809.28057.heikki@sanbi.ac.za>


Yegor,

Like you said, there are examples how it is done.. It should be possible to 
evolve sequences based on a rooted tree. You just walk the tree and evolve 
each sequence from its parent.  If there is  an agreement how the branch 
lengths get translated to  mutations, even that could be done. Do you have 
any suggestions?

	-Heikki



On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
> Hi,
>
> Just a thought: I really think that in perspective, it would be nice
> to be able to evolve the sequence along a tree of given shape. I think
> PAML's "evolver" has this functionality. I've already been doing this
> in my scripts, but I am not sure how to couple the tree and the
> sequence data properly.
>
> Yegor (George) Bazykin
>
>
> ------------------------------
>
> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
> > I've committed an interim solution to the sequence evolution problem:
> >
> >     $newseq = Bio::SeqUtils-> evolve
> >         ($seq, $similarity, $transition_transversion_rate);
> >
> > I will go on to transform this code to fully OO, extensible solution.
> >
> >    -Heikki
> >
> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> >> Ryan Golhar's mail got me thinking that we should have a simple
> >> framework for mutating sequences to a desired level. The model can then
> >> be extended to necessary complexity when needed by subclassing.
> >>
> >> To start with, I have been planning:
> >>
> >>
> >> Bio::SeqEvolution::EvolutionI - interface file
> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
> >>         (defaults to Bio::PrimarySeq)
> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
> >>        - returns an array of $count seqs
> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
> >>       converteed to probabilites of change internally
> >>
> >>   various methods to define the extent of divergence:
> >>   only one to start with:
> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
> >>    (= 100% - identity)
> >>
> >> Bio::SeqEvolution::Factory - core class to call,
> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
> >>
> >>
> >> Bio::SeqEvolution::DNASimple - default for nucleotides
> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
> >>         e.g. 5 => 5:1, defaults to 1:1
> >>         simple alternative to a scoring matrix
> >>
> >>
> >> I am soliciting usual comments and suggestions about naming and minimal
> >> functionality.
> >>
> >>
> >>    -Heikki
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From golharam at umdnj.edu  Tue Feb 14 17:01:38 2006
From: golharam at umdnj.edu (Ryan Golhar)
Date: Tue, 14 Feb 2006 12:01:38 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za>
Message-ID: <016401c63188$52c9d4b0$2f01a8c0@GOLHARMOBILE1>

Here are my two cents....

1.  Allow sequences to be mutated by some percent amount.
2.  Use mutation patterns implied by PAM matrices or some known models
of mutation.
3.  Have the output show the original sequences and the mutated sequence
so you can easily identify what was mutated and what is conserved.

Ryan


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Heikki
Lehvaslaiho
Sent: Tuesday, February 14, 2006 11:09 AM
To: bioperl-l at lists.open-bio.org; Georgii A Bazykin
Subject: Re: [Bioperl-l] planning sequence mutating modules



Yegor,

Like you said, there are examples how it is done.. It should be possible
to 
evolve sequences based on a rooted tree. You just walk the tree and
evolve 
each sequence from its parent.  If there is  an agreement how the branch

lengths get translated to  mutations, even that could be done. Do you
have 
any suggestions?

	-Heikki



On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
> Hi,
>
> Just a thought: I really think that in perspective, it would be nice 
> to be able to evolve the sequence along a tree of given shape. I think

> PAML's "evolver" has this functionality. I've already been doing this 
> in my scripts, but I am not sure how to couple the tree and the 
> sequence data properly.
>
> Yegor (George) Bazykin
>
>
> ------------------------------
>
> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
> > I've committed an interim solution to the sequence evolution 
> > problem:
> >
> >     $newseq = Bio::SeqUtils-> evolve
> >         ($seq, $similarity, $transition_transversion_rate);
> >
> > I will go on to transform this code to fully OO, extensible 
> > solution.
> >
> >    -Heikki
> >
> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
> >> Ryan Golhar's mail got me thinking that we should have a simple 
> >> framework for mutating sequences to a desired level. The model can 
> >> then be extended to necessary complexity when needed by 
> >> subclassing.
> >>
> >> To start with, I have been planning:
> >>
> >>
> >> Bio::SeqEvolution::EvolutionI - interface file
> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
> >>         (defaults to Bio::PrimarySeq)
> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by 
> >> subclasses
> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
> >>        - returns an array of $count seqs
> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
> >>       converteed to probabilites of change internally
> >>
> >>   various methods to define the extent of divergence:
> >>   only one to start with:
> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
> >>    (= 100% - identity)
> >>
> >> Bio::SeqEvolution::Factory - core class to call,
> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for 
> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution
model,
> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
> >>
> >>
> >> Bio::SeqEvolution::DNASimple - default for nucleotides 
> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
> >>         e.g. 5 => 5:1, defaults to 1:1
> >>         simple alternative to a scoring matrix
> >>
> >>
> >> I am soliciting usual comments and suggestions about naming and 
> >> minimal functionality.
> >>
> >>
> >>    -Heikki
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org 
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hjm at tacgi.com  Tue Feb 14 17:15:11 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Tue, 14 Feb 2006 09:15:11 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
Message-ID: <200602140915.11604.hjm@tacgi.com>

Hi Brian,

Thanks very much for the pointers and the speed of your reply and apologies 
for the speed of mine.

This looks good, but what I was looking for was a bioP approach for hooking to 
an API at NCBI or EBI so I could get this info and seqs from them.  In this 
case, speed of retrieval is not critical and I'd rather not download the 
entirety of the sequences to a local disk to hack at them.

I've determined a screen-scraping approach to get them and could script that, 
but I thought that bioP had a method for using NCBI's external API's, tho it 
may be that my memory is faulty or the approach is no longer supported due to 
overload.  

Does NCBI make such APIs available anymore?  I searched a bit for docs on them 
but couldn't find anything (unless it's buried in the NCBI tookit, which I 
haven't started to excavate).

Failing that, would SEALS provide such a service? Any PerlPinipeds listening?

Harry






On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> Harry,
>
> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
> from its documentation:
>
>   use Bio::DB::Fasta;
>
>   # create database from directory of fasta files
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   # simple access (for those without Bioperl)
>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>   my @ids     = $db->ids;
>   my $length   = $db->length('CHROMOSOME_I');
>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>   my $header   = $db->header('CHROMOSOME_I');
>
>   # Bioperl-style access
>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>
>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>   my $seq     = $obj->seq;
>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>
> Do you already have the offsets?
>
> Brian O.
>
> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > Hi All,
> >
> > After perusing the tutorial and other docs for a an evening, I still
> > can't find the answer to this.  Forgive me if I've missed something
> > obvious.
> >
> > This should not be a novel request, but I've not found it answered.  If
> > bioperl isn't the best way to do this, I'd be grateful to a pointer to a
> > better way, especially if it includes an illuminating bit of code.
> >
> > The problem is to retrieve genomic sequences plus & minus some offset
> > from a locus determined by HUGO keyword or GeneID.  This would be a
> > common followup chore for some extra analysis from a gene expression
> > expt.  Or maybe this is in the DBFetch routines, but I've missed the
> > sequence type to specify...?
> >
> >
> > TIA!

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From jason.stajich at duke.edu  Tue Feb 14 18:25:21 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 14 Feb 2006 13:25:21 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
References: 
	<200602140915.11604.hjm@tacgi.com>
Message-ID: <13B3724F-3716-4C4B-95A7-6849EF167A80@duke.edu>

Are you working spp that are in Ensembl?  Is what you need not  
provided by Ensembl/EnsMart? Seems like they are doing the best job  
integrating gene ids to a central place.

It is not exactly clear what API you are referring to - you can query  
Entrez via Bio::DB::Query::GenBank so if you can construct your query  
via the Entrez syntax you can access and retrieve it in bioperl.

-jason
On Feb 14, 2006, at 12:15 PM, Harry Mangalam wrote:

> Hi Brian,
>
> Thanks very much for the pointers and the speed of your reply and  
> apologies
> for the speed of mine.
>
> This looks good, but what I was looking for was a bioP approach for  
> hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.   
> In this
> case, speed of retrieval is not critical and I'd rather not  
> download the
> entirety of the sequences to a local disk to hack at them.
>
> I've determined a screen-scraping approach to get them and could  
> script that,
> but I thought that bioP had a method for using NCBI's external  
> API's, tho it
> may be that my memory is faulty or the approach is no longer  
> supported due to
> overload.
>
> Does NCBI make such APIs available anymore?  I searched a bit for  
> docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit,  
> which I
> haven't started to excavate).
>
> Failing that, would SEALS provide such a service? Any PerlPinipeds  
> listening?
>
> Harry
>
>
>
>
>
>
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>>
>> Hope you're doing well. The approach could be based on  
>> Bio::DB::Fasta. So,
>> from its documentation:
>>
>>   use Bio::DB::Fasta;
>>
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>>
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>
>> Do you already have the offsets?
>>
>> Brian O.
>>
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>>
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>>
>>> This should not be a novel request, but I've not found it  
>>> answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a  
>>> pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>>
>>> The problem is to retrieve genomic sequences plus & minus some  
>>> offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>>
>>>
>>> TIA!
>
> -- 
> Cheers, Harry
> Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com
>             <>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Tue Feb 14 18:40:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 12:40:31 -0600
Subject: [Bioperl-l] FW:  more on RemoteBlast.pm version 1.2
Message-ID: <000e01c63196$225159d0$15327e82@pyrimidine>

Sorry, forgot to add that I didn't see the regex issue that you mentioned.
It could be a perl-related issue.  Try the fixes I mentioned and see what
happens.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Tuesday, February 14, 2006 12:36 PM
> To: 'gyang at plantbio.uga.edu'
> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> 
> It's a good habit to always add single quotes around words.  The perl
> interpreter may think a single bare word is a subroutine or perlfunc
> called with no args so will try to find a subroutine named blastp().  My
> debugger actually gives the error that the bare word blastp may conflict
> with a future reserved word.  Like you said, 'use strict' will point that
> out.
> 
> As for the regex, it should match all the blast programs at NCBI (blastp,
> blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> else passes through.
> 
> So, if you are using the script below, there are several errors.  The bare
> words for $prog and $db need quotes, and the flags for you @params array
> don't have a dash before them.  I get this after adding quotes but before
> adding the dashes to @params:
> 
> C:\Perl\Scripts>test_blast.pl
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG:
> STACK: Error::throw
> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> live/Bio/Root/Root.pm:328
> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> live/Bio/Tools/Run/RemoteBlast.pm:256
> STACK: C:\Perl\Scripts\test_blast.pl:15
> -----------------------------------------------------------
> 
> The last line indicates a problem with this line:
> 
> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> 
> Changing the @params to this:
> 
> my @params=( -prog=>$prog,
> 	-data=>$db,
> 	-expect=>$e_val,
> 	-readmethod=>'SearchIO');
> 
> fixes it, and I get output as expected.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > Sent: Tuesday, February 14, 2006 11:48 AM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >
> > Hi, Chris,
> > When I tried with the perldoc script, It did not work either. First it
> > says $prog can not be bare word if I "use strict". I added quotes on the
> > words, then it says the value for $prog does not match expression
> > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> script
> > is shown below. Why is the expression "t?blast[pnx]"?
> >
> > #!/usr/bin/perl
> >
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use Bio::Tools::Run::RemoteBlast;
> > use Bio::SearchIO;
> >
> >
> > my $prog=blastp;
> > my $db=swissprot;
> > my $e_val=1e-10;
> > my @params=( prog=>$prog,
> > 	data=>$db,
> > 	expect=>$e_val,
> > 	readmethod=>'SearchIO');
> > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >
> > my $v = 1;
> >
> > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >
> > while (my $input = $str->next_seq()){
> >   #Blast a sequence against a database:
> >   #Alternatively, you could  pass in a file with many
> >   #sequences rather than loop through sequence one at a time
> >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> >   #and swap the two lines below for an example of that.
> >   my $r = $factory->submit_blast($input);
> >   #my $r = $factory->submit_blast('amino.fa');
> >   print STDERR "waiting..." if( $v > 0 );
> >   while ( my @rids = $factory->each_rid ) {
> >     foreach my $rid ( @rids ) {
> >       my $rc = $factory->retrieve_blast($rid);
> >       if( !ref($rc) ) {
> >         if( $rc < 0 ) {
> >           $factory->remove_rid($rid);
> >         }
> >         print STDERR "." if ( $v > 0 );
> >         sleep 5;
> >       } else {
> >         my $result = $rc->next_result();
> >         #save the output
> >         my $filename = $result->query_name()."\.out";
> >         $factory->save_output($filename);
> >         $factory->remove_rid($rid);
> >         print "\nQuery Name: ", $result->query_name(), "\n";
> >         while ( my $hit = $result->next_hit ) {
> >           next unless ( $v > 0);
> >           print "\thit name is ", $hit->name, "\n";
> >           while( my $hsp = $hit->next_hsp ) {
> >             print "\t\tscore is ", $hsp->score, "\n";
> >           }
> >         }
> >       }
> >     }
> >   }
> > }
> >
> > Thank you for your help!
> >
> >
> > Guojun
> > Department of Plant Biology
> > University of Georgia
> >
> > ----- Original Message -----
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > To: gyang at plantbio.uga.edu
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >
> >
> > > Try two things:
> > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > wrong
> > > with the logic in your subroutine:
> > > > my $v = 1;
> > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > > while (my $input = $str->next_seq()){
> > >   #Blast a sequence against a database:
> > >   #Alternatively, you could  pass in a file with many
> > >   #sequences rather than loop through sequence one at a time
> > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >   #and swap the two lines below for an example of that.
> > >   my $r = $factory->submit_blast($input);
> > >   #my $r = $factory->submit_blast('amino.fa');
> > >   print STDERR "waiting..." if( $v > 0 );
> > >   while ( my @rids = $factory->each_rid ) {
> > >     foreach my $rid ( @rids ) {
> > >       my $rc = $factory->retrieve_blast($rid);
> > >       if( !ref($rc) ) {
> > >         if( $rc < 0 ) {
> > >           $factory->remove_rid($rid);
> > >         }
> > >         print STDERR "." if ( $v > 0 );
> > >         sleep 5;
> > >       } else {
> > >         my $result = $rc->next_result();
> > >         #save the output
> > >         my $filename = $result->query_name()."\.out";
> > >         $factory->save_output($filename);
> > >         $factory->remove_rid($rid);
> > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > >         while ( my $hit = $result->next_hit ) {
> > >           next unless ( $v > 0);
> > >           print "\thit name is ", $hit->name, "\n";
> > >           while( my $hsp = $hit->next_hsp ) {
> > >             print "\t\tscore is ", $hsp->score, "\n";
> > >           }
> > >         }
> > >       }
> > >     }
> > >   }
> > > }
> > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> really
> > > shouldn't make that much of a difference, but I noticed that the CVS
> > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > released; the Bugzilla version is based off CVS.
> > > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > To: bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > Thanks, Chris,
> > > > I installed version 1.5.1 and replaced the blast.pm file with the
> one
> > from
> > > > your bug report. The running version is 1.5 when I use the command
> you
> > > > sent me. But when I tried the script, it doesn't change much. My
> > > > remoteblast code (portion) is here:
> > > > > > sub search {
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > local
> > > >
> $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > 'no';
> > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > 			      -id=>"query",
> > > > 			      -desc=>"new seq");
> > > > my $len=$query->length();
> > > > @db=('nr','htgs','wgs');
> > > > foreach my $db (@db) {
> > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> > > > 						'-data' =>"$db",
> > > >
'-expect'=>"$E_value");
> > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > my @rids = $factory->each_rid();
> > > > foreach my $rid ( @rids ) {
> > > >     print STDERR "$rid\n";
> > > > }
> > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > print STDERR "waiting...";
> > > > sleep 60;
> > > > > > foreach my $rid ( @rids ) {
> > > >     my $rc = $factory->retrieve_blast($rid);
> > > >     while (!ref($rc) ) {
> > > > 	if( $rc < 0 ) {
> > > > # retrieve_blast returns -1 on error
> > > > 	    $factory->remove_rid($rid);
> > > > 	    print "Error!\n";
> > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > 	    die "Can't retrieve $rid";
> > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> finished'
> > > > 	    sleep 60;
> > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > 	}
> > > >     }
> > > >     if (ref($rc)) {
> > > > 	print STDERR "Done.\n";
> > > > 	 while( my $result = $rc->next_result) {
> > > > 	    while( my $hit = $result->next_hit()) {
> > > > 	    	$hit_name=$hit->name;
> > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > 		$name=$1;
> > > > 		@left_plus_start=();
> > > > 		@left_plus_end=();
> > > > 		@left_minus_start=();
> > > > 		@left_minus_end=();
> > > > 		@right_plus_start=();
> > > > 		@right_plus_end=();
> > > > 		@right_minus_start=();
> > > > 		@right_minus_end=();
> > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > ......
> > > > > > It was working quite well before around October laster year, but
> > it has
> > > > stopped since then, When a submission is sent via a webpage, the cgi
> > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > finally
> > > > the expected email is received but without real results although it
> > does
> > > > contain something from other parts of the script. Apparently the
> > search
> > > > sub did not return anything (I know there is something should be
> > > > returned.). Is it also possible the format of the NCBI output for
> each
> > > > result has changed?
> > > > Thank you,
> > > > Guojun
> > > > > > > > Department of Plant Biology
> > > > University of Georgia
> > > > > > > > > > ----- Original Message -----
> > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > How do you know two versions are installed (i.e. how are
> you
> > checking
> > > > the
> > > > > version)?  Do you see have two complete bioperl distributions (in
> > two
> > > > > separate directories) or are you looking in modules?  Here's the
> way
> > to
> > > > > check the version (from the FAQ):
> > > > > > perl -MBio::Root::Version -e 'print
> > $Bio::Root::Version::VERSION,"\n"'
> > > > > > If you have two full bioperl distributions on your computer,
> > normally
> > > > only
> > > > > one will be in use unless you have explicitly set the environment
> > > > variable
> > > > > PERL5LIB.  The PERL5LIB  directories will be searched first before
> > your
> > > > > normal perl directory list (@INC) is searched.  You MAY get some
> > mixing
> > > > > then, but only if perl can't find a particular module in the path
> > > > designated
> > > > > in PERL5LIB; then it will progress through the directories listed
> in
> > > > @INC.
> > > > > This may happen if a module is unique to a particular release, but
> > > > shouldn't
> > > > > happen for the majority of modules, including RemoteBlast.  You
> can
> > > > check
> > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > differ
> > > > > depending on your OS, perl build, etc.
> > > > > > Regardless, if you follow the directions for installing bioperl
> > for
> > > > your
> > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > unless
> > > > you
> > > > > explicitly change the installation directory when using 'perl
> > > > Makefile.PL'),
> > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > install
> > > > the
> > > > > Bioperl distribution you downloaded over the old version in @INC.
> > See
> > > > this
> > > > > page:
> > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > for more details.
> > > > > > Christopher Fields
> > > > > Postdoctoral Researcher - Switzer Lab
> > > > > Dept. of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > > > > > -----Original Message-----
> > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > Hi, Chris,
> > > > > > I do have different versions of bioperl on my Linux machine
> (1.4.
> > and
> > > > > > 1.5.0), this may be the problem. Should I just install bioperl-
> > 1.5.1
> > > > or I
> > > > > > need to uninstall and remove the previous versions. I could not
> > find
> > > > any
> > > > > > hint on uninstalling bioperl on linux. Could you please give me
> > some
> > > > > > suggestion?
> > > > > > Thanks,
> > > > > > Guojun
> > > > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > >       _____
> > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely
> > > > updated from CVS
> > > > > > which isn't the latest fix.
> > > > > > > > Make sure that you check the following:
> > > > > > > > 1) Always post to the mailing list:
> > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > (CVS)
> > > > > > installed first.  Perform a clean installation; do not upgrade
> > only
> > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> can't
> > > > > > guarantee that mixing modules from old and new distributions
> (1.4
> > and
> > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > > > > installation will allow text output from BLAST v.2.2.12 to be
> > saved
> > > > and
> > > > > > parsed; it will not parse the newest BLAST text output from NCBI
> > > > (v2.2.13)
> > > > > > but it should still save it. I believe as long as next_results()
> > isn't
> > > > > > called, it will work.
> > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> 2.2.13
> > > > text output
> > > > > > are NOT in CVS; they haven't been cleared and checked in by
> Roger
> > Hall
> > > > > > (who's now taking care of RemoteBlast) and the powers that be
> > (Jason
> > > > or
> > > > > > whomever is in charge of Bio::SearchIO).  They can be found in
> > > > Bugzilla:
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > option
> > > > of
> > > > > > saving XML output, so isn't necessary if you don't plan on using
> > this
> > > > > > option.  And, remember, they haven't been committed yet to CVS,
> > which
> > > > > > means that the final version will change to refle the new
> version.
> > > > > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > >     _____
> > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > To: Chris Fields
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > > > Hi, Chris
> > > > > > > > Thanks for your suggestion, however, it doesn't seem to work
> > for
> > > > my cgi
> > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > even
> > > > get
> > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > Guojun
> > > > > > > > > > Guojun Yang
> > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > > Tel: 706-542-1857
> > > > > > Fax: 706-542-1805
> > > > > > http://www.arches.uga.edu/~guojun
> > > > > >     _____
> > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > I would say give the new code a try, but realize that it
> > hasn't
> > > > been
> > > > > > checked
> > > > > > in (like I said below). I will try going over the modified
> > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > anything I
> > > > > > might
> > > > > > have missed. The changed order in the header of BLAST text
> output
> > has
> > > > me a
> > > > > > bit worried that it might not catch everything, but it at least
> > > > doesn't
> > > > > > hang
> > > > > > in the while() loop I described in the bug report below (bug
> > #1934)
> > > > and
> > > > > > seems to process everything fine.
> > > > > > > > If you want more stability in the code, you might consider
> > > > changing over
> > > > > > to
> > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are
> > some
> > > > > > changes
> > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > saving
> > > > XML
> > > > > > output, but I believe it parses everything regardless. If you
> look
> > > > back
> > > > > > the
> > > > > > last month or so there has been a bit of discussion here about
> it.
> > > > Jason
> > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > > remoteblast/
> > > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > -----Original Message-----
> > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > To: bioperl-l at bioperl.org
> > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > version
> > > > 1.28
> > > > > > >
> > > > > > > Hi, Everybody,
> > > > > > > I see this post and am wondering if this is the reason for the
> > > > > > > malfunctionning of my webserver. We set up a webserver named
> > MAK,
> > > > for
> > > > > > MITE
> > > > > > > sequence analysis. It was working very well until around
> > November
> > > > 2005,
> > > > > > > when it stopped returning any result (the site is fine and
> seems
> > to
> > > > be
> > > > > > > doing sth after submission). In the CGI script, I used
> > remoteblast
> > > > (that
> > > > > > > work was done in 2003) to do searches. I currently do not have
> > > > access to
> > > > > > > the server because I moved. Quite several people sent emails
> to
> > us
> > > > about
> > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > problem?
> > > > > > Should
> > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > version?
> > > > > > > Thanks a lot,
> > > > > > > Guojun
> > > > > > >
> > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > > Tel: 706-542-1857
> > > > > > > Fax: 706-542-1805
> > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > _____
> > > > > > >
> > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > Jian'
> > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> [mailto:bioperl-
> > > > > > > l at bioperl.org]
> > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > >
> > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > CVS.
> > > > It
> > > > > > > will
> > > > > > > work for saving text output. However, it will not parse
> anything
> > > > using
> > > > > > > next_result (it will likely hang) and will not save XML
> format.
> > See
> > > > > > these
> > > > > > > bugs:
> > > > > > >
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > >
> > > > > > > for explanations and possible fixes (changes to RemoteBlast
> and
> > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in
> > yet
> > > > so
> > > > > > are
> > > > > > > still not included in bioperl-live; they may be further
> modified
> > > > before
> > > > > > > committing to CVS. If you're not worried about XML, you could
> > just
> > > > try
> > > > > > the
> > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > >
> > > > > > > Nagesh, I remember you posting to the list a month ago using a
> > > > script
> > > > > > > which
> > > > > > > had problems; the script you used saves the output but doesn't
> > > > actually
> > > > > > > parse it (i.e. you don't use next_result() to go through the
> > data).
> > > > Is
> > > > > > the
> > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have
> you
> > > > tried
> > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > readmethod
> > > > =>
> > > > > > > blast"
> > > > > > > using your version of RemoteBlast and method next_result()?
> Like
> > > > below
> > > > > > > (from
> > > > > > > perldoc):
> > > > > > >
> > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > foreach my $rid ( @rids ) {
> > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > if( !ref($rc) ) {
> > > > > > > if( $rc < 0 ) {
> > > > > > > $factory->remove_rid($rid);
> > > > > > > }
> > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > sleep 5;
> > > > > > > } else { # parsing
> > > > > > > starts here
> > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > here
> > > > > > > #save the output
> > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > $factory->save_output($filename);
> > > > > > > $factory->remove_rid($rid);
> > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > next unless ( $v > 0);
> > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > > }
> > > > > > >
> > > > > > >
> > > > > > > My script hanged if I used next_result() in any way prior to
> the
> > > > fixes.
> > > > > > I
> > > > > > > want to see how many others are having the same issues with
> > parsing
> > > > > > using
> > > > > > > the CVS version of bioperl-live.
> > > > > > >
> > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > >
> > > > > > > > Hi Huang,
> > > > > > > > Thanks for the message. The older version of RemoteBlast.pm
> > works
> > > > on
> > > > > > the
> > > > > > > > logic of checking the temporary file size to determine
> whether
> > the
> > > > > > Blast
> > > > > > > > results are ready. This condition is not getting satisfied
> may
> > be
> > > > due
> > > > > > to
> > > > > > > > some changes brought about by NCBI. I had this problem
> > recently
> > > > and
> > > > > > > > figured out that the solution was to use the latest version
> > which
> > > > has
> > > > > > > > this problem fixed (does not use file size logic any more)
> > which
> > > > is
> > > > > > not
> > > > > > > > yet included in the BioPerl package.
> > > > > > > > Cheers
> > > > > > > > Nagesh
> > > > > > > >
> > > > > > > > Huang Jian wrote:
> > > > > > > >
> > > > > > > > > Dear Nagesh,
> > > > > > > > >
> > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > you
> > > > send
> > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > >
> > > > > > > > > Thank you!!
> > > > > > > > >
> > > > > > > > > Huang
> > > > > > > > >
> > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > 
> > > > > > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > > > > > 
> > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> net,
> > so
> > > > still
> > > > > > > > > via email
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >> Hi Huang,
> > > > > > > > >> I see that you are submitting a sequence for a remote
> blast
> > > > search.
> > > > > > > Can
> > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > (2005/12/09).
> > > > > > If
> > > > > > > > >> not I have attached it with this email, try to replace it
> > with
> > > > the
> > > > > > > old
> > > > > > > > >> one which has a bug.
> > > > > > > > >> Let me know if it works.
> > > > > > > > >> Nagesh
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > >
> > > > _______________________________________________
> > > > > > Bioperl-l mailing list
> > > > > > Bioperl-l at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > >
> > > > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >



From sdavis2 at mail.nih.gov  Tue Feb 14 20:02:59 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Tue, 14 Feb 2006 15:02:59 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

You can look get the upstream regions for genes via the table browser at
UCSC.  If you want to do it yourself, just download their refGene table (as
a tab-delimited text file) that includes the HUGO gene name.  Then, use the
method given by Brian to look up the locations.  The genome just isn't THAT
big to download and to store locally.  Note that most of the big sites (like
NCBI, for example) impose restrictions on the number and timing of hits, so
utilizing them for high-thoughput analysis (like for gene expression
studies) is not always feasible.  I have found that having the data locally
is almost always better.

Sean
 


On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From cjfields at uiuc.edu  Tue Feb 14 20:32:42 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 14 Feb 2006 14:32:42 -0600
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
Message-ID: <001201c631a5$ce7496f0$15327e82@pyrimidine>

Hilmar, 

Good News: I've added a section to the bioperl wiki on installing bioperl-db
in Windows:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Installing_bioperl
-db

Bad News:  There's a new problem now. I updated from CVS yesterday; I walked
through the steps and ran 'nmake test', with everything passing fine.
However, load_seqdatabase.pl is extremely slow; it's loading a sequence
every 5 minutes or so.  I noticed (when using '-debug') that it is hanging
up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a database,
load the biosql schema, and load sequences w/o loading taxonomy, the problem
goes away.

Here's the debugging output (I cut it off at the point it hangs up):
----------------------------------------------------------------------------
-------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -driver
mysql -namespace test -dbname biosql -dbuser root -dbpass ********** -format
genbank  -debug NP_252217.gpt
Loading NP_252217.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
attempting to load adaptor class for Bio::Seq
        attempting to load module Bio::DB::BioSQL::SeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqAdaptor
attempting to load adaptor class for Bio::Species
        attempting to load module Bio::DB::BioSQL::SpeciesAdaptor
instantiating adaptor class Bio::DB::BioSQL::SpeciesAdaptor
attempting to load adaptor class for Bio::Annotation::Collection
        attempting to load module Bio::DB::BioSQL::CollectionAdaptor
attempting to load adaptor class for Bio::Root::Root
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::Root::RootI
        attempting to load module Bio::DB::BioSQL::RootIAdaptor
        attempting to load module Bio::DB::BioSQL::RootAdaptor
attempting to load adaptor class for Bio::AnnotationCollectionI
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionIAdaptor
        attempting to load module
Bio::DB::BioSQL::AnnotationCollectionAdaptor
instantiating adaptor class Bio::DB::BioSQL::AnnotationCollectionAdaptor
attempting to load adaptor class for Bio::Annotation::TypeManager
        attempting to load module Bio::DB::BioSQL::TypeManagerAdaptor
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for Bio::Annotation::SimpleValue
        attempting to load module Bio::DB::BioSQL::SimpleValueAdaptor
instantiating adaptor class Bio::DB::BioSQL::SimpleValueAdaptor
attempting to load adaptor class for Bio::Annotation::Reference
        attempting to load module Bio::DB::BioSQL::ReferenceAdaptor
instantiating adaptor class Bio::DB::BioSQL::ReferenceAdaptor
attempting to load adaptor class for Bio::Annotation::Comment
        attempting to load module Bio::DB::BioSQL::CommentAdaptor
instantiating adaptor class Bio::DB::BioSQL::CommentAdaptor
attempting to load adaptor class for Bio::Annotation::DBLink
        attempting to load module Bio::DB::BioSQL::DBLinkAdaptor
instantiating adaptor class Bio::DB::BioSQL::DBLinkAdaptor
attempting to load adaptor class for Bio::PrimarySeq
        attempting to load module Bio::DB::BioSQL::PrimarySeqAdaptor
instantiating adaptor class Bio::DB::BioSQL::PrimarySeqAdaptor
attempting to load adaptor class for Bio::SeqFeature::Generic
        attempting to load module Bio::DB::BioSQL::GenericAdaptor
attempting to load adaptor class for Bio::SeqFeatureI
        attempting to load module Bio::DB::BioSQL::SeqFeatureIAdaptor
        attempting to load module Bio::DB::BioSQL::SeqFeatureAdaptor
instantiating adaptor class Bio::DB::BioSQL::SeqFeatureAdaptor
attempting to load adaptor class for Bio::Location::Simple
        attempting to load module Bio::DB::BioSQL::SimpleAdaptor
attempting to load adaptor class for Bio::Location::Atomic
        attempting to load module Bio::DB::BioSQL::AtomicAdaptor
attempting to load adaptor class for Bio::LocationI
        attempting to load module Bio::DB::BioSQL::LocationIAdaptor
        attempting to load module Bio::DB::BioSQL::LocationAdaptor
instantiating adaptor class Bio::DB::BioSQL::LocationAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load adaptor class for BioNamespace
        attempting to load module Bio::DB::BioSQL::BioNamespaceAdaptor
instantiating adaptor class Bio::DB::BioSQL::BioNamespaceAdaptor
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
attempting to load driver for adaptor class
Bio::DB::BioSQL::BioNamespaceAdaptor
attempting to load driver for adaptor class
Bio::DB::BioSQL::BasePersistenceAdaptor
Using Bio::DB::BioSQL::mysql::BasePersistenceAdaptorDriver as driver peer
for Bio::DB::BioSQL::BioNamespaceAdaptor
preparing UK select statement: SELECT biodatabase.biodatabase_id,
biodatabase.name, biodatabase.authority FROM biodatabase WHERE name = ?
BioNamespaceAdaptor: binding UK column 1 to "test" (namespace)
preparing INSERT statement: INSERT INTO biodatabase (name, authority) VALUES
(?, ?)
BioNamespaceAdaptor::insert: binding column 1 to "test" (namespace)
BioNamespaceAdaptor::insert: binding column 2 to "" (authority)
attempting to load driver for adaptor class Bio::DB::BioSQL::SpeciesAdaptor
Using Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver as driver peer for
Bio::DB::BioSQL::SpeciesAdaptor
preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND ncbi_taxon_id =
?
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)  
----------------------------------------------------------------------------
-------------------------

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From osborne1 at optonline.net  Tue Feb 14 21:32:42 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 14 Feb 2006 16:32:42 -0500
Subject: [Bioperl-l] game xml SeqIO
In-Reply-To: <43F1043A.2000205@cornell.edu>
Message-ID: 

Robert,

It looks like you're right that this data isn't handled by SeqIO/game. If
you'd like to add this then feel free to do it, the modified files or
patches can be submitted to bugzilla.bioperl.org. If you take this on then
please add a test or 2 to t/game.t as well.

Yes, Bio::SeqFeature::Computation sounds right - does it match the data
you're trying to parse? SeqFeature::Generic is the most commonly used, and
it's flexible, but if another type of SeqFeature fits your data more
precisely then that's the one you should use.

Brian O.


On 2/13/06 5:12 PM, "Robert Buels"  wrote:

> Hi all,
> 
> Currently, the SeqIO for doing GAME XML does not seem to support writing
> (or reading?)  elements.  Am I correct?
> 
> If I am, are there any plans to add this functionality?  Can I help / do it?
> 
> If there are plans to add this, how would one distinguish SeqFeatures
> that should be rendered as  from SeqFeatures
> that should be rendered as ?  Would we do that with
> Bio::SeqFeature::Computation?  I assume that a given Seq can have
> SeqFeatures of different types associated with it (I don't know, I'm a
> bioperl newb).
> 
> Rob




From saldroubi at yahoo.com  Wed Feb 15 03:54:42 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Tue, 14 Feb 2006 19:54:42 -0800 (PST)
Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix
Message-ID: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>

All,
 
 I am trying to use Bio::Matrix::GenericMatrix module.  
 I simply put this line in my program:
     use Bio::Matrix::GenericMatrix;
 
 but I get the followin error:
 
 Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: /usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 /usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18.
 BEGIN failed--compilation aborted at sf.pl line 18.
 
 I found this module using find which is called Generic.pm in this directory
     /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix
 
 Could someone tell me why it is not working.  I have no trouble including these modules in my file.  
     use Bio::SeqIO;
     use Bio::DB::GenBank;
 
 Thank you. 
 
   

Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From jason.stajich at duke.edu  Wed Feb 15 04:10:56 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 14 Feb 2006 23:10:56 -0500
Subject: [Bioperl-l] Error using Bio::Matrix::GenericMatrix
In-Reply-To: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>
References: <20060215035442.45215.qmail@web34313.mail.mud.yahoo.com>
Message-ID: 

try:
use Bio::Matrix::Generic;

Apparently I screwed up the SYNOPSIS.  fixed that just now.

-jason
On Feb 14, 2006, at 10:54 PM, Sam Al-Droubi wrote:

> All,
>
>  I am trying to use Bio::Matrix::GenericMatrix module.
>  I simply put this line in my program:
>      use Bio::Matrix::GenericMatrix;
>
>  but I get the followin error:
>
>  Can't locate Bio/Matrix/GenericMatrix.pm in @INC (@INC contains: / 
> usr/lib/perl5/5.8.6/i586-linux-thread-multi /usr/lib/perl5/5.8.6 / 
> usr/lib/perl5/site_perl/5.8.6/i586-linux-thread-multi /usr/lib/ 
> perl5/site_perl/5.8.6 /usr/lib/perl5/site_perl /usr/lib/perl5/ 
> vendor_perl/5.8.6/i586-linux-thread-multi /usr/lib/perl5/ 
> vendor_perl/5.8.6 /usr/lib/perl5/vendor_perl .) at sf.pl line 18.
>  BEGIN failed--compilation aborted at sf.pl line 18.
>
>  I found this module using find which is called Generic.pm in this  
> directory
>      /usr/lib/perl5/site_perl/5.8.6/Bio/Matrix
>
>  Could someone tell me why it is not working.  I have no trouble  
> including these modules in my file.
>      use Bio::SeqIO;
>      use Bio::DB::GenBank;
>
>  Thank you.
>
>
>
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From daniel.lang at biologie.uni-freiburg.de  Wed Feb 15 10:35:40 2006
From: daniel.lang at biologie.uni-freiburg.de (Daniel Lang)
Date: Wed, 15 Feb 2006 11:35:40 +0100
Subject: [Bioperl-l] distmat matrix
Message-ID: <43F303FC.9000806@biologie.uni-freiburg.de>

Hi,

I need to go through a uncorrected distmat matrix (EMBOSS, run locally)
to filter sequences from an MSA.
I had a look around and didn't find an obvious candidate. Before I start
writing something my own...
Is there a bioperl parser for reading distmat matrices or can I trick
the Bio::MapIO parsers for scoring or PHYLIP in doing so?
If anyone knows of course a tool to generate an uncorrected distance
matrix of protein MSAs that is supported by bioperl, would be also OK
for me:)

I have no experience with the Pise
(Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand
it it's only to execute the application on a remote web server? Or can I
solve my task with Pise?

Thanks in advance!

Daniel



From praveecbt at yahoo.co.in  Wed Feb 15 08:57:44 2006
From: praveecbt at yahoo.co.in (Praveen Raj)
Date: Wed, 15 Feb 2006 08:57:44 +0000 (GMT)
Subject: [Bioperl-l] Help
Message-ID: <20060215085744.14911.qmail@web8711.mail.in.yahoo.com>

Dear  Peter Schattner Sir,
   
                                       I have one problem with the profile_align() of  Clustalw object.
   
  I have given the code like this,
   ......
  12 @seq_array=($seqobj1,$seqobj2,$seqobj3);
13 $seq_array_ref=\@seq_array;
  14 $aln=$factory->align($seq_array_ref);
  15 print $out $aln;   # this works fine
  16 $sen = Bio::Seq->new(-display_id => '>gi|userdata|',
17                      -seq => "MTKKPGGPGKNRA....",
18                      -format => "fasta");
19 $aln=$factory->profile_align($aln,$sen); #problem here
  20 print $out1 $aln;
   
  I have got one error like this in Line No. 19
   
  ERROR: Could not open sequence file (-profile) 
  No. of seqs. read = -1. No alignment!
   
  How I can I solve this problem?
  Hope you provide a proper solution.
   
                           Thanking you,
                                         Praveen Raj,
                                         Project Student,
                                         NIV, India.

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.


From jason.stajich at duke.edu  Wed Feb 15 13:19:41 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 08:19:41 -0500
Subject: [Bioperl-l] distmat matrix
In-Reply-To: <43F303FC.9000806@biologie.uni-freiburg.de>
References: <43F303FC.9000806@biologie.uni-freiburg.de>
Message-ID: <550C115C-1216-4285-8BE5-EC217C3F1BE9@duke.edu>

Bioperl can parse PHYLIP distance matricies, see Bio::Matrix::IO.  I  
didn't write an EMBOSS distmat result parser but that would be nice  
to have (but check that EMBOSS doesn't already allow output in phylip  
format first).

There is pure-perl distance matrix calculation of a MSA for DNA  
sequences
Bio::Align::DNAStatistics
and for protein
Bio::Align::ProteinStatistics

There is some initial discussion here on the website, but could  
certainly use some more details.

http://bioperl.org/wiki/Phylogenetics
http://bioperl.org/wiki/HOWTO:Trees
http://bioperl.org/wiki/Module:Bio::Align::DNAStatistics


-jason
On Feb 15, 2006, at 5:35 AM, Daniel Lang wrote:

> Hi,
>
> I need to go through a uncorrected distmat matrix (EMBOSS, run  
> locally)
> to filter sequences from an MSA.
> I had a look around and didn't find an obvious candidate. Before I  
> start
> writing something my own...
> Is there a bioperl parser for reading distmat matrices or can I trick
> the Bio::MapIO parsers for scoring or PHYLIP in doing so?
> If anyone knows of course a tool to generate an uncorrected distance
> matrix of protein MSAs that is supported by bioperl, would be also OK
> for me:)
>
> I have no experience with the Pise
> (Bio::Tools::Run::PiseApplication::distmat) stuff, but as I understand
> it it's only to execute the application on a remote web server? Or  
> can I
> solve my task with Pise?
>
> Thanks in advance!
>
> Daniel
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From michael.watson at bbsrc.ac.uk  Wed Feb 15 15:06:29 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Wed, 15 Feb 2006 15:06:29 -0000
Subject: [Bioperl-l] Website issues
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

The links on the left of bioperl.org don't work in konqueror 3.1.1,
which is a real b*gger because that's the browser I use on Linux... :-S

Mick



From rmb32 at cornell.edu  Wed Feb 15 16:01:07 2006
From: rmb32 at cornell.edu (Robert Buels)
Date: Wed, 15 Feb 2006 11:01:07 -0500
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
Message-ID: <43F35043.7070705@cornell.edu>

Hi all,

I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using 
FeatureIO, except it purports not to support gff 2), and the file looks 
like:

##gff-version 2
##date 2006-02-13
##sequence-region C01HBa0088L02.seq 1 120525
C01HBa0088L02   RepeatMasker    similarity      3537    4267     3.3    
-       .       Target "Motif:bac_end_repeat_family_345" 1 740
C01HBa0088L02   RepeatMasker    similarity      4172    4279     2.9    
+       .       Target "Motif:HRSiTERT00100141" 1 104
C01HBa0088L02   RepeatMasker    similarity      4267    4323     0.0    
-       .       Target "Motif:k_29" 150 206
C01HBa0088L02   RepeatMasker    similarity      4322    4492    26.6    
+       .       Target "Motif:PRSiTERT00300001" 1960 2129
C01HBa0088L02   RepeatMasker    similarity      4557    5124    29.5    
+       .       Target "Motif:PRSiTERT00300001" 2142 2711

Notice the score column is padded with spaces.

Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid 
score.  My question is, who is wrong here, my input file or 
Bio::Tools::GFF?  Should Bio::Tools::GFF be able to read this file?

Rob

-- 
Robert Buels
SGN Bioinformatics Analyst
252A Emerson Hall, Cornell University
Ithaca, NY  14853
Tel: 607-255-2360
rmb32 at cornell.edu
http://www.sgn.cornell.edu




From jason.stajich at duke.edu  Wed Feb 15 16:12:59 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 11:12:59 -0500
Subject: [Bioperl-l] Website issues
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>

Okay I guess someone will have to look into that.  Can you normally  
browse on wikipedia, we're just using their software, maybe it is a  
javascript problem?

Please send a system bug request to our helpdesk:
support at open-bio.org

-jason
On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:

> Hi
>
> The links on the left of bioperl.org don't work in konqueror 3.1.1,
> which is a real b*gger because that's the browser I use on  
> Linux... :-S
>
> Mick
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From Marc.Logghe at DEVGEN.com  Wed Feb 15 16:13:16 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Wed, 15 Feb 2006 17:13:16 +0100
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B2E@ANTARESIA.be.devgen.com>

Hi Rob,
According to the GFF Specifications Document @
http://www.sanger.ac.uk/Software/formats/GFF/GFF_Spec.shtml :

All of the above described fields should be separated by TAB characters
('\t'). All values of the mandatory fields should not include whitespace
(i.e. the strings for ,  and  fields).

Reading that, I am afraid you have to pre-process your gff input file
...
HTH,
Marc


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Robert Buels
> Sent: Wednesday, February 15, 2006 5:01 PM
> To: bioperl-l at bioperl.org
> Subject: [Bioperl-l] Bio::Tools::GFF parsing error
> 
> Hi all,
> 
> I'm parsing a GFF2 file with Bio::Tools::GFF (I would be 
> using FeatureIO, except it purports not to support gff 2), 
> and the file looks
> like:
> 
> ##gff-version 2
> ##date 2006-02-13
> ##sequence-region C01HBa0088L02.seq 1 120525
> C01HBa0088L02   RepeatMasker    similarity      3537    4267  
>    3.3    
> -       .       Target "Motif:bac_end_repeat_family_345" 1 740
> C01HBa0088L02   RepeatMasker    similarity      4172    4279  
>    2.9    
> +       .       Target "Motif:HRSiTERT00100141" 1 104
> C01HBa0088L02   RepeatMasker    similarity      4267    4323  
>    0.0    
> -       .       Target "Motif:k_29" 150 206
> C01HBa0088L02   RepeatMasker    similarity      4322    4492  
>   26.6    
> +       .       Target "Motif:PRSiTERT00300001" 1960 2129
> C01HBa0088L02   RepeatMasker    similarity      4557    5124  
>   29.5    
> +       .       Target "Motif:PRSiTERT00300001" 2142 2711
> 
> Notice the score column is padded with spaces.
> 
> Bio::Tools::GFF does not like this, and says that ' 3.3' is 
> not a valid score.  My question is, who is wrong here, my 
> input file or Bio::Tools::GFF?  Should Bio::Tools::GFF be 
> able to read this file?
> 
> Rob
> 
> --
> Robert Buels
> SGN Bioinformatics Analyst
> 252A Emerson Hall, Cornell University
> Ithaca, NY  14853
> Tel: 607-255-2360
> rmb32 at cornell.edu
> http://www.sgn.cornell.edu
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 



From jason.stajich at duke.edu  Wed Feb 15 16:29:14 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Wed, 15 Feb 2006 11:29:14 -0500
Subject: [Bioperl-l] Website issues
In-Reply-To: <93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
Message-ID: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>

I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE  
3.1.4-9)

But it works fine for me on 3.2.2-8.FC2 ....

So I'm going to go with this being a konqueror bug, sorry to say, but  
feel free to still report the bug to the helpdesk.
	
-jason
On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote:

> Okay I guess someone will have to look into that.  Can you normally
> browse on wikipedia, we're just using their software, maybe it is a
> javascript problem?
>
> Please send a system bug request to our helpdesk:
> support at open-bio.org
>
> -jason
> On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:
>
>> Hi
>>
>> The links on the left of bioperl.org don't work in konqueror 3.1.1,
>> which is a real b*gger because that's the browser I use on
>> Linux... :-S
>>
>> Mick
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Wed Feb 15 16:57:13 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 10:57:13 -0600
Subject: [Bioperl-l] Added 'Installing Bioperl for Unix' to wiki
Message-ID: <000301c63250$de506120$15327e82@pyrimidine>

I added an Installing Bioperl for Unix page, 

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

which is a quick redo of the INSTALL text file in the bioperl distribution.
It's in workable shape but needs links revisions etc.  

Please leave any comments on the discussion pages here.  

http://www.bioperl.org/wiki/Talk:Getting_BioPerl
http://www.bioperl.org/wiki/Talk:Installing_Bioperl_for_Unix

Thanks to Brian for helping out with the Windows install doc!

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From khoueiry at ibdm.univ-mrs.fr  Wed Feb 15 17:23:21 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Wed, 15 Feb 2006 18:23:21 +0100
Subject: [Bioperl-l] Website issues
In-Reply-To: <82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<93280C50-1ECE-468F-BC53-F1639D7F5C25@duke.edu>
	<82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
Message-ID: <1140024202.2689.45.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 

From heikki at sanbi.ac.za  Wed Feb 15 18:55:07 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 15 Feb 2006 20:55:07 +0200
Subject: [Bioperl-l] Website issues
In-Reply-To: <1140024202.2689.45.camel@localhost>
References: <8975119BCD0AC5419D61A9CF1A923E95030080F4@iahce2ksrv1.iah.bbsrc.ac.uk>
	<82FCD448-0A00-4BF8-B957-928F089E8157@duke.edu>
	<1140024202.2689.45.camel@localhost>
Message-ID: <200602152055.07667.heikki@sanbi.ac.za>

Konqueror 3.5.1.  has no problems, either. Clearly, older konqueror had a bug 
that has been permanently fixed.

Michael, time for you to upgrade.

	-Heikki

On Wednesday 15 February 2006 19:23, khoueiry wrote:
> I test it on konqueror 3.4.2 and it works well !!!
>
> On Wed, 2006-02-15 at 11:29 -0500, Jason Stajich wrote:
> > I can replicate it with konqueror 3.1.4-9.legacy RedHat (from KDE
> > 3.1.4-9)
> >
> > But it works fine for me on 3.2.2-8.FC2 ....
> >
> > So I'm going to go with this being a konqueror bug, sorry to say, but
> > feel free to still report the bug to the helpdesk.
> >
> > -jason
> >
> > On Feb 15, 2006, at 11:12 AM, Jason Stajich wrote:
> > > Okay I guess someone will have to look into that.  Can you normally
> > > browse on wikipedia, we're just using their software, maybe it is a
> > > javascript problem?
> > >
> > > Please send a system bug request to our helpdesk:
> > > support at open-bio.org
> > >
> > > -jason
> > >
> > > On Feb 15, 2006, at 10:06 AM, michael watson ((IAH-C)) wrote:
> > >> Hi
> > >>
> > >> The links on the left of bioperl.org don't work in konqueror 3.1.1,
> > >> which is a real b*gger because that's the browser I use on
> > >> Linux... :-S
> > >>
> > >> Mick
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > --
> > > Jason Stajich
> > > Duke University
> > > http://www.duke.edu/~jes12
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > Duke University
> > http://www.duke.edu/~jes12
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From gyang at plantbio.uga.edu  Wed Feb 15 19:39:41 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Wed, 15 Feb 2006 14:39:41 -0500
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
Message-ID: <20060215143941.54e91487@dogwood.plantbio.uga.edu>

Hi, Chris,
Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: 
"
waiting........
------------- EXCEPTION  -------------
MSG: no data for midline  Features flanking this part of subject sequence:
STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc                             hIO/blast.pm:1172
STACK toplevel remoteblast_test:40
"
The query sequence is:
CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG

The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'):
#!/usr/bin/perl

use Bio::SeqIO;
use Bio::Seq;
use Bio::Tools::Run::RemoteBlast;
use Bio::SearchIO;
use strict;
my $prog='blastn';
my $db='nr';
my $e_val=1e-10;
my @params=( -prog=>$prog,
	-data=>$db,
	-expect=>$e_val,
	-readmethod=>'SearchIO');
my $factory=Bio::Tools::Run::RemoteBlast->new(@params);

my $v = 1;

my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );

while (my $input = $str->next_seq()){
  #Blast a sequence against a database:
  #Alternatively, you could  pass in a file with many
  #sequences rather than loop through sequence one at a time
  #Remove the loop starting 'while (my $input = $str->next_seq())'
  #and swap the two lines below for an example of that.
  my $r = $factory->submit_blast($input);
  #my $r = $factory->submit_blast('amino.fa');
  print STDERR "waiting..." if( $v > 0 );
  while ( my @rids = $factory->each_rid ) {
    foreach my $rid ( @rids ) {
      my $rc = $factory->retrieve_blast($rid);
      if( !ref($rc) ) {
        if( $rc < 0 ) {
          $factory->remove_rid($rid);
        }
        print STDERR "." if ( $v > 0 );
        sleep 5;
      } else {
        my $result = $rc->next_result();
        #save the output
        my $filename = $result->query_name()."\.out";
        $factory->save_output($filename);
        $factory->remove_rid($rid);
        print "\nQuery Name: ", $result->query_name(), "\n";
        while ( my $hit = $result->next_hit ) {
          next unless ( $v > 0);
          print "\thit name is ", $hit->name, "\n";
          while( my $hsp = $hit->next_hsp ) {
            print "\t\tscore is ", $hsp->score, "\n";
          }
        }
      }
    }
  }
}


Do you think there might still be something in the NCBI output format?

Thank you,
Guojun




Guojun Yang
Department of Plant Biology
University of Georgia
Tel: 706-542-1857
Fax: 706-542-1805
http://www.arches.uga.edu/~guojun



----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2


> Sorry, forgot to add that I didn't see the regex issue that you mentioned.
> It could be a perl-related issue.  Try the fixes I mentioned and see what
> happens.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > Sent: Tuesday, February 14, 2006 12:36 PM
> > To: 'gyang at plantbio.uga.edu'
> > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > > It's a good habit to always add single quotes around words.  The perl
> > interpreter may think a single bare word is a subroutine or perlfunc
> > called with no args so will try to find a subroutine named blastp().  My
> > debugger actually gives the error that the bare word blastp may conflict
> > with a future reserved word.  Like you said, 'use strict' will point that
> > out.
> > > > As for the regex, it should match all the blast programs at NCBI (blastp,
> > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> > else passes through.
> > > > So, if you are using the script below, there are several errors.  The bare
> > words for $prog and $db need quotes, and the flags for you @params array
> > don't have a dash before them.  I get this after adding quotes but before
> > adding the dashes to @params:
> > > > C:\Perl\Scripts>test_blast.pl
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG:
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > live/Bio/Root/Root.pm:328
> > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > live/Bio/Tools/Run/RemoteBlast.pm:256
> > STACK: C:\Perl\Scripts\test_blast.pl:15
> > -----------------------------------------------------------
> > > > The last line indicates a problem with this line:
> > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > > Changing the @params to this:
> > > > my @params=( -prog=>$prog,
> > 	-data=>$db,
> > 	-expect=>$e_val,
> > 	-readmethod=>'SearchIO');
> > > > fixes it, and I get output as expected.
> > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > Sent: Tuesday, February 14, 2006 11:48 AM
> > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >
> > > Hi, Chris,
> > > When I tried with the perldoc script, It did not work either. First it
> > > says $prog can not be bare word if I "use strict". I added quotes on the
> > > words, then it says the value for $prog does not match expression
> > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > script
> > > is shown below. Why is the expression "t?blast[pnx]"?
> > >
> > > #!/usr/bin/perl
> > >
> > > use Bio::SeqIO;
> > > use Bio::Seq;
> > > use Bio::Tools::Run::RemoteBlast;
> > > use Bio::SearchIO;
> > >
> > >
> > > my $prog=blastp;
> > > my $db=swissprot;
> > > my $e_val=1e-10;
> > > my @params=( prog=>$prog,
> > > 	data=>$db,
> > > 	expect=>$e_val,
> > > 	readmethod=>'SearchIO');
> > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >
> > > my $v = 1;
> > >
> > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > >
> > > while (my $input = $str->next_seq()){
> > >   #Blast a sequence against a database:
> > >   #Alternatively, you could  pass in a file with many
> > >   #sequences rather than loop through sequence one at a time
> > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >   #and swap the two lines below for an example of that.
> > >   my $r = $factory->submit_blast($input);
> > >   #my $r = $factory->submit_blast('amino.fa');
> > >   print STDERR "waiting..." if( $v > 0 );
> > >   while ( my @rids = $factory->each_rid ) {
> > >     foreach my $rid ( @rids ) {
> > >       my $rc = $factory->retrieve_blast($rid);
> > >       if( !ref($rc) ) {
> > >         if( $rc < 0 ) {
> > >           $factory->remove_rid($rid);
> > >         }
> > >         print STDERR "." if ( $v > 0 );
> > >         sleep 5;
> > >       } else {
> > >         my $result = $rc->next_result();
> > >         #save the output
> > >         my $filename = $result->query_name()."\.out";
> > >         $factory->save_output($filename);
> > >         $factory->remove_rid($rid);
> > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > >         while ( my $hit = $result->next_hit ) {
> > >           next unless ( $v > 0);
> > >           print "\thit name is ", $hit->name, "\n";
> > >           while( my $hsp = $hit->next_hsp ) {
> > >             print "\t\tscore is ", $hsp->score, "\n";
> > >           }
> > >         }
> > >       }
> > >     }
> > >   }
> > > }
> > >
> > > Thank you for your help!
> > >
> > >
> > > Guojun
> > > Department of Plant Biology
> > > University of Georgia
> > >
> > > ----- Original Message -----
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > To: gyang at plantbio.uga.edu
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >
> > >
> > > > Try two things:
> > > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > > wrong
> > > > with the logic in your subroutine:
> > > > > my $v = 1;
> > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > > > while (my $input = $str->next_seq()){
> > > >   #Blast a sequence against a database:
> > > >   #Alternatively, you could  pass in a file with many
> > > >   #sequences rather than loop through sequence one at a time
> > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >   #and swap the two lines below for an example of that.
> > > >   my $r = $factory->submit_blast($input);
> > > >   #my $r = $factory->submit_blast('amino.fa');
> > > >   print STDERR "waiting..." if( $v > 0 );
> > > >   while ( my @rids = $factory->each_rid ) {
> > > >     foreach my $rid ( @rids ) {
> > > >       my $rc = $factory->retrieve_blast($rid);
> > > >       if( !ref($rc) ) {
> > > >         if( $rc < 0 ) {
> > > >           $factory->remove_rid($rid);
> > > >         }
> > > >         print STDERR "." if ( $v > 0 );
> > > >         sleep 5;
> > > >       } else {
> > > >         my $result = $rc->next_result();
> > > >         #save the output
> > > >         my $filename = $result->query_name()."\.out";
> > > >         $factory->save_output($filename);
> > > >         $factory->remove_rid($rid);
> > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > >         while ( my $hit = $result->next_hit ) {
> > > >           next unless ( $v > 0);
> > > >           print "\thit name is ", $hit->name, "\n";
> > > >           while( my $hsp = $hit->next_hsp ) {
> > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > >           }
> > > >         }
> > > >       }
> > > >     }
> > > >   }
> > > > }
> > > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > really
> > > > shouldn't make that much of a difference, but I noticed that the CVS
> > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > > released; the Bugzilla version is based off CVS.
> > > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > > > > -----Original Message-----
> > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > > To: bioperl-l at lists.open-bio.org
> > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > Thanks, Chris,
> > > > > I installed version 1.5.1 and replaced the blast.pm file with the
> > one
> > > from
> > > > > your bug report. The running version is 1.5 when I use the command
> > you
> > > > > sent me. But when I tried the script, it doesn't change much. My
> > > > > remoteblast code (portion) is here:
> > > > > > > sub search {
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > > local
> > > > >
> > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > > 'no';
> > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > > 			      -id=>"query",
> > > > > 			      -desc=>"new seq");
> > > > > my $len=$query->length();
> > > > > @db=('nr','htgs','wgs');
> > > > > foreach my $db (@db) {
> > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> > > > > 						'-data' =>"$db",
> > > > >
> '-expect'=>"$E_value");
> > > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > > my @rids = $factory->each_rid();
> > > > > foreach my $rid ( @rids ) {
> > > > >     print STDERR "$rid\n";
> > > > > }
> > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > > print STDERR "waiting...";
> > > > > sleep 60;
> > > > > > > foreach my $rid ( @rids ) {
> > > > >     my $rc = $factory->retrieve_blast($rid);
> > > > >     while (!ref($rc) ) {
> > > > > 	if( $rc < 0 ) {
> > > > > # retrieve_blast returns -1 on error
> > > > > 	    $factory->remove_rid($rid);
> > > > > 	    print "Error!\n";
> > > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > > 	    die "Can't retrieve $rid";
> > > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > finished'
> > > > > 	    sleep 60;
> > > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > > 	}
> > > > >     }
> > > > >     if (ref($rc)) {
> > > > > 	print STDERR "Done.\n";
> > > > > 	 while( my $result = $rc->next_result) {
> > > > > 	    while( my $hit = $result->next_hit()) {
> > > > > 	    	$hit_name=$hit->name;
> > > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > > 		$name=$1;
> > > > > 		@left_plus_start=();
> > > > > 		@left_plus_end=();
> > > > > 		@left_minus_start=();
> > > > > 		@left_minus_end=();
> > > > > 		@right_plus_start=();
> > > > > 		@right_plus_end=();
> > > > > 		@right_minus_start=();
> > > > > 		@right_minus_end=();
> > > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > > ......
> > > > > > > It was working quite well before around October laster year, but
> > > it has
> > > > > stopped since then, When a submission is sent via a webpage, the cgi
> > > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > finally
> > > > > the expected email is received but without real results although it
> > > does
> > > > > contain something from other parts of the script. Apparently the
> > > search
> > > > > sub did not return anything (I know there is something should be
> > > > > returned.). Is it also possible the format of the NCBI output for
> > each
> > > > > result has changed?
> > > > > Thank you,
> > > > > Guojun
> > > > > > > > > Department of Plant Biology
> > > > > University of Georgia
> > > > > > > > > > > ----- Original Message -----
> > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > How do you know two versions are installed (i.e. how are
> > you
> > > checking
> > > > > the
> > > > > > version)?  Do you see have two complete bioperl distributions (in
> > > two
> > > > > > separate directories) or are you looking in modules?  Here's the
> > way
> > > to
> > > > > > check the version (from the FAQ):
> > > > > > > perl -MBio::Root::Version -e 'print
> > > $Bio::Root::Version::VERSION,"\n"'
> > > > > > > If you have two full bioperl distributions on your computer,
> > > normally
> > > > > only
> > > > > > one will be in use unless you have explicitly set the environment
> > > > > variable
> > > > > > PERL5LIB.  The PERL5LIB  directories will be searched first before
> > > your
> > > > > > normal perl directory list (@INC) is searched.  You MAY get some
> > > mixing
> > > > > > then, but only if perl can't find a particular module in the path
> > > > > designated
> > > > > > in PERL5LIB; then it will progress through the directories listed
> > in
> > > > > @INC.
> > > > > > This may happen if a module is unique to a particular release, but
> > > > > shouldn't
> > > > > > happen for the majority of modules, including RemoteBlast.  You
> > can
> > > > > check
> > > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > > differ
> > > > > > depending on your OS, perl build, etc.
> > > > > > > Regardless, if you follow the directions for installing bioperl
> > > for
> > > > > your
> > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > > unless
> > > > > you
> > > > > > explicitly change the installation directory when using 'perl
> > > > > Makefile.PL'),
> > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > install
> > > > > the
> > > > > > Bioperl distribution you downloaded over the old version in @INC.
> > > See
> > > > > this
> > > > > > page:
> > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > > for more details.
> > > > > > > Christopher Fields
> > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > Dept. of Biochemistry
> > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > -----Original Message-----
> > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > Hi, Chris,
> > > > > > > I do have different versions of bioperl on my Linux machine
> > (1.4.
> > > and
> > > > > > > 1.5.0), this may be the problem. Should I just install bioperl-
> > > 1.5.1
> > > > > or I
> > > > > > > need to uninstall and remove the previous versions. I could not
> > > find
> > > > > any
> > > > > > > hint on uninstalling bioperl on linux. Could you please give me
> > > some
> > > > > > > suggestion?
> > > > > > > Thanks,
> > > > > > > Guojun
> > > > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > >       _____
> > > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've likely
> > > > > updated from CVS
> > > > > > > which isn't the latest fix.
> > > > > > > > > Make sure that you check the following:
> > > > > > > > > 1) Always post to the mailing list:
> > > > > > > http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > > (CVS)
> > > > > > > installed first.  Perform a clean installation; do not upgrade
> > > only
> > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > can't
> > > > > > > guarantee that mixing modules from old and new distributions
> > (1.4
> > > and
> > > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > > > > > installation will allow text output from BLAST v.2.2.12 to be
> > > saved
> > > > > and
> > > > > > > parsed; it will not parse the newest BLAST text output from NCBI
> > > > > (v2.2.13)
> > > > > > > but it should still save it. I believe as long as next_results()
> > > isn't
> > > > > > > called, it will work.
> > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> > 2.2.13
> > > > > text output
> > > > > > > are NOT in CVS; they haven't been cleared and checked in by
> > Roger
> > > Hall
> > > > > > > (who's now taking care of RemoteBlast) and the powers that be
> > > (Jason
> > > > > or
> > > > > > > whomever is in charge of Bio::SearchIO).  They can be found in
> > > > > Bugzilla:
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > > option
> > > > > of
> > > > > > > saving XML output, so isn't necessary if you don't plan on using
> > > this
> > > > > > > option.  And, remember, they haven't been committed yet to CVS,
> > > which
> > > > > > > means that the final version will change to refle the new
> > version.
> > > > > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > >     _____
> > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > > To: Chris Fields
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > > > Hi, Chris
> > > > > > > > > Thanks for your suggestion, however, it doesn't seem to work
> > > for
> > > > > my cgi
> > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > > even
> > > > > get
> > > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > > Guojun
> > > > > > > > > > > Guojun Yang
> > > > > > > Department of Plant Biology
> > > > > > > University of Georgia
> > > > > > > Tel: 706-542-1857
> > > > > > > Fax: 706-542-1805
> > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > >     _____
> > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > > Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > > > version
> > > > > > > 1.28
> > > > > > > > > I would say give the new code a try, but realize that it
> > > hasn't
> > > > > been
> > > > > > > checked
> > > > > > > in (like I said below). I will try going over the modified
> > > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > > anything I
> > > > > > > might
> > > > > > > have missed. The changed order in the header of BLAST text
> > output
> > > has
> > > > > me a
> > > > > > > bit worried that it might not catch everything, but it at least
> > > > > doesn't
> > > > > > > hang
> > > > > > > in the while() loop I described in the bug report below (bug
> > > #1934)
> > > > > and
> > > > > > > seems to process everything fine.
> > > > > > > > > If you want more stability in the code, you might consider
> > > > > changing over
> > > > > > > to
> > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There are
> > > some
> > > > > > > changes
> > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > saving
> > > > > XML
> > > > > > > output, but I believe it parses everything regardless. If you
> > look
> > > > > back
> > > > > > > the
> > > > > > > last month or so there has been a bit of discussion here about
> > it.
> > > > > Jason
> > > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > > > remoteblast/
> > > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > > To: bioperl-l at bioperl.org
> > > > > > > > Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > > version
> > > > > 1.28
> > > > > > > >
> > > > > > > > Hi, Everybody,
> > > > > > > > I see this post and am wondering if this is the reason for the
> > > > > > > > malfunctionning of my webserver. We set up a webserver named
> > > MAK,
> > > > > for
> > > > > > > MITE
> > > > > > > > sequence analysis. It was working very well until around
> > > November
> > > > > 2005,
> > > > > > > > when it stopped returning any result (the site is fine and
> > seems
> > > to
> > > > > be
> > > > > > > > doing sth after submission). In the CGI script, I used
> > > remoteblast
> > > > > (that
> > > > > > > > work was done in 2003) to do searches. I currently do not have
> > > > > access to
> > > > > > > > the server because I moved. Quite several people sent emails
> > to
> > > us
> > > > > about
> > > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > > problem?
> > > > > > > Should
> > > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > > version?
> > > > > > > > Thanks a lot,
> > > > > > > > Guojun
> > > > > > > >
> > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > > Tel: 706-542-1857
> > > > > > > > Fax: 706-542-1805
> > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > > _____
> > > > > > > >
> > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > > Jian'
> > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > [mailto:bioperl-
> > > > > > > > l at bioperl.org]
> > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > >
> > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > > CVS.
> > > > > It
> > > > > > > > will
> > > > > > > > work for saving text output. However, it will not parse
> > anything
> > > > > using
> > > > > > > > next_result (it will likely hang) and will not save XML
> > format.
> > > See
> > > > > > > these
> > > > > > > > bugs:
> > > > > > > >
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > >
> > > > > > > > for explanations and possible fixes (changes to RemoteBlast
> > and
> > > > > > > > Bio::SearchIO::blast). Note that these haven't been checked in
> > > yet
> > > > > so
> > > > > > > are
> > > > > > > > still not included in bioperl-live; they may be further
> > modified
> > > > > before
> > > > > > > > committing to CVS. If you're not worried about XML, you could
> > > just
> > > > > try
> > > > > > > the
> > > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > > >
> > > > > > > > Nagesh, I remember you posting to the list a month ago using a
> > > > > script
> > > > > > > > which
> > > > > > > > had problems; the script you used saves the output but doesn't
> > > > > actually
> > > > > > > > parse it (i.e. you don't use next_result() to go through the
> > > data).
> > > > > Is
> > > > > > > the
> > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > you
> > > > > tried
> > > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > > readmethod
> > > > > =>
> > > > > > > > blast"
> > > > > > > > using your version of RemoteBlast and method next_result()?
> > Like
> > > > > below
> > > > > > > > (from
> > > > > > > > perldoc):
> > > > > > > >
> > > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > > foreach my $rid ( @rids ) {
> > > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > > if( !ref($rc) ) {
> > > > > > > > if( $rc < 0 ) {
> > > > > > > > $factory->remove_rid($rid);
> > > > > > > > }
> > > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > > sleep 5;
> > > > > > > > } else { # parsing
> > > > > > > > starts here
> > > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > > here
> > > > > > > > #save the output
> > > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > > $factory->save_output($filename);
> > > > > > > > $factory->remove_rid($rid);
> > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > > next unless ( $v > 0);
> > > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > > }
> > > > > > > >
> > > > > > > >
> > > > > > > > My script hanged if I used next_result() in any way prior to
> > the
> > > > > fixes.
> > > > > > > I
> > > > > > > > want to see how many others are having the same issues with
> > > parsing
> > > > > > > using
> > > > > > > > the CVS version of bioperl-live.
> > > > > > > >
> > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > l-
> > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > >
> > > > > > > > > Hi Huang,
> > > > > > > > > Thanks for the message. The older version of RemoteBlast.pm
> > > works
> > > > > on
> > > > > > > the
> > > > > > > > > logic of checking the temporary file size to determine
> > whether
> > > the
> > > > > > > Blast
> > > > > > > > > results are ready. This condition is not getting satisfied
> > may
> > > be
> > > > > due
> > > > > > > to
> > > > > > > > > some changes brought about by NCBI. I had this problem
> > > recently
> > > > > and
> > > > > > > > > figured out that the solution was to use the latest version
> > > which
> > > > > has
> > > > > > > > > this problem fixed (does not use file size logic any more)
> > > which
> > > > > is
> > > > > > > not
> > > > > > > > > yet included in the BioPerl package.
> > > > > > > > > Cheers
> > > > > > > > > Nagesh
> > > > > > > > >
> > > > > > > > > Huang Jian wrote:
> > > > > > > > >
> > > > > > > > > > Dear Nagesh,
> > > > > > > > > >
> > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > > you
> > > > > send
> > > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > > >
> > > > > > > > > > Thank you!!
> > > > > > > > > >
> > > > > > > > > > Huang
> > > > > > > > > >
> > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > > 
> > > > > > > > > > To: "Huang Jian" ; "bioperl-l"
> > > > > > > > > > 
> > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > net,
> > > so
> > > > > still
> > > > > > > > > > via email
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >> Hi Huang,
> > > > > > > > > >> I see that you are submitting a sequence for a remote
> > blast
> > > > > search.
> > > > > > > > Can
> > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > > (2005/12/09).
> > > > > > > If
> > > > > > > > > >> not I have attached it with this email, try to replace it
> > > with
> > > > > the
> > > > > > > > old
> > > > > > > > > >> one which has a bug.
> > > > > > > > > >> Let me know if it works.
> > > > > > > > > >> Nagesh
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > _______________________________________________
> > > > > > > Bioperl-l mailing list
> > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > >
> > > > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > 



From cjfields at uiuc.edu  Wed Feb 15 20:17:27 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 14:17:27 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on
	RemoteBlast.pmversion 1.28
In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
Message-ID: <000001c6326c$d72dd640$15327e82@pyrimidine>

This looks like a genuine bug and may be something that changed in BLASTN
text output; I'm getting it here, too.  Running verbose shows that text
output is returned, so, from that and from the stack trace it looks like
another error in text parsing in Bio::SearchIO::blast.  Bio::SearchIO::blast
line 1172 throws a conditional exception.  

I'm adding this to bug 1934 in bugzilla (reference to your email and this
response) for now.  I'll try messing around with it when I can; I'm really
busy this week.  I'll also forward this to Roger Hall.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Wednesday, February 15, 2006 1:40 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pmversion 1.28
> 
> Hi, Chris,
> Finally the remoteblast test script works for the amino.fa query. but when
> I try a nucleic acid sequence (see below), Error occurs:
> "
> waiting........
> ------------- EXCEPTION  -------------
> MSG: no data for midline  Features flanking this part of subject sequence:
> STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> hIO/blast.pm:1172
> STACK toplevel remoteblast_test:40
> "
> The query sequence is:
> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> 
> The script (basically same as the remoteblast test, I only changed
> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> #!/usr/bin/perl
> 
> use Bio::SeqIO;
> use Bio::Seq;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::SearchIO;
> use strict;
> my $prog='blastn';
> my $db='nr';
> my $e_val=1e-10;
> my @params=( -prog=>$prog,
> 	-data=>$db,
> 	-expect=>$e_val,
> 	-readmethod=>'SearchIO');
> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> 
> my $v = 1;
> 
> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> 
> while (my $input = $str->next_seq()){
>   #Blast a sequence against a database:
>   #Alternatively, you could  pass in a file with many
>   #sequences rather than loop through sequence one at a time
>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>   #and swap the two lines below for an example of that.
>   my $r = $factory->submit_blast($input);
>   #my $r = $factory->submit_blast('amino.fa');
>   print STDERR "waiting..." if( $v > 0 );
>   while ( my @rids = $factory->each_rid ) {
>     foreach my $rid ( @rids ) {
>       my $rc = $factory->retrieve_blast($rid);
>       if( !ref($rc) ) {
>         if( $rc < 0 ) {
>           $factory->remove_rid($rid);
>         }
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>       } else {
>         my $result = $rc->next_result();
>         #save the output
>         my $filename = $result->query_name()."\.out";
>         $factory->save_output($filename);
>         $factory->remove_rid($rid);
>         print "\nQuery Name: ", $result->query_name(), "\n";
>         while ( my $hit = $result->next_hit ) {
>           next unless ( $v > 0);
>           print "\thit name is ", $hit->name, "\n";
>           while( my $hsp = $hit->next_hsp ) {
>             print "\t\tscore is ", $hsp->score, "\n";
>           }
>         }
>       }
>     }
>   }
> }
> 
> 
> Do you think there might still be something in the NCBI output format?
> 
> Thank you,
> Guojun
> 
> 
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> Tel: 706-542-1857
> Fax: 706-542-1805
> http://www.arches.uga.edu/~guojun
> 
> 
> 
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> 
> 
> > Sorry, forgot to add that I didn't see the regex issue that you
> mentioned.
> > It could be a perl-related issue.  Try the fixes I mentioned and see
> what
> > happens.
> > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > -----Original Message-----
> > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > Sent: Tuesday, February 14, 2006 12:36 PM
> > > To: 'gyang at plantbio.uga.edu'
> > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > > > It's a good habit to always add single quotes around words.  The
> perl
> > > interpreter may think a single bare word is a subroutine or perlfunc
> > > called with no args so will try to find a subroutine named blastp().
> My
> > > debugger actually gives the error that the bare word blastp may
> conflict
> > > with a future reserved word.  Like you said, 'use strict' will point
> that
> > > out.
> > > > > As for the regex, it should match all the blast programs at NCBI
> (blastp,
> > > blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> > > else passes through.
> > > > > So, if you are using the script below, there are several errors.
> The bare
> > > words for $prog and $db need quotes, and the flags for you @params
> array
> > > don't have a dash before them.  I get this after adding quotes but
> before
> > > adding the dashes to @params:
> > > > > C:\Perl\Scripts>test_blast.pl
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG:
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > > live/Bio/Root/Root.pm:328
> > > STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > > C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > > STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > > live/Bio/Tools/Run/RemoteBlast.pm:256
> > > STACK: C:\Perl\Scripts\test_blast.pl:15
> > > -----------------------------------------------------------
> > > > > The last line indicates a problem with this line:
> > > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > > > Changing the @params to this:
> > > > > my @params=( -prog=>$prog,
> > > 	-data=>$db,
> > > 	-expect=>$e_val,
> > > 	-readmethod=>'SearchIO');
> > > > > fixes it, and I get output as expected.
> > > > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > > > > > > > -----Original Message-----
> > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > Sent: Tuesday, February 14, 2006 11:48 AM
> > > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >
> > > > Hi, Chris,
> > > > When I tried with the perldoc script, It did not work either. First
> it
> > > > says $prog can not be bare word if I "use strict". I added quotes on
> the
> > > > words, then it says the value for $prog does not match expression
> > > > t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > > script
> > > > is shown below. Why is the expression "t?blast[pnx]"?
> > > >
> > > > #!/usr/bin/perl
> > > >
> > > > use Bio::SeqIO;
> > > > use Bio::Seq;
> > > > use Bio::Tools::Run::RemoteBlast;
> > > > use Bio::SearchIO;
> > > >
> > > >
> > > > my $prog=blastp;
> > > > my $db=swissprot;
> > > > my $e_val=1e-10;
> > > > my @params=( prog=>$prog,
> > > > 	data=>$db,
> > > > 	expect=>$e_val,
> > > > 	readmethod=>'SearchIO');
> > > > my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >
> > > > my $v = 1;
> > > >
> > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > >
> > > > while (my $input = $str->next_seq()){
> > > >   #Blast a sequence against a database:
> > > >   #Alternatively, you could  pass in a file with many
> > > >   #sequences rather than loop through sequence one at a time
> > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >   #and swap the two lines below for an example of that.
> > > >   my $r = $factory->submit_blast($input);
> > > >   #my $r = $factory->submit_blast('amino.fa');
> > > >   print STDERR "waiting..." if( $v > 0 );
> > > >   while ( my @rids = $factory->each_rid ) {
> > > >     foreach my $rid ( @rids ) {
> > > >       my $rc = $factory->retrieve_blast($rid);
> > > >       if( !ref($rc) ) {
> > > >         if( $rc < 0 ) {
> > > >           $factory->remove_rid($rid);
> > > >         }
> > > >         print STDERR "." if ( $v > 0 );
> > > >         sleep 5;
> > > >       } else {
> > > >         my $result = $rc->next_result();
> > > >         #save the output
> > > >         my $filename = $result->query_name()."\.out";
> > > >         $factory->save_output($filename);
> > > >         $factory->remove_rid($rid);
> > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > >         while ( my $hit = $result->next_hit ) {
> > > >           next unless ( $v > 0);
> > > >           print "\thit name is ", $hit->name, "\n";
> > > >           while( my $hsp = $hit->next_hsp ) {
> > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > >           }
> > > >         }
> > > >       }
> > > >     }
> > > >   }
> > > > }
> > > >
> > > > Thank you for your help!
> > > >
> > > >
> > > > Guojun
> > > > Department of Plant Biology
> > > > University of Georgia
> > > >
> > > > ----- Original Message -----
> > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > To: gyang at plantbio.uga.edu
> > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >
> > > >
> > > > > Try two things:
> > > > > > 1)  Use a much simpler script, like the one in 'perldoc
> > > > > Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's
> something
> > > > wrong
> > > > > with the logic in your subroutine:
> > > > > > my $v = 1;
> > > > > > my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> );
> > > > > > while (my $input = $str->next_seq()){
> > > > >   #Blast a sequence against a database:
> > > > >   #Alternatively, you could  pass in a file with many
> > > > >   #sequences rather than loop through sequence one at a time
> > > > >   #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > > >   #and swap the two lines below for an example of that.
> > > > >   my $r = $factory->submit_blast($input);
> > > > >   #my $r = $factory->submit_blast('amino.fa');
> > > > >   print STDERR "waiting..." if( $v > 0 );
> > > > >   while ( my @rids = $factory->each_rid ) {
> > > > >     foreach my $rid ( @rids ) {
> > > > >       my $rc = $factory->retrieve_blast($rid);
> > > > >       if( !ref($rc) ) {
> > > > >         if( $rc < 0 ) {
> > > > >           $factory->remove_rid($rid);
> > > > >         }
> > > > >         print STDERR "." if ( $v > 0 );
> > > > >         sleep 5;
> > > > >       } else {
> > > > >         my $result = $rc->next_result();
> > > > >         #save the output
> > > > >         my $filename = $result->query_name()."\.out";
> > > > >         $factory->save_output($filename);
> > > > >         $factory->remove_rid($rid);
> > > > >         print "\nQuery Name: ", $result->query_name(), "\n";
> > > > >         while ( my $hit = $result->next_hit ) {
> > > > >           next unless ( $v > 0);
> > > > >           print "\thit name is ", $hit->name, "\n";
> > > > >           while( my $hsp = $hit->next_hsp ) {
> > > > >             print "\t\tscore is ", $hsp->score, "\n";
> > > > >           }
> > > > >         }
> > > > >       }
> > > > >     }
> > > > >   }
> > > > > }
> > > > > > 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > > really
> > > > > shouldn't make that much of a difference, but I noticed that the
> CVS
> > > > > RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1
> was
> > > > > released; the Bugzilla version is based off CVS.
> > > > > > Christopher Fields
> > > > > Postdoctoral Researcher - Switzer Lab
> > > > > Dept. of Biochemistry
> > > > > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > Sent: Monday, February 13, 2006 3:00 PM
> > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > Thanks, Chris,
> > > > > > I installed version 1.5.1 and replaced the blast.pm file with
> the
> > > one
> > > > from
> > > > > > your bug report. The running version is 1.5 when I use the
> command
> > > you
> > > > > > sent me. But when I tried the script, it doesn't change much. My
> > > > > > remoteblast code (portion) is here:
> > > > > > > > sub search {
> > > > > > local
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > > > > local
> $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > > > > local
> > > > > >
> > > $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > > > > 'no';
> > > > > > local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > > > > my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > > > > 			      -id=>"query",
> > > > > > 			      -desc=>"new seq");
> > > > > > my $len=$query->length();
> > > > > > @db=('nr','htgs','wgs');
> > > > > > foreach my $db (@db) {
> > > > > > my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> =>'blastn',
> > > > > > 						'-data' =>"$db",
> > > > > >
> > '-expect'=>"$E_value");
> > > > > > > > > > my $blast_report = $factory->submit_blast($query);
> > > > > > > > my @rids = $factory->each_rid();
> > > > > > foreach my $rid ( @rids ) {
> > > > > >     print STDERR "$rid\n";
> > > > > > }
> > > > > > # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > > > > print STDERR "waiting...";
> > > > > > sleep 60;
> > > > > > > > foreach my $rid ( @rids ) {
> > > > > >     my $rc = $factory->retrieve_blast($rid);
> > > > > >     while (!ref($rc) ) {
> > > > > > 	if( $rc < 0 ) {
> > > > > > # retrieve_blast returns -1 on error
> > > > > > 	    $factory->remove_rid($rid);
> > > > > > 	    print "Error!\n";
> > > > > > 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > > > > 	    die "Can't retrieve $rid";
> > > > > > 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > > finished'
> > > > > > 	    sleep 60;
> > > > > > 	    $rc = $factory->retrieve_blast($rid);
> > > > > > 	}
> > > > > >     }
> > > > > >     if (ref($rc)) {
> > > > > > 	print STDERR "Done.\n";
> > > > > > 	 while( my $result = $rc->next_result) {
> > > > > > 	    while( my $hit = $result->next_hit()) {
> > > > > > 	    	$hit_name=$hit->name;
> > > > > > 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > > > > 		$name=$1;
> > > > > > 		@left_plus_start=();
> > > > > > 		@left_plus_end=();
> > > > > > 		@left_minus_start=();
> > > > > > 		@left_minus_end=();
> > > > > > 		@right_plus_start=();
> > > > > > 		@right_plus_end=();
> > > > > > 		@right_minus_start=();
> > > > > > 		@right_minus_end=();
> > > > > > > > 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i))
{
> > > > > > 		while( my $hsp = $hit->next_hsp()) {
> > > > > > ......
> > > > > > > > It was working quite well before around October laster year,
> but
> > > > it has
> > > > > > stopped since then, When a submission is sent via a webpage, the
> cgi
> > > > > > starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > > finally
> > > > > > the expected email is received but without real results although
> it
> > > > does
> > > > > > contain something from other parts of the script. Apparently the
> > > > search
> > > > > > sub did not return anything (I know there is something should be
> > > > > > returned.). Is it also possible the format of the NCBI output
> for
> > > each
> > > > > > result has changed?
> > > > > > Thank you,
> > > > > > Guojun
> > > > > > > > > > Department of Plant Biology
> > > > > > University of Georgia
> > > > > > > > > > > > ----- Original Message -----
> > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > > How do you know two versions are installed (i.e. how
> are
> > > you
> > > > checking
> > > > > > the
> > > > > > > version)?  Do you see have two complete bioperl distributions
> (in
> > > > two
> > > > > > > separate directories) or are you looking in modules?  Here's
> the
> > > way
> > > > to
> > > > > > > check the version (from the FAQ):
> > > > > > > > perl -MBio::Root::Version -e 'print
> > > > $Bio::Root::Version::VERSION,"\n"'
> > > > > > > > If you have two full bioperl distributions on your computer,
> > > > normally
> > > > > > only
> > > > > > > one will be in use unless you have explicitly set the
> environment
> > > > > > variable
> > > > > > > PERL5LIB.  The PERL5LIB  directories will be searched first
> before
> > > > your
> > > > > > > normal perl directory list (@INC) is searched.  You MAY get
> some
> > > > mixing
> > > > > > > then, but only if perl can't find a particular module in the
> path
> > > > > > designated
> > > > > > > in PERL5LIB; then it will progress through the directories
> listed
> > > in
> > > > > > @INC.
> > > > > > > This may happen if a module is unique to a particular release,
> but
> > > > > > shouldn't
> > > > > > > happen for the majority of modules, including RemoteBlast.
> You
> > > can
> > > > > > check
> > > > > > > what @INC and PERL5LIB are set to by using 'perl -V'.  @INC
> will
> > > > differ
> > > > > > > depending on your OS, perl build, etc.
> > > > > > > > Regardless, if you follow the directions for installing
> bioperl
> > > > for
> > > > > > your
> > > > > > > system ('perl Makefile.PL', 'make', 'make test', 'make
> install',
> > > > unless
> > > > > > you
> > > > > > > explicitly change the installation directory when using 'perl
> > > > > > Makefile.PL'),
> > > > > > > then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > > install
> > > > > > the
> > > > > > > Bioperl distribution you downloaded over the old version in
> @INC.
> > > > See
> > > > > > this
> > > > > > > page:
> > > > > > > > http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > > > > > > for more details.
> > > > > > > > Christopher Fields
> > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > Dept. of Biochemistry
> > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > -----Original Message-----
> > > > > > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> l-
> > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > Sent: Monday, February 13, 2006 12:32 PM
> > > > > > > > To: bioperl-l at lists.open-bio.org
> > > > > > > > Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > > > > > > > > Hi, Chris,
> > > > > > > > I do have different versions of bioperl on my Linux machine
> > > (1.4.
> > > > and
> > > > > > > > 1.5.0), this may be the problem. Should I just install
> bioperl-
> > > > 1.5.1
> > > > > > or I
> > > > > > > > need to uninstall and remove the previous versions. I could
> not
> > > > find
> > > > > > any
> > > > > > > > hint on uninstalling bioperl on linux. Could you please give
> me
> > > > some
> > > > > > > > suggestion?
> > > > > > > > Thanks,
> > > > > > > > Guojun
> > > > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > >       _____
> > > > > > > > > >   From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > > > > > > Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > > > > > If you're using RemoteBlast 1.28, then you've
> likely
> > > > > > updated from CVS
> > > > > > > > which isn't the latest fix.
> > > > > > > > > > Make sure that you check the following:
> > > > > > > > > > 1) Always post to the mailing list:
> > > > > > > >
> http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > > > > > > > > 2) You must have the complete bioperl-1.5.1 or bioperl-
> live
> > > > (CVS)
> > > > > > > > installed first.  Perform a clean installation; do not
> upgrade
> > > > only
> > > > > > > > Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > > can't
> > > > > > > > guarantee that mixing modules from old and new distributions
> > > (1.4
> > > > and
> > > > > > > > 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-
> live
> > > > > > > > installation will allow text output from BLAST v.2.2.12 to
> be
> > > > saved
> > > > > > and
> > > > > > > > parsed; it will not parse the newest BLAST text output from
> NCBI
> > > > > > (v2.2.13)
> > > > > > > > but it should still save it. I believe as long as
> next_results()
> > > > isn't
> > > > > > > > called, it will work.
> > > > > > > > > > 3) The bug fixes for the above issue with parsing BLAST
> > > 2.2.13
> > > > > > text output
> > > > > > > > are NOT in CVS; they haven't been cleared and checked in by
> > > Roger
> > > > Hall
> > > > > > > > (who's now taking care of RemoteBlast) and the powers that
> be
> > > > (Jason
> > > > > > or
> > > > > > > > whomever is in charge of Bio::SearchIO).  They can be found
> in
> > > > > > Bugzilla:
> > > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > > > The fix in RemoteBlast in Bugzilla (#1935) is to allow
> the
> > > > option
> > > > > > of
> > > > > > > > saving XML output, so isn't necessary if you don't plan on
> using
> > > > this
> > > > > > > > option.  And, remember, they haven't been committed yet to
> CVS,
> > > > which
> > > > > > > > means that the final version will change to refle the new
> > > version.
> > > > > > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > > >     _____
> > > > > > > > > > > > From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > > > > > > Sent: Monday, February 13, 2006 9:26 AM
> > > > > > > > To: Chris Fields
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > > > Hi, Chris
> > > > > > > > > > Thanks for your suggestion, however, it doesn't seem to
> work
> > > > for
> > > > > > my cgi
> > > > > > > > even after I replace both blast.pm and RemoteBlast.pm. I
> didn't
> > > > even
> > > > > > get
> > > > > > > > any RID. Is there any suggestion?
> > > > > > > > > > > > > > Guojun
> > > > > > > > > > > > Guojun Yang
> > > > > > > > Department of Plant Biology
> > > > > > > > University of Georgia
> > > > > > > > Tel: 706-542-1857
> > > > > > > > Fax: 706-542-1805
> > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > >     _____
> > > > > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > > > > > > Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > > > > > > Subject: RE: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > > > version
> > > > > > > > 1.28
> > > > > > > > > > I would say give the new code a try, but realize that it
> > > > hasn't
> > > > > > been
> > > > > > > > checked
> > > > > > > > in (like I said below). I will try going over the modified
> > > > > > > > Bio::SearchIO::blast again this weekend to see if there is
> > > > anything I
> > > > > > > > might
> > > > > > > > have missed. The changed order in the header of BLAST text
> > > output
> > > > has
> > > > > > me a
> > > > > > > > bit worried that it might not catch everything, but it at
> least
> > > > > > doesn't
> > > > > > > > hang
> > > > > > > > in the while() loop I described in the bug report below (bug
> > > > #1934)
> > > > > > and
> > > > > > > > seems to process everything fine.
> > > > > > > > > > If you want more stability in the code, you might
> consider
> > > > > > changing over
> > > > > > > > to
> > > > > > > > XML output and parsing with Bio::SearchIO::blastxml. There
> are
> > > > some
> > > > > > > > changes
> > > > > > > > in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > > saving
> > > > > > XML
> > > > > > > > output, but I believe it parses everything regardless. If
> you
> > > look
> > > > > > back
> > > > > > > > the
> > > > > > > > last month or so there has been a bit of discussion here
> about
> > > it.
> > > > > > Jason
> > > > > > > > describes a bit on how to set up RemoteBlast for XML:
> > > > > > > > > > http://bioperl.org/news/2005/11/06/getting-blastxml-
> using-
> > > > > > remoteblast/
> > > > > > > > > > Christopher Fields
> > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > Dept. of Biochemistry
> > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > > > > > > > Sent: Friday, February 03, 2006 1:45 PM
> > > > > > > > > To: bioperl-l at bioperl.org
> > > > > > > > > Subject: [Bioperl-l] more question regarding
> RemoteBlast.pm
> > > > version
> > > > > > 1.28
> > > > > > > > >
> > > > > > > > > Hi, Everybody,
> > > > > > > > > I see this post and am wondering if this is the reason for
> the
> > > > > > > > > malfunctionning of my webserver. We set up a webserver
> named
> > > > MAK,
> > > > > > for
> > > > > > > > MITE
> > > > > > > > > sequence analysis. It was working very well until around
> > > > November
> > > > > > 2005,
> > > > > > > > > when it stopped returning any result (the site is fine and
> > > seems
> > > > to
> > > > > > be
> > > > > > > > > doing sth after submission). In the CGI script, I used
> > > > remoteblast
> > > > > > (that
> > > > > > > > > work was done in 2003) to do searches. I currently do not
> have
> > > > > > access to
> > > > > > > > > the server because I moved. Quite several people sent
> emails
> > > to
> > > > us
> > > > > > about
> > > > > > > > > its malfunctioning. Is there any suggestion on fixing the
> > > > problem?
> > > > > > > > Should
> > > > > > > > > I simplily ask the remoteblast.pm be replaced with the new
> > > > version?
> > > > > > > > > Thanks a lot,
> > > > > > > > > Guojun
> > > > > > > > >
> > > > > > > > > Department of Plant Biology
> > > > > > > > > University of Georgia
> > > > > > > > > Tel: 706-542-1857
> > > > > > > > > Fax: 706-542-1805
> > > > > > > > > http://www.arches.uga.edu/~guojun
> > > > > > > > > _____
> > > > > > > > >
> > > > > > > > > From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > > > > > > > To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au],
> 'Huang
> > > > Jian'
> > > > > > > > > [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > > [mailto:bioperl-
> > > > > > > > > l at bioperl.org]
> > > > > > > > > Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > >
> > > > > > > > > Like Nagesh says, try the latest RemoteBlast from bioperl-
> live
> > > > CVS.
> > > > > > It
> > > > > > > > > will
> > > > > > > > > work for saving text output. However, it will not parse
> > > anything
> > > > > > using
> > > > > > > > > next_result (it will likely hang) and will not save XML
> > > format.
> > > > See
> > > > > > > > these
> > > > > > > > > bugs:
> > > > > > > > >
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > > > > > > >
> > > > > > > > > for explanations and possible fixes (changes to
> RemoteBlast
> > > and
> > > > > > > > > Bio::SearchIO::blast). Note that these haven't been
> checked in
> > > > yet
> > > > > > so
> > > > > > > > are
> > > > > > > > > still not included in bioperl-live; they may be further
> > > modified
> > > > > > before
> > > > > > > > > committing to CVS. If you're not worried about XML, you
> could
> > > > just
> > > > > > try
> > > > > > > > the
> > > > > > > > > first fix, which is a change to SearchIO::blast.
> > > > > > > > >
> > > > > > > > > Nagesh, I remember you posting to the list a month ago
> using a
> > > > > > script
> > > > > > > > > which
> > > > > > > > > had problems; the script you used saves the output but
> doesn't
> > > > > > actually
> > > > > > > > > parse it (i.e. you don't use next_result() to go through
> the
> > > > data).
> > > > > > Is
> > > > > > > > the
> > > > > > > > > version of BLAST in your text output 2.2.12 or 2.2.13?
> Have
> > > you
> > > > > > tried
> > > > > > > > > parsing the output using "-readmethod => SearchIO" or "-
> > > > readmethod
> > > > > > =>
> > > > > > > > > blast"
> > > > > > > > > using your version of RemoteBlast and method
> next_result()?
> > > Like
> > > > > > below
> > > > > > > > > (from
> > > > > > > > > perldoc):
> > > > > > > > >
> > > > > > > > > while ( my @rids = $factory->each_rid ) {
> > > > > > > > > foreach my $rid ( @rids ) {
> > > > > > > > > my $rc = $factory->retrieve_blast($rid);
> > > > > > > > > if( !ref($rc) ) {
> > > > > > > > > if( $rc < 0 ) {
> > > > > > > > > $factory->remove_rid($rid);
> > > > > > > > > }
> > > > > > > > > print STDERR "." if ( $v > 0 );
> > > > > > > > > sleep 5;
> > > > > > > > > } else { # parsing
> > > > > > > > > starts here
> > > > > > > > > my $result = $rc->next_result(); # it should hang
> > > > > > > > > here
> > > > > > > > > #save the output
> > > > > > > > > my $filename = $result->query_name()."\.out";
> > > > > > > > > $factory->save_output($filename);
> > > > > > > > > $factory->remove_rid($rid);
> > > > > > > > > print "\nQuery Name: ", $result->query_name(), "\n";
> > > > > > > > > while ( my $hit = $result->next_hit ) {
> > > > > > > > > next unless ( $v > 0);
> > > > > > > > > print "\thit name is ", $hit->name, "\n";
> > > > > > > > > while( my $hsp = $hit->next_hsp ) {
> > > > > > > > > print "\t\tscore is ", $hsp->score, "\n";
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > My script hanged if I used next_result() in any way prior
> to
> > > the
> > > > > > fixes.
> > > > > > > > I
> > > > > > > > > want to see how many others are having the same issues
> with
> > > > parsing
> > > > > > > > using
> > > > > > > > > the CVS version of bioperl-live.
> > > > > > > > >
> > > > > > > > > Christopher Fields
> > > > > > > > > Postdoctoral Researcher - Switzer Lab
> > > > > > > > > Dept. of Biochemistry
> > > > > > > > > University of Illinois Urbana-Champaign
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-
> > > l-
> > > > > > > > > > bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > > > > > > > > Sent: Thursday, February 02, 2006 7:24 PM
> > > > > > > > > > To: Huang Jian; bioperl-l
> > > > > > > > > > Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > > > > > > > >
> > > > > > > > > > Hi Huang,
> > > > > > > > > > Thanks for the message. The older version of
> RemoteBlast.pm
> > > > works
> > > > > > on
> > > > > > > > the
> > > > > > > > > > logic of checking the temporary file size to determine
> > > whether
> > > > the
> > > > > > > > Blast
> > > > > > > > > > results are ready. This condition is not getting
> satisfied
> > > may
> > > > be
> > > > > > due
> > > > > > > > to
> > > > > > > > > > some changes brought about by NCBI. I had this problem
> > > > recently
> > > > > > and
> > > > > > > > > > figured out that the solution was to use the latest
> version
> > > > which
> > > > > > has
> > > > > > > > > > this problem fixed (does not use file size logic any
> more)
> > > > which
> > > > > > is
> > > > > > > > not
> > > > > > > > > > yet included in the BioPerl package.
> > > > > > > > > > Cheers
> > > > > > > > > > Nagesh
> > > > > > > > > >
> > > > > > > > > > Huang Jian wrote:
> > > > > > > > > >
> > > > > > > > > > > Dear Nagesh,
> > > > > > > > > > >
> > > > > > > > > > > I have replaced my old RemoteBlast.pm (v 1.17) with v
> 1.28
> > > > you
> > > > > > send
> > > > > > > > > > > me. Now it works perfectly!!!
> > > > > > > > > > >
> > > > > > > > > > > Thank you!!
> > > > > > > > > > >
> > > > > > > > > > > Huang
> > > > > > > > > > >
> > > > > > > > > > > ----- Original Message ----- From: "Nagesh Chakka"
> > > > > > > > > > > 
> > > > > > > > > > > To: "Huang Jian" ;
> "bioperl-l"
> > > > > > > > > > > 
> > > > > > > > > > > Sent: Friday, February 03, 2006 7:48 AM
> > > > > > > > > > > Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > > net,
> > > > so
> > > > > > still
> > > > > > > > > > > via email
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >> Hi Huang,
> > > > > > > > > > >> I see that you are submitting a sequence for a remote
> > > blast
> > > > > > search.
> > > > > > > > > Can
> > > > > > > > > > >> you check if the RemoteBlast.pm being used is v 1.28
> > > > > > (2005/12/09).
> > > > > > > > If
> > > > > > > > > > >> not I have attached it with this email, try to
> replace it
> > > > with
> > > > > > the
> > > > > > > > > old
> > > > > > > > > > >> one which has a bug.
> > > > > > > > > > >> Let me know if it works.
> > > > > > > > > > >> Nagesh
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > _______________________________________________
> > > > > > > > > > Bioperl-l mailing list
> > > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > _______________________________________________
> > > > > > > > > Bioperl-l mailing list
> > > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > > > > > > > > > > > > > > > > > > > >
> > > > > > _______________________________________________
> > > > > > > > Bioperl-l mailing list
> > > > > > > > Bioperl-l at lists.open-bio.org
> > > > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > > > >
> > > > > > > > _______________________________________________
> > > > > > Bioperl-l mailing list
> > > > > > Bioperl-l at lists.open-bio.org
> > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > > >
> > >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From sdavis2 at mail.nih.gov  Thu Feb 16 00:39:33 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 16 Feb 2006 00:39:33 -0000
Subject: [Bioperl-l] error running load_seqdatabase.pl
References: 
Message-ID: <000c01c63291$5de08600$6601a8c0@WATSON>


----- Original Message ----- 
From: "Angshu Kar" 
To: "bioperl-l" 
Sent: Thursday, December 29, 2005 5:50 PM
Subject: [Bioperl-l] error running load_seqdatabase.pl


> Hi,
>
> I'm getting the following error while trying to run :
>
> ./load_seqdatabase.pl -host localhost -dbname USBA -dbuser 
> postgres -format
> genbank NC_003076.gbk
>
> But I've a postgreSQL db and not a MySQL one...could anyone please guide 
> me
> troubleshoot this?

Angshu,

I would probably start with:

perldoc load_seqdatabase.pl

I think that will likely give you your answer.  Again, it is best to exhaust 
the resources at hand and to let the list know that you have done so 
(like--"I read the perldoc and tried this....").

Sean




From cain at cshl.edu  Wed Feb 15 16:07:28 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 15 Feb 2006 11:07:28 -0500
Subject: [Bioperl-l] Bio::Tools::GFF parsing error
In-Reply-To: <43F35043.7070705@cornell.edu>
References: <43F35043.7070705@cornell.edu>
Message-ID: <1140019648.2849.58.camel@localhost.localdomain>

Hi Robert,

No column should ever be padded with spaces; GFF columns should always
be separated by a single tab.  Therefore, I don't thing Bio::Tools::GFF
is at fault here.

Scott


On Wed, 2006-02-15 at 11:01 -0500, Robert Buels wrote:
> Hi all,
> 
> I'm parsing a GFF2 file with Bio::Tools::GFF (I would be using 
> FeatureIO, except it purports not to support gff 2), and the file looks 
> like:
> 
> ##gff-version 2
> ##date 2006-02-13
> ##sequence-region C01HBa0088L02.seq 1 120525
> C01HBa0088L02   RepeatMasker    similarity      3537    4267     3.3    
> -       .       Target "Motif:bac_end_repeat_family_345" 1 740
> C01HBa0088L02   RepeatMasker    similarity      4172    4279     2.9    
> +       .       Target "Motif:HRSiTERT00100141" 1 104
> C01HBa0088L02   RepeatMasker    similarity      4267    4323     0.0    
> -       .       Target "Motif:k_29" 150 206
> C01HBa0088L02   RepeatMasker    similarity      4322    4492    26.6    
> +       .       Target "Motif:PRSiTERT00300001" 1960 2129
> C01HBa0088L02   RepeatMasker    similarity      4557    5124    29.5    
> +       .       Target "Motif:PRSiTERT00300001" 2142 2711
> 
> Notice the score column is padded with spaces.
> 
> Bio::Tools::GFF does not like this, and says that ' 3.3' is not a valid 
> score.  My question is, who is wrong here, my input file or 
> Bio::Tools::GFF?  Should Bio::Tools::GFF be able to read this file?
> 
> Rob
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hlapp at gmx.net  Thu Feb 16 01:54:01 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 17:54:01 -0800
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: <001201c631a5$ce7496f0$15327e82@pyrimidine>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
Message-ID: 


On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:

> Hilmar,
>
> Good News: I've added a section to the bioperl wiki on installing  
> bioperl-db
> in Windows:
>
> http://www.bioperl.org/wiki/ 
> Installing_Bioperl_on_Windows#Installing_bioperl
> -db
>
> Bad News:  There's a new problem now. I updated from CVS yesterday; I  
> walked
> through the steps and ran 'nmake test', with everything passing fine.
> However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> every 5 minutes or so.  I noticed (when using '-debug') that it is  
> hanging
> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a  
> database,
> load the biosql schema, and load sequences w/o loading taxonomy, the  
> problem
> goes away.
>
> Here's the debugging output (I cut it off at the point it hangs up):
> [...]

> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND  
> ncbi_taxon_id =
> ?
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)

I'm a bit surprised if this is the query where it hangs. Are the  
indexes all there? There should be a primary key index on  
taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name  
over (taxon_id,name,name_class). Also, there should be separate indexes  
on taxon_name.taxon_id and taxon_name.name. Are they all there? If you  
reinstantiated the schema from the DDL then it seems unlikely that  
somehow the indexes have vanished except if you messed with the schema  
or the DDL.

Putting an index on taxon_name.name_class really can't make sense, so  
let's assume it can't be that.

So really I suspect this has something to do with the state of the  
database and the version of MySQL. In particular, from some 4.x version  
of MySQL under certain circumstances you have to analyze the statistics  
of the tables in order to get the optimizer pick up the indexes  
properly. Are you on MySQL 4.x and if so, have you done that?

There's the ANALYZE TABLE command:
http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html

Note the comment: "This statement works with MyISAM, BDB, and (as of  
MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?

Also, you can check the execution plan for the query using EXPLAIN.
http://dev.mysql.com/doc/refman/4.1/en/explain.html

This should show you whether the index would be picked up for the query  
or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to  
the db using the mysql shell (mysql).

I believe something similarly strange was encountered by someone using  
DB::GFF (or Chado) under MySQL, and if I recall correctly the solution  
was to optimize (analyze) the tables. Maybe someone who was in that  
thread reads this and can comment?

	-hilmar


>
> ----------------------------------------------------------------------- 
> -----
> -------------------------
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Thu Feb 16 03:56:14 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 15 Feb 2006 21:56:14 -0600
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: 
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	
Message-ID: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>



On Feb 15, 2006, at 7:54 PM, Hilmar Lapp wrote:

>
> On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:
>
>> Hilmar,
>>
>> Good News: I've added a section to the bioperl wiki on installing
>> bioperl-db
>> in Windows:
>>
>> http://www.bioperl.org/wiki/
>> Installing_Bioperl_on_Windows#Installing_bioperl
>> -db
>>
>> Bad News:  There's a new problem now. I updated from CVS yesterday; I
>> walked
>> through the steps and ran 'nmake test', with everything passing fine.
>> However, load_seqdatabase.pl is extremely slow; it's loading a  
>> sequence
>> every 5 minutes or so.  I noticed (when using '-debug') that it is
>> hanging
>> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
>> database,
>> load the biosql schema, and load sequences w/o loading taxonomy, the
>> problem
>> goes away.
>>
>> Here's the debugging output (I cut it off at the point it hangs up):
>> [...]
>
>> preparing UK select statement: SELECT taxon_name.taxon_id, NULL,  
>> NULL,
>> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name  
>> WHERE
>> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
>> ncbi_taxon_id =
>> ?
>> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
>> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
>
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on  
> taxon_name
> over (taxon_id,name,name_class). Also, there should be separate  
> indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

I looked in the mailing list archives and Barry mentions something here:

http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html

He rebuilt the database from scratch and got it working; no reason  
was given.  I wouldn't be surprised if it is something Mysql-related  
that pops up.  The strange thing is that only a few months ago  
everything ran well with this version of MySQL (v.5); this was with  
the first test database I installed on it.  Another strange thing (I  
think I mentioned it) is that NOT loading the taxonomy with  
load_ncbi_taxonomy.pl worked (everything was entered).  I'll try  
rebuilding the database from scratch to see what happens.  I am  
running this on Windows, so this is new territory...

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
>
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x  
> version
> of MySQL under certain circumstances you have to analyze the  
> statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
>
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
>
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
>
> This should show you whether the index would be picked up for the  
> query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).

I'll give these a shot and post what I find in the next few days.

> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?
>
> 	-hilmar

I wanted to also mention that we shouldn't check in the modifications  
to Bio::Root:Root until I confirm something (I'm at home and  
currently can't).  I tried running a script on an unrelated module  
using the modified Bio::Root::Roo (with the commas added after the  
'throw $class' statements.  Everything worked for $self->throw(),  
except the thrown message wasn't displayed.  I'll dig into it a bit  
more to see what happens.

>
>
>>
>> --------------------------------------------------------------------- 
>> --
>> -----
>> -------------------------
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
> -- 
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From osborne1 at optonline.net  Thu Feb 16 05:16:04 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 00:16:04 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

Harry,

It's not clear to me that NCBI's eutils offers this capability directly. You
can probably download Entrez Gene entries and parse them for coordinates but
I know of no way to remotely retrieve genomic sequences like this from NCBI
(ENSEMBL API perhaps?). What I had in mind uses the local approach that some
of us favor and to prove to myself that this is simple to do I wrote a
script that I just added to examples/tools, it's called extract_genes.pl and
it's based on Bio::DB::Fasta. Download the sequence files for a given
species to some dir, download Entrez Gene's gene2accession file, and run. It
creates and stores a hash for lookups, it won't read gene2accession each
time it runs.

Brian O.


On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From hlapp at gmx.net  Thu Feb 16 06:31:54 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 15 Feb 2006 22:31:54 -0800
Subject: [Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki,
	problems with bioperl-db
In-Reply-To: <12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
References: <001201c631a5$ce7496f0$15327e82@pyrimidine>
	
	<12B5EFA4-97BD-45BB-B821-46D116BB22CC@uiuc.edu>
Message-ID: 


On Feb 15, 2006, at 7:56 PM, Chris Fields wrote:

> [...]
> I looked in the mailing list archives and Barry mentions something 
> here:
>
> http://bioperl.org/pipermail/bioperl-l/2005-January/018093.html
>
> He rebuilt the database from scratch and got it working; no reason
> was given.  I wouldn't be surprised if it is something Mysql-related
> that pops up.

Note though that he was using PostgreSQL. With Pg you definitely need 
to 'vacuum,' which is their name for analyzing/optimizing the table(s).

>   The strange thing is that only a few months ago
> everything ran well with this version of MySQL (v.5); this was with
> the first test database I installed on it.  Another strange thing (I
> think I mentioned it) is that NOT loading the taxonomy with
> load_ncbi_taxonomy.pl worked (everything was entered).

That's not really strange, it is in fact consistent with the query you 
report as taking a long time. If you don't pre-load the taxonomy then 
the taxon and taxon_name tables are empty or almost empty and look-ups 
and joins of empty tables are amazingly fast :-J

[...]
> I wanted to also mention that we shouldn't check in the modifications
> to Bio::Root:Root until I confirm something (I'm at home and
> currently can't).

OK we'll hold off.

	-hilmar
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From michael.watson at bbsrc.ac.uk  Thu Feb 16 10:31:54 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 16 Feb 2006 10:31:54 -0000
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
Message-ID: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk>

Hi

I have two questions really.  I fetched bacterial genome sequences from
the NCBI using Bio::DB::GenBank.

Some of these sequence entries are CONTIG sequences, ie they just point
to other sequences that need to be joined together to form the entire
genome.

Looking at my downloads, it looks as if bioperl has done all the
necessary joining for me - or maybe it was the NCBI that did the
joining?

OK, so firstly, did bioperl do the joining, and if so, are all the
co-ordinates of the features updated to reflect their new location on
the new, joined sequence?

And secondly, sequence versions... I'm thinking that possibly the
sequence version of the CONTIG may be 1 (as it hasn't changed) yet the
versions of the sequences it refers to might have changed, so when I ask
bioperl if these sequences have been updated, I will be told no because
the CONTIG sequence version is 1, but I should be told yes because the
underlying sequences have...?

Make sense?

Thanks
Mick



From cjfields at uiuc.edu  Thu Feb 16 12:51:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 06:51:50 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
References: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
	<43F449E1.80605@esat.kuleuven.be>
Message-ID: <369C1D1F-DBCB-4161-A24A-7C3E579D337A@uiuc.edu>

Yeah, looks like it broke text output nucleotide parsing with that.   
XML output parsing still works though (as expected).  I'll give it a  
look.

Chris

On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote:

> Hi,
>
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast- 
> output. (see e.g. "Features flanking this part..." or "Features in  
> this part ..."), example added.
> The blast.pm module starts looking for the hsp-alignement- 
> information, but it dies when it hits this Feature-information.
>
> Pieter
>
>
>> gi|77552765|gb|DP000011.1| > query.fcgi? 
>> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza  
>> sativa (japonica cultivar-group) chromosome 12, complete
>
> sequence
> Length=27492551
>
> Features flanking this part of subject sequence:
>   3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm  
> sub-class  val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts>
>   2655 bp at 3' side: hypothetical protein  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts>
>
> Score = 36.2 bits (18),  Expect = 0.22
> Identities = 18/18 (100%), Gaps = 0/18 (0%)
> Strand=Plus/Minus
>
> Query  4         GTACTACTCTACTCTACT  21
>                 ||||||||||||||||||
>
> Sbjct  19257436  GTACTACTCTACTCTACT  19257419
>
>
> Features flanking this part of subject sequence:
>   2991 bp at 5' side: hypothetical protein  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts>
>   1131 bp at 3' side: hypothetical protein
>  val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts>
>
> Score = 36.2 bits (18),  Expect = 0.22
> Identities = 18/18 (100%), Gaps = 0/18 (0%)
> Strand=Plus/Minus
>
> Query  2         ATGTACTACTCTACTCTA  19
>                 ||||||||||||||||||
> Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
>
>
>
> Features in this part of subject sequence:
>   DHHC zinc finger domain, putative
>  val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts>
>
> Score = 34.2 bits (17),  Expect = 0.87
> Identities = 17/17 (100%), Gaps = 0/17 (0%)
> Strand=Plus/Plus
>
> Query  5         TACTACTCTACTCTACT  21
>                 |||||||||||||||||
> Sbjct  17616437  TACTACTCTACTCTACT  17616453
>
>
>
> Features flanking this part of subject sequence:
>   102 bp at 5' side: bZIP transcription factor, putative
>  val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts>
>   3740 bp at 3' side: yeast dcp1, putative  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts>
>
> Score = 32.2 bits (16),  Expect = 3.4
> Identities = 16/16 (100%), Gaps = 0/16 (0%)
> Strand=Plus/Plus
>
> Query  7        CTACTCTACTCTACTC  22
>                ||||||||||||||||
> Sbjct  2775880  CTACTCTACTCTACTC  2775895
>
>
> Features flanking this part of subject sequence:
>
>   21 bp at 5' side: peptide transporter T17F3.11, putative  www.ncbi.nlm.nih.gov/entrez/viewer.fcgi? 
> val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts>
>   10230 bp at 3' side: transposon protein, putative, unclassified  
>  val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts>
>
> Score = 32.2 bits (16),  Expect = 3.4
> Identities = 16/16 (100%), Gaps = 0/16 (0%)
> Strand=Plus/Minus
>
> Query  7         CTACTCTACTCTACTC  22
>
>                 ||||||||||||||||
> Sbjct  27323153  CTACTCTACTCTACTC  27323138
>
>
>
>
> Guojun Yang wrote:
>
>> Hi, Chris,
>> Finally the remoteblast test script works for the amino.fa query.  
>> but when I try a nucleic acid sequence (see below), Error occurs: "
>> waiting........
>> ------------- EXCEPTION  -------------
>> MSG: no data for midline  Features flanking this part of subject  
>> sequence:
>> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/ 
>> 5.8.3/Bio/Searc                             hIO/blast.pm:1172
>> STACK toplevel remoteblast_test:40
>> "
>> The query sequence is:
>> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
>> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
>> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
>> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
>>
>> The script (basically same as the remoteblast test, I only changed  
>> database to 'nr' and program to 'blastn' and filename to 'ost3'):
>> #!/usr/bin/perl
>>
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use Bio::Tools::Run::RemoteBlast;
>> use Bio::SearchIO;
>> use strict;
>> my $prog='blastn';
>> my $db='nr';
>> my $e_val=1e-10;
>> my @params=( -prog=>$prog,
>> 	-data=>$db,
>> 	-expect=>$e_val,
>> 	-readmethod=>'SearchIO');
>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>
>> my $v = 1;
>>
>> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
>>
>> while (my $input = $str->next_seq()){
>>  #Blast a sequence against a database:
>>  #Alternatively, you could  pass in a file with many
>>  #sequences rather than loop through sequence one at a time
>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>  #and swap the two lines below for an example of that.
>>  my $r = $factory->submit_blast($input);
>>  #my $r = $factory->submit_blast('amino.fa');
>>  print STDERR "waiting..." if( $v > 0 );
>>  while ( my @rids = $factory->each_rid ) {
>>    foreach my $rid ( @rids ) {
>>      my $rc = $factory->retrieve_blast($rid);
>>      if( !ref($rc) ) {
>>        if( $rc < 0 ) {
>>          $factory->remove_rid($rid);
>>        }
>>        print STDERR "." if ( $v > 0 );
>>        sleep 5;
>>      } else {
>>        my $result = $rc->next_result();
>>        #save the output
>>        my $filename = $result->query_name()."\.out";
>>        $factory->save_output($filename);
>>        $factory->remove_rid($rid);
>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>        while ( my $hit = $result->next_hit ) {
>>          next unless ( $v > 0);
>>          print "\thit name is ", $hit->name, "\n";
>>          while( my $hsp = $hit->next_hsp ) {
>>            print "\t\tscore is ", $hsp->score, "\n";
>>          }
>>        }
>>      }
>>    }
>>  }
>> }
>>
>>
>> Do you think there might still be something in the NCBI output  
>> format?
>>
>> Thank you,
>> Guojun
>>
>>
>>
>>
>> Guojun Yang
>> Department of Plant Biology
>> University of Georgia
>> Tel: 706-542-1857
>> Fax: 706-542-1805
>> http://www.arches.uga.edu/~guojun
>>
>>
>>
>> ----- Original Message -----
>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>
>>
>>
>>> Sorry, forgot to add that I didn't see the regex issue that you  
>>> mentioned.
>>> It could be a perl-related issue.  Try the fixes I mentioned and  
>>> see what
>>> happens.
>>>
>>>> Christopher Fields
>>>>
>>> Postdoctoral Researcher - Switzer Lab
>>> Dept. of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>>>> -----Original Message-----
>>>>>>
>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>> Sent: Tuesday, February 14, 2006 12:36 PM
>>>> To: 'gyang at plantbio.uga.edu'
>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>
>>>>>> It's a good habit to always add single quotes around words.   
>>>>>> The perl
>>>>>>
>>>> interpreter may think a single bare word is a subroutine or  
>>>> perlfunc
>>>> called with no args so will try to find a subroutine named blastp 
>>>> ().  My
>>>> debugger actually gives the error that the bare word blastp may  
>>>> conflict
>>>> with a future reserved word.  Like you said, 'use strict' will  
>>>> point that
>>>> out.
>>>>
>>>>>> As for the regex, it should match all the blast programs at  
>>>>>> NCBI (blastp,
>>>>>>
>>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure  
>>>> nothing
>>>> else passes through.
>>>>
>>>>>> So, if you are using the script below, there are several  
>>>>>> errors.  The bare
>>>>>>
>>>> words for $prog and $db need quotes, and the flags for you  
>>>> @params array
>>>> don't have a dash before them.  I get this after adding quotes  
>>>> but before
>>>> adding the dashes to @params:
>>>>
>>>>>> C:\Perl\Scripts>test_blast.pl
>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>
>>>> MSG:
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
>>>> live/Bio/Root/Root.pm:328
>>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
>>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl 
>>>> \bioperl-
>>>> live/Bio/Tools/Run/RemoteBlast.pm:256
>>>> STACK: C:\Perl\Scripts\test_blast.pl:15
>>>> -----------------------------------------------------------
>>>>
>>>>>> The last line indicates a problem with this line:
>>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>> Changing the @params to this:
>>>>>> my @params=( -prog=>$prog,
>>>>>>
>>>> 	-data=>$db,
>>>> 	-expect=>$e_val,
>>>> 	-readmethod=>'SearchIO');
>>>>
>>>>>> fixes it, and I get output as expected.
>>>>>> Christopher Fields
>>>>>>
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>>
>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>> Sent: Tuesday, February 14, 2006 11:48 AM
>>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>>
>>>>> Hi, Chris,
>>>>> When I tried with the perldoc script, It did not work either.  
>>>>> First it
>>>>> says $prog can not be bare word if I "use strict". I added  
>>>>> quotes on the
>>>>> words, then it says the value for $prog does not match expression
>>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
>>>>>
>>>> script
>>>>
>>>>> is shown below. Why is the expression "t?blast[pnx]"?
>>>>>
>>>>> #!/usr/bin/perl
>>>>>
>>>>> use Bio::SeqIO;
>>>>> use Bio::Seq;
>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>> use Bio::SearchIO;
>>>>>
>>>>>
>>>>> my $prog=blastp;
>>>>> my $db=swissprot;
>>>>> my $e_val=1e-10;
>>>>> my @params=( prog=>$prog,
>>>>> 	data=>$db,
>>>>> 	expect=>$e_val,
>>>>> 	readmethod=>'SearchIO');
>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>
>>>>> my $v = 1;
>>>>>
>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  
>>>>> 'fasta' );
>>>>>
>>>>> while (my $input = $str->next_seq()){
>>>>>  #Blast a sequence against a database:
>>>>>  #Alternatively, you could  pass in a file with many
>>>>>  #sequences rather than loop through sequence one at a time
>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>  #and swap the two lines below for an example of that.
>>>>>  my $r = $factory->submit_blast($input);
>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>    foreach my $rid ( @rids ) {
>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>      if( !ref($rc) ) {
>>>>>        if( $rc < 0 ) {
>>>>>          $factory->remove_rid($rid);
>>>>>        }
>>>>>        print STDERR "." if ( $v > 0 );
>>>>>        sleep 5;
>>>>>      } else {
>>>>>        my $result = $rc->next_result();
>>>>>        #save the output
>>>>>        my $filename = $result->query_name()."\.out";
>>>>>        $factory->save_output($filename);
>>>>>        $factory->remove_rid($rid);
>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>          next unless ( $v > 0);
>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>          }
>>>>>        }
>>>>>      }
>>>>>    }
>>>>>  }
>>>>> }
>>>>>
>>>>> Thank you for your help!
>>>>>
>>>>>
>>>>> Guojun
>>>>> Department of Plant Biology
>>>>> University of Georgia
>>>>>
>>>>> ----- Original Message -----
>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>> To: gyang at plantbio.uga.edu
>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>
>>>>>
>>>>>
>>>>>> Try two things:
>>>>>>
>>>>>>> 1)  Use a much simpler script, like the one in 'perldoc
>>>>>>>
>>>>>> Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's  
>>>>>> something
>>>>>>
>>>>> wrong
>>>>>
>>>>>> with the logic in your subroutine:
>>>>>>
>>>>>>> my $v = 1;
>>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  
>>>>>>> 'fasta' );
>>>>>>> while (my $input = $str->next_seq()){
>>>>>>>
>>>>>>  #Blast a sequence against a database:
>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>  #and swap the two lines below for an example of that.
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>      if( !ref($rc) ) {
>>>>>>        if( $rc < 0 ) {
>>>>>>          $factory->remove_rid($rid);
>>>>>>        }
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>      } else {
>>>>>>        my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>        my $filename = $result->query_name()."\.out";
>>>>>>        $factory->save_output($filename);
>>>>>>        $factory->remove_rid($rid);
>>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>>          next unless ( $v > 0);
>>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>>          }
>>>>>>        }
>>>>>>      }
>>>>>>    }
>>>>>>  }
>>>>>> }
>>>>>>
>>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works.  It
>>>>>>>
>>>> really
>>>>
>>>>>> shouldn't make that much of a difference, but I noticed that  
>>>>>> the CVS
>>>>>> RemoteBlast (1.28) was changed in Dec 2005, after  
>>>>>> bioperl-1.5.1 was
>>>>>> released; the Bugzilla version is based off CVS.
>>>>>>
>>>>>>> Christopher Fields
>>>>>>>
>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>> Dept. of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>>
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>> Sent: Monday, February 13, 2006 3:00 PM
>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>
>>>>>>>>> Thanks, Chris,
>>>>>>>>>
>>>>>>> I installed version 1.5.1 and replaced the blast.pm file with  
>>>>>>> the
>>>>>>>
>>>> one
>>>>
>>>>> from
>>>>>
>>>>>>> your bug report. The running version is 1.5 when I use the  
>>>>>>> command
>>>>>>>
>>>> you
>>>>
>>>>>>> sent me. But when I tried the script, it doesn't change much. My
>>>>>>> remoteblast code (portion) is here:
>>>>>>>
>>>>>>>>> sub search {
>>>>>>>>>
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} 
>>>>>>> ="$ORGN";
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'} 
>>>>>>> =5000;
>>>>>>> local
>>>>>>>
>>>>>>>
>>>> $Bio::Tools::Run::RemoteBlast::HEADER 
>>>> {'COMPOSITION_BASED_STATISTICS'}=
>>>>
>>>>>>> 'no';
>>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
>>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]",
>>>>>>> 			      -id=>"query",
>>>>>>> 			      -desc=>"new seq");
>>>>>>> my $len=$query->length();
>>>>>>> @db=('nr','htgs','wgs');
>>>>>>> foreach my $db (@db) {
>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'  
>>>>>>> =>'blastn',
>>>>>>> 						'-data' =>"$db",
>>>>>>>
>>>>>>>
>>> '-expect'=>"$E_value");
>>>
>>>>>>>>>>> my $blast_report = $factory->submit_blast($query);
>>>>>>>>>>>
>>>>>>>>> my @rids = $factory->each_rid();
>>>>>>>>>
>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>    print STDERR "$rid\n";
>>>>>>> }
>>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
>>>>>>> print STDERR "waiting...";
>>>>>>> sleep 60;
>>>>>>>
>>>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>>>
>>>>>>>    my $rc = $factory->retrieve_blast($rid);
>>>>>>>    while (!ref($rc) ) {
>>>>>>> 	if( $rc < 0 ) {
>>>>>>> # retrieve_blast returns -1 on error
>>>>>>> 	    $factory->remove_rid($rid);
>>>>>>> 	    print "Error!\n";
>>>>>>> 	    send_error($email,$function,$seqname,$queryname[$ST]);
>>>>>>> 	    die "Can't retrieve $rid";
>>>>>>> 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
>>>>>>>
>>>> finished'
>>>>
>>>>>>> 	    sleep 60;
>>>>>>> 	    $rc = $factory->retrieve_blast($rid);
>>>>>>> 	}
>>>>>>>    }
>>>>>>>    if (ref($rc)) {
>>>>>>> 	print STDERR "Done.\n";
>>>>>>> 	 while( my $result = $rc->next_result) {
>>>>>>> 	    while( my $hit = $result->next_hit()) {
>>>>>>> 	    	$hit_name=$hit->name;
>>>>>>> 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
>>>>>>> 		$name=$1;
>>>>>>> 		@left_plus_start=();
>>>>>>> 		@left_plus_end=();
>>>>>>> 		@left_minus_start=();
>>>>>>> 		@left_minus_end=();
>>>>>>> 		@right_plus_start=();
>>>>>>> 		@right_plus_end=();
>>>>>>> 		@right_minus_start=();
>>>>>>> 		@right_minus_end=();
>>>>>>>
>>>>>>>>> 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
>>>>>>>>>
>>>>>>> 		while( my $hsp = $hit->next_hsp()) {
>>>>>>> ......
>>>>>>>
>>>>>>>>> It was working quite well before around October laster  
>>>>>>>>> year, but
>>>>>>>>>
>>>>> it has
>>>>>
>>>>>>> stopped since then, When a submission is sent via a webpage,  
>>>>>>> the cgi
>>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there,
>>>>>>>
>>>>> finally
>>>>>
>>>>>>> the expected email is received but without real results  
>>>>>>> although it
>>>>>>>
>>>>> does
>>>>>
>>>>>>> contain something from other parts of the script. Apparently the
>>>>>>>
>>>>> search
>>>>>
>>>>>>> sub did not return anything (I know there is something should be
>>>>>>> returned.). Is it also possible the format of the NCBI output  
>>>>>>> for
>>>>>>>
>>>> each
>>>>
>>>>>>> result has changed?
>>>>>>> Thank you,
>>>>>>> Guojun
>>>>>>>
>>>>>>>>>>> Department of Plant Biology
>>>>>>>>>>>
>>>>>>> University of Georgia
>>>>>>>
>>>>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>>>>
>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>
>>>>>>>>>>>> How do you know two versions are installed (i.e. how are
>>>>>>>>>>>>
>>>> you
>>>>
>>>>> checking
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> version)?  Do you see have two complete bioperl  
>>>>>>>> distributions (in
>>>>>>>>
>>>>> two
>>>>>
>>>>>>>> separate directories) or are you looking in modules?  Here's  
>>>>>>>> the
>>>>>>>>
>>>> way
>>>>
>>>>> to
>>>>>
>>>>>>>> check the version (from the FAQ):
>>>>>>>>
>>>>>>>>> perl -MBio::Root::Version -e 'print
>>>>>>>>>
>>>>> $Bio::Root::Version::VERSION,"\n"'
>>>>>
>>>>>>>>> If you have two full bioperl distributions on your computer,
>>>>>>>>>
>>>>> normally
>>>>>
>>>>>>> only
>>>>>>>
>>>>>>>> one will be in use unless you have explicitly set the  
>>>>>>>> environment
>>>>>>>>
>>>>>>> variable
>>>>>>>
>>>>>>>> PERL5LIB.  The PERL5LIB  directories will be searched first  
>>>>>>>> before
>>>>>>>>
>>>>> your
>>>>>
>>>>>>>> normal perl directory list (@INC) is searched.  You MAY get  
>>>>>>>> some
>>>>>>>>
>>>>> mixing
>>>>>
>>>>>>>> then, but only if perl can't find a particular module in the  
>>>>>>>> path
>>>>>>>>
>>>>>>> designated
>>>>>>>
>>>>>>>> in PERL5LIB; then it will progress through the directories  
>>>>>>>> listed
>>>>>>>>
>>>> in
>>>>
>>>>>>> @INC.
>>>>>>>
>>>>>>>> This may happen if a module is unique to a particular  
>>>>>>>> release, but
>>>>>>>>
>>>>>>> shouldn't
>>>>>>>
>>>>>>>> happen for the majority of modules, including RemoteBlast.  You
>>>>>>>>
>>>> can
>>>>
>>>>>>> check
>>>>>>>
>>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC  
>>>>>>>> will
>>>>>>>>
>>>>> differ
>>>>>
>>>>>>>> depending on your OS, perl build, etc.
>>>>>>>>
>>>>>>>>> Regardless, if you follow the directions for installing  
>>>>>>>>> bioperl
>>>>>>>>>
>>>>> for
>>>>>
>>>>>>> your
>>>>>>>
>>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make  
>>>>>>>> install',
>>>>>>>>
>>>>> unless
>>>>>
>>>>>>> you
>>>>>>>
>>>>>>>> explicitly change the installation directory when using 'perl
>>>>>>>>
>>>>>>> Makefile.PL'),
>>>>>>>
>>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will
>>>>>>>>
>>>>> install
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>> Bioperl distribution you downloaded over the old version in  
>>>>>>>> @INC.
>>>>>>>>
>>>>> See
>>>>>
>>>>>>> this
>>>>>>>
>>>>>>>> page:
>>>>>>>>
>>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
>>>>>>>>> for more details.
>>>>>>>>> Christopher Fields
>>>>>>>>>
>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>> Dept. of Biochemistry
>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>
>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM
>>>>>>>>> To: bioperl-l at lists.open-bio.org
>>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>>>
>>>>>>>>>>> Hi, Chris,
>>>>>>>>>>>
>>>>>>>>> I do have different versions of bioperl on my Linux machine
>>>>>>>>>
>>>> (1.4.
>>>>
>>>>> and
>>>>>
>>>>>>>>> 1.5.0), this may be the problem. Should I just install  
>>>>>>>>> bioperl-
>>>>>>>>>
>>>>> 1.5.1
>>>>>
>>>>>>> or I
>>>>>>>
>>>>>>>>> need to uninstall and remove the previous versions. I could  
>>>>>>>>> not
>>>>>>>>>
>>>>> find
>>>>>
>>>>>>> any
>>>>>>>
>>>>>>>>> hint on uninstalling bioperl on linux. Could you please  
>>>>>>>>> give me
>>>>>>>>>
>>>>> some
>>>>>
>>>>>>>>> suggestion?
>>>>>>>>> Thanks,
>>>>>>>>> Guojun
>>>>>>>>>
>>>>>>>>>>> Department of Plant Biology
>>>>>>>>>>>
>>>>>>>>> University of Georgia
>>>>>>>>>      _____
>>>>>>>>>
>>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>
>>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely
>>>>>>>>>>>>>>>
>>>>>>> updated from CVS
>>>>>>>
>>>>>>>>> which isn't the latest fix.
>>>>>>>>>
>>>>>>>>>>> Make sure that you check the following:
>>>>>>>>>>> 1) Always post to the mailing list:
>>>>>>>>>>>
>>>>>>>>> http://www.bioperl.org/wiki/ 
>>>>>>>>> HOWTO:Beginners#Getting_Assistance .
>>>>>>>>>
>>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live
>>>>>>>>>>>
>>>>> (CVS)
>>>>>
>>>>>>>>> installed first.  Perform a clean installation; do not upgrade
>>>>>>>>>
>>>>> only
>>>>>
>>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
>>>>>>>>>
>>>> can't
>>>>
>>>>>>>>> guarantee that mixing modules from old and new distributions
>>>>>>>>>
>>>> (1.4
>>>>
>>>>> and
>>>>>
>>>>>>>>> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl- 
>>>>>>>>> live
>>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be
>>>>>>>>>
>>>>> saved
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>> parsed; it will not parse the newest BLAST text output from  
>>>>>>>>> NCBI
>>>>>>>>>
>>>>>>> (v2.2.13)
>>>>>>>
>>>>>>>>> but it should still save it. I believe as long as  
>>>>>>>>> next_results()
>>>>>>>>>
>>>>> isn't
>>>>>
>>>>>>>>> called, it will work.
>>>>>>>>>
>>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST
>>>>>>>>>>>
>>>> 2.2.13
>>>>
>>>>>>> text output
>>>>>>>
>>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by
>>>>>>>>>
>>>> Roger
>>>>
>>>>> Hall
>>>>>
>>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be
>>>>>>>>>
>>>>> (Jason
>>>>>
>>>>>>> or
>>>>>>>
>>>>>>>>> whomever is in charge of Bio::SearchIO).  They can be found in
>>>>>>>>>
>>>>>>> Bugzilla:
>>>>>>>
>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>>
>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>
>>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the
>>>>>>>>>>>
>>>>> option
>>>>>
>>>>>>> of
>>>>>>>
>>>>>>>>> saving XML output, so isn't necessary if you don't plan on  
>>>>>>>>> using
>>>>>>>>>
>>>>> this
>>>>>
>>>>>>>>> option.  And, remember, they haven't been committed yet to  
>>>>>>>>> CVS,
>>>>>>>>>
>>>>> which
>>>>>
>>>>>>>>> means that the final version will change to refle the new
>>>>>>>>>
>>>> version.
>>>>
>>>>>>>>>>>>> Christopher Fields
>>>>>>>>>>>>>
>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>> Dept. of Biochemistry
>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>>>>>    _____
>>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>>>>>>>>>>
>>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM
>>>>>>>>> To: Chris Fields
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>>>> Hi, Chris
>>>>>>>>>>>>>
>>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work
>>>>>>>>>>>
>>>>> for
>>>>>
>>>>>>> my cgi
>>>>>>>
>>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I  
>>>>>>>>> didn't
>>>>>>>>>
>>>>> even
>>>>>
>>>>>>> get
>>>>>>>
>>>>>>>>> any RID. Is there any suggestion?
>>>>>>>>>
>>>>>>>>>>>>>>> Guojun
>>>>>>>>>>>>>>>
>>>>>>>>>>>>> Guojun Yang
>>>>>>>>>>>>>
>>>>>>>>> Department of Plant Biology
>>>>>>>>> University of Georgia
>>>>>>>>> Tel: 706-542-1857
>>>>>>>>> Fax: 706-542-1805
>>>>>>>>> http://www.arches.uga.edu/~guojun
>>>>>>>>>    _____
>>>>>>>>>
>>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>>>
>>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
>>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500
>>>>>>>>> Subject: RE: [Bioperl-l] more question regarding  
>>>>>>>>> RemoteBlast.pm
>>>>>>>>>
>>>>>>> version
>>>>>>>
>>>>>>>>> 1.28
>>>>>>>>>
>>>>>>>>>>> I would say give the new code a try, but realize that it
>>>>>>>>>>>
>>>>> hasn't
>>>>>
>>>>>>> been
>>>>>>>
>>>>>>>>> checked
>>>>>>>>> in (like I said below). I will try going over the modified
>>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is
>>>>>>>>>
>>>>> anything I
>>>>>
>>>>>>>>> might
>>>>>>>>> have missed. The changed order in the header of BLAST text
>>>>>>>>>
>>>> output
>>>>
>>>>> has
>>>>>
>>>>>>> me a
>>>>>>>
>>>>>>>>> bit worried that it might not catch everything, but it at  
>>>>>>>>> least
>>>>>>>>>
>>>>>>> doesn't
>>>>>>>
>>>>>>>>> hang
>>>>>>>>> in the while() loop I described in the bug report below (bug
>>>>>>>>>
>>>>> #1934)
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>> seems to process everything fine.
>>>>>>>>>
>>>>>>>>>>> If you want more stability in the code, you might consider
>>>>>>>>>>>
>>>>>>> changing over
>>>>>>>
>>>>>>>>> to
>>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are
>>>>>>>>>
>>>>> some
>>>>>
>>>>>>>>> changes
>>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
>>>>>>>>>
>>>>> saving
>>>>>
>>>>>>> XML
>>>>>>>
>>>>>>>>> output, but I believe it parses everything regardless. If you
>>>>>>>>>
>>>> look
>>>>
>>>>>>> back
>>>>>>>
>>>>>>>>> the
>>>>>>>>> last month or so there has been a bit of discussion here about
>>>>>>>>>
>>>> it.
>>>>
>>>>>>> Jason
>>>>>>>
>>>>>>>>> describes a bit on how to set up RemoteBlast for XML:
>>>>>>>>>
>>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using-
>>>>>>>>>>>
>>>>>>> remoteblast/
>>>>>>>
>>>>>>>>>>> Christopher Fields
>>>>>>>>>>>
>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>> Dept. of Biochemistry
>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>>>
>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM
>>>>>>>>>> To: bioperl-l at bioperl.org
>>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>>>
>>>>> version
>>>>>
>>>>>>> 1.28
>>>>>>>
>>>>>>>>>> Hi, Everybody,
>>>>>>>>>> I see this post and am wondering if this is the reason for  
>>>>>>>>>> the
>>>>>>>>>> malfunctionning of my webserver. We set up a webserver named
>>>>>>>>>>
>>>>> MAK,
>>>>>
>>>>>>> for
>>>>>>>
>>>>>>>>> MITE
>>>>>>>>>
>>>>>>>>>> sequence analysis. It was working very well until around
>>>>>>>>>>
>>>>> November
>>>>>
>>>>>>> 2005,
>>>>>>>
>>>>>>>>>> when it stopped returning any result (the site is fine and
>>>>>>>>>>
>>>> seems
>>>>
>>>>> to
>>>>>
>>>>>>> be
>>>>>>>
>>>>>>>>>> doing sth after submission). In the CGI script, I used
>>>>>>>>>>
>>>>> remoteblast
>>>>>
>>>>>>> (that
>>>>>>>
>>>>>>>>>> work was done in 2003) to do searches. I currently do not  
>>>>>>>>>> have
>>>>>>>>>>
>>>>>>> access to
>>>>>>>
>>>>>>>>>> the server because I moved. Quite several people sent emails
>>>>>>>>>>
>>>> to
>>>>
>>>>> us
>>>>>
>>>>>>> about
>>>>>>>
>>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the
>>>>>>>>>>
>>>>> problem?
>>>>>
>>>>>>>>> Should
>>>>>>>>>
>>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new
>>>>>>>>>>
>>>>> version?
>>>>>
>>>>>>>>>> Thanks a lot,
>>>>>>>>>> Guojun
>>>>>>>>>>
>>>>>>>>>> Department of Plant Biology
>>>>>>>>>> University of Georgia
>>>>>>>>>> Tel: 706-542-1857
>>>>>>>>>> Fax: 706-542-1805
>>>>>>>>>> http://www.arches.uga.edu/~guojun
>>>>>>>>>> _____
>>>>>>>>>>
>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
>>>>>>>>>>
>>>>> Jian'
>>>>>
>>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
>>>>>>>>>>
>>>> [mailto:bioperl-
>>>>
>>>>>>>>>> l at bioperl.org]
>>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500
>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>
>>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl- 
>>>>>>>>>> live
>>>>>>>>>>
>>>>> CVS.
>>>>>
>>>>>>> It
>>>>>>>
>>>>>>>>>> will
>>>>>>>>>> work for saving text output. However, it will not parse
>>>>>>>>>>
>>>> anything
>>>>
>>>>>>> using
>>>>>>>
>>>>>>>>>> next_result (it will likely hang) and will not save XML
>>>>>>>>>>
>>>> format.
>>>>
>>>>> See
>>>>>
>>>>>>>>> these
>>>>>>>>>
>>>>>>>>>> bugs:
>>>>>>>>>>
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>>
>>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast
>>>>>>>>>>
>>>> and
>>>>
>>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been  
>>>>>>>>>> checked in
>>>>>>>>>>
>>>>> yet
>>>>>
>>>>>>> so
>>>>>>>
>>>>>>>>> are
>>>>>>>>>
>>>>>>>>>> still not included in bioperl-live; they may be further
>>>>>>>>>>
>>>> modified
>>>>
>>>>>>> before
>>>>>>>
>>>>>>>>>> committing to CVS. If you're not worried about XML, you could
>>>>>>>>>>
>>>>> just
>>>>>
>>>>>>> try
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> first fix, which is a change to SearchIO::blast.
>>>>>>>>>>
>>>>>>>>>> Nagesh, I remember you posting to the list a month ago  
>>>>>>>>>> using a
>>>>>>>>>>
>>>>>>> script
>>>>>>>
>>>>>>>>>> which
>>>>>>>>>> had problems; the script you used saves the output but  
>>>>>>>>>> doesn't
>>>>>>>>>>
>>>>>>> actually
>>>>>>>
>>>>>>>>>> parse it (i.e. you don't use next_result() to go through the
>>>>>>>>>>
>>>>> data).
>>>>>
>>>>>>> Is
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have
>>>>>>>>>>
>>>> you
>>>>
>>>>>>> tried
>>>>>>>
>>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "-
>>>>>>>>>>
>>>>> readmethod
>>>>>
>>>>>>> =>
>>>>>>>
>>>>>>>>>> blast"
>>>>>>>>>> using your version of RemoteBlast and method next_result()?
>>>>>>>>>>
>>>> Like
>>>>
>>>>>>> below
>>>>>>>
>>>>>>>>>> (from
>>>>>>>>>> perldoc):
>>>>>>>>>>
>>>>>>>>>> while ( my @rids = $factory->each_rid ) {
>>>>>>>>>> foreach my $rid ( @rids ) {
>>>>>>>>>> my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>> if( !ref($rc) ) {
>>>>>>>>>> if( $rc < 0 ) {
>>>>>>>>>> $factory->remove_rid($rid);
>>>>>>>>>> }
>>>>>>>>>> print STDERR "." if ( $v > 0 );
>>>>>>>>>> sleep 5;
>>>>>>>>>> } else { # parsing
>>>>>>>>>> starts here
>>>>>>>>>> my $result = $rc->next_result(); # it should hang
>>>>>>>>>> here
>>>>>>>>>> #save the output
>>>>>>>>>> my $filename = $result->query_name()."\.out";
>>>>>>>>>> $factory->save_output($filename);
>>>>>>>>>> $factory->remove_rid($rid);
>>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>>>>> while ( my $hit = $result->next_hit ) {
>>>>>>>>>> next unless ( $v > 0);
>>>>>>>>>> print "\thit name is ", $hit->name, "\n";
>>>>>>>>>> while( my $hsp = $hit->next_hsp ) {
>>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n";
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> My script hanged if I used next_result() in any way prior to
>>>>>>>>>>
>>>> the
>>>>
>>>>>>> fixes.
>>>>>>>
>>>>>>>>> I
>>>>>>>>>
>>>>>>>>>> want to see how many others are having the same issues with
>>>>>>>>>>
>>>>> parsing
>>>>>
>>>>>>>>> using
>>>>>>>>>
>>>>>>>>>> the CVS version of bioperl-live.
>>>>>>>>>>
>>>>>>>>>> Christopher Fields
>>>>>>>>>> Postdoctoral Researcher - Switzer Lab
>>>>>>>>>> Dept. of Biochemistry
>>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> -----Original Message-----
>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
>>>>>>>>>>>
>>>> l-
>>>>
>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
>>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM
>>>>>>>>>>> To: Huang Jian; bioperl-l
>>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>>
>>>>>>>>>>> Hi Huang,
>>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm
>>>>>>>>>>>
>>>>> works
>>>>>
>>>>>>> on
>>>>>>>
>>>>>>>>> the
>>>>>>>>>
>>>>>>>>>>> logic of checking the temporary file size to determine
>>>>>>>>>>>
>>>> whether
>>>>
>>>>> the
>>>>>
>>>>>>>>> Blast
>>>>>>>>>
>>>>>>>>>>> results are ready. This condition is not getting satisfied
>>>>>>>>>>>
>>>> may
>>>>
>>>>> be
>>>>>
>>>>>>> due
>>>>>>>
>>>>>>>>> to
>>>>>>>>>
>>>>>>>>>>> some changes brought about by NCBI. I had this problem
>>>>>>>>>>>
>>>>> recently
>>>>>
>>>>>>> and
>>>>>>>
>>>>>>>>>>> figured out that the solution was to use the latest version
>>>>>>>>>>>
>>>>> which
>>>>>
>>>>>>> has
>>>>>>>
>>>>>>>>>>> this problem fixed (does not use file size logic any more)
>>>>>>>>>>>
>>>>> which
>>>>>
>>>>>>> is
>>>>>>>
>>>>>>>>> not
>>>>>>>>>
>>>>>>>>>>> yet included in the BioPerl package.
>>>>>>>>>>> Cheers
>>>>>>>>>>> Nagesh
>>>>>>>>>>>
>>>>>>>>>>> Huang Jian wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Dear Nagesh,
>>>>>>>>>>>>
>>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
>>>>>>>>>>>>
>>>>> you
>>>>>
>>>>>>> send
>>>>>>>
>>>>>>>>>>>> me. Now it works perfectly!!!
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you!!
>>>>>>>>>>>>
>>>>>>>>>>>> Huang
>>>>>>>>>>>>
>>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka"
>>>>>>>>>>>> 
>>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l"
>>>>>>>>>>>> 
>>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM
>>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the
>>>>>>>>>>>>
>>>> net,
>>>>
>>>>> so
>>>>>
>>>>>>> still
>>>>>>>
>>>>>>>>>>>> via email
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Huang,
>>>>>>>>>>>>> I see that you are submitting a sequence for a remote
>>>>>>>>>>>>>
>>>> blast
>>>>
>>>>>>> search.
>>>>>>>
>>>>>>>>>> Can
>>>>>>>>>>
>>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28
>>>>>>>>>>>>>
>>>>>>> (2005/12/09).
>>>>>>>
>>>>>>>>> If
>>>>>>>>>
>>>>>>>>>>>>> not I have attached it with this email, try to replace it
>>>>>>>>>>>>>
>>>>> with
>>>>>
>>>>>>> the
>>>>>>>
>>>>>>>>>> old
>>>>>>>>>>
>>>>>>>>>>>>> one which has a bug.
>>>>>>>>>>>>> Let me know if it works.
>>>>>>>>>>>>> Nagesh
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Bioperl-l mailing list
>>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>>
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>>>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From cjfields at uiuc.edu  Thu Feb 16 12:52:31 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 06:52:31 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
Message-ID: 

I think a method was recently implemented in Bio::DB::GenBank to  
retrieve a segment of DNA given start and end coordinates in GenBank  
format; that should contain the features you need.  I requested it  
~Nov-Dec in the mailing list but didn't get a chance to test it.   
Would that help?

On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:

> Harry,
>
> It's not clear to me that NCBI's eutils offers this capability  
> directly. You
> can probably download Entrez Gene entries and parse them for  
> coordinates but
> I know of no way to remotely retrieve genomic sequences like this  
> from NCBI
> (ENSEMBL API perhaps?). What I had in mind uses the local approach  
> that some
> of us favor and to prove to myself that this is simple to do I wrote a
> script that I just added to examples/tools, it's called  
> extract_genes.pl and
> it's based on Bio::DB::Fasta. Download the sequence files for a given
> species to some dir, download Entrez Gene's gene2accession file,  
> and run. It
> creates and stores a hash for lookups, it won't read gene2accession  
> each
> time it runs.
>
> Brian O.
>
>
> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>
>> Hi Brian,
>>
>> Thanks very much for the pointers and the speed of your reply and  
>> apologies
>> for the speed of mine.
>>
>> This looks good, but what I was looking for was a bioP approach  
>> for hooking to
>> an API at NCBI or EBI so I could get this info and seqs from  
>> them.  In this
>> case, speed of retrieval is not critical and I'd rather not  
>> download the
>> entirety of the sequences to a local disk to hack at them.
>>
>> I've determined a screen-scraping approach to get them and could  
>> script that,
>> but I thought that bioP had a method for using NCBI's external  
>> API's, tho it
>> may be that my memory is faulty or the approach is no longer  
>> supported due to
>> overload.
>>
>> Does NCBI make such APIs available anymore?  I searched a bit for  
>> docs on them
>> but couldn't find anything (unless it's buried in the NCBI tookit,  
>> which I
>> haven't started to excavate).
>>
>> Failing that, would SEALS provide such a service? Any PerlPinipeds  
>> listening?
>>
>> Harry
>>
>>
>>
>>
>>
>>
>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>> Harry,
>>>
>>> Hope you're doing well. The approach could be based on  
>>> Bio::DB::Fasta. So,
>>> from its documentation:
>>>
>>>   use Bio::DB::Fasta;
>>>
>>>   # create database from directory of fasta files
>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>
>>>   # simple access (for those without Bioperl)
>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>   my @ids     = $db->ids;
>>>   my $length   = $db->length('CHROMOSOME_I');
>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>   my $header   = $db->header('CHROMOSOME_I');
>>>
>>>   # Bioperl-style access
>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>
>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>   my $seq     = $obj->seq;
>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>
>>> Do you already have the offsets?
>>>
>>> Brian O.
>>>
>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>> Hi All,
>>>>
>>>> After perusing the tutorial and other docs for a an evening, I  
>>>> still
>>>> can't find the answer to this.  Forgive me if I've missed something
>>>> obvious.
>>>>
>>>> This should not be a novel request, but I've not found it  
>>>> answered.  If
>>>> bioperl isn't the best way to do this, I'd be grateful to a  
>>>> pointer to a
>>>> better way, especially if it includes an illuminating bit of code.
>>>>
>>>> The problem is to retrieve genomic sequences plus & minus some  
>>>> offset
>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>> common followup chore for some extra analysis from a gene  
>>>> expression
>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed  
>>>> the
>>>> sequence type to specify...?
>>>>
>>>>
>>>> TIA!
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From anst at kvl.dk  Thu Feb 16 09:24:51 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Thu, 16 Feb 2006 10:24:51 +0100
Subject: [Bioperl-l] searchIO bug?
Message-ID: <43F452F30200009B00000EC9@gwia.kvl.dk>

Hi! 
 
 
I am blasting a protein seq against an identical protein. 
I am trying to parse the protein header by using the query_description
method in the SearchIO module. 
After using the query_description method I use split / /      in order
to easily access the different header components. 
Here I discover that the query_description method is somehow introducing
a space between number 5 comma and the following chromosome position
number 
in the exon chromosome position list!? 
This truncates the list of exon chromosome positions from 7 to 4, later
yielding a wrong number of the introns counted. 
 
Is this a bug? 
 
Attached is: 
 
testblast1.pl: the blastprogram to run. 
 
Q0045 the seq that is used as both query and database seq. 
(Q0045 has to be formated in order to be used as a database: formatdb -i
Q0045 -p T -o F) 
 
 
Regards Anders. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: blastp5.pl
Type: application/octet-stream
Size: 50384 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 

From anst at kvl.dk  Thu Feb 16 10:20:06 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Thu, 16 Feb 2006 11:20:06 +0100
Subject: [Bioperl-l] another searchIO bug?
Message-ID: <43F45FE60200009B00000ED6@gwia.kvl.dk>

Hi! 
 
I am blasting a protein seq (query) against an identical seq with a
deletion of Aa nr 61 (subject). 
Then I print out the type of nomatch Aa and its position. 
The nomatch for the query seq is Aa G at position 61, which is correct. 
The nomatch for the subject seq is V at position 60, which is definitely
not correct!? 
 
Is this a bug? 
 
testblast2.pl is the program to run 
 
Q0045 is the query seq. 
 
Q0045del61 is the subject seq (it has to be formated: formatdb -i
Q0045del61 -p T -o F). 
 
Regards Anders. 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testblast2.pl
Type: application/octet-stream
Size: 6109 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045del61
Type: application/octet-stream
Size: 872 bytes
Desc: not available
URL: 

From mcoyne at channing.harvard.edu  Wed Feb 15 21:20:17 2006
From: mcoyne at channing.harvard.edu (Michael Coyne)
Date: Wed, 15 Feb 2006 16:20:17 -0500
Subject: [Bioperl-l] Primer maps?
Message-ID: <6.2.0.14.0.20060215155422.01d44a98@localhost>

An HTML attachment was scrubbed...
URL: 

From Pieter.Monsieurs at esat.kuleuven.be  Thu Feb 16 09:46:09 2006
From: Pieter.Monsieurs at esat.kuleuven.be (Pieter Monsieurs)
Date: Thu, 16 Feb 2006 10:46:09 +0100
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
 version 1.28
In-Reply-To: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
References: <20060215143941.54e91487@dogwood.plantbio.uga.edu>
Message-ID: <43F449E1.80605@esat.kuleuven.be>

Hi,

I have the same problem with the blast.pm-file.
The people of NCBI added some extra info when giving the Blast-output. 
(see e.g. "Features flanking this part..." or "Features in this part 
..."), example added.
The blast.pm module starts looking for the hsp-alignement-information, 
but it dies when it hits this Feature-information.

Pieter


>gi|77552765|gb|DP000011.1|  Oryza sativa (japonica cultivar-group) chromosome 12, complete 

sequence
Length=27492551

 Features flanking this part of subject sequence:
   
3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class 
   
2655 bp at 3' side: hypothetical protein 

 Score = 36.2 bits (18),  Expect = 0.22
 Identities = 18/18 (100%), Gaps = 0/18 (0%)
 Strand=Plus/Minus

Query  4         GTACTACTCTACTCTACT  21
                 ||||||||||||||||||

Sbjct  19257436  GTACTACTCTACTCTACT  19257419


 Features flanking this part of subject sequence:
   
2991 bp at 5' side: hypothetical protein 
   1131 bp at 3' side: hypothetical protein
 

 Score = 36.2 bits (18),  Expect = 0.22
 Identities = 18/18 (100%), Gaps = 0/18 (0%)
 Strand=Plus/Minus

Query  2         ATGTACTACTCTACTCTA  19
                 ||||||||||||||||||
Sbjct  27006915  ATGTACTACTCTACTCTA  27006898



 Features in this part of subject sequence:
   DHHC zinc finger domain, putative
 

 Score = 34.2 bits (17),  Expect = 0.87
 Identities = 17/17 (100%), Gaps = 0/17 (0%)
 Strand=Plus/Plus

Query  5         TACTACTCTACTCTACT  21
                 |||||||||||||||||
Sbjct  17616437  TACTACTCTACTCTACT  17616453



 Features flanking this part of subject sequence:
   102 bp at 5' side: bZIP transcription factor, putative
 
   3740 bp at 3' side: yeast dcp1, putative 

 Score = 32.2 bits (16),  Expect = 
3.4
 Identities = 16/16 (100%), Gaps = 0/16 (0%)
 Strand=Plus/Plus

Query  7        CTACTCTACTCTACTC  22
                ||||||||||||||||
Sbjct  2775880  CTACTCTACTCTACTC  2775895


 Features flanking this part of subject sequence:

   21 bp at 5' side: peptide transporter T17F3.11, putative 
   
10230 bp at 3' side: transposon protein, putative, unclassified 

 Score = 32.2 bits (16),  Expect = 3.4
 Identities = 16/16 (100%), Gaps = 0/16 (0%)
 Strand=Plus/Minus

Query  7         CTACTCTACTCTACTC  22

                 ||||||||||||||||
Sbjct  27323153  CTACTCTACTCTACTC  27323138




Guojun Yang wrote:

>Hi, Chris,
>Finally the remoteblast test script works for the amino.fa query. but when I try a nucleic acid sequence (see below), Error occurs: 
>"
>waiting........
>------------- EXCEPTION  -------------
>MSG: no data for midline  Features flanking this part of subject sequence:
>STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/5.8.3/Bio/Searc                             hIO/blast.pm:1172
>STACK toplevel remoteblast_test:40
>"
>The query sequence is:
>CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
>GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
>AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
>AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
>
>The script (basically same as the remoteblast test, I only changed database to 'nr' and program to 'blastn' and filename to 'ost3'):
>#!/usr/bin/perl
>
>use Bio::SeqIO;
>use Bio::Seq;
>use Bio::Tools::Run::RemoteBlast;
>use Bio::SearchIO;
>use strict;
>my $prog='blastn';
>my $db='nr';
>my $e_val=1e-10;
>my @params=( -prog=>$prog,
>	-data=>$db,
>	-expect=>$e_val,
>	-readmethod=>'SearchIO');
>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>
>my $v = 1;
>
>my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
>
>while (my $input = $str->next_seq()){
>  #Blast a sequence against a database:
>  #Alternatively, you could  pass in a file with many
>  #sequences rather than loop through sequence one at a time
>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>  #and swap the two lines below for an example of that.
>  my $r = $factory->submit_blast($input);
>  #my $r = $factory->submit_blast('amino.fa');
>  print STDERR "waiting..." if( $v > 0 );
>  while ( my @rids = $factory->each_rid ) {
>    foreach my $rid ( @rids ) {
>      my $rc = $factory->retrieve_blast($rid);
>      if( !ref($rc) ) {
>        if( $rc < 0 ) {
>          $factory->remove_rid($rid);
>        }
>        print STDERR "." if ( $v > 0 );
>        sleep 5;
>      } else {
>        my $result = $rc->next_result();
>        #save the output
>        my $filename = $result->query_name()."\.out";
>        $factory->save_output($filename);
>        $factory->remove_rid($rid);
>        print "\nQuery Name: ", $result->query_name(), "\n";
>        while ( my $hit = $result->next_hit ) {
>          next unless ( $v > 0);
>          print "\thit name is ", $hit->name, "\n";
>          while( my $hsp = $hit->next_hsp ) {
>            print "\t\tscore is ", $hsp->score, "\n";
>          }
>        }
>      }
>    }
>  }
>}
>
>
>Do you think there might still be something in the NCBI output format?
>
>Thank you,
>Guojun
>
>
>
>
>Guojun Yang
>Department of Plant Biology
>University of Georgia
>Tel: 706-542-1857
>Fax: 706-542-1805
>http://www.arches.uga.edu/~guojun
>
>
>
>----- Original Message -----
>From: Chris Fields [mailto:cjfields at uiuc.edu]
>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
>
>
>  
>
>>Sorry, forgot to add that I didn't see the regex issue that you mentioned.
>>It could be a perl-related issue.  Try the fixes I mentioned and see what
>>happens.
>>    
>>
>>>Christopher Fields
>>>      
>>>
>>Postdoctoral Researcher - Switzer Lab
>>Dept. of Biochemistry
>>University of Illinois Urbana-Champaign 
>>    
>>
>>>>>-----Original Message-----
>>>>>          
>>>>>
>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>Sent: Tuesday, February 14, 2006 12:36 PM
>>>To: 'gyang at plantbio.uga.edu'
>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>      
>>>
>>>>>It's a good habit to always add single quotes around words.  The perl
>>>>>          
>>>>>
>>>interpreter may think a single bare word is a subroutine or perlfunc
>>>called with no args so will try to find a subroutine named blastp().  My
>>>debugger actually gives the error that the bare word blastp may conflict
>>>with a future reserved word.  Like you said, 'use strict' will point that
>>>out.
>>>      
>>>
>>>>>As for the regex, it should match all the blast programs at NCBI (blastp,
>>>>>          
>>>>>
>>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
>>>else passes through.
>>>      
>>>
>>>>>So, if you are using the script below, there are several errors.  The bare
>>>>>          
>>>>>
>>>words for $prog and $db need quotes, and the flags for you @params array
>>>don't have a dash before them.  I get this after adding quotes but before
>>>adding the dashes to @params:
>>>      
>>>
>>>>>C:\Perl\Scripts>test_blast.pl
>>>>>------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>          
>>>>>
>>>MSG:
>>>STACK: Error::throw
>>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
>>>live/Bio/Root/Root.pm:328
>>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
>>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
>>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
>>>live/Bio/Tools/Run/RemoteBlast.pm:256
>>>STACK: C:\Perl\Scripts\test_blast.pl:15
>>>-----------------------------------------------------------
>>>      
>>>
>>>>>The last line indicates a problem with this line:
>>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>Changing the @params to this:
>>>>>my @params=( -prog=>$prog,
>>>>>          
>>>>>
>>>	-data=>$db,
>>>	-expect=>$e_val,
>>>	-readmethod=>'SearchIO');
>>>      
>>>
>>>>>fixes it, and I get output as expected.
>>>>>Christopher Fields
>>>>>          
>>>>>
>>>Postdoctoral Researcher - Switzer Lab
>>>Dept. of Biochemistry
>>>University of Illinois Urbana-Champaign
>>>      
>>>
>>>>>>>>-----Original Message-----
>>>>>>>>                
>>>>>>>>
>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>Sent: Tuesday, February 14, 2006 11:48 AM
>>>>To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
>>>>
>>>>Hi, Chris,
>>>>When I tried with the perldoc script, It did not work either. First it
>>>>says $prog can not be bare word if I "use strict". I added quotes on the
>>>>words, then it says the value for $prog does not match expression
>>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
>>>>        
>>>>
>>>script
>>>      
>>>
>>>>is shown below. Why is the expression "t?blast[pnx]"?
>>>>
>>>>#!/usr/bin/perl
>>>>
>>>>use Bio::SeqIO;
>>>>use Bio::Seq;
>>>>use Bio::Tools::Run::RemoteBlast;
>>>>use Bio::SearchIO;
>>>>
>>>>
>>>>my $prog=blastp;
>>>>my $db=swissprot;
>>>>my $e_val=1e-10;
>>>>my @params=( prog=>$prog,
>>>>	data=>$db,
>>>>	expect=>$e_val,
>>>>	readmethod=>'SearchIO');
>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
>>>>
>>>>my $v = 1;
>>>>
>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
>>>>
>>>>while (my $input = $str->next_seq()){
>>>>  #Blast a sequence against a database:
>>>>  #Alternatively, you could  pass in a file with many
>>>>  #sequences rather than loop through sequence one at a time
>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>  #and swap the two lines below for an example of that.
>>>>  my $r = $factory->submit_blast($input);
>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>  while ( my @rids = $factory->each_rid ) {
>>>>    foreach my $rid ( @rids ) {
>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>      if( !ref($rc) ) {
>>>>        if( $rc < 0 ) {
>>>>          $factory->remove_rid($rid);
>>>>        }
>>>>        print STDERR "." if ( $v > 0 );
>>>>        sleep 5;
>>>>      } else {
>>>>        my $result = $rc->next_result();
>>>>        #save the output
>>>>        my $filename = $result->query_name()."\.out";
>>>>        $factory->save_output($filename);
>>>>        $factory->remove_rid($rid);
>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>        while ( my $hit = $result->next_hit ) {
>>>>          next unless ( $v > 0);
>>>>          print "\thit name is ", $hit->name, "\n";
>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>          }
>>>>        }
>>>>      }
>>>>    }
>>>>  }
>>>>}
>>>>
>>>>Thank you for your help!
>>>>
>>>>
>>>>Guojun
>>>>Department of Plant Biology
>>>>University of Georgia
>>>>
>>>>----- Original Message -----
>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>To: gyang at plantbio.uga.edu
>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>
>>>>
>>>>        
>>>>
>>>>>Try two things:
>>>>>          
>>>>>
>>>>>>1)  Use a much simpler script, like the one in 'perldoc
>>>>>>            
>>>>>>
>>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
>>>>>          
>>>>>
>>>>wrong
>>>>        
>>>>
>>>>>with the logic in your subroutine:
>>>>>          
>>>>>
>>>>>>my $v = 1;
>>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
>>>>>>while (my $input = $str->next_seq()){
>>>>>>            
>>>>>>
>>>>>  #Blast a sequence against a database:
>>>>>  #Alternatively, you could  pass in a file with many
>>>>>  #sequences rather than loop through sequence one at a time
>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>  #and swap the two lines below for an example of that.
>>>>>  my $r = $factory->submit_blast($input);
>>>>>  #my $r = $factory->submit_blast('amino.fa');
>>>>>  print STDERR "waiting..." if( $v > 0 );
>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>    foreach my $rid ( @rids ) {
>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>      if( !ref($rc) ) {
>>>>>        if( $rc < 0 ) {
>>>>>          $factory->remove_rid($rid);
>>>>>        }
>>>>>        print STDERR "." if ( $v > 0 );
>>>>>        sleep 5;
>>>>>      } else {
>>>>>        my $result = $rc->next_result();
>>>>>        #save the output
>>>>>        my $filename = $result->query_name()."\.out";
>>>>>        $factory->save_output($filename);
>>>>>        $factory->remove_rid($rid);
>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>        while ( my $hit = $result->next_hit ) {
>>>>>          next unless ( $v > 0);
>>>>>          print "\thit name is ", $hit->name, "\n";
>>>>>          while( my $hsp = $hit->next_hsp ) {
>>>>>            print "\t\tscore is ", $hsp->score, "\n";
>>>>>          }
>>>>>        }
>>>>>      }
>>>>>    }
>>>>>  }
>>>>>}
>>>>>          
>>>>>
>>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
>>>>>>            
>>>>>>
>>>really
>>>      
>>>
>>>>>shouldn't make that much of a difference, but I noticed that the CVS
>>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
>>>>>released; the Bugzilla version is based off CVS.
>>>>>          
>>>>>
>>>>>>Christopher Fields
>>>>>>            
>>>>>>
>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>Dept. of Biochemistry
>>>>>University of Illinois Urbana-Champaign
>>>>>          
>>>>>
>>>>>>>-----Original Message-----
>>>>>>>              
>>>>>>>
>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>Sent: Monday, February 13, 2006 3:00 PM
>>>>>>To: bioperl-l at lists.open-bio.org
>>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>            
>>>>>>
>>>>>>>>Thanks, Chris,
>>>>>>>>                
>>>>>>>>
>>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
>>>>>>            
>>>>>>
>>>one
>>>      
>>>
>>>>from
>>>>        
>>>>
>>>>>>your bug report. The running version is 1.5 when I use the command
>>>>>>            
>>>>>>
>>>you
>>>      
>>>
>>>>>>sent me. But when I tried the script, it doesn't change much. My
>>>>>>remoteblast code (portion) is here:
>>>>>>            
>>>>>>
>>>>>>>>sub search {
>>>>>>>>                
>>>>>>>>
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
>>>>>>local
>>>>>>
>>>>>>            
>>>>>>
>>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
>>>      
>>>
>>>>>>'no';
>>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
>>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
>>>>>>			      -id=>"query",
>>>>>>			      -desc=>"new seq");
>>>>>>my $len=$query->length();
>>>>>>@db=('nr','htgs','wgs');
>>>>>>foreach my $db (@db) {
>>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
>>>>>>						'-data' =>"$db",
>>>>>>
>>>>>>            
>>>>>>
>>'-expect'=>"$E_value");
>>    
>>
>>>>>>>>>>my $blast_report = $factory->submit_blast($query);
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>my @rids = $factory->each_rid();
>>>>>>>>                
>>>>>>>>
>>>>>>foreach my $rid ( @rids ) {
>>>>>>    print STDERR "$rid\n";
>>>>>>}
>>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
>>>>>>print STDERR "waiting...";
>>>>>>sleep 60;
>>>>>>            
>>>>>>
>>>>>>>>foreach my $rid ( @rids ) {
>>>>>>>>                
>>>>>>>>
>>>>>>    my $rc = $factory->retrieve_blast($rid);
>>>>>>    while (!ref($rc) ) {
>>>>>>	if( $rc < 0 ) {
>>>>>># retrieve_blast returns -1 on error
>>>>>>	    $factory->remove_rid($rid);
>>>>>>	    print "Error!\n";
>>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
>>>>>>	    die "Can't retrieve $rid";
>>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
>>>>>>            
>>>>>>
>>>finished'
>>>      
>>>
>>>>>>	    sleep 60;
>>>>>>	    $rc = $factory->retrieve_blast($rid);
>>>>>>	}
>>>>>>    }
>>>>>>    if (ref($rc)) {
>>>>>>	print STDERR "Done.\n";
>>>>>>	 while( my $result = $rc->next_result) {
>>>>>>	    while( my $hit = $result->next_hit()) {
>>>>>>	    	$hit_name=$hit->name;
>>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
>>>>>>		$name=$1;
>>>>>>		@left_plus_start=();
>>>>>>		@left_plus_end=();
>>>>>>		@left_minus_start=();
>>>>>>		@left_minus_end=();
>>>>>>		@right_plus_start=();
>>>>>>		@right_plus_end=();
>>>>>>		@right_minus_start=();
>>>>>>		@right_minus_end=();
>>>>>>            
>>>>>>
>>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
>>>>>>>>                
>>>>>>>>
>>>>>>		while( my $hsp = $hit->next_hsp()) {
>>>>>>......
>>>>>>            
>>>>>>
>>>>>>>>It was working quite well before around October laster year, but
>>>>>>>>                
>>>>>>>>
>>>>it has
>>>>        
>>>>
>>>>>>stopped since then, When a submission is sent via a webpage, the cgi
>>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
>>>>>>            
>>>>>>
>>>>finally
>>>>        
>>>>
>>>>>>the expected email is received but without real results although it
>>>>>>            
>>>>>>
>>>>does
>>>>        
>>>>
>>>>>>contain something from other parts of the script. Apparently the
>>>>>>            
>>>>>>
>>>>search
>>>>        
>>>>
>>>>>>sub did not return anything (I know there is something should be
>>>>>>returned.). Is it also possible the format of the NCBI output for
>>>>>>            
>>>>>>
>>>each
>>>      
>>>
>>>>>>result has changed?
>>>>>>Thank you,
>>>>>>Guojun
>>>>>>            
>>>>>>
>>>>>>>>>>Department of Plant Biology
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>University of Georgia
>>>>>>            
>>>>>>
>>>>>>>>>>>>----- Original Message -----
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>            
>>>>>>
>>>>>>>>>>>How do you know two versions are installed (i.e. how are
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>you
>>>      
>>>
>>>>checking
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>version)?  Do you see have two complete bioperl distributions (in
>>>>>>>              
>>>>>>>
>>>>two
>>>>        
>>>>
>>>>>>>separate directories) or are you looking in modules?  Here's the
>>>>>>>              
>>>>>>>
>>>way
>>>      
>>>
>>>>to
>>>>        
>>>>
>>>>>>>check the version (from the FAQ):
>>>>>>>              
>>>>>>>
>>>>>>>>perl -MBio::Root::Version -e 'print
>>>>>>>>                
>>>>>>>>
>>>>$Bio::Root::Version::VERSION,"\n"'
>>>>        
>>>>
>>>>>>>>If you have two full bioperl distributions on your computer,
>>>>>>>>                
>>>>>>>>
>>>>normally
>>>>        
>>>>
>>>>>>only
>>>>>>            
>>>>>>
>>>>>>>one will be in use unless you have explicitly set the environment
>>>>>>>              
>>>>>>>
>>>>>>variable
>>>>>>            
>>>>>>
>>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first before
>>>>>>>              
>>>>>>>
>>>>your
>>>>        
>>>>
>>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
>>>>>>>              
>>>>>>>
>>>>mixing
>>>>        
>>>>
>>>>>>>then, but only if perl can't find a particular module in the path
>>>>>>>              
>>>>>>>
>>>>>>designated
>>>>>>            
>>>>>>
>>>>>>>in PERL5LIB; then it will progress through the directories listed
>>>>>>>              
>>>>>>>
>>>in
>>>      
>>>
>>>>>>@INC.
>>>>>>            
>>>>>>
>>>>>>>This may happen if a module is unique to a particular release, but
>>>>>>>              
>>>>>>>
>>>>>>shouldn't
>>>>>>            
>>>>>>
>>>>>>>happen for the majority of modules, including RemoteBlast.  You
>>>>>>>              
>>>>>>>
>>>can
>>>      
>>>
>>>>>>check
>>>>>>            
>>>>>>
>>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
>>>>>>>              
>>>>>>>
>>>>differ
>>>>        
>>>>
>>>>>>>depending on your OS, perl build, etc.
>>>>>>>              
>>>>>>>
>>>>>>>>Regardless, if you follow the directions for installing bioperl
>>>>>>>>                
>>>>>>>>
>>>>for
>>>>        
>>>>
>>>>>>your
>>>>>>            
>>>>>>
>>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
>>>>>>>              
>>>>>>>
>>>>unless
>>>>        
>>>>
>>>>>>you
>>>>>>            
>>>>>>
>>>>>>>explicitly change the installation directory when using 'perl
>>>>>>>              
>>>>>>>
>>>>>>Makefile.PL'),
>>>>>>            
>>>>>>
>>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
>>>>>>>              
>>>>>>>
>>>>install
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>Bioperl distribution you downloaded over the old version in @INC.
>>>>>>>              
>>>>>>>
>>>>See
>>>>        
>>>>
>>>>>>this
>>>>>>            
>>>>>>
>>>>>>>page:
>>>>>>>              
>>>>>>>
>>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
>>>>>>>>for more details.
>>>>>>>>Christopher Fields
>>>>>>>>                
>>>>>>>>
>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>Dept. of Biochemistry
>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>              
>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
>>>>>>>>To: bioperl-l at lists.open-bio.org
>>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Hi, Chris,
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>I do have different versions of bioperl on my Linux machine
>>>>>>>>                
>>>>>>>>
>>>(1.4.
>>>      
>>>
>>>>and
>>>>        
>>>>
>>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
>>>>>>>>                
>>>>>>>>
>>>>1.5.1
>>>>        
>>>>
>>>>>>or I
>>>>>>            
>>>>>>
>>>>>>>>need to uninstall and remove the previous versions. I could not
>>>>>>>>                
>>>>>>>>
>>>>find
>>>>        
>>>>
>>>>>>any
>>>>>>            
>>>>>>
>>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
>>>>>>>>                
>>>>>>>>
>>>>some
>>>>        
>>>>
>>>>>>>>suggestion?
>>>>>>>>Thanks,
>>>>>>>>Guojun
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Department of Plant Biology
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>University of Georgia
>>>>>>>>      _____
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
>>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>updated from CVS
>>>>>>            
>>>>>>
>>>>>>>>which isn't the latest fix.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>Make sure that you check the following:
>>>>>>>>>>1) Always post to the mailing list:
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
>>>>>>>>>>                    
>>>>>>>>>>
>>>>(CVS)
>>>>        
>>>>
>>>>>>>>installed first.  Perform a clean installation; do not upgrade
>>>>>>>>                
>>>>>>>>
>>>>only
>>>>        
>>>>
>>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
>>>>>>>>                
>>>>>>>>
>>>can't
>>>      
>>>
>>>>>>>>guarantee that mixing modules from old and new distributions
>>>>>>>>                
>>>>>>>>
>>>(1.4
>>>      
>>>
>>>>and
>>>>        
>>>>
>>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
>>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
>>>>>>>>                
>>>>>>>>
>>>>saved
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
>>>>>>>>                
>>>>>>>>
>>>>>>(v2.2.13)
>>>>>>            
>>>>>>
>>>>>>>>but it should still save it. I believe as long as next_results()
>>>>>>>>                
>>>>>>>>
>>>>isn't
>>>>        
>>>>
>>>>>>>>called, it will work.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
>>>>>>>>>>                    
>>>>>>>>>>
>>>2.2.13
>>>      
>>>
>>>>>>text output
>>>>>>            
>>>>>>
>>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
>>>>>>>>                
>>>>>>>>
>>>Roger
>>>      
>>>
>>>>Hall
>>>>        
>>>>
>>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
>>>>>>>>                
>>>>>>>>
>>>>(Jason
>>>>        
>>>>
>>>>>>or
>>>>>>            
>>>>>>
>>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
>>>>>>>>                
>>>>>>>>
>>>>>>Bugzilla:
>>>>>>            
>>>>>>
>>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
>>>>>>>>>>                    
>>>>>>>>>>
>>>>option
>>>>        
>>>>
>>>>>>of
>>>>>>            
>>>>>>
>>>>>>>>saving XML output, so isn't necessary if you don't plan on using
>>>>>>>>                
>>>>>>>>
>>>>this
>>>>        
>>>>
>>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
>>>>>>>>                
>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>>>means that the final version will change to refle the new
>>>>>>>>                
>>>>>>>>
>>>version.
>>>      
>>>
>>>>>>>>>>>>Christopher Fields
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>Dept. of Biochemistry
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>    _____
>>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
>>>>>>>>To: Chris Fields
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>Hi, Chris
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
>>>>>>>>>>                    
>>>>>>>>>>
>>>>for
>>>>        
>>>>
>>>>>>my cgi
>>>>>>            
>>>>>>
>>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
>>>>>>>>                
>>>>>>>>
>>>>even
>>>>        
>>>>
>>>>>>get
>>>>>>            
>>>>>>
>>>>>>>>any RID. Is there any suggestion?
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>>>Guojun
>>>>>>>>>>>>>>                            
>>>>>>>>>>>>>>
>>>>>>>>>>>>Guojun Yang
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>Department of Plant Biology
>>>>>>>>University of Georgia
>>>>>>>>Tel: 706-542-1857
>>>>>>>>Fax: 706-542-1805
>>>>>>>>http://www.arches.uga.edu/~guojun
>>>>>>>>    _____
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
>>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
>>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>                
>>>>>>>>
>>>>>>version
>>>>>>            
>>>>>>
>>>>>>>>1.28
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>I would say give the new code a try, but realize that it
>>>>>>>>>>                    
>>>>>>>>>>
>>>>hasn't
>>>>        
>>>>
>>>>>>been
>>>>>>            
>>>>>>
>>>>>>>>checked
>>>>>>>>in (like I said below). I will try going over the modified
>>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
>>>>>>>>                
>>>>>>>>
>>>>anything I
>>>>        
>>>>
>>>>>>>>might
>>>>>>>>have missed. The changed order in the header of BLAST text
>>>>>>>>                
>>>>>>>>
>>>output
>>>      
>>>
>>>>has
>>>>        
>>>>
>>>>>>me a
>>>>>>            
>>>>>>
>>>>>>>>bit worried that it might not catch everything, but it at least
>>>>>>>>                
>>>>>>>>
>>>>>>doesn't
>>>>>>            
>>>>>>
>>>>>>>>hang
>>>>>>>>in the while() loop I described in the bug report below (bug
>>>>>>>>                
>>>>>>>>
>>>>#1934)
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>seems to process everything fine.
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>If you want more stability in the code, you might consider
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>changing over
>>>>>>            
>>>>>>
>>>>>>>>to
>>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
>>>>>>>>                
>>>>>>>>
>>>>some
>>>>        
>>>>
>>>>>>>>changes
>>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
>>>>>>>>                
>>>>>>>>
>>>>saving
>>>>        
>>>>
>>>>>>XML
>>>>>>            
>>>>>>
>>>>>>>>output, but I believe it parses everything regardless. If you
>>>>>>>>                
>>>>>>>>
>>>look
>>>      
>>>
>>>>>>back
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>last month or so there has been a bit of discussion here about
>>>>>>>>                
>>>>>>>>
>>>it.
>>>      
>>>
>>>>>>Jason
>>>>>>            
>>>>>>
>>>>>>>>describes a bit on how to set up RemoteBlast for XML:
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>remoteblast/
>>>>>>            
>>>>>>
>>>>>>>>>>Christopher Fields
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>Dept. of Biochemistry
>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>-----Original Message-----
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
>>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
>>>>>>>>>To: bioperl-l at bioperl.org
>>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
>>>>>>>>>                  
>>>>>>>>>
>>>>version
>>>>        
>>>>
>>>>>>1.28
>>>>>>            
>>>>>>
>>>>>>>>>Hi, Everybody,
>>>>>>>>>I see this post and am wondering if this is the reason for the
>>>>>>>>>malfunctionning of my webserver. We set up a webserver named
>>>>>>>>>                  
>>>>>>>>>
>>>>MAK,
>>>>        
>>>>
>>>>>>for
>>>>>>            
>>>>>>
>>>>>>>>MITE
>>>>>>>>                
>>>>>>>>
>>>>>>>>>sequence analysis. It was working very well until around
>>>>>>>>>                  
>>>>>>>>>
>>>>November
>>>>        
>>>>
>>>>>>2005,
>>>>>>            
>>>>>>
>>>>>>>>>when it stopped returning any result (the site is fine and
>>>>>>>>>                  
>>>>>>>>>
>>>seems
>>>      
>>>
>>>>to
>>>>        
>>>>
>>>>>>be
>>>>>>            
>>>>>>
>>>>>>>>>doing sth after submission). In the CGI script, I used
>>>>>>>>>                  
>>>>>>>>>
>>>>remoteblast
>>>>        
>>>>
>>>>>>(that
>>>>>>            
>>>>>>
>>>>>>>>>work was done in 2003) to do searches. I currently do not have
>>>>>>>>>                  
>>>>>>>>>
>>>>>>access to
>>>>>>            
>>>>>>
>>>>>>>>>the server because I moved. Quite several people sent emails
>>>>>>>>>                  
>>>>>>>>>
>>>to
>>>      
>>>
>>>>us
>>>>        
>>>>
>>>>>>about
>>>>>>            
>>>>>>
>>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
>>>>>>>>>                  
>>>>>>>>>
>>>>problem?
>>>>        
>>>>
>>>>>>>>Should
>>>>>>>>                
>>>>>>>>
>>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
>>>>>>>>>                  
>>>>>>>>>
>>>>version?
>>>>        
>>>>
>>>>>>>>>Thanks a lot,
>>>>>>>>>Guojun
>>>>>>>>>
>>>>>>>>>Department of Plant Biology
>>>>>>>>>University of Georgia
>>>>>>>>>Tel: 706-542-1857
>>>>>>>>>Fax: 706-542-1805
>>>>>>>>>http://www.arches.uga.edu/~guojun
>>>>>>>>>_____
>>>>>>>>>
>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
>>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
>>>>>>>>>                  
>>>>>>>>>
>>>>Jian'
>>>>        
>>>>
>>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
>>>>>>>>>                  
>>>>>>>>>
>>>[mailto:bioperl-
>>>      
>>>
>>>>>>>>>l at bioperl.org]
>>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>
>>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
>>>>>>>>>                  
>>>>>>>>>
>>>>CVS.
>>>>        
>>>>
>>>>>>It
>>>>>>            
>>>>>>
>>>>>>>>>will
>>>>>>>>>work for saving text output. However, it will not parse
>>>>>>>>>                  
>>>>>>>>>
>>>anything
>>>      
>>>
>>>>>>using
>>>>>>            
>>>>>>
>>>>>>>>>next_result (it will likely hang) and will not save XML
>>>>>>>>>                  
>>>>>>>>>
>>>format.
>>>      
>>>
>>>>See
>>>>        
>>>>
>>>>>>>>these
>>>>>>>>                
>>>>>>>>
>>>>>>>>>bugs:
>>>>>>>>>
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
>>>>>>>>>
>>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
>>>>>>>>>                  
>>>>>>>>>
>>>and
>>>      
>>>
>>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
>>>>>>>>>                  
>>>>>>>>>
>>>>yet
>>>>        
>>>>
>>>>>>so
>>>>>>            
>>>>>>
>>>>>>>>are
>>>>>>>>                
>>>>>>>>
>>>>>>>>>still not included in bioperl-live; they may be further
>>>>>>>>>                  
>>>>>>>>>
>>>modified
>>>      
>>>
>>>>>>before
>>>>>>            
>>>>>>
>>>>>>>>>committing to CVS. If you're not worried about XML, you could
>>>>>>>>>                  
>>>>>>>>>
>>>>just
>>>>        
>>>>
>>>>>>try
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>first fix, which is a change to SearchIO::blast.
>>>>>>>>>
>>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
>>>>>>>>>                  
>>>>>>>>>
>>>>>>script
>>>>>>            
>>>>>>
>>>>>>>>>which
>>>>>>>>>had problems; the script you used saves the output but doesn't
>>>>>>>>>                  
>>>>>>>>>
>>>>>>actually
>>>>>>            
>>>>>>
>>>>>>>>>parse it (i.e. you don't use next_result() to go through the
>>>>>>>>>                  
>>>>>>>>>
>>>>data).
>>>>        
>>>>
>>>>>>Is
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
>>>>>>>>>                  
>>>>>>>>>
>>>you
>>>      
>>>
>>>>>>tried
>>>>>>            
>>>>>>
>>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
>>>>>>>>>                  
>>>>>>>>>
>>>>readmethod
>>>>        
>>>>
>>>>>>=>
>>>>>>            
>>>>>>
>>>>>>>>>blast"
>>>>>>>>>using your version of RemoteBlast and method next_result()?
>>>>>>>>>                  
>>>>>>>>>
>>>Like
>>>      
>>>
>>>>>>below
>>>>>>            
>>>>>>
>>>>>>>>>(from
>>>>>>>>>perldoc):
>>>>>>>>>
>>>>>>>>>while ( my @rids = $factory->each_rid ) {
>>>>>>>>>foreach my $rid ( @rids ) {
>>>>>>>>>my $rc = $factory->retrieve_blast($rid);
>>>>>>>>>if( !ref($rc) ) {
>>>>>>>>>if( $rc < 0 ) {
>>>>>>>>>$factory->remove_rid($rid);
>>>>>>>>>}
>>>>>>>>>print STDERR "." if ( $v > 0 );
>>>>>>>>>sleep 5;
>>>>>>>>>} else { # parsing
>>>>>>>>>starts here
>>>>>>>>>my $result = $rc->next_result(); # it should hang
>>>>>>>>>here
>>>>>>>>>#save the output
>>>>>>>>>my $filename = $result->query_name()."\.out";
>>>>>>>>>$factory->save_output($filename);
>>>>>>>>>$factory->remove_rid($rid);
>>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
>>>>>>>>>while ( my $hit = $result->next_hit ) {
>>>>>>>>>next unless ( $v > 0);
>>>>>>>>>print "\thit name is ", $hit->name, "\n";
>>>>>>>>>while( my $hsp = $hit->next_hsp ) {
>>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>My script hanged if I used next_result() in any way prior to
>>>>>>>>>                  
>>>>>>>>>
>>>the
>>>      
>>>
>>>>>>fixes.
>>>>>>            
>>>>>>
>>>>>>>>I
>>>>>>>>                
>>>>>>>>
>>>>>>>>>want to see how many others are having the same issues with
>>>>>>>>>                  
>>>>>>>>>
>>>>parsing
>>>>        
>>>>
>>>>>>>>using
>>>>>>>>                
>>>>>>>>
>>>>>>>>>the CVS version of bioperl-live.
>>>>>>>>>
>>>>>>>>>Christopher Fields
>>>>>>>>>Postdoctoral Researcher - Switzer Lab
>>>>>>>>>Dept. of Biochemistry
>>>>>>>>>University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>-----Original Message-----
>>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
>>>>>>>>>>                    
>>>>>>>>>>
>>>l-
>>>      
>>>
>>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
>>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
>>>>>>>>>>To: Huang Jian; bioperl-l
>>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
>>>>>>>>>>
>>>>>>>>>>Hi Huang,
>>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
>>>>>>>>>>                    
>>>>>>>>>>
>>>>works
>>>>        
>>>>
>>>>>>on
>>>>>>            
>>>>>>
>>>>>>>>the
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>logic of checking the temporary file size to determine
>>>>>>>>>>                    
>>>>>>>>>>
>>>whether
>>>      
>>>
>>>>the
>>>>        
>>>>
>>>>>>>>Blast
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>results are ready. This condition is not getting satisfied
>>>>>>>>>>                    
>>>>>>>>>>
>>>may
>>>      
>>>
>>>>be
>>>>        
>>>>
>>>>>>due
>>>>>>            
>>>>>>
>>>>>>>>to
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>some changes brought about by NCBI. I had this problem
>>>>>>>>>>                    
>>>>>>>>>>
>>>>recently
>>>>        
>>>>
>>>>>>and
>>>>>>            
>>>>>>
>>>>>>>>>>figured out that the solution was to use the latest version
>>>>>>>>>>                    
>>>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>has
>>>>>>            
>>>>>>
>>>>>>>>>>this problem fixed (does not use file size logic any more)
>>>>>>>>>>                    
>>>>>>>>>>
>>>>which
>>>>        
>>>>
>>>>>>is
>>>>>>            
>>>>>>
>>>>>>>>not
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>yet included in the BioPerl package.
>>>>>>>>>>Cheers
>>>>>>>>>>Nagesh
>>>>>>>>>>
>>>>>>>>>>Huang Jian wrote:
>>>>>>>>>>
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>>>Dear Nagesh,
>>>>>>>>>>>
>>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>you
>>>>        
>>>>
>>>>>>send
>>>>>>            
>>>>>>
>>>>>>>>>>>me. Now it works perfectly!!!
>>>>>>>>>>>
>>>>>>>>>>>Thank you!!
>>>>>>>>>>>
>>>>>>>>>>>Huang
>>>>>>>>>>>
>>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
>>>>>>>>>>>
>>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
>>>>>>>>>>>
>>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
>>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>net,
>>>      
>>>
>>>>so
>>>>        
>>>>
>>>>>>still
>>>>>>            
>>>>>>
>>>>>>>>>>>via email
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>>>Hi Huang,
>>>>>>>>>>>>I see that you are submitting a sequence for a remote
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>blast
>>>      
>>>
>>>>>>search.
>>>>>>            
>>>>>>
>>>>>>>>>Can
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>(2005/12/09).
>>>>>>            
>>>>>>
>>>>>>>>If
>>>>>>>>                
>>>>>>>>
>>>>>>>>>>>>not I have attached it with this email, try to replace it
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>with
>>>>        
>>>>
>>>>>>the
>>>>>>            
>>>>>>
>>>>>>>>>old
>>>>>>>>>                  
>>>>>>>>>
>>>>>>>>>>>>one which has a bug.
>>>>>>>>>>>>Let me know if it works.
>>>>>>>>>>>>Nagesh
>>>>>>>>>>>>                        
>>>>>>>>>>>>
>>>>>>>>>>>                      
>>>>>>>>>>>
>>>>>>>>>>_______________________________________________
>>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>>                    
>>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>_______________________________________________
>>>>>>>>>Bioperl-l mailing list
>>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>                  
>>>>>>>>>
>>>>>>_______________________________________________
>>>>>>            
>>>>>>
>>>>>>>>Bioperl-l mailing list
>>>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>_______________________________________________
>>>>>>>>                
>>>>>>>>
>>>>>>Bioperl-l mailing list
>>>>>>Bioperl-l at lists.open-bio.org
>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>            
>>>>>>
>
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>  
>

Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm



From jason.stajich at duke.edu  Thu Feb 16 14:00:01 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 16 Feb 2006 09:00:01 -0500
Subject: [Bioperl-l] searchIO bug?
In-Reply-To: <43F452F30200009B00000EC9@gwia.kvl.dk>
References: <43F452F30200009B00000EC9@gwia.kvl.dk>
Message-ID: <11B49C84-9C04-4F43-9278-A3AA09C9B773@duke.edu>

i think it would be more helpful if you posted the actual report  
rather than the protein since this may be dependent on the version of  
blast you are using.

if you used
split(/\s+/, $header)
  it wouldn't matter how many spaces.

On Feb 16, 2006, at 4:24 AM, Anders Stegmann wrote:

> Hi!
>
>
> I am blasting a protein seq against an identical protein.
> I am trying to parse the protein header by using the query_description
> method in the SearchIO module.
> After using the query_description method I use split / /      in order
> to easily access the different header components.
> Here I discover that the query_description method is somehow  
> introducing
> a space between number 5 comma and the following chromosome position
> number
> in the exon chromosome position list!?
> This truncates the list of exon chromosome positions from 7 to 4,  
> later
> yielding a wrong number of the introns counted.
>
> Is this a bug?
>
> Attached is:
>
> testblast1.pl: the blastprogram to run.
>
> Q0045 the seq that is used as both query and database seq.
> (Q0045 has to be formated in order to be used as a database:  
> formatdb -i
> Q0045 -p T -o F)
>
>
> Regards Anders.
>
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12/




From cjfields at uiuc.edu  Thu Feb 16 15:50:04 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 09:50:04 -0600
Subject: [Bioperl-l] additional error message
In-Reply-To: <20060216100410.54a1a6d5@dogwood.plantbio.uga.edu>
Message-ID: <002901c63310$a7da1b20$15327e82@pyrimidine>

I don't think the apache error is related to the main issue here, but you
could always try upgrading LWP to see if that fixes it.  The second issue is
text parsing issues in SearchIO specific to nucleotide BLAST information,
which I'm looking into.

Jason has posted a bit on using XML.  Basically, do the following:

my $prog = 'blastn';
my $db = 'nr';
my $e_val=1e-10;
my $v = 1;
my @params=(-prog=>$prog,
 		-data=>$db,
	-expect=>$e_val,
	-readmethod=>'xml');

my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
$factory->retrieve_parameter('FORMAT_TYPE', 'XML');

You'll also need to modify following line:

my $filename = $result->query_name()."\.out";

b/c the XML tag for this feature is actually part of the rid for some
reason, so you'll get a weird output file name.  This is a problem with
NCBI's XML output, not SearchIO::XML parsing.

XML BLAST files can be really big (~5 MB and up depending on how much
information is returned), so it may take a little time to go through the
data.  Right now, it is the only consistently reliable way that output can
be parsed at this moment as NCBI keeps changing text output, sending us back
into "SearchIO::blast hell," as J.S. puts it.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> Sent: Thursday, February 16, 2006 9:04 AM
> To: Chris Fields; Pieter Monsieurs
> Cc: bioperl-l at lists.open-bio.org
> Subject: additional error message
> 
> when I check my apache error_log, there is one line saying:
> "waiting...Parsing of undecoded UTF-8 will give garbage when decoding
> entities at /usr/lib/perl5/site_perl/5.8.3/LWP/Protocol.pm line 137.,"
> I also see an error saying "MSG: no data for midline  Features flanking
> this part of subject sequence:, " that is mentioned by Pieter.
> Chris, may I have your suggestion on change it to XML parsing? I read
> Jason's comments/suggestions about it, but could not make it work.
> Thanks
> 
> Guojun
> Department of Plant Biology
> University of Georgia
> 
> 
> 
> ----- Original Message -----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> To: Pieter Monsieurs [mailto:Pieter.Monsieurs at esat.kuleuven.be]
> Cc: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> 	version 1.28
> 
> 
> > Yeah, looks like it broke text output nucleotide parsing with that.
> > XML output parsing still works though (as expected).  I'll give it a
> > look.
> > > Chris
> > > On Feb 16, 2006, at 3:46 AM, Pieter Monsieurs wrote:
> > > > Hi,
> > >
> > > I have the same problem with the blast.pm-file.
> > > The people of NCBI added some extra info when giving the Blast-
> > > output. (see e.g. "Features flanking this part..." or "Features in
> > > this part ..."), example added.
> > > The blast.pm module starts looking for the hsp-alignement-
> > > information, but it dies when it hits this Feature-information.
> > >
> > > Pieter
> > >
> > >
> > >> gi|77552765|gb|DP000011.1|  > >> query.fcgi?
> > >> cmd=Retrieve&db=Nucleotide&list_uids=77552765&dopt=GenBank> Oryza
> > >> sativa (japonica cultivar-group) chromosome 12, complete
> > >
> > > sequence
> > > Length=27492551
> > >
> > > Features flanking this part of subject sequence:
> > >   3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm
> > > sub-class  > > val=77552765&db=Nucleotide&from=19251479&to=19253693&view=gbwithparts>
> > >   2655 bp at 3' side: hypothetical protein  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=19260091&to=19260600&view=gbwithparts>
> > >
> > > Score = 36.2 bits (18),  Expect = 0.22
> > > Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  4         GTACTACTCTACTCTACT  21
> > >                 ||||||||||||||||||
> > >
> > > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >   2991 bp at 5' side: hypothetical protein  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=27003164&to=27003907&view=gbwithparts>
> > >   1131 bp at 3' side: hypothetical protein
> > >  > > val=77552765&db=Nucleotide&from=27008046&to=27010752&view=gbwithparts>
> > >
> > > Score = 36.2 bits (18),  Expect = 0.22
> > > Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  2         ATGTACTACTCTACTCTA  19
> > >                 ||||||||||||||||||
> > > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> > >
> > >
> > >
> > > Features in this part of subject sequence:
> > >   DHHC zinc finger domain, putative
> > >  > > val=77552765&db=Nucleotide&from=17614825&to=17618687&view=gbwithparts>
> > >
> > > Score = 34.2 bits (17),  Expect = 0.87
> > > Identities = 17/17 (100%), Gaps = 0/17 (0%)
> > > Strand=Plus/Plus
> > >
> > > Query  5         TACTACTCTACTCTACT  21
> > >                 |||||||||||||||||
> > > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> > >
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >   102 bp at 5' side: bZIP transcription factor, putative
> > >  > > val=77552765&db=Nucleotide&from=2774964&to=2775778&view=gbwithparts>
> > >   3740 bp at 3' side: yeast dcp1, putative  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=2779635&to=2782508&view=gbwithparts>
> > >
> > > Score = 32.2 bits (16),  Expect = 3.4
> > > Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > > Strand=Plus/Plus
> > >
> > > Query  7        CTACTCTACTCTACTC  22
> > >                ||||||||||||||||
> > > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> > >
> > >
> > > Features flanking this part of subject sequence:
> > >
> > >   21 bp at 5' side: peptide transporter T17F3.11, putative  > > www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?
> > > val=77552765&db=Nucleotide&from=27321354&to=27323117&view=gbwithparts>
> > >   10230 bp at 3' side: transposon protein, putative, unclassified
> > >  > > val=77552765&db=Nucleotide&from=27333383&to=27334285&view=gbwithparts>
> > >
> > > Score = 32.2 bits (16),  Expect = 3.4
> > > Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > > Strand=Plus/Minus
> > >
> > > Query  7         CTACTCTACTCTACTC  22
> > >
> > >                 ||||||||||||||||
> > > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> > >
> > >
> > >
> > >
> > > Guojun Yang wrote:
> > >
> > >> Hi, Chris,
> > >> Finally the remoteblast test script works for the amino.fa query.
> > >> but when I try a nucleic acid sequence (see below), Error occurs: "
> > >> waiting........
> > >> ------------- EXCEPTION  -------------
> > >> MSG: no data for midline  Features flanking this part of subject
> > >> sequence:
> > >> STACK Bio::SearchIO::blast::next_result /usr/lib/perl5/site_perl/
> > >> 5.8.3/Bio/Searc                             hIO/blast.pm:1172
> > >> STACK toplevel remoteblast_test:40
> > >> "
> > >> The query sequence is:
> > >> CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > >> GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > >> AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > >> AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > >>
> > >> The script (basically same as the remoteblast test, I only changed
> > >> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > >> #!/usr/bin/perl
> > >>
> > >> use Bio::SeqIO;
> > >> use Bio::Seq;
> > >> use Bio::Tools::Run::RemoteBlast;
> > >> use Bio::SearchIO;
> > >> use strict;
> > >> my $prog='blastn';
> > >> my $db='nr';
> > >> my $e_val=1e-10;
> > >> my @params=( -prog=>$prog,
> > >> 	-data=>$db,
> > >> 	-expect=>$e_val,
> > >> 	-readmethod=>'SearchIO');
> > >> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>
> > >> my $v = 1;
> > >>
> > >> my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > >>
> > >> while (my $input = $str->next_seq()){
> > >>  #Blast a sequence against a database:
> > >>  #Alternatively, you could  pass in a file with many
> > >>  #sequences rather than loop through sequence one at a time
> > >>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>  #and swap the two lines below for an example of that.
> > >>  my $r = $factory->submit_blast($input);
> > >>  #my $r = $factory->submit_blast('amino.fa');
> > >>  print STDERR "waiting..." if( $v > 0 );
> > >>  while ( my @rids = $factory->each_rid ) {
> > >>    foreach my $rid ( @rids ) {
> > >>      my $rc = $factory->retrieve_blast($rid);
> > >>      if( !ref($rc) ) {
> > >>        if( $rc < 0 ) {
> > >>          $factory->remove_rid($rid);
> > >>        }
> > >>        print STDERR "." if ( $v > 0 );
> > >>        sleep 5;
> > >>      } else {
> > >>        my $result = $rc->next_result();
> > >>        #save the output
> > >>        my $filename = $result->query_name()."\.out";
> > >>        $factory->save_output($filename);
> > >>        $factory->remove_rid($rid);
> > >>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>        while ( my $hit = $result->next_hit ) {
> > >>          next unless ( $v > 0);
> > >>          print "\thit name is ", $hit->name, "\n";
> > >>          while( my $hsp = $hit->next_hsp ) {
> > >>            print "\t\tscore is ", $hsp->score, "\n";
> > >>          }
> > >>        }
> > >>      }
> > >>    }
> > >>  }
> > >> }
> > >>
> > >>
> > >> Do you think there might still be something in the NCBI output
> > >> format?
> > >>
> > >> Thank you,
> > >> Guojun
> > >>
> > >>
> > >>
> > >>
> > >> Guojun Yang
> > >> Department of Plant Biology
> > >> University of Georgia
> > >> Tel: 706-542-1857
> > >> Fax: 706-542-1805
> > >> http://www.arches.uga.edu/~guojun
> > >>
> > >>
> > >>
> > >> ----- Original Message -----
> > >> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >> Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>
> > >>
> > >>
> > >>> Sorry, forgot to add that I didn't see the regex issue that you
> > >>> mentioned.
> > >>> It could be a perl-related issue.  Try the fixes I mentioned and
> > >>> see what
> > >>> happens.
> > >>>
> > >>>> Christopher Fields
> > >>>>
> > >>> Postdoctoral Researcher - Switzer Lab
> > >>> Dept. of Biochemistry
> > >>> University of Illinois Urbana-Champaign
> > >>>>>> -----Original Message-----
> > >>>>>>
> > >>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>> Sent: Tuesday, February 14, 2006 12:36 PM
> > >>>> To: 'gyang at plantbio.uga.edu'
> > >>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>
> > >>>>>> It's a good habit to always add single quotes around words.
> > >>>>>> The perl
> > >>>>>>
> > >>>> interpreter may think a single bare word is a subroutine or
> > >>>> perlfunc
> > >>>> called with no args so will try to find a subroutine named blastp
> > >>>> ().  My
> > >>>> debugger actually gives the error that the bare word blastp may
> > >>>> conflict
> > >>>> with a future reserved word.  Like you said, 'use strict' will
> > >>>> point that
> > >>>> out.
> > >>>>
> > >>>>>> As for the regex, it should match all the blast programs at
> > >>>>>> NCBI (blastp,
> > >>>>>>
> > >>>> blastn, blastx, tblastn, tblastx) and is built-in to make sure
> > >>>> nothing
> > >>>> else passes through.
> > >>>>
> > >>>>>> So, if you are using the script below, there are several
> > >>>>>> errors.  The bare
> > >>>>>>
> > >>>> words for $prog and $db need quotes, and the flags for you
> > >>>> @params array
> > >>>> don't have a dash before them.  I get this after adding quotes
> > >>>> but before
> > >>>> adding the dashes to @params:
> > >>>>
> > >>>>>> C:\Perl\Scripts>test_blast.pl
> > >>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>>>>>
> > >>>> MSG:
> > >>>> STACK: Error::throw
> > >>>> STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > >>>> live/Bio/Root/Root.pm:328
> > >>>> STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > >>>> C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > >>>> STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl
> > >>>> \bioperl-
> > >>>> live/Bio/Tools/Run/RemoteBlast.pm:256
> > >>>> STACK: C:\Perl\Scripts\test_blast.pl:15
> > >>>> -----------------------------------------------------------
> > >>>>
> > >>>>>> The last line indicates a problem with this line:
> > >>>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>> Changing the @params to this:
> > >>>>>> my @params=( -prog=>$prog,
> > >>>>>>
> > >>>> 	-data=>$db,
> > >>>> 	-expect=>$e_val,
> > >>>> 	-readmethod=>'SearchIO');
> > >>>>
> > >>>>>> fixes it, and I get output as expected.
> > >>>>>> Christopher Fields
> > >>>>>>
> > >>>> Postdoctoral Researcher - Switzer Lab
> > >>>> Dept. of Biochemistry
> > >>>> University of Illinois Urbana-Champaign
> > >>>>
> > >>>>>>>>> -----Original Message-----
> > >>>>>>>>>
> > >>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>> Sent: Tuesday, February 14, 2006 11:48 AM
> > >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>>
> > >>>>> Hi, Chris,
> > >>>>> When I tried with the perldoc script, It did not work either.
> > >>>>> First it
> > >>>>> says $prog can not be bare word if I "use strict". I added
> > >>>>> quotes on the
> > >>>>> words, then it says the value for $prog does not match expression
> > >>>>> t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > >>>>>
> > >>>> script
> > >>>>
> > >>>>> is shown below. Why is the expression "t?blast[pnx]"?
> > >>>>>
> > >>>>> #!/usr/bin/perl
> > >>>>>
> > >>>>> use Bio::SeqIO;
> > >>>>> use Bio::Seq;
> > >>>>> use Bio::Tools::Run::RemoteBlast;
> > >>>>> use Bio::SearchIO;
> > >>>>>
> > >>>>>
> > >>>>> my $prog=blastp;
> > >>>>> my $db=swissprot;
> > >>>>> my $e_val=1e-10;
> > >>>>> my @params=( prog=>$prog,
> > >>>>> 	data=>$db,
> > >>>>> 	expect=>$e_val,
> > >>>>> 	readmethod=>'SearchIO');
> > >>>>> my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>
> > >>>>> my $v = 1;
> > >>>>>
> > >>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  > >>>>>
> 'fasta' );
> > >>>>>
> > >>>>> while (my $input = $str->next_seq()){
> > >>>>>  #Blast a sequence against a database:
> > >>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>  #and swap the two lines below for an example of that.
> > >>>>>  my $r = $factory->submit_blast($input);
> > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>    foreach my $rid ( @rids ) {
> > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>      if( !ref($rc) ) {
> > >>>>>        if( $rc < 0 ) {
> > >>>>>          $factory->remove_rid($rid);
> > >>>>>        }
> > >>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>        sleep 5;
> > >>>>>      } else {
> > >>>>>        my $result = $rc->next_result();
> > >>>>>        #save the output
> > >>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>        $factory->save_output($filename);
> > >>>>>        $factory->remove_rid($rid);
> > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>          next unless ( $v > 0);
> > >>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>          }
> > >>>>>        }
> > >>>>>      }
> > >>>>>    }
> > >>>>>  }
> > >>>>> }
> > >>>>>
> > >>>>> Thank you for your help!
> > >>>>>
> > >>>>>
> > >>>>> Guojun
> > >>>>> Department of Plant Biology
> > >>>>> University of Georgia
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>> To: gyang at plantbio.uga.edu
> > >>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>> Try two things:
> > >>>>>>
> > >>>>>>> 1)  Use a much simpler script, like the one in 'perldoc
> > >>>>>>>
> > >>>>>> Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's
> > >>>>>> something
> > >>>>>>
> > >>>>> wrong
> > >>>>>
> > >>>>>> with the logic in your subroutine:
> > >>>>>>
> > >>>>>>> my $v = 1;
> > >>>>>>> my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format =>  >
> >>>>>>> 'fasta' );
> > >>>>>>> while (my $input = $str->next_seq()){
> > >>>>>>>
> > >>>>>>  #Blast a sequence against a database:
> > >>>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>>  #and swap the two lines below for an example of that.
> > >>>>>>  my $r = $factory->submit_blast($input);
> > >>>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>>    foreach my $rid ( @rids ) {
> > >>>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>>      if( !ref($rc) ) {
> > >>>>>>        if( $rc < 0 ) {
> > >>>>>>          $factory->remove_rid($rid);
> > >>>>>>        }
> > >>>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>>        sleep 5;
> > >>>>>>      } else {
> > >>>>>>        my $result = $rc->next_result();
> > >>>>>>        #save the output
> > >>>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>>        $factory->save_output($filename);
> > >>>>>>        $factory->remove_rid($rid);
> > >>>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>>          next unless ( $v > 0);
> > >>>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>          }
> > >>>>>>        }
> > >>>>>>      }
> > >>>>>>    }
> > >>>>>>  }
> > >>>>>> }
> > >>>>>>
> > >>>>>>> 2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > >>>>>>>
> > >>>> really
> > >>>>
> > >>>>>> shouldn't make that much of a difference, but I noticed that
> > >>>>>> the CVS
> > >>>>>> RemoteBlast (1.28) was changed in Dec 2005, after
> > >>>>>> bioperl-1.5.1 was
> > >>>>>> released; the Bugzilla version is based off CVS.
> > >>>>>>
> > >>>>>>> Christopher Fields
> > >>>>>>>
> > >>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>> Dept. of Biochemistry
> > >>>>>> University of Illinois Urbana-Champaign
> > >>>>>>
> > >>>>>>>> -----Original Message-----
> > >>>>>>>>
> > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>> Sent: Monday, February 13, 2006 3:00 PM
> > >>>>>>> To: bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>
> > >>>>>>>>> Thanks, Chris,
> > >>>>>>>>>
> > >>>>>>> I installed version 1.5.1 and replaced the blast.pm file with
> > >>>>>>> the
> > >>>>>>>
> > >>>> one
> > >>>>
> > >>>>> from
> > >>>>>
> > >>>>>>> your bug report. The running version is 1.5 when I use the
> > >>>>>>> command
> > >>>>>>>
> > >>>> you
> > >>>>
> > >>>>>>> sent me. But when I tried the script, it doesn't change much. My
> > >>>>>>> remoteblast code (portion) is here:
> > >>>>>>>
> > >>>>>>>>> sub search {
> > >>>>>>>>>
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}
> > >>>>>>> ="$ORGN";
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}
> > >>>>>>> =5000;
> > >>>>>>> local
> > >>>>>>>
> > >>>>>>>
> > >>>> $Bio::Tools::Run::RemoteBlast::HEADER
> > >>>> {'COMPOSITION_BASED_STATISTICS'}=
> > >>>>
> > >>>>>>> 'no';
> > >>>>>>> local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > >>>>>>> my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > >>>>>>> 			      -id=>"query",
> > >>>>>>> 			      -desc=>"new seq");
> > >>>>>>> my $len=$query->length();
> > >>>>>>> @db=('nr','htgs','wgs');
> > >>>>>>> foreach my $db (@db) {
> > >>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'  >
> >>>>>>> =>'blastn',
> > >>>>>>> 						'-data' =>"$db",
> > >>>>>>>
> > >>>>>>>
> > >>> '-expect'=>"$E_value");
> > >>>
> > >>>>>>>>>>> my $blast_report = $factory->submit_blast($query);
> > >>>>>>>>>>>
> > >>>>>>>>> my @rids = $factory->each_rid();
> > >>>>>>>>>
> > >>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>    print STDERR "$rid\n";
> > >>>>>>> }
> > >>>>>>> # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > >>>>>>> print STDERR "waiting...";
> > >>>>>>> sleep 60;
> > >>>>>>>
> > >>>>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>>>
> > >>>>>>>    my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>    while (!ref($rc) ) {
> > >>>>>>> 	if( $rc < 0 ) {
> > >>>>>>> # retrieve_blast returns -1 on error
> > >>>>>>> 	    $factory->remove_rid($rid);
> > >>>>>>> 	    print "Error!\n";
> > >>>>>>> 	    send_error($email,$function,$seqname,$queryname[$ST]);
> > >>>>>>> 	    die "Can't retrieve $rid";
> > >>>>>>> 	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > >>>>>>>
> > >>>> finished'
> > >>>>
> > >>>>>>> 	    sleep 60;
> > >>>>>>> 	    $rc = $factory->retrieve_blast($rid);
> > >>>>>>> 	}
> > >>>>>>>    }
> > >>>>>>>    if (ref($rc)) {
> > >>>>>>> 	print STDERR "Done.\n";
> > >>>>>>> 	 while( my $result = $rc->next_result) {
> > >>>>>>> 	    while( my $hit = $result->next_hit()) {
> > >>>>>>> 	    	$hit_name=$hit->name;
> > >>>>>>> 		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > >>>>>>> 		$name=$1;
> > >>>>>>> 		@left_plus_start=();
> > >>>>>>> 		@left_plus_end=();
> > >>>>>>> 		@left_minus_start=();
> > >>>>>>> 		@left_minus_end=();
> > >>>>>>> 		@right_plus_start=();
> > >>>>>>> 		@right_plus_end=();
> > >>>>>>> 		@right_minus_start=();
> > >>>>>>> 		@right_minus_end=();
> > >>>>>>>
> > >>>>>>>>> 		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > >>>>>>>>>
> > >>>>>>> 		while( my $hsp = $hit->next_hsp()) {
> > >>>>>>> ......
> > >>>>>>>
> > >>>>>>>>> It was working quite well before around October laster
> > >>>>>>>>> year, but
> > >>>>>>>>>
> > >>>>> it has
> > >>>>>
> > >>>>>>> stopped since then, When a submission is sent via a webpage,
> > >>>>>>> the cgi
> > >>>>>>> starts to work and use a memory of ~20 Mb. Then it hangs there,
> > >>>>>>>
> > >>>>> finally
> > >>>>>
> > >>>>>>> the expected email is received but without real results
> > >>>>>>> although it
> > >>>>>>>
> > >>>>> does
> > >>>>>
> > >>>>>>> contain something from other parts of the script. Apparently the
> > >>>>>>>
> > >>>>> search
> > >>>>>
> > >>>>>>> sub did not return anything (I know there is something should be
> > >>>>>>> returned.). Is it also possible the format of the NCBI output
> > >>>>>>> for
> > >>>>>>>
> > >>>> each
> > >>>>
> > >>>>>>> result has changed?
> > >>>>>>> Thank you,
> > >>>>>>> Guojun
> > >>>>>>>
> > >>>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>>>
> > >>>>>>> University of Georgia
> > >>>>>>>
> > >>>>>>>>>>>>> ----- Original Message -----
> > >>>>>>>>>>>>>
> > >>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>
> > >>>>>>>>>>>> How do you know two versions are installed (i.e. how are
> > >>>>>>>>>>>>
> > >>>> you
> > >>>>
> > >>>>> checking
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>> version)?  Do you see have two complete bioperl
> > >>>>>>>> distributions (in
> > >>>>>>>>
> > >>>>> two
> > >>>>>
> > >>>>>>>> separate directories) or are you looking in modules?  Here's
> > >>>>>>>> the
> > >>>>>>>>
> > >>>> way
> > >>>>
> > >>>>> to
> > >>>>>
> > >>>>>>>> check the version (from the FAQ):
> > >>>>>>>>
> > >>>>>>>>> perl -MBio::Root::Version -e 'print
> > >>>>>>>>>
> > >>>>> $Bio::Root::Version::VERSION,"\n"'
> > >>>>>
> > >>>>>>>>> If you have two full bioperl distributions on your computer,
> > >>>>>>>>>
> > >>>>> normally
> > >>>>>
> > >>>>>>> only
> > >>>>>>>
> > >>>>>>>> one will be in use unless you have explicitly set the
> > >>>>>>>> environment
> > >>>>>>>>
> > >>>>>>> variable
> > >>>>>>>
> > >>>>>>>> PERL5LIB.  The PERL5LIB  directories will be searched first
> > >>>>>>>> before
> > >>>>>>>>
> > >>>>> your
> > >>>>>
> > >>>>>>>> normal perl directory list (@INC) is searched.  You MAY get
> > >>>>>>>> some
> > >>>>>>>>
> > >>>>> mixing
> > >>>>>
> > >>>>>>>> then, but only if perl can't find a particular module in the
> > >>>>>>>> path
> > >>>>>>>>
> > >>>>>>> designated
> > >>>>>>>
> > >>>>>>>> in PERL5LIB; then it will progress through the directories
> > >>>>>>>> listed
> > >>>>>>>>
> > >>>> in
> > >>>>
> > >>>>>>> @INC.
> > >>>>>>>
> > >>>>>>>> This may happen if a module is unique to a particular
> > >>>>>>>> release, but
> > >>>>>>>>
> > >>>>>>> shouldn't
> > >>>>>>>
> > >>>>>>>> happen for the majority of modules, including RemoteBlast.  You
> > >>>>>>>>
> > >>>> can
> > >>>>
> > >>>>>>> check
> > >>>>>>>
> > >>>>>>>> what @INC and PERL5LIB are set to by using 'perl -V'.  @INC
> > >>>>>>>> will
> > >>>>>>>>
> > >>>>> differ
> > >>>>>
> > >>>>>>>> depending on your OS, perl build, etc.
> > >>>>>>>>
> > >>>>>>>>> Regardless, if you follow the directions for installing
> > >>>>>>>>> bioperl
> > >>>>>>>>>
> > >>>>> for
> > >>>>>
> > >>>>>>> your
> > >>>>>>>
> > >>>>>>>> system ('perl Makefile.PL', 'make', 'make test', 'make
> > >>>>>>>> install',
> > >>>>>>>>
> > >>>>> unless
> > >>>>>
> > >>>>>>> you
> > >>>>>>>
> > >>>>>>>> explicitly change the installation directory when using 'perl
> > >>>>>>>>
> > >>>>>>> Makefile.PL'),
> > >>>>>>>
> > >>>>>>>> then 'uninstalling' Bioperl shouldn't be a problem as it will
> > >>>>>>>>
> > >>>>> install
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>> Bioperl distribution you downloaded over the old version in
> > >>>>>>>> @INC.
> > >>>>>>>>
> > >>>>> See
> > >>>>>
> > >>>>>>> this
> > >>>>>>>
> > >>>>>>>> page:
> > >>>>>>>>
> > >>>>>>>>> http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > >>>>>>>>> for more details.
> > >>>>>>>>> Christopher Fields
> > >>>>>>>>>
> > >>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>> Dept. of Biochemistry
> > >>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>>
> > >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>> Sent: Monday, February 13, 2006 12:32 PM
> > >>>>>>>>> To: bioperl-l at lists.open-bio.org
> > >>>>>>>>> Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>>>
> > >>>>>>>>>>> Hi, Chris,
> > >>>>>>>>>>>
> > >>>>>>>>> I do have different versions of bioperl on my Linux machine
> > >>>>>>>>>
> > >>>> (1.4.
> > >>>>
> > >>>>> and
> > >>>>>
> > >>>>>>>>> 1.5.0), this may be the problem. Should I just install
> > >>>>>>>>> bioperl-
> > >>>>>>>>>
> > >>>>> 1.5.1
> > >>>>>
> > >>>>>>> or I
> > >>>>>>>
> > >>>>>>>>> need to uninstall and remove the previous versions. I could
> > >>>>>>>>> not
> > >>>>>>>>>
> > >>>>> find
> > >>>>>
> > >>>>>>> any
> > >>>>>>>
> > >>>>>>>>> hint on uninstalling bioperl on linux. Could you please
> > >>>>>>>>> give me
> > >>>>>>>>>
> > >>>>> some
> > >>>>>
> > >>>>>>>>> suggestion?
> > >>>>>>>>> Thanks,
> > >>>>>>>>> Guojun
> > >>>>>>>>>
> > >>>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>>>
> > >>>>>>>>> University of Georgia
> > >>>>>>>>>      _____
> > >>>>>>>>>
> > >>>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>
> > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>>>> Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>>>>>> If you're using RemoteBlast 1.28, then you've likely
> > >>>>>>>>>>>>>>>
> > >>>>>>> updated from CVS
> > >>>>>>>
> > >>>>>>>>> which isn't the latest fix.
> > >>>>>>>>>
> > >>>>>>>>>>> Make sure that you check the following:
> > >>>>>>>>>>> 1) Always post to the mailing list:
> > >>>>>>>>>>>
> > >>>>>>>>> http://www.bioperl.org/wiki/
> > >>>>>>>>> HOWTO:Beginners#Getting_Assistance .
> > >>>>>>>>>
> > >>>>>>>>>>> 2) You must have the complete bioperl-1.5.1 or bioperl-live
> > >>>>>>>>>>>
> > >>>>> (CVS)
> > >>>>>
> > >>>>>>>>> installed first.  Perform a clean installation; do not upgrade
> > >>>>>>>>>
> > >>>>> only
> > >>>>>
> > >>>>>>>>> Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > >>>>>>>>>
> > >>>> can't
> > >>>>
> > >>>>>>>>> guarantee that mixing modules from old and new distributions
> > >>>>>>>>>
> > >>>> (1.4
> > >>>>
> > >>>>> and
> > >>>>>
> > >>>>>>>>> 1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-
> > >>>>>>>>> live
> > >>>>>>>>> installation will allow text output from BLAST v.2.2.12 to be
> > >>>>>>>>>
> > >>>>> saved
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>> parsed; it will not parse the newest BLAST text output from
> > >>>>>>>>> NCBI
> > >>>>>>>>>
> > >>>>>>> (v2.2.13)
> > >>>>>>>
> > >>>>>>>>> but it should still save it. I believe as long as
> > >>>>>>>>> next_results()
> > >>>>>>>>>
> > >>>>> isn't
> > >>>>>
> > >>>>>>>>> called, it will work.
> > >>>>>>>>>
> > >>>>>>>>>>> 3) The bug fixes for the above issue with parsing BLAST
> > >>>>>>>>>>>
> > >>>> 2.2.13
> > >>>>
> > >>>>>>> text output
> > >>>>>>>
> > >>>>>>>>> are NOT in CVS; they haven't been cleared and checked in by
> > >>>>>>>>>
> > >>>> Roger
> > >>>>
> > >>>>> Hall
> > >>>>>
> > >>>>>>>>> (who's now taking care of RemoteBlast) and the powers that be
> > >>>>>>>>>
> > >>>>> (Jason
> > >>>>>
> > >>>>>>> or
> > >>>>>>>
> > >>>>>>>>> whomever is in charge of Bio::SearchIO).  They can be found in
> > >>>>>>>>>
> > >>>>>>> Bugzilla:
> > >>>>>>>
> > >>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>>>
> > >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>
> > >>>>>>>>>>> The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > >>>>>>>>>>>
> > >>>>> option
> > >>>>>
> > >>>>>>> of
> > >>>>>>>
> > >>>>>>>>> saving XML output, so isn't necessary if you don't plan on
> > >>>>>>>>> using
> > >>>>>>>>>
> > >>>>> this
> > >>>>>
> > >>>>>>>>> option.  And, remember, they haven't been committed yet to
> > >>>>>>>>> CVS,
> > >>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>>>> means that the final version will change to refle the new
> > >>>>>>>>>
> > >>>> version.
> > >>>>
> > >>>>>>>>>>>>> Christopher Fields
> > >>>>>>>>>>>>>
> > >>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>>>>>    _____
> > >>>>>>>>>>>>> From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>>>>>>>>>>
> > >>>>>>>>> Sent: Monday, February 13, 2006 9:26 AM
> > >>>>>>>>> To: Chris Fields
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>>>> Hi, Chris
> > >>>>>>>>>>>>>
> > >>>>>>>>>>> Thanks for your suggestion, however, it doesn't seem to work
> > >>>>>>>>>>>
> > >>>>> for
> > >>>>>
> > >>>>>>> my cgi
> > >>>>>>>
> > >>>>>>>>> even after I replace both blast.pm and RemoteBlast.pm. I
> > >>>>>>>>> didn't
> > >>>>>>>>>
> > >>>>> even
> > >>>>>
> > >>>>>>> get
> > >>>>>>>
> > >>>>>>>>> any RID. Is there any suggestion?
> > >>>>>>>>>
> > >>>>>>>>>>>>>>> Guojun
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>> Guojun Yang
> > >>>>>>>>>>>>>
> > >>>>>>>>> Department of Plant Biology
> > >>>>>>>>> University of Georgia
> > >>>>>>>>> Tel: 706-542-1857
> > >>>>>>>>> Fax: 706-542-1805
> > >>>>>>>>> http://www.arches.uga.edu/~guojun
> > >>>>>>>>>    _____
> > >>>>>>>>>
> > >>>>>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>>>
> > >>>>>>>>> To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > >>>>>>>>> Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > >>>>>>>>> Subject: RE: [Bioperl-l] more question regarding
> > >>>>>>>>> RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>> version
> > >>>>>>>
> > >>>>>>>>> 1.28
> > >>>>>>>>>
> > >>>>>>>>>>> I would say give the new code a try, but realize that it
> > >>>>>>>>>>>
> > >>>>> hasn't
> > >>>>>
> > >>>>>>> been
> > >>>>>>>
> > >>>>>>>>> checked
> > >>>>>>>>> in (like I said below). I will try going over the modified
> > >>>>>>>>> Bio::SearchIO::blast again this weekend to see if there is
> > >>>>>>>>>
> > >>>>> anything I
> > >>>>>
> > >>>>>>>>> might
> > >>>>>>>>> have missed. The changed order in the header of BLAST text
> > >>>>>>>>>
> > >>>> output
> > >>>>
> > >>>>> has
> > >>>>>
> > >>>>>>> me a
> > >>>>>>>
> > >>>>>>>>> bit worried that it might not catch everything, but it at
> > >>>>>>>>> least
> > >>>>>>>>>
> > >>>>>>> doesn't
> > >>>>>>>
> > >>>>>>>>> hang
> > >>>>>>>>> in the while() loop I described in the bug report below (bug
> > >>>>>>>>>
> > >>>>> #1934)
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>> seems to process everything fine.
> > >>>>>>>>>
> > >>>>>>>>>>> If you want more stability in the code, you might consider
> > >>>>>>>>>>>
> > >>>>>>> changing over
> > >>>>>>>
> > >>>>>>>>> to
> > >>>>>>>>> XML output and parsing with Bio::SearchIO::blastxml. There are
> > >>>>>>>>>
> > >>>>> some
> > >>>>>
> > >>>>>>>>> changes
> > >>>>>>>>> in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > >>>>>>>>>
> > >>>>> saving
> > >>>>>
> > >>>>>>> XML
> > >>>>>>>
> > >>>>>>>>> output, but I believe it parses everything regardless. If you
> > >>>>>>>>>
> > >>>> look
> > >>>>
> > >>>>>>> back
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>> last month or so there has been a bit of discussion here about
> > >>>>>>>>>
> > >>>> it.
> > >>>>
> > >>>>>>> Jason
> > >>>>>>>
> > >>>>>>>>> describes a bit on how to set up RemoteBlast for XML:
> > >>>>>>>>>
> > >>>>>>>>>>> http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > >>>>>>>>>>>
> > >>>>>>> remoteblast/
> > >>>>>>>
> > >>>>>>>>>>> Christopher Fields
> > >>>>>>>>>>>
> > >>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>>>
> > >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>>> Sent: Friday, February 03, 2006 1:45 PM
> > >>>>>>>>>> To: bioperl-l at bioperl.org
> > >>>>>>>>>> Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>>>
> > >>>>> version
> > >>>>>
> > >>>>>>> 1.28
> > >>>>>>>
> > >>>>>>>>>> Hi, Everybody,
> > >>>>>>>>>> I see this post and am wondering if this is the reason for
> > >>>>>>>>>> the
> > >>>>>>>>>> malfunctionning of my webserver. We set up a webserver named
> > >>>>>>>>>>
> > >>>>> MAK,
> > >>>>>
> > >>>>>>> for
> > >>>>>>>
> > >>>>>>>>> MITE
> > >>>>>>>>>
> > >>>>>>>>>> sequence analysis. It was working very well until around
> > >>>>>>>>>>
> > >>>>> November
> > >>>>>
> > >>>>>>> 2005,
> > >>>>>>>
> > >>>>>>>>>> when it stopped returning any result (the site is fine and
> > >>>>>>>>>>
> > >>>> seems
> > >>>>
> > >>>>> to
> > >>>>>
> > >>>>>>> be
> > >>>>>>>
> > >>>>>>>>>> doing sth after submission). In the CGI script, I used
> > >>>>>>>>>>
> > >>>>> remoteblast
> > >>>>>
> > >>>>>>> (that
> > >>>>>>>
> > >>>>>>>>>> work was done in 2003) to do searches. I currently do not
> > >>>>>>>>>> have
> > >>>>>>>>>>
> > >>>>>>> access to
> > >>>>>>>
> > >>>>>>>>>> the server because I moved. Quite several people sent emails
> > >>>>>>>>>>
> > >>>> to
> > >>>>
> > >>>>> us
> > >>>>>
> > >>>>>>> about
> > >>>>>>>
> > >>>>>>>>>> its malfunctioning. Is there any suggestion on fixing the
> > >>>>>>>>>>
> > >>>>> problem?
> > >>>>>
> > >>>>>>>>> Should
> > >>>>>>>>>
> > >>>>>>>>>> I simplily ask the remoteblast.pm be replaced with the new
> > >>>>>>>>>>
> > >>>>> version?
> > >>>>>
> > >>>>>>>>>> Thanks a lot,
> > >>>>>>>>>> Guojun
> > >>>>>>>>>>
> > >>>>>>>>>> Department of Plant Biology
> > >>>>>>>>>> University of Georgia
> > >>>>>>>>>> Tel: 706-542-1857
> > >>>>>>>>>> Fax: 706-542-1805
> > >>>>>>>>>> http://www.arches.uga.edu/~guojun
> > >>>>>>>>>> _____
> > >>>>>>>>>>
> > >>>>>>>>>> From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>> To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > >>>>>>>>>>
> > >>>>> Jian'
> > >>>>>
> > >>>>>>>>>> [mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > >>>>>>>>>>
> > >>>> [mailto:bioperl-
> > >>>>
> > >>>>>>>>>> l at bioperl.org]
> > >>>>>>>>>> Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > >>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>
> > >>>>>>>>>> Like Nagesh says, try the latest RemoteBlast from bioperl-
> > >>>>>>>>>> live
> > >>>>>>>>>>
> > >>>>> CVS.
> > >>>>>
> > >>>>>>> It
> > >>>>>>>
> > >>>>>>>>>> will
> > >>>>>>>>>> work for saving text output. However, it will not parse
> > >>>>>>>>>>
> > >>>> anything
> > >>>>
> > >>>>>>> using
> > >>>>>>>
> > >>>>>>>>>> next_result (it will likely hang) and will not save XML
> > >>>>>>>>>>
> > >>>> format.
> > >>>>
> > >>>>> See
> > >>>>>
> > >>>>>>>>> these
> > >>>>>>>>>
> > >>>>>>>>>> bugs:
> > >>>>>>>>>>
> > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>>
> > >>>>>>>>>> for explanations and possible fixes (changes to RemoteBlast
> > >>>>>>>>>>
> > >>>> and
> > >>>>
> > >>>>>>>>>> Bio::SearchIO::blast). Note that these haven't been
> > >>>>>>>>>> checked in
> > >>>>>>>>>>
> > >>>>> yet
> > >>>>>
> > >>>>>>> so
> > >>>>>>>
> > >>>>>>>>> are
> > >>>>>>>>>
> > >>>>>>>>>> still not included in bioperl-live; they may be further
> > >>>>>>>>>>
> > >>>> modified
> > >>>>
> > >>>>>>> before
> > >>>>>>>
> > >>>>>>>>>> committing to CVS. If you're not worried about XML, you could
> > >>>>>>>>>>
> > >>>>> just
> > >>>>>
> > >>>>>>> try
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>> first fix, which is a change to SearchIO::blast.
> > >>>>>>>>>>
> > >>>>>>>>>> Nagesh, I remember you posting to the list a month ago
> > >>>>>>>>>> using a
> > >>>>>>>>>>
> > >>>>>>> script
> > >>>>>>>
> > >>>>>>>>>> which
> > >>>>>>>>>> had problems; the script you used saves the output but
> > >>>>>>>>>> doesn't
> > >>>>>>>>>>
> > >>>>>>> actually
> > >>>>>>>
> > >>>>>>>>>> parse it (i.e. you don't use next_result() to go through the
> > >>>>>>>>>>
> > >>>>> data).
> > >>>>>
> > >>>>>>> Is
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>> version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > >>>>>>>>>>
> > >>>> you
> > >>>>
> > >>>>>>> tried
> > >>>>>>>
> > >>>>>>>>>> parsing the output using "-readmethod => SearchIO" or "-
> > >>>>>>>>>>
> > >>>>> readmethod
> > >>>>>
> > >>>>>>> =>
> > >>>>>>>
> > >>>>>>>>>> blast"
> > >>>>>>>>>> using your version of RemoteBlast and method next_result()?
> > >>>>>>>>>>
> > >>>> Like
> > >>>>
> > >>>>>>> below
> > >>>>>>>
> > >>>>>>>>>> (from
> > >>>>>>>>>> perldoc):
> > >>>>>>>>>>
> > >>>>>>>>>> while ( my @rids = $factory->each_rid ) {
> > >>>>>>>>>> foreach my $rid ( @rids ) {
> > >>>>>>>>>> my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>>>> if( !ref($rc) ) {
> > >>>>>>>>>> if( $rc < 0 ) {
> > >>>>>>>>>> $factory->remove_rid($rid);
> > >>>>>>>>>> }
> > >>>>>>>>>> print STDERR "." if ( $v > 0 );
> > >>>>>>>>>> sleep 5;
> > >>>>>>>>>> } else { # parsing
> > >>>>>>>>>> starts here
> > >>>>>>>>>> my $result = $rc->next_result(); # it should hang
> > >>>>>>>>>> here
> > >>>>>>>>>> #save the output
> > >>>>>>>>>> my $filename = $result->query_name()."\.out";
> > >>>>>>>>>> $factory->save_output($filename);
> > >>>>>>>>>> $factory->remove_rid($rid);
> > >>>>>>>>>> print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>>>>> while ( my $hit = $result->next_hit ) {
> > >>>>>>>>>> next unless ( $v > 0);
> > >>>>>>>>>> print "\thit name is ", $hit->name, "\n";
> > >>>>>>>>>> while( my $hsp = $hit->next_hsp ) {
> > >>>>>>>>>> print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>> }
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> My script hanged if I used next_result() in any way prior to
> > >>>>>>>>>>
> > >>>> the
> > >>>>
> > >>>>>>> fixes.
> > >>>>>>>
> > >>>>>>>>> I
> > >>>>>>>>>
> > >>>>>>>>>> want to see how many others are having the same issues with
> > >>>>>>>>>>
> > >>>>> parsing
> > >>>>>
> > >>>>>>>>> using
> > >>>>>>>>>
> > >>>>>>>>>> the CVS version of bioperl-live.
> > >>>>>>>>>>
> > >>>>>>>>>> Christopher Fields
> > >>>>>>>>>> Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>>> Dept. of Biochemistry
> > >>>>>>>>>> University of Illinois Urbana-Champaign
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>> -----Original Message-----
> > >>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > >>>>>>>>>>>
> > >>>> l-
> > >>>>
> > >>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > >>>>>>>>>>> Sent: Thursday, February 02, 2006 7:24 PM
> > >>>>>>>>>>> To: Huang Jian; bioperl-l
> > >>>>>>>>>>> Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>>
> > >>>>>>>>>>> Hi Huang,
> > >>>>>>>>>>> Thanks for the message. The older version of RemoteBlast.pm
> > >>>>>>>>>>>
> > >>>>> works
> > >>>>>
> > >>>>>>> on
> > >>>>>>>
> > >>>>>>>>> the
> > >>>>>>>>>
> > >>>>>>>>>>> logic of checking the temporary file size to determine
> > >>>>>>>>>>>
> > >>>> whether
> > >>>>
> > >>>>> the
> > >>>>>
> > >>>>>>>>> Blast
> > >>>>>>>>>
> > >>>>>>>>>>> results are ready. This condition is not getting satisfied
> > >>>>>>>>>>>
> > >>>> may
> > >>>>
> > >>>>> be
> > >>>>>
> > >>>>>>> due
> > >>>>>>>
> > >>>>>>>>> to
> > >>>>>>>>>
> > >>>>>>>>>>> some changes brought about by NCBI. I had this problem
> > >>>>>>>>>>>
> > >>>>> recently
> > >>>>>
> > >>>>>>> and
> > >>>>>>>
> > >>>>>>>>>>> figured out that the solution was to use the latest version
> > >>>>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>> has
> > >>>>>>>
> > >>>>>>>>>>> this problem fixed (does not use file size logic any more)
> > >>>>>>>>>>>
> > >>>>> which
> > >>>>>
> > >>>>>>> is
> > >>>>>>>
> > >>>>>>>>> not
> > >>>>>>>>>
> > >>>>>>>>>>> yet included in the BioPerl package.
> > >>>>>>>>>>> Cheers
> > >>>>>>>>>>> Nagesh
> > >>>>>>>>>>>
> > >>>>>>>>>>> Huang Jian wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Dear Nagesh,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > >>>>>>>>>>>>
> > >>>>> you
> > >>>>>
> > >>>>>>> send
> > >>>>>>>
> > >>>>>>>>>>>> me. Now it works perfectly!!!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thank you!!
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Huang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> ----- Original Message ----- From: "Nagesh Chakka"
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> To: "Huang Jian" ; "bioperl-l"
> > >>>>>>>>>>>> 
> > >>>>>>>>>>>> Sent: Friday, February 03, 2006 7:48 AM
> > >>>>>>>>>>>> Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > >>>>>>>>>>>>
> > >>>> net,
> > >>>>
> > >>>>> so
> > >>>>>
> > >>>>>>> still
> > >>>>>>>
> > >>>>>>>>>>>> via email
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>> Hi Huang,
> > >>>>>>>>>>>>> I see that you are submitting a sequence for a remote
> > >>>>>>>>>>>>>
> > >>>> blast
> > >>>>
> > >>>>>>> search.
> > >>>>>>>
> > >>>>>>>>>> Can
> > >>>>>>>>>>
> > >>>>>>>>>>>>> you check if the RemoteBlast.pm being used is v 1.28
> > >>>>>>>>>>>>>
> > >>>>>>> (2005/12/09).
> > >>>>>>>
> > >>>>>>>>> If
> > >>>>>>>>>
> > >>>>>>>>>>>>> not I have attached it with this email, try to replace it
> > >>>>>>>>>>>>>
> > >>>>> with
> > >>>>>
> > >>>>>>> the
> > >>>>>>>
> > >>>>>>>>>> old
> > >>>>>>>>>>
> > >>>>>>>>>>>>> one which has a bug.
> > >>>>>>>>>>>>> Let me know if it works.
> > >>>>>>>>>>>>> Nagesh
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>> _______________________________________________
> > >>>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>>
> > >>>>>>>>> Bioperl-l mailing list
> > >>>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>> _______________________________________________
> > >>>>>>>>>
> > >>>>>>> Bioperl-l mailing list
> > >>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>
> > >>>>>>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>
> > >>
> > >
> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> > >
> > > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >



From Marc.Logghe at DEVGEN.com  Thu Feb 16 15:47:13 2006
From: Marc.Logghe at DEVGEN.com (Marc Logghe)
Date: Thu, 16 Feb 2006 16:47:13 +0100
Subject: [Bioperl-l] Primer maps?
Message-ID: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>

Hi Mike,
Another route you might take is mapping your primers into
Bio::SeqFeature::Generic objects and add them to the seq object. Then
you dump the object into a rich sequence format like genbank and pass
that to EMBOSS's showseq application
Or you might do it completely with showseq. Here the only thing you need
is an annotation file containing the positions of the primers, followed
by any text (e.g. primer name).
Then you do:
showseq   -translate - -format 4
-annotation 
Have a look at http://emboss.sourceforge.net/apps/showseq.html for more
options
 
HTH,
Marc
 

Marc Logghe, PhD
Expert Scientist Bioinformatics
deVGen NV
Technologiepark 30
B - 9052 Ghent-Zwijnaarde
Tel. +32 9 324 24 83
Fax. +32 9 324 24 25
Web: www.devgen.com

 --- Disclaimer start ---
This e-mail and any attachments thereto may contain information which is
confidential and/or which is proprietary to the sender. Accordingly,
this e-mail and any attachments thereto, as well as any and all
information contained therein, are intended for the sole use of the
recipient or recipients designated above. Any use of this e-mail, of any
attachments thereto, of any and all information contained therein,
and/or of any part(s) thereof (including, without limitation, total or
partial reproduction, communication and/or distribution in any form) by
persons other than the designated recipient(s) is prohibited. If you
have received this e-mail in error, please notify the sender either by
telephone or by e-mail and delete the material from any computer.
Thank you for your cooperation.
--- Disclaimer end ---
  

 


________________________________

	From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne
	Sent: Wednesday, February 15, 2006 10:20 PM
	To: bioperl-l at lists.open-bio.org
	Subject: [Bioperl-l] Primer maps?
	
	
	Hello all --
	
	I'm having a devil of a time figuring out how to make
restriction maps using BioPerl.  What I'm going for is output similar to
GCG's map program, but instead of using a set of defined restriction
enzymes, I'd like to use a set of primers, to create a primer map rather
than a restriction map.  I do not need a table of restriction enzymes
that cut or don't cut (or primers that match or don't match, in this
case), but an honest-to-goodness map, something like:
	
	                                       FKP-5->
	                                             |
	
CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
	1921
---------+---------+---------+---------+---------+---------+ 1980
	
GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
	 
	a                         M  E  I  V  S  T  F  D  E  L  Q  D  Y
-
	
	I also need translations of orfs, but I can use GenBank files as
input to the program and thus the CDS translations are already there, so
I'm guessing that shouldn't be too hard....  How does one create such a
map using the BioPerl modules?
	
	There are intriguing indications out there that such a thing is
possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I
can't find a single example of code that creates such a basic,
bread-and-butter thing as a restriction map with orf translations.  The
documentation to these modules is fairly useless to me, consisting
mostly of internal methods and function prototypes.  Perhaps my skills
as a Perl programmer are to blame, but a clear example of how a map like
this is constructed would be a big help.
	
	Right now, I'm generating primer maps with system calls to
EMBOSS's remap, pointing it at a file of primer sequences rather than a
file of restriction enzyme sequences, but the results are less than
desired.  I'm considering trying to adapt tacg 4.1.0 or sequence
extractor 1.1 web-based code to my needs, but this seems like a lot of
work for an operation I suspect is possible in BioPerl.
	
	Any help greatly appreciated...
	
	Mike
	

	
---------------------------------------------------------------------
	 //=\   Michael J. Coyne                       phone: (617)
525-7820
	 \=//   Channing Laboratory                    FAX:   (617)
264-5193
	  //=\  EBRC, Room 617
	  \=//  221 Longwood Avenue
email:mcoyne at channing.harvard.edu
	   //=\ Boston, MA 02115                 mjcoyne at comcast.net
	   \=// 
	
---------------------------------------------------------------------
	




From sdavis2 at mail.nih.gov  Thu Feb 16 14:43:45 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Thu, 16 Feb 2006 09:43:45 -0500
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost>
Message-ID: 

Do you mean that you want to use Bio::Graphics to make a picture, or just
map your primers onto a sequence?

Sean



On 2/15/06 4:20 PM, "Michael Coyne"  wrote:

> Hello all --
> 
> I'm having a devil of a time figuring out how to make restriction maps using
> BioPerl.  What I'm going for is output similar to GCG's map program, but
> instead of using a set of defined restriction enzymes, I'd like to use a set
> of primers, to create a primer map rather than a restriction map.  I do not
> need a table of restriction enzymes that cut or don't cut (or primers that
> match or don't match, in this case), but an honest-to-goodness map, something
> like:
> 
>                                       FKP-5->
>                                             |
>     CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
> 1921 ---------+---------+---------+---------+---------+---------+ 1980
>     GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>  
> a                        M  E  I  V  S  T  F  D  E L  Q  D  Y   -
> 
> I also need translations of orfs, but I can use GenBank files as input to the
> program and thus the CDS translations are already there, so I'm guessing that
> shouldn't be too hard....  How does one create such a map using the BioPerl
> modules?
> 
> There are intriguing indications out there that such a thing is possible (e.g.
> the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find a single
> example of code that creates such a basic, bread-and-butter thing as a
> restriction map with orf translations.  The documentation to these modules is
> fairly useless to me, consisting mostly of internal methods and function
> prototypes.  Perhaps my skills as a Perl programmer are to blame, but a clear
> example of how a map like this is constructed would be a big help.
> 
> Right now, I'm generating primer maps with system calls to EMBOSS's remap,
> pointing it at a file of primer sequences rather than a file of restriction
> enzyme sequences, but the results are less than desired.  I'm considering
> trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my
> needs, but this seems like a lot of work for an operation I suspect is
> possible in BioPerl.
> 
> Any help greatly appreciated...
> 
> Mike
> 
> ---------------------------------------------------------------------
>  //=\   Michael J. Coyne                      phone: (617) 525-7820
>  \=//   Channing Laboratory                   FAX:   (617) 264-5193
>   //=\  EBRC, Room 617
>   \=//  221 Longwood Avenue       email:mcoyne at channing.harvard.edu
>    //=\ Boston, MA 02115                mjcoyne at comcast.net
>    \=// 
> ---------------------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Thu Feb 16 16:27:13 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 11:27:13 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602140915.11604.hjm@tacgi.com>
Message-ID: 

Harry,

I've long suspected, but never demonstrated, that the easiest way to do
something like this is through ENSEMBL, and Jason hinted at this as well. In
fact your question is something of a FAQ, and my previous responses always
included a plea to some anonymous ENSEMBL API expert, always unheeded. At
any rate, here is an example script I made:

#!/usr/bin/perl



use strict;

use lib "/Users/bosborne/ensembl/modules";

use DBI;

use Getopt::Long;

use Bio::EnsEMBL::DBSQL::DBAdaptor;


my $name;



GetOptions( "n=s" => \$name );



my $db = Bio::EnsEMBL::DBSQL::DBAdaptor->new(
-user   => "anonymous",

-dbname => "homo_sapiens_core_37_35j",

-host   => "ensembldb.ensembl.org",

-pass   => "",                 

-driver => 'mysql'

);



my $gene_adaptor = $db->get_GeneAdaptor;

my $slice_adaptor = $db->get_SliceAdaptor;



my @genes = @{$gene_adaptor->fetch_all_by_external_name($name)};



for my $gene (@genes) {

  for my $trans (@{$gene->get_all_Transcripts}) {

      my $seq = $slice_adaptor->fetch_by_region("chromosome",

             $trans->seq_region_name,

             $trans->start,

             $trans->end);


      print "\n",$seq->seq,"\n";

  }

}

There are some issues, the largest of which is that though this script
prints out big sequences it's completely untested! Another is that it makes
assumptions about transcripts, you should verify for yourself that ENSEMBL's
definition of transcript fits yours. Finally that
fetch_all_by_external_name() method does not seem to accept a second
argument, i.e. namespace. I found this surprising. Anyway, if more than one
gene is retrieved using some name or id you're in a quandary.

For more on this API see:

http://www.ensembl.org/info/software/core/core_tutorial.html

There are tons of modules and methods in this API, I've barely scratched the
surface here.


Brian O.




On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:

> Hi Brian,
> 
> Thanks very much for the pointers and the speed of your reply and apologies
> for the speed of mine.
> 
> This looks good, but what I was looking for was a bioP approach for hooking to
> an API at NCBI or EBI so I could get this info and seqs from them.  In this
> case, speed of retrieval is not critical and I'd rather not download the
> entirety of the sequences to a local disk to hack at them.
> 
> I've determined a screen-scraping approach to get them and could script that,
> but I thought that bioP had a method for using NCBI's external API's, tho it
> may be that my memory is faulty or the approach is no longer supported due to
> overload.  
> 
> Does NCBI make such APIs available anymore?  I searched a bit for docs on them
> but couldn't find anything (unless it's buried in the NCBI tookit, which I
> haven't started to excavate).
> 
> Failing that, would SEALS provide such a service? Any PerlPinipeds listening?
> 
> Harry
> 
> 
> 
> 
> 
> 
> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>> Harry,
>> 
>> Hope you're doing well. The approach could be based on Bio::DB::Fasta. So,
>> from its documentation:
>> 
>>   use Bio::DB::Fasta;
>> 
>>   # create database from directory of fasta files
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   # simple access (for those without Bioperl)
>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>   my @ids     = $db->ids;
>>   my $length   = $db->length('CHROMOSOME_I');
>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>   my $header   = $db->header('CHROMOSOME_I');
>> 
>>   # Bioperl-style access
>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>> 
>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>   my $seq     = $obj->seq;
>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>> 
>> Do you already have the offsets?
>> 
>> Brian O.
>> 
>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>> Hi All,
>>> 
>>> After perusing the tutorial and other docs for a an evening, I still
>>> can't find the answer to this.  Forgive me if I've missed something
>>> obvious.
>>> 
>>> This should not be a novel request, but I've not found it answered.  If
>>> bioperl isn't the best way to do this, I'd be grateful to a pointer to a
>>> better way, especially if it includes an illuminating bit of code.
>>> 
>>> The problem is to retrieve genomic sequences plus & minus some offset
>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>> common followup chore for some extra analysis from a gene expression
>>> expt.  Or maybe this is in the DBFetch routines, but I've missed the
>>> sequence type to specify...?
>>> 
>>> 
>>> TIA!




From heikki at sanbi.ac.za  Thu Feb 16 17:32:51 2006
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 16 Feb 2006 19:32:51 +0200
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>
References: <0C528E3670D8CE4B8E013F6749231AA6746B35@ANTARESIA.be.devgen.com>
Message-ID: <200602161932.51552.heikki@sanbi.ac.za>

Mike,

Marc's suggestion is the best I've heard.

We really do not have any kind of pretty print functionality within BioPerl.
I guess there has not been a pressing need.  Bio::Graphics has filled in the 
need for sequence display.

I think Bio::Seq::PrettyPrint could be a great way to design prettyprinting in 
very modular way so that it can print out anything mapped to a sequence 
location. The EMBOSS showseq would be a great  help in there. A student 
project?

Would anyone be interested? 

   -Heikki




On Thursday 16 February 2006 17:47, Marc Logghe wrote:
> Hi Mike,
> Another route you might take is mapping your primers into
> Bio::SeqFeature::Generic objects and add them to the seq object. Then
> you dump the object into a rich sequence format like genbank and pass
> that to EMBOSS's showseq application
> Or you might do it completely with showseq. Here the only thing you need
> is an annotation file containing the positions of the primers, followed
> by any text (e.g. primer name).
> Then you do:
> showseq   -translate - -format 4
> -annotation 
> Have a look at http://emboss.sourceforge.net/apps/showseq.html for more
> options
>
> HTH,
> Marc
>
>
> Marc Logghe, PhD
> Expert Scientist Bioinformatics
> deVGen NV
> Technologiepark 30
> B - 9052 Ghent-Zwijnaarde
> Tel. +32 9 324 24 83
> Fax. +32 9 324 24 25
> Web: www.devgen.com
>
>  --- Disclaimer start ---
> This e-mail and any attachments thereto may contain information which is
> confidential and/or which is proprietary to the sender. Accordingly,
> this e-mail and any attachments thereto, as well as any and all
> information contained therein, are intended for the sole use of the
> recipient or recipients designated above. Any use of this e-mail, of any
> attachments thereto, of any and all information contained therein,
> and/or of any part(s) thereof (including, without limitation, total or
> partial reproduction, communication and/or distribution in any form) by
> persons other than the designated recipient(s) is prohibited. If you
> have received this e-mail in error, please notify the sender either by
> telephone or by e-mail and delete the material from any computer.
> Thank you for your cooperation.
> --- Disclaimer end ---
>
>
>
>
>
> ________________________________
>
> 	From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Michael Coyne
> 	Sent: Wednesday, February 15, 2006 10:20 PM
> 	To: bioperl-l at lists.open-bio.org
> 	Subject: [Bioperl-l] Primer maps?
>
>
> 	Hello all --
>
> 	I'm having a devil of a time figuring out how to make
> restriction maps using BioPerl.  What I'm going for is output similar to
> GCG's map program, but instead of using a set of defined restriction
> enzymes, I'd like to use a set of primers, to create a primer map rather
> than a restriction map.  I do not need a table of restriction enzymes
> that cut or don't cut (or primers that match or don't match, in this
> case), but an honest-to-goodness map, something like:
>
> 	                                       FKP-5->
>
>
> CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
> 	1921
> ---------+---------+---------+---------+---------+---------+ 1980
>
> GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>
> 	a                         M  E  I  V  S  T  F  D  E  L  Q  D  Y
> -
>
> 	I also need translations of orfs, but I can use GenBank files as
> input to the program and thus the CDS translations are already there, so
> I'm guessing that shouldn't be too hard....  How does one create such a
> map using the BioPerl modules?
>
> 	There are intriguing indications out there that such a thing is
> possible (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I
> can't find a single example of code that creates such a basic,
> bread-and-butter thing as a restriction map with orf translations.  The
> documentation to these modules is fairly useless to me, consisting
> mostly of internal methods and function prototypes.  Perhaps my skills
> as a Perl programmer are to blame, but a clear example of how a map like
> this is constructed would be a big help.
>
> 	Right now, I'm generating primer maps with system calls to
> EMBOSS's remap, pointing it at a file of primer sequences rather than a
> file of restriction enzyme sequences, but the results are less than
> desired.  I'm considering trying to adapt tacg 4.1.0 or sequence
> extractor 1.1 web-based code to my needs, but this seems like a lot of
> work for an operation I suspect is possible in BioPerl.
>
> 	Any help greatly appreciated...
>
> 	Mike
>
>
>
> ---------------------------------------------------------------------
> 	 //=\   Michael J. Coyne                       phone: (617)
> 525-7820
> 	 \=//   Channing Laboratory                    FAX:   (617)
> 264-5193
> 	  //=\  EBRC, Room 617
> 	  \=//  221 Longwood Avenue
> email:mcoyne at channing.harvard.edu
> 	   //=\ Boston, MA 02115                 mjcoyne at comcast.net
> 	   \=//
>
> ---------------------------------------------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From osborne1 at optonline.net  Thu Feb 16 17:59:37 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 12:59:37 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: <200602160823.03534.hjm@tacgi.com>
Message-ID: 

Chris and Harry,

I'm writing a Wiki page on this, it's linked to the FAQ as Wiki is
complaining that the FAQ is getting too big. I'll fill in the ENSEMBL API
and Bio::DB::Fasta approaches, if you would comment on the BioPerl/eutils
approach at some point that would be superb:

http://bioperl.open-bio.org/wiki/Getting_Genomic_Sequences

Brian O.


On 2/16/06 11:23 AM, "Harry Mangalam"  wrote:

> Yes, I'm going to  try this 1st.  Also the pointer to the NCBI eutils page was
> helpful.  They describe the same thing and I think that API will give me what
> I need.  I'll post back to report.
> 
> Sorry for the delay in answering - this is a side project and as such is going
> slow.
> 
> Many thanks to you guys, especially Brian for the example code - much more
> than I had a right to expect.  Virtual Beers all round and real ones should
> we ever meet up.
> 
> Harry
> 
> 
> On Thursday 16 February 2006 04:52, Chris Fields wrote:
>> I think a method was recently implemented in Bio::DB::GenBank to
>> retrieve a segment of DNA given start and end coordinates in GenBank
>> format; that should contain the features you need.  I requested it
>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>> Would that help?
>> 
>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>> Harry,
>>> 
>>> It's not clear to me that NCBI's eutils offers this capability
>>> directly. You
>>> can probably download Entrez Gene entries and parse them for
>>> coordinates but
>>> I know of no way to remotely retrieve genomic sequences like this
>>> from NCBI
>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>> that some
>>> of us favor and to prove to myself that this is simple to do I wrote a
>>> script that I just added to examples/tools, it's called
>>> extract_genes.pl and
>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>> species to some dir, download Entrez Gene's gene2accession file,
>>> and run. It
>>> creates and stores a hash for lookups, it won't read gene2accession
>>> each
>>> time it runs.
>>> 
>>> Brian O.
>>> 
>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>> Hi Brian,
>>>> 
>>>> Thanks very much for the pointers and the speed of your reply and
>>>> apologies
>>>> for the speed of mine.
>>>> 
>>>> This looks good, but what I was looking for was a bioP approach
>>>> for hooking to
>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>> them.  In this
>>>> case, speed of retrieval is not critical and I'd rather not
>>>> download the
>>>> entirety of the sequences to a local disk to hack at them.
>>>> 
>>>> I've determined a screen-scraping approach to get them and could
>>>> script that,
>>>> but I thought that bioP had a method for using NCBI's external
>>>> API's, tho it
>>>> may be that my memory is faulty or the approach is no longer
>>>> supported due to
>>>> overload.
>>>> 
>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>> docs on them
>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>> which I
>>>> haven't started to excavate).
>>>> 
>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>> listening?
>>>> 
>>>> Harry
>>>> 
>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>> Harry,
>>>>> 
>>>>> Hope you're doing well. The approach could be based on
>>>>> Bio::DB::Fasta. So,
>>>>> from its documentation:
>>>>> 
>>>>>   use Bio::DB::Fasta;
>>>>> 
>>>>>   # create database from directory of fasta files
>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>> 
>>>>>   # simple access (for those without Bioperl)
>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>   my @ids     = $db->ids;
>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>> 
>>>>>   # Bioperl-style access
>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>> 
>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>   my $seq     = $obj->seq;
>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>> 
>>>>> Do you already have the offsets?
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>> Hi All,
>>>>>> 
>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>> still
>>>>>> can't find the answer to this.  Forgive me if I've missed something
>>>>>> obvious.
>>>>>> 
>>>>>> This should not be a novel request, but I've not found it
>>>>>> answered.  If
>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>> pointer to a
>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>> 
>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>> offset
>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>>>> common followup chore for some extra analysis from a gene
>>>>>> expression
>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>> the
>>>>>> sequence type to specify...?
>>>>>> 
>>>>>> 
>>>>>> TIA!
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign




From hjm at tacgi.com  Thu Feb 16 17:02:07 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 09:02:07 -0800
Subject: [Bioperl-l] Primer maps?
In-Reply-To: <6.2.0.14.0.20060215155422.01d44a98@localhost>
References: <6.2.0.14.0.20060215155422.01d44a98@localhost>
Message-ID: <200602160902.07383.hjm@tacgi.com>

A bit off the bioperl topic - if you must have bioperl, ignore this (or just 
system() wrap the command) -  but you can do exactly this mapping and in-line 
translation with a thing I wrote called tacg - you make a GCG-formatted file 
of primers ie for each pattern you need a line like:

   
;         Top                         Bottom
;Name    Offset Recognition Pattern   Offset    ! comments
primer1    0   tcgggywmkkgg               0    ! ...
primer2    0   gcttggctgaggag             0    !
 .
 .
 .
Obviously the offsets can be set to 0 for non REs.
There's no limit to the number of primer patterns (tho I think there's a 
compiled-in limit of 30 chars in the pattern - easily changed in header), no 
limit to amount of seq searched, handles degeneracies, searches at ~4Mbases/s 
on a 2G opteron (120 patterns).
 
Also does searching with errors (slowly) and regex's (at pcre speeds), and 
matrices.  Other neat stuff, too.

The output is sort of as you describe - replace the RE names with your primer 
labels and you'll have it.

6 frame xl with 3 letter abbrievs.

                  BsrGI    BsrGI AflII                      DraI
                   \        \     \                          \
    121   gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt    180
   3453   cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa   3512
              ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
1         ValCysIleCysThrLeuCysThrLeuLysThrTyrThrPheHisCysValTerIleIle
2          CysValPheValHisPheValHisLeuArgProThrHisPheIleValPheLysLeuLeu
3           ValTyrLeuTyrThrLeuTyrThrTerAspLeuHisIleSerLeuCysLeuAsnTyrTyr

4           HisIleGlnValSerGlnValSerLeuTerValValAsnTerGlnThrTerIleIleVal
5          ThrTyrLysTyrValLysTyrValTerArgSerCysMetGluAsnHisLysPheTerTer
6         HisThrAsnThrCysLysThrCysLysGlyLeuValCysLysMetThrAsnLeuAsnAsn

or 3 frames with 1 letter abbrievs

                   BsrGI    BsrGI AflII                      DraI
                   \        \     \                          \
    121   gtgtgtatttgtacactttgtacacttaagacctacacatttcattgtgtttaaattatt    180
   3453   cacacataaacatgtgaaacatgtgaattctggatgtgtaaagtaacacaaatttaataa   3512
              ^    *    ^    *    ^    *    ^    *    ^    *    ^    *
1         V  C  I  C  T  L  C  T  L  K  T  Y  T  F  H  C  V  *  I  I
2          C  V  F  V  H  F  V  H  L  R  P  T  H  F  I  V  F  K  L  L
3           V  Y  L  Y  T  L  Y  T  *  D  L  H  I  S  L  C  L  N  Y  Y

read more at tacg.sf.net or reply to me for the latest docs and version - have 
to admit the sf site is a bit moldy.

hjm


On Wednesday 15 February 2006 13:20, Michael Coyne wrote:
>  Hello all --
>
>  I'm having a devil of a time figuring out how to make restriction maps
> using BioPerl.? What I'm going for is output similar to GCG's map program,
> but instead of using a set of defined restriction enzymes, I'd like to use
> a set of primers, to create a primer map rather than a restriction map.? I
> do not need a table of restriction enzymes that cut or don't cut (or
> primers that match or don't match, in this case), but an honest-to-goodness
> map, something like:
>
>   ?????????????????????????????????????? FKP-5->
>  ???????????????????????????????????????????? |
>  ???? CGTTCTATCGATATGGGTGCTATGGAAATAGTATCTACGTTTGATGAATTGCAAGATTAT
>  1921 ---------+---------+---------+---------+---------+---------+ 1980
>  ???? GCAAGATAGCTATACCCACGATACCTTTATCATAGATGCAAACTACTTAACGTTCTAATA
>  ?
>  a???????????????????????? M? E? I? V? S? T? F? D? E? L? Q? D? Y?? -
>
>  I also need translations of orfs, but I can use GenBank files as input to
> the program and thus the CDS translations are already there, so I'm
> guessing that shouldn't be too hard....? How does one create such a map
> using the BioPerl modules?
>
>  There are intriguing indications out there that such a thing is possible
> (e.g. the Bio::Map:: * and Bio::Restriction:: * modules), but I can't find
> a single example of code that creates such a basic, bread-and-butter thing
> as a restriction map with orf translations.? The documentation to these
> modules is fairly useless to me, consisting mostly of internal methods and
> function prototypes.? Perhaps my skills as a Perl programmer are to blame,
> but a clear example of how a map like this is constructed would be a big
> help.
>
>  Right now, I'm generating primer maps with system calls to EMBOSS's remap,
> pointing it at a file of primer sequences rather than a file of restriction
> enzyme sequences, but the results are less than desired.? I'm considering
> trying to adapt tacg 4.1.0 or sequence extractor 1.1 web-based code to my
> needs, but this seems like a lot of work for an operation I suspect is
> possible in BioPerl.
>
>  Any help greatly appreciated...
>
>  Mike
>
>  ---------------------------------------------------------------------
>  ?//=\?? Michael J. Coyne?????????????????????? phone: (617) 525-7820
>  ?\=//?? Channing Laboratory??????????????????? FAX:?? (617) 264-5193
>  ? //=\? EBRC, Room 617
>  ? \=//? 221 Longwood Avenue??????? email:mcoyne at channing.harvard.edu
>  ?? //=\ Boston, MA 02115???????????????? mjcoyne at comcast.net
>  ?? \=//
>  ---------------------------------------------------------------------

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>



From hjm at tacgi.com  Thu Feb 16 16:23:02 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 08:23:02 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
References: 
	
Message-ID: <200602160823.03534.hjm@tacgi.com>

Yes, I'm going to  try this 1st.  Also the pointer to the NCBI eutils page was 
helpful.  They describe the same thing and I think that API will give me what 
I need.  I'll post back to report.  

Sorry for the delay in answering - this is a side project and as such is going 
slow.

Many thanks to you guys, especially Brian for the example code - much more 
than I had a right to expect.  Virtual Beers all round and real ones should 
we ever meet up.

Harry


On Thursday 16 February 2006 04:52, Chris Fields wrote:
> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need.  I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
>
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > Harry,
> >
> > It's not clear to me that NCBI's eutils offers this capability
> > directly. You
> > can probably download Entrez Gene entries and parse them for
> > coordinates but
> > I know of no way to remotely retrieve genomic sequences like this
> > from NCBI
> > (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > that some
> > of us favor and to prove to myself that this is simple to do I wrote a
> > script that I just added to examples/tools, it's called
> > extract_genes.pl and
> > it's based on Bio::DB::Fasta. Download the sequence files for a given
> > species to some dir, download Entrez Gene's gene2accession file,
> > and run. It
> > creates and stores a hash for lookups, it won't read gene2accession
> > each
> > time it runs.
> >
> > Brian O.
> >
> > On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> >> Hi Brian,
> >>
> >> Thanks very much for the pointers and the speed of your reply and
> >> apologies
> >> for the speed of mine.
> >>
> >> This looks good, but what I was looking for was a bioP approach
> >> for hooking to
> >> an API at NCBI or EBI so I could get this info and seqs from
> >> them.  In this
> >> case, speed of retrieval is not critical and I'd rather not
> >> download the
> >> entirety of the sequences to a local disk to hack at them.
> >>
> >> I've determined a screen-scraping approach to get them and could
> >> script that,
> >> but I thought that bioP had a method for using NCBI's external
> >> API's, tho it
> >> may be that my memory is faulty or the approach is no longer
> >> supported due to
> >> overload.
> >>
> >> Does NCBI make such APIs available anymore?  I searched a bit for
> >> docs on them
> >> but couldn't find anything (unless it's buried in the NCBI tookit,
> >> which I
> >> haven't started to excavate).
> >>
> >> Failing that, would SEALS provide such a service? Any PerlPinipeds
> >> listening?
> >>
> >> Harry
> >>
> >> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> >>> Harry,
> >>>
> >>> Hope you're doing well. The approach could be based on
> >>> Bio::DB::Fasta. So,
> >>> from its documentation:
> >>>
> >>>   use Bio::DB::Fasta;
> >>>
> >>>   # create database from directory of fasta files
> >>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>
> >>>   # simple access (for those without Bioperl)
> >>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> >>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> >>>   my @ids     = $db->ids;
> >>>   my $length   = $db->length('CHROMOSOME_I');
> >>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> >>>   my $header   = $db->header('CHROMOSOME_I');
> >>>
> >>>   # Bioperl-style access
> >>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>
> >>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> >>>   my $seq     = $obj->seq;
> >>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> >>>
> >>> Do you already have the offsets?
> >>>
> >>> Brian O.
> >>>
> >>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> >>>> Hi All,
> >>>>
> >>>> After perusing the tutorial and other docs for a an evening, I
> >>>> still
> >>>> can't find the answer to this.  Forgive me if I've missed something
> >>>> obvious.
> >>>>
> >>>> This should not be a novel request, but I've not found it
> >>>> answered.  If
> >>>> bioperl isn't the best way to do this, I'd be grateful to a
> >>>> pointer to a
> >>>> better way, especially if it includes an illuminating bit of code.
> >>>>
> >>>> The problem is to retrieve genomic sequences plus & minus some
> >>>> offset
> >>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> >>>> common followup chore for some extra analysis from a gene
> >>>> expression
> >>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> >>>> the
> >>>> sequence type to specify...?
> >>>>
> >>>>
> >>>> TIA!
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From cjfields at uiuc.edu  Thu Feb 16 21:37:25 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 15:37:25 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
Message-ID: <000301c63341$2e015d50$15327e82@pyrimidine>

As an update for those interested, I check on this today, feeding SearchIO
XML and text output for all NCBI's BLAST flavors.  Basically, all XML parses
fine.  All text output except blastn and tblastx works fine.  The last two
have the extra lines starting with 'Features in this part of subject
sequence:'.  I'll be checking into SearchIO::blast but don't know when I can
get around to posting a fix.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> Sent: Thursday, February 16, 2006 3:46 AM
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org; Chris Fields
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> version 1.28
> 
> Hi,
> 
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast-output.
> (see e.g. "Features flanking this part..." or "Features in this part
> ..."), example added.
> The blast.pm module starts looking for the hsp-alignement-information,
> but it dies when it hits this Feature-information.
> 
> Pieter
> 
> 
......







From osborne1 at optonline.net  Thu Feb 16 22:19:16 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Thu, 16 Feb 2006 17:19:16 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
 GeneIDs
In-Reply-To: 
Message-ID: 

Chris,

Yes. The question now is where to easily get the coordinates.

Brian O.


On 2/16/06 7:52 AM, "Chris Fields"  wrote:

> I think a method was recently implemented in Bio::DB::GenBank to
> retrieve a segment of DNA given start and end coordinates in GenBank
> format; that should contain the features you need.  I requested it
> ~Nov-Dec in the mailing list but didn't get a chance to test it.
> Would that help?
> 
> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> 
>> Harry,
>> 
>> It's not clear to me that NCBI's eutils offers this capability
>> directly. You
>> can probably download Entrez Gene entries and parse them for
>> coordinates but
>> I know of no way to remotely retrieve genomic sequences like this
>> from NCBI
>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>> that some
>> of us favor and to prove to myself that this is simple to do I wrote a
>> script that I just added to examples/tools, it's called
>> extract_genes.pl and
>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>> species to some dir, download Entrez Gene's gene2accession file,
>> and run. It
>> creates and stores a hash for lookups, it won't read gene2accession
>> each
>> time it runs.
>> 
>> Brian O.
>> 
>> 
>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>> 
>>> Hi Brian,
>>> 
>>> Thanks very much for the pointers and the speed of your reply and
>>> apologies
>>> for the speed of mine.
>>> 
>>> This looks good, but what I was looking for was a bioP approach
>>> for hooking to
>>> an API at NCBI or EBI so I could get this info and seqs from
>>> them.  In this
>>> case, speed of retrieval is not critical and I'd rather not
>>> download the
>>> entirety of the sequences to a local disk to hack at them.
>>> 
>>> I've determined a screen-scraping approach to get them and could
>>> script that,
>>> but I thought that bioP had a method for using NCBI's external
>>> API's, tho it
>>> may be that my memory is faulty or the approach is no longer
>>> supported due to
>>> overload.
>>> 
>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>> docs on them
>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>> which I
>>> haven't started to excavate).
>>> 
>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>> listening?
>>> 
>>> Harry
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>> Harry,
>>>> 
>>>> Hope you're doing well. The approach could be based on
>>>> Bio::DB::Fasta. So,
>>>> from its documentation:
>>>> 
>>>>   use Bio::DB::Fasta;
>>>> 
>>>>   # create database from directory of fasta files
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   # simple access (for those without Bioperl)
>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>   my @ids     = $db->ids;
>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>> 
>>>>   # Bioperl-style access
>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>> 
>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>   my $seq     = $obj->seq;
>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>> 
>>>> Do you already have the offsets?
>>>> 
>>>> Brian O.
>>>> 
>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>> Hi All,
>>>>> 
>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>> still
>>>>> can't find the answer to this.  Forgive me if I've missed something
>>>>> obvious.
>>>>> 
>>>>> This should not be a novel request, but I've not found it
>>>>> answered.  If
>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>> pointer to a
>>>>> better way, especially if it includes an illuminating bit of code.
>>>>> 
>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>> offset
>>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
>>>>> common followup chore for some extra analysis from a gene
>>>>> expression
>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>> the
>>>>> sequence type to specify...?
>>>>> 
>>>>> 
>>>>> TIA!
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Thu Feb 16 22:29:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 16:29:15 -0600
Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO
	text parsing?
Message-ID: <000001c63348$6b8136d0$15327e82@pyrimidine>

I'm floating this to see what people think...

I'm beginning to wonder, especially when I'm wading through the
regex/parsing nightmare in SearchIO::blast, if we should either require a
minimal BLAST version number for parsing to work in SearchIO::blast.  I
could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a
warning if the blast version number is below a minimal version, so at least
people will know what the problem is (not us!).

The regexes are really piling up, and the latest changes in blastn and
tblastx will require adding a few more.  I also think that this would help
remind everybody running the latest Bioperl that there are also newer
versions of BLAST.  My current thought is to get it working for the latest
text output from NCBI, check it against the last version of BLAST (v.
2.2.12, which, luckily, blastcl3 generates), and not worry too much about
older ones.

Any thoughts on this?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From cjfields at uiuc.edu  Thu Feb 16 22:45:52 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 16 Feb 2006 16:45:52 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: 
Message-ID: <000101c6334a$bd80a900$15327e82@pyrimidine>

If I know the start, end, and strand info for a list of features (personal
preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
up), couldn't I try pulling out the surrounding region?  My thought is this,
though I haven't coded it yet:

1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
(array of hashes) based off what I get from RNAMotif objects.
2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
in there somewhere to not tick off the NCBI curators.

Reason I'm interested in this is b/c I want to know where the RNA motif is
in context to surrounding features. If it is very close to a coding region,
then the motif likely indicates translational regulation.  Further away may
indicate transcriptional termination or another mechanism.

The files returned should have the features included as long as they are in
the full length GenBank record.  I tried it out using the web form but not
through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
page.  

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Brian Osborne [mailto:osborne1 at optonline.net]
> Sent: Thursday, February 16, 2006 4:19 PM
> To: Chris Fields
> Cc: Harry Mangalam; bioperl-l
> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names or
> GeneIDs
> 
> Chris,
> 
> Yes. The question now is where to easily get the coordinates.
> 
> Brian O.
> 
> 
> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> 
> > I think a method was recently implemented in Bio::DB::GenBank to
> > retrieve a segment of DNA given start and end coordinates in GenBank
> > format; that should contain the features you need.  I requested it
> > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > Would that help?
> >
> > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> >
> >> Harry,
> >>
> >> It's not clear to me that NCBI's eutils offers this capability
> >> directly. You
> >> can probably download Entrez Gene entries and parse them for
> >> coordinates but
> >> I know of no way to remotely retrieve genomic sequences like this
> >> from NCBI
> >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> >> that some
> >> of us favor and to prove to myself that this is simple to do I wrote a
> >> script that I just added to examples/tools, it's called
> >> extract_genes.pl and
> >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> >> species to some dir, download Entrez Gene's gene2accession file,
> >> and run. It
> >> creates and stores a hash for lookups, it won't read gene2accession
> >> each
> >> time it runs.
> >>
> >> Brian O.
> >>
> >>
> >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> >>
> >>> Hi Brian,
> >>>
> >>> Thanks very much for the pointers and the speed of your reply and
> >>> apologies
> >>> for the speed of mine.
> >>>
> >>> This looks good, but what I was looking for was a bioP approach
> >>> for hooking to
> >>> an API at NCBI or EBI so I could get this info and seqs from
> >>> them.  In this
> >>> case, speed of retrieval is not critical and I'd rather not
> >>> download the
> >>> entirety of the sequences to a local disk to hack at them.
> >>>
> >>> I've determined a screen-scraping approach to get them and could
> >>> script that,
> >>> but I thought that bioP had a method for using NCBI's external
> >>> API's, tho it
> >>> may be that my memory is faulty or the approach is no longer
> >>> supported due to
> >>> overload.
> >>>
> >>> Does NCBI make such APIs available anymore?  I searched a bit for
> >>> docs on them
> >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> >>> which I
> >>> haven't started to excavate).
> >>>
> >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> >>> listening?
> >>>
> >>> Harry
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> >>>> Harry,
> >>>>
> >>>> Hope you're doing well. The approach could be based on
> >>>> Bio::DB::Fasta. So,
> >>>> from its documentation:
> >>>>
> >>>>   use Bio::DB::Fasta;
> >>>>
> >>>>   # create database from directory of fasta files
> >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>>
> >>>>   # simple access (for those without Bioperl)
> >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> >>>>   my @ids     = $db->ids;
> >>>>   my $length   = $db->length('CHROMOSOME_I');
> >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> >>>>   my $header   = $db->header('CHROMOSOME_I');
> >>>>
> >>>>   # Bioperl-style access
> >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> >>>>
> >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> >>>>   my $seq     = $obj->seq;
> >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> >>>>
> >>>> Do you already have the offsets?
> >>>>
> >>>> Brian O.
> >>>>
> >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> >>>>> Hi All,
> >>>>>
> >>>>> After perusing the tutorial and other docs for a an evening, I
> >>>>> still
> >>>>> can't find the answer to this.  Forgive me if I've missed something
> >>>>> obvious.
> >>>>>
> >>>>> This should not be a novel request, but I've not found it
> >>>>> answered.  If
> >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> >>>>> pointer to a
> >>>>> better way, especially if it includes an illuminating bit of code.
> >>>>>
> >>>>> The problem is to retrieve genomic sequences plus & minus some
> >>>>> offset
> >>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> >>>>> common followup chore for some extra analysis from a gene
> >>>>> expression
> >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> >>>>> the
> >>>>> sequence type to specify...?
> >>>>>
> >>>>>
> >>>>> TIA!
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l




From hjm at tacgi.com  Thu Feb 16 23:10:59 2006
From: hjm at tacgi.com (Harry Mangalam)
Date: Thu, 16 Feb 2006 15:10:59 -0800
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names or
	GeneIDs
In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine>
References: <000101c6334a$bd80a900$15327e82@pyrimidine>
Message-ID: <200602161510.59679.hjm@tacgi.com>

This is essentially what I want to do and my [only in pseudocode] approach is 
basically what you describe, except that currently I only have HUGO 
descriptors, not Genbank UIDs.  If you know of an index that lists both, that 
would be the entire shot.

I'm also interested in tracking transcriptional control elements and 
cross-correlating & why I wrote the 'rules' chunk of the recently 
(self-promoted) tacg.

Best
Harry


On Thursday 16 February 2006 14:45, Chris Fields wrote:
> If I know the start, end, and strand info for a list of features (personal
> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
> up), couldn't I try pulling out the surrounding region?  My thought is
> this, though I haven't coded it yet:
>
> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
> (array of hashes) based off what I get from RNAMotif objects.
> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
> in there somewhere to not tick off the NCBI curators.
>
> Reason I'm interested in this is b/c I want to know where the RNA motif is
> in context to surrounding features. If it is very close to a coding region,
> then the motif likely indicates translational regulation.  Further away may
> indicate transcriptional termination or another mechanism.
>
> The files returned should have the features included as long as they are in
> the full length GenBank record.  I tried it out using the web form but not
> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
> page.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
> > -----Original Message-----
> > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > Sent: Thursday, February 16, 2006 4:19 PM
> > To: Chris Fields
> > Cc: Harry Mangalam; bioperl-l
> > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> > or GeneIDs
> >
> > Chris,
> >
> > Yes. The question now is where to easily get the coordinates.
> >
> > Brian O.
> >
> > On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> > > I think a method was recently implemented in Bio::DB::GenBank to
> > > retrieve a segment of DNA given start and end coordinates in GenBank
> > > format; that should contain the features you need.  I requested it
> > > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > > Would that help?
> > >
> > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > >> Harry,
> > >>
> > >> It's not clear to me that NCBI's eutils offers this capability
> > >> directly. You
> > >> can probably download Entrez Gene entries and parse them for
> > >> coordinates but
> > >> I know of no way to remotely retrieve genomic sequences like this
> > >> from NCBI
> > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > >> that some
> > >> of us favor and to prove to myself that this is simple to do I wrote a
> > >> script that I just added to examples/tools, it's called
> > >> extract_genes.pl and
> > >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> > >> species to some dir, download Entrez Gene's gene2accession file,
> > >> and run. It
> > >> creates and stores a hash for lookups, it won't read gene2accession
> > >> each
> > >> time it runs.
> > >>
> > >> Brian O.
> > >>
> > >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> > >>> Hi Brian,
> > >>>
> > >>> Thanks very much for the pointers and the speed of your reply and
> > >>> apologies
> > >>> for the speed of mine.
> > >>>
> > >>> This looks good, but what I was looking for was a bioP approach
> > >>> for hooking to
> > >>> an API at NCBI or EBI so I could get this info and seqs from
> > >>> them.  In this
> > >>> case, speed of retrieval is not critical and I'd rather not
> > >>> download the
> > >>> entirety of the sequences to a local disk to hack at them.
> > >>>
> > >>> I've determined a screen-scraping approach to get them and could
> > >>> script that,
> > >>> but I thought that bioP had a method for using NCBI's external
> > >>> API's, tho it
> > >>> may be that my memory is faulty or the approach is no longer
> > >>> supported due to
> > >>> overload.
> > >>>
> > >>> Does NCBI make such APIs available anymore?  I searched a bit for
> > >>> docs on them
> > >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> > >>> which I
> > >>> haven't started to excavate).
> > >>>
> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> > >>> listening?
> > >>>
> > >>> Harry
> > >>>
> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> > >>>> Harry,
> > >>>>
> > >>>> Hope you're doing well. The approach could be based on
> > >>>> Bio::DB::Fasta. So,
> > >>>> from its documentation:
> > >>>>
> > >>>>   use Bio::DB::Fasta;
> > >>>>
> > >>>>   # create database from directory of fasta files
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   # simple access (for those without Bioperl)
> > >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> > >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> > >>>>   my @ids     = $db->ids;
> > >>>>   my $length   = $db->length('CHROMOSOME_I');
> > >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> > >>>>   my $header   = $db->header('CHROMOSOME_I');
> > >>>>
> > >>>>   # Bioperl-style access
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> > >>>>   my $seq     = $obj->seq;
> > >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> > >>>>
> > >>>> Do you already have the offsets?
> > >>>>
> > >>>> Brian O.
> > >>>>
> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > >>>>> Hi All,
> > >>>>>
> > >>>>> After perusing the tutorial and other docs for a an evening, I
> > >>>>> still
> > >>>>> can't find the answer to this.  Forgive me if I've missed something
> > >>>>> obvious.
> > >>>>>
> > >>>>> This should not be a novel request, but I've not found it
> > >>>>> answered.  If
> > >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> > >>>>> pointer to a
> > >>>>> better way, especially if it includes an illuminating bit of code.
> > >>>>>
> > >>>>> The problem is to retrieve genomic sequences plus & minus some
> > >>>>> offset
> > >>>>> from a locus determined by HUGO keyword or GeneID.  This would be a
> > >>>>> common followup chore for some extra analysis from a gene
> > >>>>> expression
> > >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> > >>>>> the
> > >>>>> sequence type to specify...?
> > >>>>>
> > >>>>>
> > >>>>> TIA!
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Cheers, Harry
Harry J Mangalam - 949 856 2847 (vox; email for fax) - hjm at tacgi.com 
            <>


From anst at kvl.dk  Fri Feb 17 09:18:18 2006
From: anst at kvl.dk (Anders Stegmann)
Date: Fri, 17 Feb 2006 10:18:18 +0100
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F45FE60200009B00000ED6@gwia.kvl.dk>
References: <43F45FE60200009B00000ED6@gwia.kvl.dk>
Message-ID: <43F5A2EA0200009B00000F45@gwia.kvl.dk>



>>>Anders Stegmann  02/16/06 11:20 am >>>
Hi!

I am blasting a protein seq (query) against an identical seq with a
deletion of Aa nr 61 (subject).
Then I print out the type of nomatch Aa and its position.
The nomatch for the query seq is Aa G at position 61, which is correct.
The nomatch for the subject seq is V at position 60, which is definitely
not correct!?

Is this a bug?

testblast2.pl is the program to run

Q0045 is the query seq.

Q0045del61 is the subject seq (it has to be formated: formatdb -i
Q0045del61 -p T -o F).

Regards Anders.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045
Type: application/octet-stream
Size: 873 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Q0045del61
Type: application/octet-stream
Size: 872 bytes
Desc: not available
URL: 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: testblast2.pl
Type: application/octet-stream
Size: 6109 bytes
Desc: not available
URL: 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From saldroubi at yahoo.com  Fri Feb 17 17:49:40 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Fri, 17 Feb 2006 09:49:40 -0800 (PST)
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <43EAAEEF.3000304@infotech.monash.edu.au>
Message-ID: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>


Torsten and all,
 
 I don't think this will work for me for it only generates statistics for a single sequence.  What I need is a count matrix for each position for a number of DNA sequences.  In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide.
 
 For example if I pass an array of sequences say: ATC,CCC,TTT
 then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this:
 
 
                 1    2   3
      A        1    0    0
      C        1    1    2
      T        1    2    1     
      G        0    0    0
 
 Any idea of this is already built somewhere in bioperl?
 
 Thank you.
 
 
 Torsten Seemann  wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings?
>   Please excuse my lack of knowledge as I am a new comer to bioinformatics.

Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
example similar to what you want to do:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html

--Torsten Seemann




Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From muratem at eng.uah.edu  Fri Feb 17 17:45:30 2006
From: muratem at eng.uah.edu (Mike Muratet)
Date: Fri, 17 Feb 2006 11:45:30 -0600 (CST)
Subject: [Bioperl-l] Minimal versions requirements/warnings for SearchIO
 text parsing?
In-Reply-To: <000001c63348$6b8136d0$15327e82@pyrimidine>
References: <000001c63348$6b8136d0$15327e82@pyrimidine>
Message-ID: 



On Thu, 16 Feb 2006, Chris Fields wrote:

> I'm floating this to see what people think...
>
> I'm beginning to wonder, especially when I'm wading through the
> regex/parsing nightmare in SearchIO::blast, if we should either require a
> minimal BLAST version number for parsing to work in SearchIO::blast.  I
> could add a '$self->throw("Requires BLAST v2.x.x")' or at least give a
> warning if the blast version number is below a minimal version, so at least
> people will know what the problem is (not us!).
>
> The regexes are really piling up, and the latest changes in blastn and
> tblastx will require adding a few more.  I also think that this would help
> remind everybody running the latest Bioperl that there are also newer
> versions of BLAST.  My current thought is to get it working for the latest
> text output from NCBI, check it against the last version of BLAST (v.
> 2.2.12, which, luckily, blastcl3 generates), and not worry too much about
> older ones.
>
> Any thoughts on this?
>

Chris

I could live with it. I think most of the world runs on NCBI or WUBLAST 
and it's easy to download/update either of those.

Thanks for the effort. I use SearchIO a lot.

Mike


> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From MEC at stowers-institute.org  Fri Feb 17 18:15:53 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 17 Feb 2006 12:15:53 -0600
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: 

http://forkhead.cgb.ki.se/TFBS/ provides ability to generate position
frequency matrix from list of (presumaby aligned) sequences as follows:

#!/usr/bin/env perl	
use  TFBS::PatternGen::SimplePFM;
my @sequences = <>;
chomp @sequences;
print
TFBS::PatternGen::SimplePFM->new(-seq_list=>\@sequences)->pattern->rawpr
int;
exit 0;

The output when run on your example input shows that the order the
nucleotides is not the same as you expect (it is alphbetical):

1 0 0
1 1 2
0 0 0
1 2 1

Good luck,

TFBS installation requires signifigant dependencies, including bioperl
and PDL.

Malcolm Cook 

>-----Original Message-----
>From: bioperl-l-bounces at lists.open-bio.org 
>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sam 
>Al-Droubi
>Sent: Friday, February 17, 2006 11:50 AM
>To: Torsten Seemann
>Cc: BioPerl list
>Subject: Re: [Bioperl-l] Count or weight matrix in bioperl?
>
>
>Torsten and all,
> 
> I don't think this will work for me for it only generates 
>statistics for a single sequence.  What I need is a count 
>matrix for each position for a number of DNA sequences.  In 
>other words, if I pass there 3 sequences to this function then 
>it returns the count for each postion for each nucleotide.
> 
> For example if I pass an array of sequences say: ATC,CCC,TTT
> then I should get a matrix back that will have count for 
>postion 1,2,3 for each A,C,T, or G like this:
> 
> 
>                 1    2   3
>      A        1    0    0
>      C        1    1    2
>      T        1    2    1     
>      G        0    0    0
> 
> Any idea of this is already built somewhere in bioperl?
> 
> Thank you.
> 
> 
> Torsten Seemann  
>wrote:> Say I have an array of nucleotide sequences of of 
>length N. I want to calculate the count matrix (weight 
>matrix). That is for each position 1..N, I want to know how 
>many As, Cs ,Ts and Gs there are. Is the code to do this 
>already written in bioperl to build this matrix if I pass it 
>those strings?
>>   Please excuse my lack of knowledge as I am a new comer to 
>bioinformatics.
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation 
>even has an 
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/Seq
>Stats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely, 
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



From jason.stajich at duke.edu  Fri Feb 17 19:01:45 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Fri, 17 Feb 2006 14:01:45 -0500
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk>
References: <43F45FE60200009B00000ED6@gwia.kvl.dk>
	<43F5A2EA0200009B00000F45@gwia.kvl.dk>
Message-ID: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu>

In case people on the list think that by my speaking up about  
question means they should ignore it...

Hopefully someone else can help debug this - I really don't have time  
I'm afraid.

-jason


On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote:

>
>
>>>> Anders Stegmann  02/16/06 11:20 am >>>
> Hi!
>
> I am blasting a protein seq (query) against an identical seq with a
> deletion of Aa nr 61 (subject).
> Then I print out the type of nomatch Aa and its position.
> The nomatch for the query seq is Aa G at position 61, which is  
> correct.
> The nomatch for the subject seq is V at position 60, which is  
> definitely
> not correct!?
>
> Is this a bug?
>
> testblast2.pl is the program to run
>
> Q0045 is the query seq.
>
> Q0045del61 is the subject seq (it has to be formated: formatdb -i
> Q0045del61 -p T -o F).
>
> Regards Anders.
>
>
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Fri Feb 17 19:17:32 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 17 Feb 2006 13:17:32 -0600
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <69783647-DD43-4A20-84FA-88E5F8A2C535@duke.edu>
Message-ID: <000001c633f6$cd391740$15327e82@pyrimidine>

No, haven't ignored it.  Just been busy going through SearchIO::blast again
(I've perltidy'd it) since BLASTN and TBLASTX output (v2.2.13) don't work;
looks like all others should.  Trying to fix one problem at a time.  I'll
look at this next.  Don't worry about it.  ;>

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Friday, February 17, 2006 1:02 PM
> To: Anders Stegmann
> Cc: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] another searchIO bug? with blast report
> 
> In case people on the list think that by my speaking up about
> question means they should ignore it...
> 
> Hopefully someone else can help debug this - I really don't have time
> I'm afraid.
> 
> -jason
> 
> 
> On Feb 17, 2006, at 4:18 AM, Anders Stegmann wrote:
> 
> >
> >
> >>>> Anders Stegmann  02/16/06 11:20 am >>>
> > Hi!
> >
> > I am blasting a protein seq (query) against an identical seq with a
> > deletion of Aa nr 61 (subject).
> > Then I print out the type of nomatch Aa and its position.
> > The nomatch for the query seq is Aa G at position 61, which is
> > correct.
> > The nomatch for the subject seq is V at position 60, which is
> > definitely
> > not correct!?
> >
> > Is this a bug?
> >
> > testblast2.pl is the program to run
> >
> > Q0045 is the query seq.
> >
> > Q0045del61 is the subject seq (it has to be formated: formatdb -i
> > Q0045del61 -p T -o F).
> >
> > Regards Anders.
> >
> >
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From skirov at utk.edu  Fri Feb 17 18:09:00 2006
From: skirov at utk.edu (Stefan Kirov)
Date: Fri, 17 Feb 2006 13:09:00 -0500
Subject: [Bioperl-l] Count or weight matrix in bioperl?
In-Reply-To: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>
References: <20060217174940.56976.qmail@web34313.mail.mud.yahoo.com>
Message-ID: <43F6113C.6070501@utk.edu>

If you have bioperl-live:
write a file:
 >seqgroup1
ATC
CCC
TTT

my $mio=new Bio::Matrix::PSM::IO(-format=>'masta',-file=>$filename);
while (my $matrix=$mio->next_matrix) {#Returns 
Bio::Matrix::PSM::SiteMatrix object
#do something with the matrix...
print $matrix->consensus,"\n";
}

This is not going to give you the raw counts, but it will give you the 
fequency for each pos/letter. see the docs for Bio::Matrix::PSM::SiteMatrix
Hope this helps
Stefan

Sam Al-Droubi wrote:

>Torsten and all,
> 
> I don't think this will work for me for it only generates statistics for a single sequence.  What I need is a count matrix for each position for a number of DNA sequences.  In other words, if I pass there 3 sequences to this function then it returns the count for each postion for each nucleotide.
> 
> For example if I pass an array of sequences say: ATC,CCC,TTT
> then I should get a matrix back that will have count for postion 1,2,3 for each A,C,T, or G like this:
> 
> 
>                 1    2   3
>      A        1    0    0
>      C        1    1    2
>      T        1    2    1     
>      G        0    0    0
> 
> Any idea of this is already built somewhere in bioperl?
> 
> Thank you.
> 
> 
> Torsten Seemann  wrote:> Say I have an array of nucleotide sequences of of length N. I want to calculate the count matrix (weight matrix). That is for each position 1..N, I want to know how many As, Cs ,Ts and Gs there are. Is the code to do this already written in bioperl to build this matrix if I pass it those strings?
>  
>
>>  Please excuse my lack of knowledge as I am a new comer to bioinformatics.
>>    
>>
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation even has an 
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely, 
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l
>  
>



From cjfields at uiuc.edu  Fri Feb 17 23:02:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Fri, 17 Feb 2006 17:02:02 -0600
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
	orGeneIDs
In-Reply-To: <000101c6334a$bd80a900$15327e82@pyrimidine>
Message-ID: <000601c63416$2a14aa00$15327e82@pyrimidine>

Brian,

I added some sample code to the page.  See what you think.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 16, 2006 4:46 PM
> To: 'Brian Osborne'
> Cc: 'Harry Mangalam'; 'bioperl-l'
> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> orGeneIDs
> 
> If I know the start, end, and strand info for a list of features (personal
> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
> up), couldn't I try pulling out the surrounding region?  My thought is
> this,
> though I haven't coded it yet:
> 
> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
> (array of hashes) based off what I get from RNAMotif objects.
> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
> in there somewhere to not tick off the NCBI curators.
> 
> Reason I'm interested in this is b/c I want to know where the RNA motif is
> in context to surrounding features. If it is very close to a coding
> region,
> then the motif likely indicates translational regulation.  Further away
> may
> indicate transcriptional termination or another mechanism.
> 
> The files returned should have the features included as long as they are
> in
> the full length GenBank record.  I tried it out using the web form but not
> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
> page.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: Brian Osborne [mailto:osborne1 at optonline.net]
> > Sent: Thursday, February 16, 2006 4:19 PM
> > To: Chris Fields
> > Cc: Harry Mangalam; bioperl-l
> > Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
> or
> > GeneIDs
> >
> > Chris,
> >
> > Yes. The question now is where to easily get the coordinates.
> >
> > Brian O.
> >
> >
> > On 2/16/06 7:52 AM, "Chris Fields"  wrote:
> >
> > > I think a method was recently implemented in Bio::DB::GenBank to
> > > retrieve a segment of DNA given start and end coordinates in GenBank
> > > format; that should contain the features you need.  I requested it
> > > ~Nov-Dec in the mailing list but didn't get a chance to test it.
> > > Would that help?
> > >
> > > On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
> > >
> > >> Harry,
> > >>
> > >> It's not clear to me that NCBI's eutils offers this capability
> > >> directly. You
> > >> can probably download Entrez Gene entries and parse them for
> > >> coordinates but
> > >> I know of no way to remotely retrieve genomic sequences like this
> > >> from NCBI
> > >> (ENSEMBL API perhaps?). What I had in mind uses the local approach
> > >> that some
> > >> of us favor and to prove to myself that this is simple to do I wrote
> a
> > >> script that I just added to examples/tools, it's called
> > >> extract_genes.pl and
> > >> it's based on Bio::DB::Fasta. Download the sequence files for a given
> > >> species to some dir, download Entrez Gene's gene2accession file,
> > >> and run. It
> > >> creates and stores a hash for lookups, it won't read gene2accession
> > >> each
> > >> time it runs.
> > >>
> > >> Brian O.
> > >>
> > >>
> > >> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
> > >>
> > >>> Hi Brian,
> > >>>
> > >>> Thanks very much for the pointers and the speed of your reply and
> > >>> apologies
> > >>> for the speed of mine.
> > >>>
> > >>> This looks good, but what I was looking for was a bioP approach
> > >>> for hooking to
> > >>> an API at NCBI or EBI so I could get this info and seqs from
> > >>> them.  In this
> > >>> case, speed of retrieval is not critical and I'd rather not
> > >>> download the
> > >>> entirety of the sequences to a local disk to hack at them.
> > >>>
> > >>> I've determined a screen-scraping approach to get them and could
> > >>> script that,
> > >>> but I thought that bioP had a method for using NCBI's external
> > >>> API's, tho it
> > >>> may be that my memory is faulty or the approach is no longer
> > >>> supported due to
> > >>> overload.
> > >>>
> > >>> Does NCBI make such APIs available anymore?  I searched a bit for
> > >>> docs on them
> > >>> but couldn't find anything (unless it's buried in the NCBI tookit,
> > >>> which I
> > >>> haven't started to excavate).
> > >>>
> > >>> Failing that, would SEALS provide such a service? Any PerlPinipeds
> > >>> listening?
> > >>>
> > >>> Harry
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
> > >>>> Harry,
> > >>>>
> > >>>> Hope you're doing well. The approach could be based on
> > >>>> Bio::DB::Fasta. So,
> > >>>> from its documentation:
> > >>>>
> > >>>>   use Bio::DB::Fasta;
> > >>>>
> > >>>>   # create database from directory of fasta files
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   # simple access (for those without Bioperl)
> > >>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
> > >>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
> > >>>>   my @ids     = $db->ids;
> > >>>>   my $length   = $db->length('CHROMOSOME_I');
> > >>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
> > >>>>   my $header   = $db->header('CHROMOSOME_I');
> > >>>>
> > >>>>   # Bioperl-style access
> > >>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
> > >>>>
> > >>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
> > >>>>   my $seq     = $obj->seq;
> > >>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
> > >>>>
> > >>>> Do you already have the offsets?
> > >>>>
> > >>>> Brian O.
> > >>>>
> > >>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
> > >>>>> Hi All,
> > >>>>>
> > >>>>> After perusing the tutorial and other docs for a an evening, I
> > >>>>> still
> > >>>>> can't find the answer to this.  Forgive me if I've missed
> something
> > >>>>> obvious.
> > >>>>>
> > >>>>> This should not be a novel request, but I've not found it
> > >>>>> answered.  If
> > >>>>> bioperl isn't the best way to do this, I'd be grateful to a
> > >>>>> pointer to a
> > >>>>> better way, especially if it includes an illuminating bit of code.
> > >>>>>
> > >>>>> The problem is to retrieve genomic sequences plus & minus some
> > >>>>> offset
> > >>>>> from a locus determined by HUGO keyword or GeneID.  This would be
> a
> > >>>>> common followup chore for some extra analysis from a gene
> > >>>>> expression
> > >>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
> > >>>>> the
> > >>>>> sequence type to specify...?
> > >>>>>
> > >>>>>
> > >>>>> TIA!
> > >>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From osborne1 at optonline.net  Sat Feb 18 04:01:14 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 17 Feb 2006 23:01:14 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
 orGeneIDs
In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine>
Message-ID: 

Chris,

That's nice. Now what I'm puzzling over is how to get the genomic
coordinates given an id, like a Gene id. The raw query is something like:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=2&rettyp
e=xml

This is _something_ like the queries used within Bio::DB::Query::GenBank,
but not exactly. Now taking a look at how the text returned is transformed
into objects...

Brian O.


On 2/17/06 6:02 PM, "Chris Fields"  wrote:

> Brian,
> 
> I added some sample code to the page.  See what you think.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 16, 2006 4:46 PM
>> To: 'Brian Osborne'
>> Cc: 'Harry Mangalam'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> orGeneIDs
>> 
>> If I know the start, end, and strand info for a list of features (personal
>> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
>> up), couldn't I try pulling out the surrounding region?  My thought is
>> this,
>> though I haven't coded it yet:
>> 
>> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
>> (array of hashes) based off what I get from RNAMotif objects.
>> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
>> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
>> in there somewhere to not tick off the NCBI curators.
>> 
>> Reason I'm interested in this is b/c I want to know where the RNA motif is
>> in context to surrounding features. If it is very close to a coding
>> region,
>> then the motif likely indicates translational regulation.  Further away
>> may
>> indicate transcriptional termination or another mechanism.
>> 
>> The files returned should have the features included as long as they are
>> in
>> the full length GenBank record.  I tried it out using the web form but not
>> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
>> page.
>> 
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>>> -----Original Message-----
>>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>>> Sent: Thursday, February 16, 2006 4:19 PM
>>> To: Chris Fields
>>> Cc: Harry Mangalam; bioperl-l
>>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> or
>>> GeneIDs
>>> 
>>> Chris,
>>> 
>>> Yes. The question now is where to easily get the coordinates.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
>>> 
>>>> I think a method was recently implemented in Bio::DB::GenBank to
>>>> retrieve a segment of DNA given start and end coordinates in GenBank
>>>> format; that should contain the features you need.  I requested it
>>>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>>>> Would that help?
>>>> 
>>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>>> 
>>>>> Harry,
>>>>> 
>>>>> It's not clear to me that NCBI's eutils offers this capability
>>>>> directly. You
>>>>> can probably download Entrez Gene entries and parse them for
>>>>> coordinates but
>>>>> I know of no way to remotely retrieve genomic sequences like this
>>>>> from NCBI
>>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>>>> that some
>>>>> of us favor and to prove to myself that this is simple to do I wrote
>> a
>>>>> script that I just added to examples/tools, it's called
>>>>> extract_genes.pl and
>>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>>>> species to some dir, download Entrez Gene's gene2accession file,
>>>>> and run. It
>>>>> creates and stores a hash for lookups, it won't read gene2accession
>>>>> each
>>>>> time it runs.
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> 
>>>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>>> 
>>>>>> Hi Brian,
>>>>>> 
>>>>>> Thanks very much for the pointers and the speed of your reply and
>>>>>> apologies
>>>>>> for the speed of mine.
>>>>>> 
>>>>>> This looks good, but what I was looking for was a bioP approach
>>>>>> for hooking to
>>>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>>>> them.  In this
>>>>>> case, speed of retrieval is not critical and I'd rather not
>>>>>> download the
>>>>>> entirety of the sequences to a local disk to hack at them.
>>>>>> 
>>>>>> I've determined a screen-scraping approach to get them and could
>>>>>> script that,
>>>>>> but I thought that bioP had a method for using NCBI's external
>>>>>> API's, tho it
>>>>>> may be that my memory is faulty or the approach is no longer
>>>>>> supported due to
>>>>>> overload.
>>>>>> 
>>>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>>>> docs on them
>>>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>>>> which I
>>>>>> haven't started to excavate).
>>>>>> 
>>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>>>> listening?
>>>>>> 
>>>>>> Harry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>>>> Harry,
>>>>>>> 
>>>>>>> Hope you're doing well. The approach could be based on
>>>>>>> Bio::DB::Fasta. So,
>>>>>>> from its documentation:
>>>>>>> 
>>>>>>>   use Bio::DB::Fasta;
>>>>>>> 
>>>>>>>   # create database from directory of fasta files
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   # simple access (for those without Bioperl)
>>>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>>>   my @ids     = $db->ids;
>>>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>>>> 
>>>>>>>   # Bioperl-style access
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>>>   my $seq     = $obj->seq;
>>>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>>>> 
>>>>>>> Do you already have the offsets?
>>>>>>> 
>>>>>>> Brian O.
>>>>>>> 
>>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>>>> still
>>>>>>>> can't find the answer to this.  Forgive me if I've missed
>> something
>>>>>>>> obvious.
>>>>>>>> 
>>>>>>>> This should not be a novel request, but I've not found it
>>>>>>>> answered.  If
>>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>>>> pointer to a
>>>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>>>> 
>>>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>>>> offset
>>>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be
>> a
>>>>>>>> common followup chore for some extra analysis from a gene
>>>>>>>> expression
>>>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>>>> the
>>>>>>>> sequence type to specify...?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> TIA!
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From osborne1 at optonline.net  Sat Feb 18 04:56:08 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Fri, 17 Feb 2006 23:56:08 -0500
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
In-Reply-To: <8975119BCD0AC5419D61A9CF1A923E95030081AC@iahce2ksrv1.iah.bbsrc.ac.uk>
Message-ID: 

Michael,

Yes, BioPerl has done this for you. Essentially what it does it take all the
ids in the CONTIG section and query for each individually, then use the
sequences and the location data to create the single large sequence. This
sequence is appended to the annotation and feature section of the initial
Genbank entry. If you want to study this yourself take a look at
Bio::DB::NCBIHelper::postprocess_data.

OK, to answer your first question with my assumption: what NCBI is doing is
simply providing a shorthand rather than an entire large sequence, therefore
no feature coordinates change, whether it's shorthand, CONTIG, or longhand,
ORIGIN. Second, my explanation tells you that all the sequences are the very
latest versions of each sequence, that's how eutils works by default.
However, I don't think I've answered your question because I'm not sure I
understand what you mean by "when I ask bioperl if these sequences have been
updated, I will be told no". All Bioperl does is read the file provided by
GenBank and use its stated version, nothing fancy.

Brian O.


On 2/16/06 5:31 AM, "michael watson (IAH-C)" 
wrote:

> Hi
> 
> I have two questions really.  I fetched bacterial genome sequences from
> the NCBI using Bio::DB::GenBank.
> 
> Some of these sequence entries are CONTIG sequences, ie they just point
> to other sequences that need to be joined together to form the entire
> genome.
> 
> Looking at my downloads, it looks as if bioperl has done all the
> necessary joining for me - or maybe it was the NCBI that did the
> joining?
> 
> OK, so firstly, did bioperl do the joining, and if so, are all the
> co-ordinates of the features updated to reflect their new location on
> the new, joined sequence?
> 
> And secondly, sequence versions... I'm thinking that possibly the
> sequence version of the CONTIG may be 1 (as it hasn't changed) yet the
> versions of the sequences it refers to might have changed, so when I ask
> bioperl if these sequences have been updated, I will be told no because
> the CONTIG sequence version is 1, but I should be told yes because the
> underlying sequences have...?
> 
> Make sense?
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From pedro.fabre at gmail.com  Fri Feb 17 18:36:37 2006
From: pedro.fabre at gmail.com (pedro fabre)
Date: Fri, 17 Feb 2006 18:36:37 +0000
Subject: [Bioperl-l] Count or weight matrix in bioperl?
Message-ID: 

>Torsten and all,
>
>  I don't think this will work for me for it only generates 
>statistics for a single sequence.  What I need is a count matrix for 
>each position for a number of DNA sequences.  In other words, if I 
>pass there 3 sequences to this function then it returns the count 
>for each postion for each nucleotide.
>
>  For example if I pass an array of sequences say: ATC,CCC,TTT
>  then I should get a matrix back that will have count for postion 
>1,2,3 for each A,C,T, or G like this:
>
>
>                  1    2   3
>       A        1    0    0
>       C        1    1    2
>       T        1    2    1    
>       G        0    0    0
>
>  Any idea of this is already built somewhere in bioperl?
>
>  Thank you.
>
>


Sam,

What about this?

I worked in something like that some time ago for SNP calculation

and it looks to me you are on the same way.

If you have a sequence like

   A       C       G       T       C       C       A       -       T
   C       G       G       T       A       G       T       G       C
   C       C       C       C       C       G       T       G       C
   C       G       C       T       C       G       T       G       C

Convert the sequence to numbers (0 for the first value, 1 for the 
first modification (reading by columns), 2 for the second 
modification and so on)
Deletions can be considered as another base if you like

After that:


   0       0       0       0       0       0       0       0       0
   1       1       0       0       1       1       1       1       1
   1       0       1       1       0       1       1       1       1
   1       1       1       0       0       1       1       1       1

Once we have the haplotype converted to numbers we have to generate the
snp type information for the haplotype.


SNP code = SUM ( value * multiplicity ^ position );>

     where:
       SUM is the sum of the values for the SNP
       value is the SNP number code (0 [generally for the mayor allele],
                                     1 [for the minor allele].
       position is the position on the block.

For this example the code is:

   0       0       0       0       0       0       0       0       0
   1       1       0       0       1       1       1       1       1
   1       0       1       1       0       1       1       1       1
   1       1       1       0       0       1       1       1       1
  ------------------------------------------------------------------
   14      10      12      4       2       14      14      14      14

   14 = 0*2^0 + 1*2^1 + 1*2^2 + 1*2^3
   12 = 0*2^0 + 1*2^1 + 0*2^2 + 1*2^3
   ....

Once we have the families classify. We will B just the SNP's B.

   14      10      12      4       2

If you want to look into the code follow this link.


http://users.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/PopGen/HtSNP.pm?rev=1.4&content-type=text/vnd.viewcvs-markup

HTH
Pedro



>  Torsten Seemann  wrote:> 
>Say I have an array of nucleotide sequences of of length N. I want 
>to calculate the count matrix (weight matrix). That is for each 
>position 1..N, I want to know how many As, Cs ,Ts and Gs there are. 
>Is the code to do this already written in bioperl to build this 
>matrix if I pass it those strings?
>>    Please excuse my lack of knowledge as I am a new comer to bioinformatics.
>
>Use the Bio::Tools::SeqStats module. The PDoc documentation even has an
>example similar to what you want to do:
>
>http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqStats.html
>
>--Torsten Seemann
>
>
>
>
>Sincerely,
>Sam Al-Droubi, M.S.
>saldroubi at yahoo.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Sat Feb 18 23:35:22 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 18 Feb 2006 17:35:22 -0600
Subject: [Bioperl-l] Bio::SearchIO fix posted in Bugzilla
Message-ID: <97C946BE-8410-4B7F-9FA3-97A01641E20E@uiuc.edu>

Added a fix for the blastn and tblastx problems with Bio::SearchIO  
text parsing of BLAST 2.2.13 output:

http://bugzilla.open-bio.org/show_bug.cgi?id=1934

The extra lines "Features in this part of subject sequence" and the  
following descriptive lines are passed over using a loop.  See the  
bug report for specifics.

Cheers,

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From osborne1 at optonline.net  Sun Feb 19 05:47:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Sun, 19 Feb 2006 00:47:44 -0500
Subject: [Bioperl-l] Fetching genomic sequences based on HUGO names
 orGeneIDs
In-Reply-To: <000601c63416$2a14aa00$15327e82@pyrimidine>
Message-ID: 

Chris and Harry,

OK, I've put the missing link in place. This is Bio::DB::EntrezGene, so you
can get NCBI Genes as objects, perfectly analogous to Bio::DB::GenBank and
the related modules:

use Bio::DB::EntrezGene;
$db = new Bio::DB::EntrezGene;
$seq = $db->get_Seq_by_id(2);

So starting with just a Gene id, then using Bio::DB::GenBank as Chris
showed, you can get the sequence. What's a little odd is how Entrez Gene has
stored positional information and Sequence identifier, you may have thought
that they'd create a special set of fields for this but no, it's only
available as part of a URL as far as I can tell:

Bio::Annotation::DBLink=HASH()
'_root_verbose' => 0

'database' => 'Evidence Viewer'

'primary_id' => 4693

'url' => 
'http://www.ncbi.nlm.nih.gov/sutils/evv.cgi?taxid=9606&contig=NT_079573.2&ge
ne=NDP&lid=4693&from=6657835&to=6682559'


Question: are NT_* sequences going to be a problem for Bio::DB::GenBank? I
see this in NCBIHelper:

# NT contigs can not be retrieved

$self->throw("NT_ contigs are whole chromosome files which are not part of
regular".
"database distributions. Go to ftp://ftp.ncbi.nih.gov/genomes/.")
      if $ids =~ /NT_/;


Perhaps we can modify this so there's no throw() when a seq_start and
seq_stop are specified.

Brian O.

On 2/17/06 6:02 PM, "Chris Fields"  wrote:

> Brian,
> 
> I added some sample code to the page.  See what you think.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 16, 2006 4:46 PM
>> To: 'Brian Osborne'
>> Cc: 'Harry Mangalam'; 'bioperl-l'
>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> orGeneIDs
>> 
>> If I know the start, end, and strand info for a list of features (personal
>> preference, since I use Bio::SeqFeature::Generic with the RNAMotif I drew
>> up), couldn't I try pulling out the surrounding region?  My thought is
>> this,
>> though I haven't coded it yet:
>> 
>> 1)  Draw up a list of Seqfeatures, with accession, start, stop coordinates
>> (array of hashes) based off what I get from RNAMotif objects.
>> 2)  Pull the sequence from NCBI using Bio::DB::GenBank with x bp upstream
>> and downstream, one at a time, using get_Seq_by_ID().  I could add a sleep
>> in there somewhere to not tick off the NCBI curators.
>> 
>> Reason I'm interested in this is b/c I want to know where the RNA motif is
>> in context to surrounding features. If it is very close to a coding
>> region,
>> then the motif likely indicates translational regulation.  Further away
>> may
>> indicate transcriptional termination or another mechanism.
>> 
>> The files returned should have the features included as long as they are
>> in
>> the full length GenBank record.  I tried it out using the web form but not
>> through Bio::DB::GenBank yet.  If I can get it to work I'll add it to the
>> page.
>> 
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>> 
>> 
>>> -----Original Message-----
>>> From: Brian Osborne [mailto:osborne1 at optonline.net]
>>> Sent: Thursday, February 16, 2006 4:19 PM
>>> To: Chris Fields
>>> Cc: Harry Mangalam; bioperl-l
>>> Subject: Re: [Bioperl-l] Fetching genomic sequences based on HUGO names
>> or
>>> GeneIDs
>>> 
>>> Chris,
>>> 
>>> Yes. The question now is where to easily get the coordinates.
>>> 
>>> Brian O.
>>> 
>>> 
>>> On 2/16/06 7:52 AM, "Chris Fields"  wrote:
>>> 
>>>> I think a method was recently implemented in Bio::DB::GenBank to
>>>> retrieve a segment of DNA given start and end coordinates in GenBank
>>>> format; that should contain the features you need.  I requested it
>>>> ~Nov-Dec in the mailing list but didn't get a chance to test it.
>>>> Would that help?
>>>> 
>>>> On Feb 15, 2006, at 11:16 PM, Brian Osborne wrote:
>>>> 
>>>>> Harry,
>>>>> 
>>>>> It's not clear to me that NCBI's eutils offers this capability
>>>>> directly. You
>>>>> can probably download Entrez Gene entries and parse them for
>>>>> coordinates but
>>>>> I know of no way to remotely retrieve genomic sequences like this
>>>>> from NCBI
>>>>> (ENSEMBL API perhaps?). What I had in mind uses the local approach
>>>>> that some
>>>>> of us favor and to prove to myself that this is simple to do I wrote
>> a
>>>>> script that I just added to examples/tools, it's called
>>>>> extract_genes.pl and
>>>>> it's based on Bio::DB::Fasta. Download the sequence files for a given
>>>>> species to some dir, download Entrez Gene's gene2accession file,
>>>>> and run. It
>>>>> creates and stores a hash for lookups, it won't read gene2accession
>>>>> each
>>>>> time it runs.
>>>>> 
>>>>> Brian O.
>>>>> 
>>>>> 
>>>>> On 2/14/06 12:15 PM, "Harry Mangalam"  wrote:
>>>>> 
>>>>>> Hi Brian,
>>>>>> 
>>>>>> Thanks very much for the pointers and the speed of your reply and
>>>>>> apologies
>>>>>> for the speed of mine.
>>>>>> 
>>>>>> This looks good, but what I was looking for was a bioP approach
>>>>>> for hooking to
>>>>>> an API at NCBI or EBI so I could get this info and seqs from
>>>>>> them.  In this
>>>>>> case, speed of retrieval is not critical and I'd rather not
>>>>>> download the
>>>>>> entirety of the sequences to a local disk to hack at them.
>>>>>> 
>>>>>> I've determined a screen-scraping approach to get them and could
>>>>>> script that,
>>>>>> but I thought that bioP had a method for using NCBI's external
>>>>>> API's, tho it
>>>>>> may be that my memory is faulty or the approach is no longer
>>>>>> supported due to
>>>>>> overload.
>>>>>> 
>>>>>> Does NCBI make such APIs available anymore?  I searched a bit for
>>>>>> docs on them
>>>>>> but couldn't find anything (unless it's buried in the NCBI tookit,
>>>>>> which I
>>>>>> haven't started to excavate).
>>>>>> 
>>>>>> Failing that, would SEALS provide such a service? Any PerlPinipeds
>>>>>> listening?
>>>>>> 
>>>>>> Harry
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Sunday 12 February 2006 08:37, Brian Osborne wrote:
>>>>>>> Harry,
>>>>>>> 
>>>>>>> Hope you're doing well. The approach could be based on
>>>>>>> Bio::DB::Fasta. So,
>>>>>>> from its documentation:
>>>>>>> 
>>>>>>>   use Bio::DB::Fasta;
>>>>>>> 
>>>>>>>   # create database from directory of fasta files
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   # simple access (for those without Bioperl)
>>>>>>>   my $seq      = $db->seq('CHROMOSOME_I',4_000_000 => 4_100_000);
>>>>>>>   my $revseq   = $db->seq('CHROMOSOME_I',4_100_000 => 4_000_000);
>>>>>>>   my @ids     = $db->ids;
>>>>>>>   my $length   = $db->length('CHROMOSOME_I');
>>>>>>>   my $alphabet = $db->alphabet('CHROMOSOME_I');
>>>>>>>   my $header   = $db->header('CHROMOSOME_I');
>>>>>>> 
>>>>>>>   # Bioperl-style access
>>>>>>>   my $db      = Bio::DB::Fasta->new('/path/to/fasta/files');
>>>>>>> 
>>>>>>>   my $obj     = $db->get_Seq_by_id('CHROMOSOME_I');
>>>>>>>   my $seq     = $obj->seq;
>>>>>>>   my $subseq  = $obj->subseq(4_000_000 => 4_100_000);
>>>>>>> 
>>>>>>> Do you already have the offsets?
>>>>>>> 
>>>>>>> Brian O.
>>>>>>> 
>>>>>>> On 2/12/06 1:46 AM, "Harry Mangalam"  wrote:
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> After perusing the tutorial and other docs for a an evening, I
>>>>>>>> still
>>>>>>>> can't find the answer to this.  Forgive me if I've missed
>> something
>>>>>>>> obvious.
>>>>>>>> 
>>>>>>>> This should not be a novel request, but I've not found it
>>>>>>>> answered.  If
>>>>>>>> bioperl isn't the best way to do this, I'd be grateful to a
>>>>>>>> pointer to a
>>>>>>>> better way, especially if it includes an illuminating bit of code.
>>>>>>>> 
>>>>>>>> The problem is to retrieve genomic sequences plus & minus some
>>>>>>>> offset
>>>>>>>> from a locus determined by HUGO keyword or GeneID.  This would be
>> a
>>>>>>>> common followup chore for some extra analysis from a gene
>>>>>>>> expression
>>>>>>>> expt.  Or maybe this is in the DBFetch routines, but I've missed
>>>>>>>> the
>>>>>>>> sequence type to specify...?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> TIA!
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> Christopher Fields
>>>> Postdoctoral Researcher
>>>> Lab of Dr. Robert Switzer
>>>> Dept of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 




From maximilianh at gmail.com  Sun Feb 19 13:52:37 2006
From: maximilianh at gmail.com (Maximilian Haeussler)
Date: Sun, 19 Feb 2006 14:52:37 +0100
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
Message-ID: <76f031ae0602190552v5f2542dbv@mail.gmail.com>

Hi bio-mailinglists,

does anyone here know of a tool or a library to display two (or more)
sequences at the same time with coloured features? Possibly with lines,
connecting some features from one sequence to the other (synteny-plot) ?
Or to display two multiple alignments, one on top of each other, with
colored features added?

It's not that it would be difficult to write, but programming visualisation
usually takes a lot of time.
Bio::Graphics seems mainly concerned with one main sequence and features on
it. Well, I could copy together two of these gif-images, but then there
would be no connecting lines. Same applies for the graphics in Biojava or
the gff2ps tool or all the multiple alignment viewers that I know (Bioedit,
ClustalX). There is something called Toucan in Java, which displays at least
several lines of gff-style-features, but no visible sequences and more
importantly, no connecting lines. A recent software, Djinn lite, is using a
similar kind of visualization to compare different spliced genes from
various species, but it's mainly aimed at splicing and written in Visual
Basic.
I guess a good compromise might be the 3D viewer Sockeye, but I haven't seen
any synteny-lines in sockeye yet.

I guess I must have missed something here. I cannot be the first one that
would like to compare, say, two gff files, or two multiple alignments?

Thanks a lot for any idea,
Max



From lutfullah at upesh.edu  Sun Feb 19 17:01:05 2006
From: lutfullah at upesh.edu (Dr. Lutfullah)
Date: Sun, 19 Feb 2006 22:01:05 +0500
Subject: [Bioperl-l] bioperl in jail
Message-ID: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>

Hello,

I am trying to create a situation where users can ssh login to a chrooted
jailed account with limited functionality.
I created the chroot jail on my Fedora Core 4 installation using a script
available at:
http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/
The script has a line:
======================
APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
/bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
/usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
/usr/libexec/openssh/sftp-server"
=======================
to which I added everything I could get with /bin/perl to make it:

APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
/bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
/usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
/usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5
/usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp
/usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl"

perl becomes available inside the jail but I cannot use the line "use
Bio::Perl" inside the jail.

The script produces an error on including /usr/lib or /usr/lib/perl5:

Copying necessary library-files to jail (may take some time)
cp: omitting directory `/usr/lib'
ldd: /usr/lib: No such file or directory
Copying files from /etc/pam.d/ to jail
Copying PAM-Modules to jail

In the jailed account the little test program:

use Bio::Perl;
print 2+4;

generated this error:

Can't locate Bio/Perl.pm in @INC (@INC contains:
/usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
/usr/lib/perl5/site_perl/5.8.4/i386-linux-thread
............................................

Any help would be much appreciated. Thanks in advance.

LK



From boris.steipe at utoronto.ca  Sun Feb 19 22:34:52 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Sun, 19 Feb 2006 17:34:52 -0500
Subject: [Bioperl-l] bioperl in jail
In-Reply-To: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>
References: <477b582e0602190901g297728aamaf127f2645471ba9@mail.gmail.com>
Message-ID: 

The path that perl uses internally to search its modules (@INC) is  
not the same thing as the path your shell uses. You have to modify  
@INC either within running scripts, or by setting the PERL5LIB  
environment variable upon login.

e.g. see http://modperlbook.org/html/ch03_09.html

HTH,
B.



On 19 Feb 2006, at 12:01, Dr. Lutfullah wrote:

> Hello,
>
> I am trying to create a situation where users can ssh login to a  
> chrooted
> jailed account with limited functionality.
> I created the chroot jail on my Fedora Core 4 installation using a  
> script
> available at:
> http://www.fuschlberger.net/programs/ssh-scp-chroot-jail/
> The script has a line:
> ======================
> APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
> /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
> /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
> /usr/libexec/openssh/sftp-server"
> =======================
> to which I added everything I could get with /bin/perl to make it:
>
> APPS="/bin/bash /bin/cp /usr/bin/dircolors /bin/ls /bin/mkdir /bin/mv
> /bin/vi /bin/rm /bin/rmdir /bin/sh /bin/su /usr/bin/groups /usr/bin/id
> /usr/bin/rsync /usr/bin/ssh /usr/bin/scp /sbin/unix_chkpwd
> /usr/libexec/openssh/sftp-server /usr/bin/perl /usr/bin/perl5
> /usr/bin/perl5.8.6 /usr/bin/perldoc /usr/bin/perlbug /usr/bin/perlivp
> /usr/bin/perlcc /usr/bin/foomatic-perl-data /usr/bin/find2perl"
>
> perl becomes available inside the jail but I cannot use the line "use
> Bio::Perl" inside the jail.
>
> The script produces an error on including /usr/lib or /usr/lib/perl5:
>
> Copying necessary library-files to jail (may take some time)
> cp: omitting directory `/usr/lib'
> ldd: /usr/lib: No such file or directory
> Copying files from /etc/pam.d/ to jail
> Copying PAM-Modules to jail
>
> In the jailed account the little test program:
>
> use Bio::Perl;
> print 2+4;
>
> generated this error:
>
> Can't locate Bio/Perl.pm in @INC (@INC contains:
> /usr/lib/perl5/site_perl/5.8.6/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.5/i386-linux-thread-multi
> /usr/lib/perl5/site_perl/5.8.4/i386-linux-thread
> ............................................
>
> Any help would be much appreciated. Thanks in advance.
>
> LK
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From khoueiry at ibdm.univ-mrs.fr  Mon Feb 20 09:27:07 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Mon, 20 Feb 2006 10:27:07 +0100
Subject: [Bioperl-l] [BiO BB] Tool to mutate DNA sequence
In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
Message-ID: <1140427628.10569.10.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 

From shameer at ncbs.res.in  Mon Feb 20 06:21:01 2006
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Mon, 20 Feb 2006 11:51:01 +0530 (IST)
Subject: [Bioperl-l] Matrix Average Code / Module ?
In-Reply-To: <76f031ae0602190552v5f2542dbv@mail.gmail.com>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
Message-ID: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176>

Hi all,
Is there any program/module to calculate the average of a blosum/pam any
matrix ?

I have a matrix and I need to see the average

for example

11 22 43 54 50
27 87 74 32 10
66 58 98 78 20
22 23 44 16 34

I have gone through Bio::Matrix::MatrixI and Bio::Matrix::GenericMatrix
and other perl modules like Math::Matrix
http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm
and Math::Cephes::Matrix - but none of them have a provison to do matrix 
average calculation.

Any help ???
thanks in advance,
Happy biocomputing !!!


-- 
Shameer Khadar
National Centre for Biological Sciences (TIFR)
UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
T - 91-080-23636420-32 EXT 4241
F - 91-080-23636662/23636675
W - http://www.ncbs.res.in
--------------------------------------------------
"Refrain from illusions, insist on work and not words,
 patiently seek divine and scientific truth."
MM




From cjfields at uiuc.edu  Mon Feb 20 17:01:26 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 20 Feb 2006 11:01:26 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
	version 1.28
In-Reply-To: <43F449E1.80605@esat.kuleuven.be>
Message-ID: <000e01c6363f$494bc5e0$15327e82@pyrimidine>

I have added a preliminary bugfix for the problems seen with nucleotide
blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
perltidy to space out the blocks (really for my own purposes; it's a pretty
complex module).  The fix bypasses the extra lines output for blastn and
tblastx and now seems to parse the text output for those reports correctly.
I tested it using all NCBI BLAST flavors for the last two version of BLAST
(2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
tested on MacOSX at the moment, so I need people out there to test it out on
anything they can to make sure it works before committing.  I'll be trying
it on Windows today.  Report back to me and I'll post anything on bugzilla.

Here it is:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934


Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> Sent: Thursday, February 16, 2006 3:46 AM
> To: gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org; Chris Fields
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pm
> version 1.28
> 
> Hi,
> 
> I have the same problem with the blast.pm-file.
> The people of NCBI added some extra info when giving the Blast-output.
> (see e.g. "Features flanking this part..." or "Features in this part
> ..."), example added.
> The blast.pm module starts looking for the hsp-alignement-information,
> but it dies when it hits this Feature-information.
> 
> Pieter
> 
> 
> >gi|77552765|gb|DP000011.1|
>  list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> chromosome 12, complete
> 
> sequence
> Length=27492551
> 
>  Features flanking this part of subject sequence:
> 
> 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-class
>  &from=19251479&to=19253693&view=gbwithparts>
> 
> 2655 bp at 3' side: hypothetical protein
>  &from=19260091&to=19260600&view=gbwithparts>
> 
>  Score = 36.2 bits (18),  Expect = 0.22
>  Identities = 18/18 (100%), Gaps = 0/18 (0%)
>  Strand=Plus/Minus
> 
> Query  4         GTACTACTCTACTCTACT  21
>                  ||||||||||||||||||
> 
> Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> 
> 
>  Features flanking this part of subject sequence:
> 
> 2991 bp at 5' side: hypothetical protein
>  &from=27003164&to=27003907&view=gbwithparts>
>    1131 bp at 3' side: hypothetical protein
> 
>  &from=27008046&to=27010752&view=gbwithparts>
> 
>  Score = 36.2 bits (18),  Expect = 0.22
>  Identities = 18/18 (100%), Gaps = 0/18 (0%)
>  Strand=Plus/Minus
> 
> Query  2         ATGTACTACTCTACTCTA  19
>                  ||||||||||||||||||
> Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> 
> 
> 
>  Features in this part of subject sequence:
>    DHHC zinc finger domain, putative
> 
>  &from=17614825&to=17618687&view=gbwithparts>
> 
>  Score = 34.2 bits (17),  Expect = 0.87
>  Identities = 17/17 (100%), Gaps = 0/17 (0%)
>  Strand=Plus/Plus
> 
> Query  5         TACTACTCTACTCTACT  21
>                  |||||||||||||||||
> Sbjct  17616437  TACTACTCTACTCTACT  17616453
> 
> 
> 
>  Features flanking this part of subject sequence:
>    102 bp at 5' side: bZIP transcription factor, putative
> 
>  &from=2774964&to=2775778&view=gbwithparts>
>    3740 bp at 3' side: yeast dcp1, putative
>  &from=2779635&to=2782508&view=gbwithparts>
> 
>  Score = 32.2 bits (16),  Expect =
> 3.4
>  Identities = 16/16 (100%), Gaps = 0/16 (0%)
>  Strand=Plus/Plus
> 
> Query  7        CTACTCTACTCTACTC  22
>                 ||||||||||||||||
> Sbjct  2775880  CTACTCTACTCTACTC  2775895
> 
> 
>  Features flanking this part of subject sequence:
> 
>    21 bp at 5' side: peptide transporter T17F3.11, putative
>  &from=27321354&to=27323117&view=gbwithparts>
> 
> 10230 bp at 3' side: transposon protein, putative, unclassified
>  &from=27333383&to=27334285&view=gbwithparts>
> 
>  Score = 32.2 bits (16),  Expect = 3.4
>  Identities = 16/16 (100%), Gaps = 0/16 (0%)
>  Strand=Plus/Minus
> 
> Query  7         CTACTCTACTCTACTC  22
> 
>                  ||||||||||||||||
> Sbjct  27323153  CTACTCTACTCTACTC  27323138
> 
> 
> 
> 
> Guojun Yang wrote:
> 
> >Hi, Chris,
> >Finally the remoteblast test script works for the amino.fa query. but
> when I try a nucleic acid sequence (see below), Error occurs:
> >"
> >waiting........
> >------------- EXCEPTION  -------------
> >MSG: no data for midline  Features flanking this part of subject
> sequence:
> >STACK Bio::SearchIO::blast::next_result
> /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> hIO/blast.pm:1172
> >STACK toplevel remoteblast_test:40
> >"
> >The query sequence is:
> >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> >
> >The script (basically same as the remoteblast test, I only changed
> database to 'nr' and program to 'blastn' and filename to 'ost3'):
> >#!/usr/bin/perl
> >
> >use Bio::SeqIO;
> >use Bio::Seq;
> >use Bio::Tools::Run::RemoteBlast;
> >use Bio::SearchIO;
> >use strict;
> >my $prog='blastn';
> >my $db='nr';
> >my $e_val=1e-10;
> >my @params=( -prog=>$prog,
> >	-data=>$db,
> >	-expect=>$e_val,
> >	-readmethod=>'SearchIO');
> >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >
> >my $v = 1;
> >
> >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> >
> >while (my $input = $str->next_seq()){
> >  #Blast a sequence against a database:
> >  #Alternatively, you could  pass in a file with many
> >  #sequences rather than loop through sequence one at a time
> >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >  #and swap the two lines below for an example of that.
> >  my $r = $factory->submit_blast($input);
> >  #my $r = $factory->submit_blast('amino.fa');
> >  print STDERR "waiting..." if( $v > 0 );
> >  while ( my @rids = $factory->each_rid ) {
> >    foreach my $rid ( @rids ) {
> >      my $rc = $factory->retrieve_blast($rid);
> >      if( !ref($rc) ) {
> >        if( $rc < 0 ) {
> >          $factory->remove_rid($rid);
> >        }
> >        print STDERR "." if ( $v > 0 );
> >        sleep 5;
> >      } else {
> >        my $result = $rc->next_result();
> >        #save the output
> >        my $filename = $result->query_name()."\.out";
> >        $factory->save_output($filename);
> >        $factory->remove_rid($rid);
> >        print "\nQuery Name: ", $result->query_name(), "\n";
> >        while ( my $hit = $result->next_hit ) {
> >          next unless ( $v > 0);
> >          print "\thit name is ", $hit->name, "\n";
> >          while( my $hsp = $hit->next_hsp ) {
> >            print "\t\tscore is ", $hsp->score, "\n";
> >          }
> >        }
> >      }
> >    }
> >  }
> >}
> >
> >
> >Do you think there might still be something in the NCBI output format?
> >
> >Thank you,
> >Guojun
> >
> >
> >
> >
> >Guojun Yang
> >Department of Plant Biology
> >University of Georgia
> >Tel: 706-542-1857
> >Fax: 706-542-1805
> >http://www.arches.uga.edu/~guojun
> >
> >
> >
> >----- Original Message -----
> >From: Chris Fields [mailto:cjfields at uiuc.edu]
> >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >
> >
> >
> >
> >>Sorry, forgot to add that I didn't see the regex issue that you
> mentioned.
> >>It could be a perl-related issue.  Try the fixes I mentioned and see
> what
> >>happens.
> >>
> >>
> >>>Christopher Fields
> >>>
> >>>
> >>Postdoctoral Researcher - Switzer Lab
> >>Dept. of Biochemistry
> >>University of Illinois Urbana-Champaign
> >>
> >>
> >>>>>-----Original Message-----
> >>>>>
> >>>>>
> >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>Sent: Tuesday, February 14, 2006 12:36 PM
> >>>To: 'gyang at plantbio.uga.edu'
> >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >>>
> >>>
> >>>>>It's a good habit to always add single quotes around words.  The perl
> >>>>>
> >>>>>
> >>>interpreter may think a single bare word is a subroutine or perlfunc
> >>>called with no args so will try to find a subroutine named blastp().
> My
> >>>debugger actually gives the error that the bare word blastp may
> conflict
> >>>with a future reserved word.  Like you said, 'use strict' will point
> that
> >>>out.
> >>>
> >>>
> >>>>>As for the regex, it should match all the blast programs at NCBI
> (blastp,
> >>>>>
> >>>>>
> >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure nothing
> >>>else passes through.
> >>>
> >>>
> >>>>>So, if you are using the script below, there are several errors.  The
> bare
> >>>>>
> >>>>>
> >>>words for $prog and $db need quotes, and the flags for you @params
> array
> >>>don't have a dash before them.  I get this after adding quotes but
> before
> >>>adding the dashes to @params:
> >>>
> >>>
> >>>>>C:\Perl\Scripts>test_blast.pl
> >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> >>>>>
> >>>>>
> >>>MSG:
> >>>STACK: Error::throw
> >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> >>>live/Bio/Root/Root.pm:328
> >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> >>>-----------------------------------------------------------
> >>>
> >>>
> >>>>>The last line indicates a problem with this line:
> >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>>Changing the @params to this:
> >>>>>my @params=( -prog=>$prog,
> >>>>>
> >>>>>
> >>>	-data=>$db,
> >>>	-expect=>$e_val,
> >>>	-readmethod=>'SearchIO');
> >>>
> >>>
> >>>>>fixes it, and I get output as expected.
> >>>>>Christopher Fields
> >>>>>
> >>>>>
> >>>Postdoctoral Researcher - Switzer Lab
> >>>Dept. of Biochemistry
> >>>University of Illinois Urbana-Champaign
> >>>
> >>>
> >>>>>>>>-----Original Message-----
> >>>>>>>>
> >>>>>>>>
> >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> >>>>
> >>>>Hi, Chris,
> >>>>When I tried with the perldoc script, It did not work either. First it
> >>>>says $prog can not be bare word if I "use strict". I added quotes on
> the
> >>>>words, then it says the value for $prog does not match expression
> >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> >>>>
> >>>>
> >>>script
> >>>
> >>>
> >>>>is shown below. Why is the expression "t?blast[pnx]"?
> >>>>
> >>>>#!/usr/bin/perl
> >>>>
> >>>>use Bio::SeqIO;
> >>>>use Bio::Seq;
> >>>>use Bio::Tools::Run::RemoteBlast;
> >>>>use Bio::SearchIO;
> >>>>
> >>>>
> >>>>my $prog=blastp;
> >>>>my $db=swissprot;
> >>>>my $e_val=1e-10;
> >>>>my @params=( prog=>$prog,
> >>>>	data=>$db,
> >>>>	expect=>$e_val,
> >>>>	readmethod=>'SearchIO');
> >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> >>>>
> >>>>my $v = 1;
> >>>>
> >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >>>>
> >>>>while (my $input = $str->next_seq()){
> >>>>  #Blast a sequence against a database:
> >>>>  #Alternatively, you could  pass in a file with many
> >>>>  #sequences rather than loop through sequence one at a time
> >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >>>>  #and swap the two lines below for an example of that.
> >>>>  my $r = $factory->submit_blast($input);
> >>>>  #my $r = $factory->submit_blast('amino.fa');
> >>>>  print STDERR "waiting..." if( $v > 0 );
> >>>>  while ( my @rids = $factory->each_rid ) {
> >>>>    foreach my $rid ( @rids ) {
> >>>>      my $rc = $factory->retrieve_blast($rid);
> >>>>      if( !ref($rc) ) {
> >>>>        if( $rc < 0 ) {
> >>>>          $factory->remove_rid($rid);
> >>>>        }
> >>>>        print STDERR "." if ( $v > 0 );
> >>>>        sleep 5;
> >>>>      } else {
> >>>>        my $result = $rc->next_result();
> >>>>        #save the output
> >>>>        my $filename = $result->query_name()."\.out";
> >>>>        $factory->save_output($filename);
> >>>>        $factory->remove_rid($rid);
> >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>        while ( my $hit = $result->next_hit ) {
> >>>>          next unless ( $v > 0);
> >>>>          print "\thit name is ", $hit->name, "\n";
> >>>>          while( my $hsp = $hit->next_hsp ) {
> >>>>            print "\t\tscore is ", $hsp->score, "\n";
> >>>>          }
> >>>>        }
> >>>>      }
> >>>>    }
> >>>>  }
> >>>>}
> >>>>
> >>>>Thank you for your help!
> >>>>
> >>>>
> >>>>Guojun
> >>>>Department of Plant Biology
> >>>>University of Georgia
> >>>>
> >>>>----- Original Message -----
> >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>To: gyang at plantbio.uga.edu
> >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>>Try two things:
> >>>>>
> >>>>>
> >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> >>>>>>
> >>>>>>
> >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> >>>>>
> >>>>>
> >>>>wrong
> >>>>
> >>>>
> >>>>>with the logic in your subroutine:
> >>>>>
> >>>>>
> >>>>>>my $v = 1;
> >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> >>>>>>while (my $input = $str->next_seq()){
> >>>>>>
> >>>>>>
> >>>>>  #Blast a sequence against a database:
> >>>>>  #Alternatively, you could  pass in a file with many
> >>>>>  #sequences rather than loop through sequence one at a time
> >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> >>>>>  #and swap the two lines below for an example of that.
> >>>>>  my $r = $factory->submit_blast($input);
> >>>>>  #my $r = $factory->submit_blast('amino.fa');
> >>>>>  print STDERR "waiting..." if( $v > 0 );
> >>>>>  while ( my @rids = $factory->each_rid ) {
> >>>>>    foreach my $rid ( @rids ) {
> >>>>>      my $rc = $factory->retrieve_blast($rid);
> >>>>>      if( !ref($rc) ) {
> >>>>>        if( $rc < 0 ) {
> >>>>>          $factory->remove_rid($rid);
> >>>>>        }
> >>>>>        print STDERR "." if ( $v > 0 );
> >>>>>        sleep 5;
> >>>>>      } else {
> >>>>>        my $result = $rc->next_result();
> >>>>>        #save the output
> >>>>>        my $filename = $result->query_name()."\.out";
> >>>>>        $factory->save_output($filename);
> >>>>>        $factory->remove_rid($rid);
> >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>>        while ( my $hit = $result->next_hit ) {
> >>>>>          next unless ( $v > 0);
> >>>>>          print "\thit name is ", $hit->name, "\n";
> >>>>>          while( my $hsp = $hit->next_hsp ) {
> >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> >>>>>          }
> >>>>>        }
> >>>>>      }
> >>>>>    }
> >>>>>  }
> >>>>>}
> >>>>>
> >>>>>
> >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> >>>>>>
> >>>>>>
> >>>really
> >>>
> >>>
> >>>>>shouldn't make that much of a difference, but I noticed that the CVS
> >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> >>>>>released; the Bugzilla version is based off CVS.
> >>>>>
> >>>>>
> >>>>>>Christopher Fields
> >>>>>>
> >>>>>>
> >>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>Dept. of Biochemistry
> >>>>>University of Illinois Urbana-Champaign
> >>>>>
> >>>>>
> >>>>>>>-----Original Message-----
> >>>>>>>
> >>>>>>>
> >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> >>>>>>To: bioperl-l at lists.open-bio.org
> >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>
> >>>>>>
> >>>>>>>>Thanks, Chris,
> >>>>>>>>
> >>>>>>>>
> >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> >>>>>>
> >>>>>>
> >>>one
> >>>
> >>>
> >>>>from
> >>>>
> >>>>
> >>>>>>your bug report. The running version is 1.5 when I use the command
> >>>>>>
> >>>>>>
> >>>you
> >>>
> >>>
> >>>>>>sent me. But when I tried the script, it doesn't change much. My
> >>>>>>remoteblast code (portion) is here:
> >>>>>>
> >>>>>>
> >>>>>>>>sub search {
> >>>>>>>>
> >>>>>>>>
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> >>>>>>local
> >>>>>>
> >>>>>>
> >>>>>>
> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> >>>
> >>>
> >>>>>>'no';
> >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> >>>>>>			      -id=>"query",
> >>>>>>			      -desc=>"new seq");
> >>>>>>my $len=$query->length();
> >>>>>>@db=('nr','htgs','wgs');
> >>>>>>foreach my $db (@db) {
> >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog' =>'blastn',
> >>>>>>						'-data' =>"$db",
> >>>>>>
> >>>>>>
> >>>>>>
> >>'-expect'=>"$E_value");
> >>
> >>
> >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>my @rids = $factory->each_rid();
> >>>>>>>>
> >>>>>>>>
> >>>>>>foreach my $rid ( @rids ) {
> >>>>>>    print STDERR "$rid\n";
> >>>>>>}
> >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> >>>>>>print STDERR "waiting...";
> >>>>>>sleep 60;
> >>>>>>
> >>>>>>
> >>>>>>>>foreach my $rid ( @rids ) {
> >>>>>>>>
> >>>>>>>>
> >>>>>>    my $rc = $factory->retrieve_blast($rid);
> >>>>>>    while (!ref($rc) ) {
> >>>>>>	if( $rc < 0 ) {
> >>>>>># retrieve_blast returns -1 on error
> >>>>>>	    $factory->remove_rid($rid);
> >>>>>>	    print "Error!\n";
> >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> >>>>>>	    die "Can't retrieve $rid";
> >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> >>>>>>
> >>>>>>
> >>>finished'
> >>>
> >>>
> >>>>>>	    sleep 60;
> >>>>>>	    $rc = $factory->retrieve_blast($rid);
> >>>>>>	}
> >>>>>>    }
> >>>>>>    if (ref($rc)) {
> >>>>>>	print STDERR "Done.\n";
> >>>>>>	 while( my $result = $rc->next_result) {
> >>>>>>	    while( my $hit = $result->next_hit()) {
> >>>>>>	    	$hit_name=$hit->name;
> >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> >>>>>>		$name=$1;
> >>>>>>		@left_plus_start=();
> >>>>>>		@left_plus_end=();
> >>>>>>		@left_minus_start=();
> >>>>>>		@left_minus_end=();
> >>>>>>		@right_plus_start=();
> >>>>>>		@right_plus_end=();
> >>>>>>		@right_minus_start=();
> >>>>>>		@right_minus_end=();
> >>>>>>
> >>>>>>
> >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> >>>>>>>>
> >>>>>>>>
> >>>>>>		while( my $hsp = $hit->next_hsp()) {
> >>>>>>......
> >>>>>>
> >>>>>>
> >>>>>>>>It was working quite well before around October laster year, but
> >>>>>>>>
> >>>>>>>>
> >>>>it has
> >>>>
> >>>>
> >>>>>>stopped since then, When a submission is sent via a webpage, the cgi
> >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> >>>>>>
> >>>>>>
> >>>>finally
> >>>>
> >>>>
> >>>>>>the expected email is received but without real results although it
> >>>>>>
> >>>>>>
> >>>>does
> >>>>
> >>>>
> >>>>>>contain something from other parts of the script. Apparently the
> >>>>>>
> >>>>>>
> >>>>search
> >>>>
> >>>>
> >>>>>>sub did not return anything (I know there is something should be
> >>>>>>returned.). Is it also possible the format of the NCBI output for
> >>>>>>
> >>>>>>
> >>>each
> >>>
> >>>
> >>>>>>result has changed?
> >>>>>>Thank you,
> >>>>>>Guojun
> >>>>>>
> >>>>>>
> >>>>>>>>>>Department of Plant Biology
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>University of Georgia
> >>>>>>
> >>>>>>
> >>>>>>>>>>>>----- Original Message -----
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>
> >>>>>>
> >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>you
> >>>
> >>>
> >>>>checking
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> >>>>>>>
> >>>>>>>
> >>>>two
> >>>>
> >>>>
> >>>>>>>separate directories) or are you looking in modules?  Here's the
> >>>>>>>
> >>>>>>>
> >>>way
> >>>
> >>>
> >>>>to
> >>>>
> >>>>
> >>>>>>>check the version (from the FAQ):
> >>>>>>>
> >>>>>>>
> >>>>>>>>perl -MBio::Root::Version -e 'print
> >>>>>>>>
> >>>>>>>>
> >>>>$Bio::Root::Version::VERSION,"\n"'
> >>>>
> >>>>
> >>>>>>>>If you have two full bioperl distributions on your computer,
> >>>>>>>>
> >>>>>>>>
> >>>>normally
> >>>>
> >>>>
> >>>>>>only
> >>>>>>
> >>>>>>
> >>>>>>>one will be in use unless you have explicitly set the environment
> >>>>>>>
> >>>>>>>
> >>>>>>variable
> >>>>>>
> >>>>>>
> >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first before
> >>>>>>>
> >>>>>>>
> >>>>your
> >>>>
> >>>>
> >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> >>>>>>>
> >>>>>>>
> >>>>mixing
> >>>>
> >>>>
> >>>>>>>then, but only if perl can't find a particular module in the path
> >>>>>>>
> >>>>>>>
> >>>>>>designated
> >>>>>>
> >>>>>>
> >>>>>>>in PERL5LIB; then it will progress through the directories listed
> >>>>>>>
> >>>>>>>
> >>>in
> >>>
> >>>
> >>>>>>@INC.
> >>>>>>
> >>>>>>
> >>>>>>>This may happen if a module is unique to a particular release, but
> >>>>>>>
> >>>>>>>
> >>>>>>shouldn't
> >>>>>>
> >>>>>>
> >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> >>>>>>>
> >>>>>>>
> >>>can
> >>>
> >>>
> >>>>>>check
> >>>>>>
> >>>>>>
> >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> >>>>>>>
> >>>>>>>
> >>>>differ
> >>>>
> >>>>
> >>>>>>>depending on your OS, perl build, etc.
> >>>>>>>
> >>>>>>>
> >>>>>>>>Regardless, if you follow the directions for installing bioperl
> >>>>>>>>
> >>>>>>>>
> >>>>for
> >>>>
> >>>>
> >>>>>>your
> >>>>>>
> >>>>>>
> >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> >>>>>>>
> >>>>>>>
> >>>>unless
> >>>>
> >>>>
> >>>>>>you
> >>>>>>
> >>>>>>
> >>>>>>>explicitly change the installation directory when using 'perl
> >>>>>>>
> >>>>>>>
> >>>>>>Makefile.PL'),
> >>>>>>
> >>>>>>
> >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> >>>>>>>
> >>>>>>>
> >>>>install
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> >>>>>>>
> >>>>>>>
> >>>>See
> >>>>
> >>>>
> >>>>>>this
> >>>>>>
> >>>>>>
> >>>>>>>page:
> >>>>>>>
> >>>>>>>
> >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> >>>>>>>>for more details.
> >>>>>>>>Christopher Fields
> >>>>>>>>
> >>>>>>>>
> >>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>Dept. of Biochemistry
> >>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>
> >>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> >>>>>>>>To: bioperl-l at lists.open-bio.org
> >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Hi, Chris,
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>I do have different versions of bioperl on my Linux machine
> >>>>>>>>
> >>>>>>>>
> >>>(1.4.
> >>>
> >>>
> >>>>and
> >>>>
> >>>>
> >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> >>>>>>>>
> >>>>>>>>
> >>>>1.5.1
> >>>>
> >>>>
> >>>>>>or I
> >>>>>>
> >>>>>>
> >>>>>>>>need to uninstall and remove the previous versions. I could not
> >>>>>>>>
> >>>>>>>>
> >>>>find
> >>>>
> >>>>
> >>>>>>any
> >>>>>>
> >>>>>>
> >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> >>>>>>>>
> >>>>>>>>
> >>>>some
> >>>>
> >>>>
> >>>>>>>>suggestion?
> >>>>>>>>Thanks,
> >>>>>>>>Guojun
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Department of Plant Biology
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>University of Georgia
> >>>>>>>>      _____
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>updated from CVS
> >>>>>>
> >>>>>>
> >>>>>>>>which isn't the latest fix.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>Make sure that you check the following:
> >>>>>>>>>>1) Always post to the mailing list:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>(CVS)
> >>>>
> >>>>
> >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> >>>>>>>>
> >>>>>>>>
> >>>>only
> >>>>
> >>>>
> >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> >>>>>>>>
> >>>>>>>>
> >>>can't
> >>>
> >>>
> >>>>>>>>guarantee that mixing modules from old and new distributions
> >>>>>>>>
> >>>>>>>>
> >>>(1.4
> >>>
> >>>
> >>>>and
> >>>>
> >>>>
> >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> >>>>>>>>
> >>>>>>>>
> >>>>saved
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> >>>>>>>>
> >>>>>>>>
> >>>>>>(v2.2.13)
> >>>>>>
> >>>>>>
> >>>>>>>>but it should still save it. I believe as long as next_results()
> >>>>>>>>
> >>>>>>>>
> >>>>isn't
> >>>>
> >>>>
> >>>>>>>>called, it will work.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> >>>>>>>>>>
> >>>>>>>>>>
> >>>2.2.13
> >>>
> >>>
> >>>>>>text output
> >>>>>>
> >>>>>>
> >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> >>>>>>>>
> >>>>>>>>
> >>>Roger
> >>>
> >>>
> >>>>Hall
> >>>>
> >>>>
> >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> >>>>>>>>
> >>>>>>>>
> >>>>(Jason
> >>>>
> >>>>
> >>>>>>or
> >>>>>>
> >>>>>>
> >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> >>>>>>>>
> >>>>>>>>
> >>>>>>Bugzilla:
> >>>>>>
> >>>>>>
> >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>option
> >>>>
> >>>>
> >>>>>>of
> >>>>>>
> >>>>>>
> >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> >>>>>>>>
> >>>>>>>>
> >>>>this
> >>>>
> >>>>
> >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> >>>>>>>>
> >>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>>>means that the final version will change to refle the new
> >>>>>>>>
> >>>>>>>>
> >>>version.
> >>>
> >>>
> >>>>>>>>>>>>Christopher Fields
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>Dept. of Biochemistry
> >>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>    _____
> >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> >>>>>>>>To: Chris Fields
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>Hi, Chris
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>for
> >>>>
> >>>>
> >>>>>>my cgi
> >>>>>>
> >>>>>>
> >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> >>>>>>>>
> >>>>>>>>
> >>>>even
> >>>>
> >>>>
> >>>>>>get
> >>>>>>
> >>>>>>
> >>>>>>>>any RID. Is there any suggestion?
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>>>Guojun
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>Guojun Yang
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>Department of Plant Biology
> >>>>>>>>University of Georgia
> >>>>>>>>Tel: 706-542-1857
> >>>>>>>>Fax: 706-542-1805
> >>>>>>>>http://www.arches.uga.edu/~guojun
> >>>>>>>>    _____
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>
> >>>>>>>>
> >>>>>>version
> >>>>>>
> >>>>>>
> >>>>>>>>1.28
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>I would say give the new code a try, but realize that it
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>hasn't
> >>>>
> >>>>
> >>>>>>been
> >>>>>>
> >>>>>>
> >>>>>>>>checked
> >>>>>>>>in (like I said below). I will try going over the modified
> >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> >>>>>>>>
> >>>>>>>>
> >>>>anything I
> >>>>
> >>>>
> >>>>>>>>might
> >>>>>>>>have missed. The changed order in the header of BLAST text
> >>>>>>>>
> >>>>>>>>
> >>>output
> >>>
> >>>
> >>>>has
> >>>>
> >>>>
> >>>>>>me a
> >>>>>>
> >>>>>>
> >>>>>>>>bit worried that it might not catch everything, but it at least
> >>>>>>>>
> >>>>>>>>
> >>>>>>doesn't
> >>>>>>
> >>>>>>
> >>>>>>>>hang
> >>>>>>>>in the while() loop I described in the bug report below (bug
> >>>>>>>>
> >>>>>>>>
> >>>>#1934)
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>seems to process everything fine.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>If you want more stability in the code, you might consider
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>changing over
> >>>>>>
> >>>>>>
> >>>>>>>>to
> >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> >>>>>>>>
> >>>>>>>>
> >>>>some
> >>>>
> >>>>
> >>>>>>>>changes
> >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> >>>>>>>>
> >>>>>>>>
> >>>>saving
> >>>>
> >>>>
> >>>>>>XML
> >>>>>>
> >>>>>>
> >>>>>>>>output, but I believe it parses everything regardless. If you
> >>>>>>>>
> >>>>>>>>
> >>>look
> >>>
> >>>
> >>>>>>back
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>last month or so there has been a bit of discussion here about
> >>>>>>>>
> >>>>>>>>
> >>>it.
> >>>
> >>>
> >>>>>>Jason
> >>>>>>
> >>>>>>
> >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>remoteblast/
> >>>>>>
> >>>>>>
> >>>>>>>>>>Christopher Fields
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>Dept. of Biochemistry
> >>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>-----Original Message-----
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> >>>>>>>>>To: bioperl-l at bioperl.org
> >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> >>>>>>>>>
> >>>>>>>>>
> >>>>version
> >>>>
> >>>>
> >>>>>>1.28
> >>>>>>
> >>>>>>
> >>>>>>>>>Hi, Everybody,
> >>>>>>>>>I see this post and am wondering if this is the reason for the
> >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> >>>>>>>>>
> >>>>>>>>>
> >>>>MAK,
> >>>>
> >>>>
> >>>>>>for
> >>>>>>
> >>>>>>
> >>>>>>>>MITE
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>sequence analysis. It was working very well until around
> >>>>>>>>>
> >>>>>>>>>
> >>>>November
> >>>>
> >>>>
> >>>>>>2005,
> >>>>>>
> >>>>>>
> >>>>>>>>>when it stopped returning any result (the site is fine and
> >>>>>>>>>
> >>>>>>>>>
> >>>seems
> >>>
> >>>
> >>>>to
> >>>>
> >>>>
> >>>>>>be
> >>>>>>
> >>>>>>
> >>>>>>>>>doing sth after submission). In the CGI script, I used
> >>>>>>>>>
> >>>>>>>>>
> >>>>remoteblast
> >>>>
> >>>>
> >>>>>>(that
> >>>>>>
> >>>>>>
> >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>access to
> >>>>>>
> >>>>>>
> >>>>>>>>>the server because I moved. Quite several people sent emails
> >>>>>>>>>
> >>>>>>>>>
> >>>to
> >>>
> >>>
> >>>>us
> >>>>
> >>>>
> >>>>>>about
> >>>>>>
> >>>>>>
> >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> >>>>>>>>>
> >>>>>>>>>
> >>>>problem?
> >>>>
> >>>>
> >>>>>>>>Should
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> >>>>>>>>>
> >>>>>>>>>
> >>>>version?
> >>>>
> >>>>
> >>>>>>>>>Thanks a lot,
> >>>>>>>>>Guojun
> >>>>>>>>>
> >>>>>>>>>Department of Plant Biology
> >>>>>>>>>University of Georgia
> >>>>>>>>>Tel: 706-542-1857
> >>>>>>>>>Fax: 706-542-1805
> >>>>>>>>>http://www.arches.uga.edu/~guojun
> >>>>>>>>>_____
> >>>>>>>>>
> >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> >>>>>>>>>
> >>>>>>>>>
> >>>>Jian'
> >>>>
> >>>>
> >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> >>>>>>>>>
> >>>>>>>>>
> >>>[mailto:bioperl-
> >>>
> >>>
> >>>>>>>>>l at bioperl.org]
> >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >>>>>>>>>
> >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> >>>>>>>>>
> >>>>>>>>>
> >>>>CVS.
> >>>>
> >>>>
> >>>>>>It
> >>>>>>
> >>>>>>
> >>>>>>>>>will
> >>>>>>>>>work for saving text output. However, it will not parse
> >>>>>>>>>
> >>>>>>>>>
> >>>anything
> >>>
> >>>
> >>>>>>using
> >>>>>>
> >>>>>>
> >>>>>>>>>next_result (it will likely hang) and will not save XML
> >>>>>>>>>
> >>>>>>>>>
> >>>format.
> >>>
> >>>
> >>>>See
> >>>>
> >>>>
> >>>>>>>>these
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>bugs:
> >>>>>>>>>
> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> >>>>>>>>>
> >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> >>>>>>>>>
> >>>>>>>>>
> >>>and
> >>>
> >>>
> >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> >>>>>>>>>
> >>>>>>>>>
> >>>>yet
> >>>>
> >>>>
> >>>>>>so
> >>>>>>
> >>>>>>
> >>>>>>>>are
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>still not included in bioperl-live; they may be further
> >>>>>>>>>
> >>>>>>>>>
> >>>modified
> >>>
> >>>
> >>>>>>before
> >>>>>>
> >>>>>>
> >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> >>>>>>>>>
> >>>>>>>>>
> >>>>just
> >>>>
> >>>>
> >>>>>>try
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>first fix, which is a change to SearchIO::blast.
> >>>>>>>>>
> >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>script
> >>>>>>
> >>>>>>
> >>>>>>>>>which
> >>>>>>>>>had problems; the script you used saves the output but doesn't
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>actually
> >>>>>>
> >>>>>>
> >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> >>>>>>>>>
> >>>>>>>>>
> >>>>data).
> >>>>
> >>>>
> >>>>>>Is
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> >>>>>>>>>
> >>>>>>>>>
> >>>you
> >>>
> >>>
> >>>>>>tried
> >>>>>>
> >>>>>>
> >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> >>>>>>>>>
> >>>>>>>>>
> >>>>readmethod
> >>>>
> >>>>
> >>>>>>=>
> >>>>>>
> >>>>>>
> >>>>>>>>>blast"
> >>>>>>>>>using your version of RemoteBlast and method next_result()?
> >>>>>>>>>
> >>>>>>>>>
> >>>Like
> >>>
> >>>
> >>>>>>below
> >>>>>>
> >>>>>>
> >>>>>>>>>(from
> >>>>>>>>>perldoc):
> >>>>>>>>>
> >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> >>>>>>>>>foreach my $rid ( @rids ) {
> >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> >>>>>>>>>if( !ref($rc) ) {
> >>>>>>>>>if( $rc < 0 ) {
> >>>>>>>>>$factory->remove_rid($rid);
> >>>>>>>>>}
> >>>>>>>>>print STDERR "." if ( $v > 0 );
> >>>>>>>>>sleep 5;
> >>>>>>>>>} else { # parsing
> >>>>>>>>>starts here
> >>>>>>>>>my $result = $rc->next_result(); # it should hang
> >>>>>>>>>here
> >>>>>>>>>#save the output
> >>>>>>>>>my $filename = $result->query_name()."\.out";
> >>>>>>>>>$factory->save_output($filename);
> >>>>>>>>>$factory->remove_rid($rid);
> >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> >>>>>>>>>while ( my $hit = $result->next_hit ) {
> >>>>>>>>>next unless ( $v > 0);
> >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>}
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>My script hanged if I used next_result() in any way prior to
> >>>>>>>>>
> >>>>>>>>>
> >>>the
> >>>
> >>>
> >>>>>>fixes.
> >>>>>>
> >>>>>>
> >>>>>>>>I
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>want to see how many others are having the same issues with
> >>>>>>>>>
> >>>>>>>>>
> >>>>parsing
> >>>>
> >>>>
> >>>>>>>>using
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>the CVS version of bioperl-live.
> >>>>>>>>>
> >>>>>>>>>Christopher Fields
> >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> >>>>>>>>>Dept. of Biochemistry
> >>>>>>>>>University of Illinois Urbana-Champaign
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>-----Original Message-----
> >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> >>>>>>>>>>
> >>>>>>>>>>
> >>>l-
> >>>
> >>>
> >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> >>>>>>>>>>To: Huang Jian; bioperl-l
> >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> >>>>>>>>>>
> >>>>>>>>>>Hi Huang,
> >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>works
> >>>>
> >>>>
> >>>>>>on
> >>>>>>
> >>>>>>
> >>>>>>>>the
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>logic of checking the temporary file size to determine
> >>>>>>>>>>
> >>>>>>>>>>
> >>>whether
> >>>
> >>>
> >>>>the
> >>>>
> >>>>
> >>>>>>>>Blast
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>results are ready. This condition is not getting satisfied
> >>>>>>>>>>
> >>>>>>>>>>
> >>>may
> >>>
> >>>
> >>>>be
> >>>>
> >>>>
> >>>>>>due
> >>>>>>
> >>>>>>
> >>>>>>>>to
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>some changes brought about by NCBI. I had this problem
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>recently
> >>>>
> >>>>
> >>>>>>and
> >>>>>>
> >>>>>>
> >>>>>>>>>>figured out that the solution was to use the latest version
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>has
> >>>>>>
> >>>>>>
> >>>>>>>>>>this problem fixed (does not use file size logic any more)
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>which
> >>>>
> >>>>
> >>>>>>is
> >>>>>>
> >>>>>>
> >>>>>>>>not
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>yet included in the BioPerl package.
> >>>>>>>>>>Cheers
> >>>>>>>>>>Nagesh
> >>>>>>>>>>
> >>>>>>>>>>Huang Jian wrote:
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>Dear Nagesh,
> >>>>>>>>>>>
> >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>you
> >>>>
> >>>>
> >>>>>>send
> >>>>>>
> >>>>>>
> >>>>>>>>>>>me. Now it works perfectly!!!
> >>>>>>>>>>>
> >>>>>>>>>>>Thank you!!
> >>>>>>>>>>>
> >>>>>>>>>>>Huang
> >>>>>>>>>>>
> >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> >>>>>>>>>>>
> >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> >>>>>>>>>>>
> >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>net,
> >>>
> >>>
> >>>>so
> >>>>
> >>>>
> >>>>>>still
> >>>>>>
> >>>>>>
> >>>>>>>>>>>via email
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>>Hi Huang,
> >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>blast
> >>>
> >>>
> >>>>>>search.
> >>>>>>
> >>>>>>
> >>>>>>>>>Can
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>(2005/12/09).
> >>>>>>
> >>>>>>
> >>>>>>>>If
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>>>>>not I have attached it with this email, try to replace it
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>with
> >>>>
> >>>>
> >>>>>>the
> >>>>>>
> >>>>>>
> >>>>>>>>>old
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>>>>one which has a bug.
> >>>>>>>>>>>>Let me know if it works.
> >>>>>>>>>>>>Nagesh
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>_______________________________________________
> >>>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>_______________________________________________
> >>>>>>>>>Bioperl-l mailing list
> >>>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>_______________________________________________
> >>>>>>
> >>>>>>
> >>>>>>>>Bioperl-l mailing list
> >>>>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>_______________________________________________
> >>>>>>>>
> >>>>>>>>
> >>>>>>Bioperl-l mailing list
> >>>>>>Bioperl-l at lists.open-bio.org
> >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>>
> >
> >_______________________________________________
> >Bioperl-l mailing list
> >Bioperl-l at lists.open-bio.org
> >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> >
> 
> Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From valiente at lsi.upc.edu  Mon Feb 20 18:51:35 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Mon, 20 Feb 2006 19:51:35 +0100
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <43FA0FB7.6060904@lsi.upc.edu>

The local flat file implementation of Bio::DB::Taxonomy seems to be fine:

use Bio::DB::Taxonomy;
my $nodesfile = "nodes.dmp";
my $namesfile = "names.dmp";
my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
                               -nodesfile => $nodesfile,
                               -namesfile => $namefile);
my $taxonid = $db->get_taxonid('Homo sapiens');

Here, $taxonid is 9606. However,

my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);

raises:

-------------------- WARNING ---------------------
MSG: can't create a species object for Homo sapiens (human) because it isn't a species but is a '' instead
---------------------------------------------------

Thanks,

Gabriel



From boris.steipe at utoronto.ca  Mon Feb 20 18:40:19 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 20 Feb 2006 13:40:19 -0500
Subject: [Bioperl-l] Matrix Average Code / Module ?
In-Reply-To: <59825.192.168.1.176.1140416461.squirrel@192.168.1.176>
References: <003b01c62d33$d37d15d0$e6028a0a@GOLHARMOBILE1>
	<76f031ae0602190552v5f2542dbv@mail.gmail.com>
	<59825.192.168.1.176.1140416461.squirrel@192.168.1.176>
Message-ID: <92CF0104-0524-4BA3-B039-3CEECF68E20B@utoronto.ca>

Assuming you mean the arithmetic average of all elements in a matrix,  
you could do the following (using your numbers):


#!/usr/bin/perl -w
use strict;

my @matrix;

push(@matrix, [(11,22,43,54,50)]); # [(...)] :a list passed as an  
anonymous array
push(@matrix, [(27,87,74,32,10)]);
push(@matrix, [(66,58,98,78,20)]);
push(@matrix, [(22,23,44,16,34)]);

my $sum = 0;
my $number = 0;

foreach my $row (@matrix) {
     foreach my $element (@{$row}){
         $sum += $element;
         $number++;
     }
}

print "Average of $number elements = ", $sum/$number,"\n";
exit;


HTH,

B.




On 20 Feb 2006, at 01:21, Shameer Khadar wrote:

> Hi all,
> Is there any program/module to calculate the average of a blosum/ 
> pam any
> matrix ?
>
> I have a matrix and I need to see the average
>
> for example
>
> 11 22 43 54 50
> 27 87 74 32 10
> 66 58 98 78 20
> 22 23 44 16 34
>
> I have gone through Bio::Matrix::MatrixI and  
> Bio::Matrix::GenericMatrix
> and other perl modules like Math::Matrix
> http://search.cpan.org/~ulpfr/Math-Matrix-0.4/Matrix.pm
> and Math::Cephes::Matrix - but none of them have a provison to do  
> matrix
> average calculation.
>
> Any help ???
> thanks in advance,
> Happy biocomputing !!!
>
>
> -- 
> Shameer Khadar
> National Centre for Biological Sciences (TIFR)
> UAS - GKVK Campus - Bellary Road Bangalore - 65 - Karnataka - India
> T - 91-080-23636420-32 EXT 4241
> F - 91-080-23636662/23636675
> W - http://www.ncbs.res.in
> --------------------------------------------------
> "Refrain from illusions, insist on work and not words,
>  patiently seek divine and scientific truth."
> MM
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Mon Feb 20 22:01:15 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 20 Feb 2006 16:01:15 -0600
Subject: [Bioperl-l] OK for aa seq but not a na seq on
	RemoteBlast.pmversion 1.28
In-Reply-To: <000e01c6363f$494bc5e0$15327e82@pyrimidine>
Message-ID: <000001c63669$2bf06a80$15327e82@pyrimidine>

Guojun Yang pointed out that his BLAST output was still not parsed
correctly, so I posted another change:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

The direct link for the module is:

http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view

Note that all caveats (can't sue if computer blows up, this is a very
preliminary bugfix, etc.) apply.

Apparently, NCBI has changed blastn and tblastx output to show features in
the region for each HSP, starting with the either one of the following
lines:

 Features in this part of subject sequence:
 Features flanking this part of subject sequence:

If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13,
most blastn or tblastx report parsing seems to choke on these lines, unless
you are pretty lucky.  This extra little feature was introduced a while back
for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default
and hadn't starting affecting web output until this last fall.  The first
fix I posted caught only the first version but not the second

The fix included a loop with debugging output to bypass this for now.  If
you use SearchIO directly for parsing (not through RemoteBlast) you can see
the bypassed lines by setting the '-verbose' flag to 1.

Thanks to Guojun Yang for pointing this out.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Monday, February 20, 2006 11:01 AM
> To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pmversion 1.28
> 
> I have added a preliminary bugfix for the problems seen with nucleotide
> blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
> perltidy to space out the blocks (really for my own purposes; it's a
> pretty
> complex module).  The fix bypasses the extra lines output for blastn and
> tblastx and now seems to parse the text output for those reports
> correctly.
> I tested it using all NCBI BLAST flavors for the last two version of BLAST
> (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
> report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
> tested on MacOSX at the moment, so I need people out there to test it out
> on
> anything they can to make sure it works before committing.  I'll be trying
> it on Windows today.  Report back to me and I'll post anything on
> bugzilla.
> 
> Here it is:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> > Sent: Thursday, February 16, 2006 3:46 AM
> > To: gyang at plantbio.uga.edu
> > Cc: bioperl-l at lists.open-bio.org; Chris Fields
> > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> RemoteBlast.pm
> > version 1.28
> >
> > Hi,
> >
> > I have the same problem with the blast.pm-file.
> > The people of NCBI added some extra info when giving the Blast-output.
> > (see e.g. "Features flanking this part..." or "Features in this part
> > ..."), example added.
> > The blast.pm module starts looking for the hsp-alignement-information,
> > but it dies when it hits this Feature-information.
> >
> > Pieter
> >
> >
> > >gi|77552765|gb|DP000011.1|
> >
>  > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> > chromosome 12, complete
> >
> > sequence
> > Length=27492551
> >
> >  Features flanking this part of subject sequence:
> >
> > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-
> class
> >
>  > &from=19251479&to=19253693&view=gbwithparts>
> >
> > 2655 bp at 3' side: hypothetical protein
> >
>  > &from=19260091&to=19260600&view=gbwithparts>
> >
> >  Score = 36.2 bits (18),  Expect = 0.22
> >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> >  Strand=Plus/Minus
> >
> > Query  4         GTACTACTCTACTCTACT  21
> >                  ||||||||||||||||||
> >
> > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> >
> >
> >  Features flanking this part of subject sequence:
> >
> > 2991 bp at 5' side: hypothetical protein
> >
>  > &from=27003164&to=27003907&view=gbwithparts>
> >    1131 bp at 3' side: hypothetical protein
> >
> >
>  > &from=27008046&to=27010752&view=gbwithparts>
> >
> >  Score = 36.2 bits (18),  Expect = 0.22
> >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> >  Strand=Plus/Minus
> >
> > Query  2         ATGTACTACTCTACTCTA  19
> >                  ||||||||||||||||||
> > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> >
> >
> >
> >  Features in this part of subject sequence:
> >    DHHC zinc finger domain, putative
> >
> >
>  > &from=17614825&to=17618687&view=gbwithparts>
> >
> >  Score = 34.2 bits (17),  Expect = 0.87
> >  Identities = 17/17 (100%), Gaps = 0/17 (0%)
> >  Strand=Plus/Plus
> >
> > Query  5         TACTACTCTACTCTACT  21
> >                  |||||||||||||||||
> > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> >
> >
> >
> >  Features flanking this part of subject sequence:
> >    102 bp at 5' side: bZIP transcription factor, putative
> >
> >
>  > &from=2774964&to=2775778&view=gbwithparts>
> >    3740 bp at 3' side: yeast dcp1, putative
> >
>  > &from=2779635&to=2782508&view=gbwithparts>
> >
> >  Score = 32.2 bits (16),  Expect =
> > 3.4
> >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> >  Strand=Plus/Plus
> >
> > Query  7        CTACTCTACTCTACTC  22
> >                 ||||||||||||||||
> > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> >
> >
> >  Features flanking this part of subject sequence:
> >
> >    21 bp at 5' side: peptide transporter T17F3.11, putative
> >
>  > &from=27321354&to=27323117&view=gbwithparts>
> >
> > 10230 bp at 3' side: transposon protein, putative, unclassified
> >
>  > &from=27333383&to=27334285&view=gbwithparts>
> >
> >  Score = 32.2 bits (16),  Expect = 3.4
> >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> >  Strand=Plus/Minus
> >
> > Query  7         CTACTCTACTCTACTC  22
> >
> >                  ||||||||||||||||
> > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> >
> >
> >
> >
> > Guojun Yang wrote:
> >
> > >Hi, Chris,
> > >Finally the remoteblast test script works for the amino.fa query. but
> > when I try a nucleic acid sequence (see below), Error occurs:
> > >"
> > >waiting........
> > >------------- EXCEPTION  -------------
> > >MSG: no data for midline  Features flanking this part of subject
> > sequence:
> > >STACK Bio::SearchIO::blast::next_result
> > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> > hIO/blast.pm:1172
> > >STACK toplevel remoteblast_test:40
> > >"
> > >The query sequence is:
> > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > >
> > >The script (basically same as the remoteblast test, I only changed
> > database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > >#!/usr/bin/perl
> > >
> > >use Bio::SeqIO;
> > >use Bio::Seq;
> > >use Bio::Tools::Run::RemoteBlast;
> > >use Bio::SearchIO;
> > >use strict;
> > >my $prog='blastn';
> > >my $db='nr';
> > >my $e_val=1e-10;
> > >my @params=( -prog=>$prog,
> > >	-data=>$db,
> > >	-expect=>$e_val,
> > >	-readmethod=>'SearchIO');
> > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >
> > >my $v = 1;
> > >
> > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > >
> > >while (my $input = $str->next_seq()){
> > >  #Blast a sequence against a database:
> > >  #Alternatively, you could  pass in a file with many
> > >  #sequences rather than loop through sequence one at a time
> > >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >  #and swap the two lines below for an example of that.
> > >  my $r = $factory->submit_blast($input);
> > >  #my $r = $factory->submit_blast('amino.fa');
> > >  print STDERR "waiting..." if( $v > 0 );
> > >  while ( my @rids = $factory->each_rid ) {
> > >    foreach my $rid ( @rids ) {
> > >      my $rc = $factory->retrieve_blast($rid);
> > >      if( !ref($rc) ) {
> > >        if( $rc < 0 ) {
> > >          $factory->remove_rid($rid);
> > >        }
> > >        print STDERR "." if ( $v > 0 );
> > >        sleep 5;
> > >      } else {
> > >        my $result = $rc->next_result();
> > >        #save the output
> > >        my $filename = $result->query_name()."\.out";
> > >        $factory->save_output($filename);
> > >        $factory->remove_rid($rid);
> > >        print "\nQuery Name: ", $result->query_name(), "\n";
> > >        while ( my $hit = $result->next_hit ) {
> > >          next unless ( $v > 0);
> > >          print "\thit name is ", $hit->name, "\n";
> > >          while( my $hsp = $hit->next_hsp ) {
> > >            print "\t\tscore is ", $hsp->score, "\n";
> > >          }
> > >        }
> > >      }
> > >    }
> > >  }
> > >}
> > >
> > >
> > >Do you think there might still be something in the NCBI output format?
> > >
> > >Thank you,
> > >Guojun
> > >
> > >
> > >
> > >
> > >Guojun Yang
> > >Department of Plant Biology
> > >University of Georgia
> > >Tel: 706-542-1857
> > >Fax: 706-542-1805
> > >http://www.arches.uga.edu/~guojun
> > >
> > >
> > >
> > >----- Original Message -----
> > >From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >
> > >
> > >
> > >
> > >>Sorry, forgot to add that I didn't see the regex issue that you
> > mentioned.
> > >>It could be a perl-related issue.  Try the fixes I mentioned and see
> > what
> > >>happens.
> > >>
> > >>
> > >>>Christopher Fields
> > >>>
> > >>>
> > >>Postdoctoral Researcher - Switzer Lab
> > >>Dept. of Biochemistry
> > >>University of Illinois Urbana-Champaign
> > >>
> > >>
> > >>>>>-----Original Message-----
> > >>>>>
> > >>>>>
> > >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>Sent: Tuesday, February 14, 2006 12:36 PM
> > >>>To: 'gyang at plantbio.uga.edu'
> > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>
> > >>>
> > >>>>>It's a good habit to always add single quotes around words.  The
> perl
> > >>>>>
> > >>>>>
> > >>>interpreter may think a single bare word is a subroutine or perlfunc
> > >>>called with no args so will try to find a subroutine named blastp().
> > My
> > >>>debugger actually gives the error that the bare word blastp may
> > conflict
> > >>>with a future reserved word.  Like you said, 'use strict' will point
> > that
> > >>>out.
> > >>>
> > >>>
> > >>>>>As for the regex, it should match all the blast programs at NCBI
> > (blastp,
> > >>>>>
> > >>>>>
> > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure
> nothing
> > >>>else passes through.
> > >>>
> > >>>
> > >>>>>So, if you are using the script below, there are several errors.
> The
> > bare
> > >>>>>
> > >>>>>
> > >>>words for $prog and $db need quotes, and the flags for you @params
> > array
> > >>>don't have a dash before them.  I get this after adding quotes but
> > before
> > >>>adding the dashes to @params:
> > >>>
> > >>>
> > >>>>>C:\Perl\Scripts>test_blast.pl
> > >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> > >>>>>
> > >>>>>
> > >>>MSG:
> > >>>STACK: Error::throw
> > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > >>>live/Bio/Root/Root.pm:328
> > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> > >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> > >>>-----------------------------------------------------------
> > >>>
> > >>>
> > >>>>>The last line indicates a problem with this line:
> > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>>Changing the @params to this:
> > >>>>>my @params=( -prog=>$prog,
> > >>>>>
> > >>>>>
> > >>>	-data=>$db,
> > >>>	-expect=>$e_val,
> > >>>	-readmethod=>'SearchIO');
> > >>>
> > >>>
> > >>>>>fixes it, and I get output as expected.
> > >>>>>Christopher Fields
> > >>>>>
> > >>>>>
> > >>>Postdoctoral Researcher - Switzer Lab
> > >>>Dept. of Biochemistry
> > >>>University of Illinois Urbana-Champaign
> > >>>
> > >>>
> > >>>>>>>>-----Original Message-----
> > >>>>>>>>
> > >>>>>>>>
> > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > >>>>
> > >>>>Hi, Chris,
> > >>>>When I tried with the perldoc script, It did not work either. First
> it
> > >>>>says $prog can not be bare word if I "use strict". I added quotes on
> > the
> > >>>>words, then it says the value for $prog does not match expression
> > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > >>>>
> > >>>>
> > >>>script
> > >>>
> > >>>
> > >>>>is shown below. Why is the expression "t?blast[pnx]"?
> > >>>>
> > >>>>#!/usr/bin/perl
> > >>>>
> > >>>>use Bio::SeqIO;
> > >>>>use Bio::Seq;
> > >>>>use Bio::Tools::Run::RemoteBlast;
> > >>>>use Bio::SearchIO;
> > >>>>
> > >>>>
> > >>>>my $prog=blastp;
> > >>>>my $db=swissprot;
> > >>>>my $e_val=1e-10;
> > >>>>my @params=( prog=>$prog,
> > >>>>	data=>$db,
> > >>>>	expect=>$e_val,
> > >>>>	readmethod=>'SearchIO');
> > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > >>>>
> > >>>>my $v = 1;
> > >>>>
> > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > >>>>
> > >>>>while (my $input = $str->next_seq()){
> > >>>>  #Blast a sequence against a database:
> > >>>>  #Alternatively, you could  pass in a file with many
> > >>>>  #sequences rather than loop through sequence one at a time
> > >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>  #and swap the two lines below for an example of that.
> > >>>>  my $r = $factory->submit_blast($input);
> > >>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>    foreach my $rid ( @rids ) {
> > >>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>      if( !ref($rc) ) {
> > >>>>        if( $rc < 0 ) {
> > >>>>          $factory->remove_rid($rid);
> > >>>>        }
> > >>>>        print STDERR "." if ( $v > 0 );
> > >>>>        sleep 5;
> > >>>>      } else {
> > >>>>        my $result = $rc->next_result();
> > >>>>        #save the output
> > >>>>        my $filename = $result->query_name()."\.out";
> > >>>>        $factory->save_output($filename);
> > >>>>        $factory->remove_rid($rid);
> > >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>        while ( my $hit = $result->next_hit ) {
> > >>>>          next unless ( $v > 0);
> > >>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>          }
> > >>>>        }
> > >>>>      }
> > >>>>    }
> > >>>>  }
> > >>>>}
> > >>>>
> > >>>>Thank you for your help!
> > >>>>
> > >>>>
> > >>>>Guojun
> > >>>>Department of Plant Biology
> > >>>>University of Georgia
> > >>>>
> > >>>>----- Original Message -----
> > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>To: gyang at plantbio.uga.edu
> > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>
> > >>>>
> > >>>>
> > >>>>
> > >>>>>Try two things:
> > >>>>>
> > >>>>>
> > >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> > >>>>>>
> > >>>>>>
> > >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > >>>>>
> > >>>>>
> > >>>>wrong
> > >>>>
> > >>>>
> > >>>>>with the logic in your subroutine:
> > >>>>>
> > >>>>>
> > >>>>>>my $v = 1;
> > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> );
> > >>>>>>while (my $input = $str->next_seq()){
> > >>>>>>
> > >>>>>>
> > >>>>>  #Blast a sequence against a database:
> > >>>>>  #Alternatively, you could  pass in a file with many
> > >>>>>  #sequences rather than loop through sequence one at a time
> > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > >>>>>  #and swap the two lines below for an example of that.
> > >>>>>  my $r = $factory->submit_blast($input);
> > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > >>>>>  while ( my @rids = $factory->each_rid ) {
> > >>>>>    foreach my $rid ( @rids ) {
> > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > >>>>>      if( !ref($rc) ) {
> > >>>>>        if( $rc < 0 ) {
> > >>>>>          $factory->remove_rid($rid);
> > >>>>>        }
> > >>>>>        print STDERR "." if ( $v > 0 );
> > >>>>>        sleep 5;
> > >>>>>      } else {
> > >>>>>        my $result = $rc->next_result();
> > >>>>>        #save the output
> > >>>>>        my $filename = $result->query_name()."\.out";
> > >>>>>        $factory->save_output($filename);
> > >>>>>        $factory->remove_rid($rid);
> > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>        while ( my $hit = $result->next_hit ) {
> > >>>>>          next unless ( $v > 0);
> > >>>>>          print "\thit name is ", $hit->name, "\n";
> > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>          }
> > >>>>>        }
> > >>>>>      }
> > >>>>>    }
> > >>>>>  }
> > >>>>>}
> > >>>>>
> > >>>>>
> > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > >>>>>>
> > >>>>>>
> > >>>really
> > >>>
> > >>>
> > >>>>>shouldn't make that much of a difference, but I noticed that the
> CVS
> > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > >>>>>released; the Bugzilla version is based off CVS.
> > >>>>>
> > >>>>>
> > >>>>>>Christopher Fields
> > >>>>>>
> > >>>>>>
> > >>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>Dept. of Biochemistry
> > >>>>>University of Illinois Urbana-Champaign
> > >>>>>
> > >>>>>
> > >>>>>>>-----Original Message-----
> > >>>>>>>
> > >>>>>>>
> > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> > >>>>>>To: bioperl-l at lists.open-bio.org
> > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>Thanks, Chris,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> > >>>>>>
> > >>>>>>
> > >>>one
> > >>>
> > >>>
> > >>>>from
> > >>>>
> > >>>>
> > >>>>>>your bug report. The running version is 1.5 when I use the command
> > >>>>>>
> > >>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>>>sent me. But when I tried the script, it doesn't change much. My
> > >>>>>>remoteblast code (portion) is here:
> > >>>>>>
> > >>>>>>
> > >>>>>>>>sub search {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>local
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > >>>>>>local
> > >>>>>>
> > >>>>>>
> > >>>>>>
> >
> >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > >>>
> > >>>
> > >>>>>>'no';
> > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > >>>>>>			      -id=>"query",
> > >>>>>>			      -desc=>"new seq");
> > >>>>>>my $len=$query->length();
> > >>>>>>@db=('nr','htgs','wgs');
> > >>>>>>foreach my $db (@db) {
> > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> =>'blastn',
> > >>>>>>						'-data' =>"$db",
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>'-expect'=>"$E_value");
> > >>
> > >>
> > >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>my @rids = $factory->each_rid();
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>foreach my $rid ( @rids ) {
> > >>>>>>    print STDERR "$rid\n";
> > >>>>>>}
> > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > >>>>>>print STDERR "waiting...";
> > >>>>>>sleep 60;
> > >>>>>>
> > >>>>>>
> > >>>>>>>>foreach my $rid ( @rids ) {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>    my $rc = $factory->retrieve_blast($rid);
> > >>>>>>    while (!ref($rc) ) {
> > >>>>>>	if( $rc < 0 ) {
> > >>>>>># retrieve_blast returns -1 on error
> > >>>>>>	    $factory->remove_rid($rid);
> > >>>>>>	    print "Error!\n";
> > >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> > >>>>>>	    die "Can't retrieve $rid";
> > >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > >>>>>>
> > >>>>>>
> > >>>finished'
> > >>>
> > >>>
> > >>>>>>	    sleep 60;
> > >>>>>>	    $rc = $factory->retrieve_blast($rid);
> > >>>>>>	}
> > >>>>>>    }
> > >>>>>>    if (ref($rc)) {
> > >>>>>>	print STDERR "Done.\n";
> > >>>>>>	 while( my $result = $rc->next_result) {
> > >>>>>>	    while( my $hit = $result->next_hit()) {
> > >>>>>>	    	$hit_name=$hit->name;
> > >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > >>>>>>		$name=$1;
> > >>>>>>		@left_plus_start=();
> > >>>>>>		@left_plus_end=();
> > >>>>>>		@left_minus_start=();
> > >>>>>>		@left_minus_end=();
> > >>>>>>		@right_plus_start=();
> > >>>>>>		@right_plus_end=();
> > >>>>>>		@right_minus_start=();
> > >>>>>>		@right_minus_end=();
> > >>>>>>
> > >>>>>>
> > >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>		while( my $hsp = $hit->next_hsp()) {
> > >>>>>>......
> > >>>>>>
> > >>>>>>
> > >>>>>>>>It was working quite well before around October laster year, but
> > >>>>>>>>
> > >>>>>>>>
> > >>>>it has
> > >>>>
> > >>>>
> > >>>>>>stopped since then, When a submission is sent via a webpage, the
> cgi
> > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> > >>>>>>
> > >>>>>>
> > >>>>finally
> > >>>>
> > >>>>
> > >>>>>>the expected email is received but without real results although
> it
> > >>>>>>
> > >>>>>>
> > >>>>does
> > >>>>
> > >>>>
> > >>>>>>contain something from other parts of the script. Apparently the
> > >>>>>>
> > >>>>>>
> > >>>>search
> > >>>>
> > >>>>
> > >>>>>>sub did not return anything (I know there is something should be
> > >>>>>>returned.). Is it also possible the format of the NCBI output for
> > >>>>>>
> > >>>>>>
> > >>>each
> > >>>
> > >>>
> > >>>>>>result has changed?
> > >>>>>>Thank you,
> > >>>>>>Guojun
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>Department of Plant Biology
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>University of Georgia
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>>----- Original Message -----
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>checking
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> > >>>>>>>
> > >>>>>>>
> > >>>>two
> > >>>>
> > >>>>
> > >>>>>>>separate directories) or are you looking in modules?  Here's the
> > >>>>>>>
> > >>>>>>>
> > >>>way
> > >>>
> > >>>
> > >>>>to
> > >>>>
> > >>>>
> > >>>>>>>check the version (from the FAQ):
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>perl -MBio::Root::Version -e 'print
> > >>>>>>>>
> > >>>>>>>>
> > >>>>$Bio::Root::Version::VERSION,"\n"'
> > >>>>
> > >>>>
> > >>>>>>>>If you have two full bioperl distributions on your computer,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>normally
> > >>>>
> > >>>>
> > >>>>>>only
> > >>>>>>
> > >>>>>>
> > >>>>>>>one will be in use unless you have explicitly set the environment
> > >>>>>>>
> > >>>>>>>
> > >>>>>>variable
> > >>>>>>
> > >>>>>>
> > >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first
> before
> > >>>>>>>
> > >>>>>>>
> > >>>>your
> > >>>>
> > >>>>
> > >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> > >>>>>>>
> > >>>>>>>
> > >>>>mixing
> > >>>>
> > >>>>
> > >>>>>>>then, but only if perl can't find a particular module in the path
> > >>>>>>>
> > >>>>>>>
> > >>>>>>designated
> > >>>>>>
> > >>>>>>
> > >>>>>>>in PERL5LIB; then it will progress through the directories listed
> > >>>>>>>
> > >>>>>>>
> > >>>in
> > >>>
> > >>>
> > >>>>>>@INC.
> > >>>>>>
> > >>>>>>
> > >>>>>>>This may happen if a module is unique to a particular release,
> but
> > >>>>>>>
> > >>>>>>>
> > >>>>>>shouldn't
> > >>>>>>
> > >>>>>>
> > >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> > >>>>>>>
> > >>>>>>>
> > >>>can
> > >>>
> > >>>
> > >>>>>>check
> > >>>>>>
> > >>>>>>
> > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > >>>>>>>
> > >>>>>>>
> > >>>>differ
> > >>>>
> > >>>>
> > >>>>>>>depending on your OS, perl build, etc.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>Regardless, if you follow the directions for installing bioperl
> > >>>>>>>>
> > >>>>>>>>
> > >>>>for
> > >>>>
> > >>>>
> > >>>>>>your
> > >>>>>>
> > >>>>>>
> > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > >>>>>>>
> > >>>>>>>
> > >>>>unless
> > >>>>
> > >>>>
> > >>>>>>you
> > >>>>>>
> > >>>>>>
> > >>>>>>>explicitly change the installation directory when using 'perl
> > >>>>>>>
> > >>>>>>>
> > >>>>>>Makefile.PL'),
> > >>>>>>
> > >>>>>>
> > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> > >>>>>>>
> > >>>>>>>
> > >>>>install
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> > >>>>>>>
> > >>>>>>>
> > >>>>See
> > >>>>
> > >>>>
> > >>>>>>this
> > >>>>>>
> > >>>>>>
> > >>>>>>>page:
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > >>>>>>>>for more details.
> > >>>>>>>>Christopher Fields
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>Dept. of Biochemistry
> > >>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> > >>>>>>>>To: bioperl-l at lists.open-bio.org
> > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Hi, Chris,
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>I do have different versions of bioperl on my Linux machine
> > >>>>>>>>
> > >>>>>>>>
> > >>>(1.4.
> > >>>
> > >>>
> > >>>>and
> > >>>>
> > >>>>
> > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> > >>>>>>>>
> > >>>>>>>>
> > >>>>1.5.1
> > >>>>
> > >>>>
> > >>>>>>or I
> > >>>>>>
> > >>>>>>
> > >>>>>>>>need to uninstall and remove the previous versions. I could not
> > >>>>>>>>
> > >>>>>>>>
> > >>>>find
> > >>>>
> > >>>>
> > >>>>>>any
> > >>>>>>
> > >>>>>>
> > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> > >>>>>>>>
> > >>>>>>>>
> > >>>>some
> > >>>>
> > >>>>
> > >>>>>>>>suggestion?
> > >>>>>>>>Thanks,
> > >>>>>>>>Guojun
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Department of Plant Biology
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>University of Georgia
> > >>>>>>>>      _____
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>updated from CVS
> > >>>>>>
> > >>>>>>
> > >>>>>>>>which isn't the latest fix.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>Make sure that you check the following:
> > >>>>>>>>>>1) Always post to the mailing list:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>(CVS)
> > >>>>
> > >>>>
> > >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> > >>>>>>>>
> > >>>>>>>>
> > >>>>only
> > >>>>
> > >>>>
> > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > >>>>>>>>
> > >>>>>>>>
> > >>>can't
> > >>>
> > >>>
> > >>>>>>>>guarantee that mixing modules from old and new distributions
> > >>>>>>>>
> > >>>>>>>>
> > >>>(1.4
> > >>>
> > >>>
> > >>>>and
> > >>>>
> > >>>>
> > >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> > >>>>>>>>
> > >>>>>>>>
> > >>>>saved
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>(v2.2.13)
> > >>>>>>
> > >>>>>>
> > >>>>>>>>but it should still save it. I believe as long as next_results()
> > >>>>>>>>
> > >>>>>>>>
> > >>>>isn't
> > >>>>
> > >>>>
> > >>>>>>>>called, it will work.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>2.2.13
> > >>>
> > >>>
> > >>>>>>text output
> > >>>>>>
> > >>>>>>
> > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> > >>>>>>>>
> > >>>>>>>>
> > >>>Roger
> > >>>
> > >>>
> > >>>>Hall
> > >>>>
> > >>>>
> > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> > >>>>>>>>
> > >>>>>>>>
> > >>>>(Jason
> > >>>>
> > >>>>
> > >>>>>>or
> > >>>>>>
> > >>>>>>
> > >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>Bugzilla:
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>option
> > >>>>
> > >>>>
> > >>>>>>of
> > >>>>>>
> > >>>>>>
> > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> > >>>>>>>>
> > >>>>>>>>
> > >>>>this
> > >>>>
> > >>>>
> > >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>>>means that the final version will change to refle the new
> > >>>>>>>>
> > >>>>>>>>
> > >>>version.
> > >>>
> > >>>
> > >>>>>>>>>>>>Christopher Fields
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>Dept. of Biochemistry
> > >>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>    _____
> > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> > >>>>>>>>To: Chris Fields
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>Hi, Chris
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>for
> > >>>>
> > >>>>
> > >>>>>>my cgi
> > >>>>>>
> > >>>>>>
> > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > >>>>>>>>
> > >>>>>>>>
> > >>>>even
> > >>>>
> > >>>>
> > >>>>>>get
> > >>>>>>
> > >>>>>>
> > >>>>>>>>any RID. Is there any suggestion?
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>>>Guojun
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>Guojun Yang
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>Department of Plant Biology
> > >>>>>>>>University of Georgia
> > >>>>>>>>Tel: 706-542-1857
> > >>>>>>>>Fax: 706-542-1805
> > >>>>>>>>http://www.arches.uga.edu/~guojun
> > >>>>>>>>    _____
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>version
> > >>>>>>
> > >>>>>>
> > >>>>>>>>1.28
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>I would say give the new code a try, but realize that it
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>hasn't
> > >>>>
> > >>>>
> > >>>>>>been
> > >>>>>>
> > >>>>>>
> > >>>>>>>>checked
> > >>>>>>>>in (like I said below). I will try going over the modified
> > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> > >>>>>>>>
> > >>>>>>>>
> > >>>>anything I
> > >>>>
> > >>>>
> > >>>>>>>>might
> > >>>>>>>>have missed. The changed order in the header of BLAST text
> > >>>>>>>>
> > >>>>>>>>
> > >>>output
> > >>>
> > >>>
> > >>>>has
> > >>>>
> > >>>>
> > >>>>>>me a
> > >>>>>>
> > >>>>>>
> > >>>>>>>>bit worried that it might not catch everything, but it at least
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>doesn't
> > >>>>>>
> > >>>>>>
> > >>>>>>>>hang
> > >>>>>>>>in the while() loop I described in the bug report below (bug
> > >>>>>>>>
> > >>>>>>>>
> > >>>>#1934)
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>seems to process everything fine.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>If you want more stability in the code, you might consider
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>changing over
> > >>>>>>
> > >>>>>>
> > >>>>>>>>to
> > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> > >>>>>>>>
> > >>>>>>>>
> > >>>>some
> > >>>>
> > >>>>
> > >>>>>>>>changes
> > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > >>>>>>>>
> > >>>>>>>>
> > >>>>saving
> > >>>>
> > >>>>
> > >>>>>>XML
> > >>>>>>
> > >>>>>>
> > >>>>>>>>output, but I believe it parses everything regardless. If you
> > >>>>>>>>
> > >>>>>>>>
> > >>>look
> > >>>
> > >>>
> > >>>>>>back
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>last month or so there has been a bit of discussion here about
> > >>>>>>>>
> > >>>>>>>>
> > >>>it.
> > >>>
> > >>>
> > >>>>>>Jason
> > >>>>>>
> > >>>>>>
> > >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>remoteblast/
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>Christopher Fields
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>Dept. of Biochemistry
> > >>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> > >>>>>>>>>To: bioperl-l at bioperl.org
> > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>version
> > >>>>
> > >>>>
> > >>>>>>1.28
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>Hi, Everybody,
> > >>>>>>>>>I see this post and am wondering if this is the reason for the
> > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>MAK,
> > >>>>
> > >>>>
> > >>>>>>for
> > >>>>>>
> > >>>>>>
> > >>>>>>>>MITE
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>sequence analysis. It was working very well until around
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>November
> > >>>>
> > >>>>
> > >>>>>>2005,
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>when it stopped returning any result (the site is fine and
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>seems
> > >>>
> > >>>
> > >>>>to
> > >>>>
> > >>>>
> > >>>>>>be
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>doing sth after submission). In the CGI script, I used
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>remoteblast
> > >>>>
> > >>>>
> > >>>>>>(that
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>access to
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>the server because I moved. Quite several people sent emails
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>to
> > >>>
> > >>>
> > >>>>us
> > >>>>
> > >>>>
> > >>>>>>about
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>problem?
> > >>>>
> > >>>>
> > >>>>>>>>Should
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>version?
> > >>>>
> > >>>>
> > >>>>>>>>>Thanks a lot,
> > >>>>>>>>>Guojun
> > >>>>>>>>>
> > >>>>>>>>>Department of Plant Biology
> > >>>>>>>>>University of Georgia
> > >>>>>>>>>Tel: 706-542-1857
> > >>>>>>>>>Fax: 706-542-1805
> > >>>>>>>>>http://www.arches.uga.edu/~guojun
> > >>>>>>>>>_____
> > >>>>>>>>>
> > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>Jian'
> > >>>>
> > >>>>
> > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>[mailto:bioperl-
> > >>>
> > >>>
> > >>>>>>>>>l at bioperl.org]
> > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>
> > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>CVS.
> > >>>>
> > >>>>
> > >>>>>>It
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>will
> > >>>>>>>>>work for saving text output. However, it will not parse
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>anything
> > >>>
> > >>>
> > >>>>>>using
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>next_result (it will likely hang) and will not save XML
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>format.
> > >>>
> > >>>
> > >>>>See
> > >>>>
> > >>>>
> > >>>>>>>>these
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>bugs:
> > >>>>>>>>>
> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > >>>>>>>>>
> > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>and
> > >>>
> > >>>
> > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>yet
> > >>>>
> > >>>>
> > >>>>>>so
> > >>>>>>
> > >>>>>>
> > >>>>>>>>are
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>still not included in bioperl-live; they may be further
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>modified
> > >>>
> > >>>
> > >>>>>>before
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>just
> > >>>>
> > >>>>
> > >>>>>>try
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>first fix, which is a change to SearchIO::blast.
> > >>>>>>>>>
> > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>script
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>which
> > >>>>>>>>>had problems; the script you used saves the output but doesn't
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>actually
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>data).
> > >>>>
> > >>>>
> > >>>>>>Is
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>you
> > >>>
> > >>>
> > >>>>>>tried
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>readmethod
> > >>>>
> > >>>>
> > >>>>>>=>
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>blast"
> > >>>>>>>>>using your version of RemoteBlast and method next_result()?
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>Like
> > >>>
> > >>>
> > >>>>>>below
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>(from
> > >>>>>>>>>perldoc):
> > >>>>>>>>>
> > >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> > >>>>>>>>>foreach my $rid ( @rids ) {
> > >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> > >>>>>>>>>if( !ref($rc) ) {
> > >>>>>>>>>if( $rc < 0 ) {
> > >>>>>>>>>$factory->remove_rid($rid);
> > >>>>>>>>>}
> > >>>>>>>>>print STDERR "." if ( $v > 0 );
> > >>>>>>>>>sleep 5;
> > >>>>>>>>>} else { # parsing
> > >>>>>>>>>starts here
> > >>>>>>>>>my $result = $rc->next_result(); # it should hang
> > >>>>>>>>>here
> > >>>>>>>>>#save the output
> > >>>>>>>>>my $filename = $result->query_name()."\.out";
> > >>>>>>>>>$factory->save_output($filename);
> > >>>>>>>>>$factory->remove_rid($rid);
> > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> > >>>>>>>>>while ( my $hit = $result->next_hit ) {
> > >>>>>>>>>next unless ( $v > 0);
> > >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> > >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>}
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>My script hanged if I used next_result() in any way prior to
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>the
> > >>>
> > >>>
> > >>>>>>fixes.
> > >>>>>>
> > >>>>>>
> > >>>>>>>>I
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>want to see how many others are having the same issues with
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>parsing
> > >>>>
> > >>>>
> > >>>>>>>>using
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>the CVS version of bioperl-live.
> > >>>>>>>>>
> > >>>>>>>>>Christopher Fields
> > >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> > >>>>>>>>>Dept. of Biochemistry
> > >>>>>>>>>University of Illinois Urbana-Champaign
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>-----Original Message-----
> > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>l-
> > >>>
> > >>>
> > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> > >>>>>>>>>>To: Huang Jian; bioperl-l
> > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > >>>>>>>>>>
> > >>>>>>>>>>Hi Huang,
> > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>works
> > >>>>
> > >>>>
> > >>>>>>on
> > >>>>>>
> > >>>>>>
> > >>>>>>>>the
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>logic of checking the temporary file size to determine
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>whether
> > >>>
> > >>>
> > >>>>the
> > >>>>
> > >>>>
> > >>>>>>>>Blast
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>results are ready. This condition is not getting satisfied
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>may
> > >>>
> > >>>
> > >>>>be
> > >>>>
> > >>>>
> > >>>>>>due
> > >>>>>>
> > >>>>>>
> > >>>>>>>>to
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>some changes brought about by NCBI. I had this problem
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>recently
> > >>>>
> > >>>>
> > >>>>>>and
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>figured out that the solution was to use the latest version
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>has
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>this problem fixed (does not use file size logic any more)
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>which
> > >>>>
> > >>>>
> > >>>>>>is
> > >>>>>>
> > >>>>>>
> > >>>>>>>>not
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>yet included in the BioPerl package.
> > >>>>>>>>>>Cheers
> > >>>>>>>>>>Nagesh
> > >>>>>>>>>>
> > >>>>>>>>>>Huang Jian wrote:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>>Dear Nagesh,
> > >>>>>>>>>>>
> > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>you
> > >>>>
> > >>>>
> > >>>>>>send
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>me. Now it works perfectly!!!
> > >>>>>>>>>>>
> > >>>>>>>>>>>Thank you!!
> > >>>>>>>>>>>
> > >>>>>>>>>>>Huang
> > >>>>>>>>>>>
> > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> > >>>>>>>>>>>
> > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> > >>>>>>>>>>>
> > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>net,
> > >>>
> > >>>
> > >>>>so
> > >>>>
> > >>>>
> > >>>>>>still
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>>>via email
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>>Hi Huang,
> > >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>blast
> > >>>
> > >>>
> > >>>>>>search.
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>Can
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>(2005/12/09).
> > >>>>>>
> > >>>>>>
> > >>>>>>>>If
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>>>>>not I have attached it with this email, try to replace it
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>with
> > >>>>
> > >>>>
> > >>>>>>the
> > >>>>>>
> > >>>>>>
> > >>>>>>>>>old
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>>>>one which has a bug.
> > >>>>>>>>>>>>Let me know if it works.
> > >>>>>>>>>>>>Nagesh
> > >>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>_______________________________________________
> > >>>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>_______________________________________________
> > >>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>_______________________________________________
> > >>>>>>>>>Bioperl-l mailing list
> > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>_______________________________________________
> > >>>>>>
> > >>>>>>
> > >>>>>>>>Bioperl-l mailing list
> > >>>>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>
> > >>>>>>>>_______________________________________________
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>Bioperl-l mailing list
> > >>>>>>Bioperl-l at lists.open-bio.org
> > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >
> > >_______________________________________________
> > >Bioperl-l mailing list
> > >Bioperl-l at lists.open-bio.org
> > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > >
> >
> > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From gyang at plantbio.uga.edu  Mon Feb 20 22:22:28 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Mon, 20 Feb 2006 17:22:28 -0500
Subject: [Bioperl-l] Tested-OK
Message-ID: <20060220172228.f7d22947@dogwood.plantbio.uga.edu>

Chris, I tested the latest fix for blast.pm on my linux with blastn. It worked very well although my CGI script still not returning what I need, but it's not related to this parsing of blast results I think. Thanks for your great efforts.

Guojun 

----- Original Message -----
From: Chris Fields [mailto:cjfields at uiuc.edu]
To: 'Chris Fields' [mailto:cjfields at uiuc.edu], 'Pieter Monsieurs' [mailto:Pieter.Monsieurs at esat.kuleuven.be], gyang at plantbio.uga.edu
Cc: bioperl-l at lists.open-bio.org
Subject: RE: [Bioperl-l] OK for aa seq but not a na seq on RemoteBlast.pmversion 1.28


> Guojun Yang pointed out that his BLAST output was still not parsed
> correctly, so I posted another change:
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > The direct link for the module is:
> > http://bugzilla.bioperl.org/attachment.cgi?id=289&action=view
> > Note that all caveats (can't sue if computer blows up, this is a very
> preliminary bugfix, etc.) apply.
> > Apparently, NCBI has changed blastn and tblastx output to show features in
> the region for each HSP, starting with the either one of the following
> lines:
> >  Features in this part of subject sequence:
>  Features flanking this part of subject sequence:
> > If you're using Bio::SearchIO::blast previous to Dec. 2005, BLAST 2.2.13,
> most blastn or tblastx report parsing seems to choke on these lines, unless
> you are pretty lucky.  This extra little feature was introduced a while back
> for large contigs and chromosomes (~BLAST 2.2.10) but was not set by default
> and hadn't starting affecting web output until this last fall.  The first
> fix I posted caught only the first version but not the second
> > The fix included a loop with debugging output to bypass this for now.  If
> you use SearchIO directly for parsing (not through RemoteBlast) you can see
> the bypassed lines by setting the '-verbose' flag to 1.
> > Thanks to Guojun Yang for pointing this out.
> > Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> > > > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Chris Fields
> > Sent: Monday, February 20, 2006 11:01 AM
> > To: 'Pieter Monsieurs'; gyang at plantbio.uga.edu
> > Cc: bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> > RemoteBlast.pmversion 1.28
> > > > I have added a preliminary bugfix for the problems seen with nucleotide
> > blast parsing for BLAST 2.2.13 reports.  I passed SearchIO::blast through
> > perltidy to space out the blocks (really for my own purposes; it's a
> > pretty
> > complex module).  The fix bypasses the extra lines output for blastn and
> > tblastx and now seems to parse the text output for those reports
> > correctly.
> > I tested it using all NCBI BLAST flavors for the last two version of BLAST
> > (2.2.12 and 2.2.13); the fix was simple so shouldn't break any other BLAST
> > report parsing, such as WU-BLAST, RPS-BLAST, or Paracel.  It has only been
> > tested on MacOSX at the moment, so I need people out there to test it out
> > on
> > anything they can to make sure it works before committing.  I'll be trying
> > it on Windows today.  Report back to me and I'll post anything on
> > bugzilla.
> > > > Here it is:
> > > > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > > > > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> > > > > > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Pieter Monsieurs
> > > Sent: Thursday, February 16, 2006 3:46 AM
> > > To: gyang at plantbio.uga.edu
> > > Cc: bioperl-l at lists.open-bio.org; Chris Fields
> > > Subject: Re: [Bioperl-l] OK for aa seq but not a na seq on
> > RemoteBlast.pm
> > > version 1.28
> > >
> > > Hi,
> > >
> > > I have the same problem with the blast.pm-file.
> > > The people of NCBI added some extra info when giving the Blast-output.
> > > (see e.g. "Features flanking this part..." or "Features in this part
> > > ..."), example added.
> > > The blast.pm module starts looking for the hsp-alignement-information,
> > > but it dies when it hits this Feature-information.
> > >
> > > Pieter
> > >
> > >
> > > >gi|77552765|gb|DP000011.1|
> > >
> >  > > list_uids=77552765&dopt=GenBank> Oryza sativa (japonica cultivar-group)
> > > chromosome 12, complete
> > >
> > > sequence
> > > Length=27492551
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > > 3726 bp at 5' side: transposon protein, putative, CACTA, En/Spm sub-
> > class
> > >
> >  > > &from=19251479&to=19253693&view=gbwithparts>
> > >
> > > 2655 bp at 3' side: hypothetical protein
> > >
> >  > > &from=19260091&to=19260600&view=gbwithparts>
> > >
> > >  Score = 36.2 bits (18),  Expect = 0.22
> > >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  4         GTACTACTCTACTCTACT  21
> > >                  ||||||||||||||||||
> > >
> > > Sbjct  19257436  GTACTACTCTACTCTACT  19257419
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > > 2991 bp at 5' side: hypothetical protein
> > >
> >  > > &from=27003164&to=27003907&view=gbwithparts>
> > >    1131 bp at 3' side: hypothetical protein
> > >
> > >
> >  > > &from=27008046&to=27010752&view=gbwithparts>
> > >
> > >  Score = 36.2 bits (18),  Expect = 0.22
> > >  Identities = 18/18 (100%), Gaps = 0/18 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  2         ATGTACTACTCTACTCTA  19
> > >                  ||||||||||||||||||
> > > Sbjct  27006915  ATGTACTACTCTACTCTA  27006898
> > >
> > >
> > >
> > >  Features in this part of subject sequence:
> > >    DHHC zinc finger domain, putative
> > >
> > >
> >  > > &from=17614825&to=17618687&view=gbwithparts>
> > >
> > >  Score = 34.2 bits (17),  Expect = 0.87
> > >  Identities = 17/17 (100%), Gaps = 0/17 (0%)
> > >  Strand=Plus/Plus
> > >
> > > Query  5         TACTACTCTACTCTACT  21
> > >                  |||||||||||||||||
> > > Sbjct  17616437  TACTACTCTACTCTACT  17616453
> > >
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >    102 bp at 5' side: bZIP transcription factor, putative
> > >
> > >
> >  > > &from=2774964&to=2775778&view=gbwithparts>
> > >    3740 bp at 3' side: yeast dcp1, putative
> > >
> >  > > &from=2779635&to=2782508&view=gbwithparts>
> > >
> > >  Score = 32.2 bits (16),  Expect =
> > > 3.4
> > >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > >  Strand=Plus/Plus
> > >
> > > Query  7        CTACTCTACTCTACTC  22
> > >                 ||||||||||||||||
> > > Sbjct  2775880  CTACTCTACTCTACTC  2775895
> > >
> > >
> > >  Features flanking this part of subject sequence:
> > >
> > >    21 bp at 5' side: peptide transporter T17F3.11, putative
> > >
> >  > > &from=27321354&to=27323117&view=gbwithparts>
> > >
> > > 10230 bp at 3' side: transposon protein, putative, unclassified
> > >
> >  > > &from=27333383&to=27334285&view=gbwithparts>
> > >
> > >  Score = 32.2 bits (16),  Expect = 3.4
> > >  Identities = 16/16 (100%), Gaps = 0/16 (0%)
> > >  Strand=Plus/Minus
> > >
> > > Query  7         CTACTCTACTCTACTC  22
> > >
> > >                  ||||||||||||||||
> > > Sbjct  27323153  CTACTCTACTCTACTC  27323138
> > >
> > >
> > >
> > >
> > > Guojun Yang wrote:
> > >
> > > >Hi, Chris,
> > > >Finally the remoteblast test script works for the amino.fa query. but
> > > when I try a nucleic acid sequence (see below), Error occurs:
> > > >"
> > > >waiting........
> > > >------------- EXCEPTION  -------------
> > > >MSG: no data for midline  Features flanking this part of subject
> > > sequence:
> > > >STACK Bio::SearchIO::blast::next_result
> > > /usr/lib/perl5/site_perl/5.8.3/Bio/Searc
> > > hIO/blast.pm:1172
> > > >STACK toplevel remoteblast_test:40
> > > >"
> > > >The query sequence is:
> > > >CTCCCTCCGTCTCAAAATATTTGACGCCGTTGACTTTTTACTAAAAATGTTTGACCGTTC
> > > >GTCTTATTTAAAAAATTTAAGTAATTATTAATTCTTTTCCTATCATTTGATTTATTGTTA
> > > >AATATATTTTTATGTATACATATAGTTTTACATATTTCACAAAAAATTTTGAATAAGACG
> > > >AACGGTCAAATATGTTTTAAAAAGTCAACGGTGTCAAACATTTAGAAACGGAGGGAG
> > > >
> > > >The script (basically same as the remoteblast test, I only changed
> > > database to 'nr' and program to 'blastn' and filename to 'ost3'):
> > > >#!/usr/bin/perl
> > > >
> > > >use Bio::SeqIO;
> > > >use Bio::Seq;
> > > >use Bio::Tools::Run::RemoteBlast;
> > > >use Bio::SearchIO;
> > > >use strict;
> > > >my $prog='blastn';
> > > >my $db='nr';
> > > >my $e_val=1e-10;
> > > >my @params=( -prog=>$prog,
> > > >	-data=>$db,
> > > >	-expect=>$e_val,
> > > >	-readmethod=>'SearchIO');
> > > >my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >
> > > >my $v = 1;
> > > >
> > > >my $str = Bio::SeqIO->new(-file=>'ost3' , -format => 'fasta' );
> > > >
> > > >while (my $input = $str->next_seq()){
> > > >  #Blast a sequence against a database:
> > > >  #Alternatively, you could  pass in a file with many
> > > >  #sequences rather than loop through sequence one at a time
> > > >  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >  #and swap the two lines below for an example of that.
> > > >  my $r = $factory->submit_blast($input);
> > > >  #my $r = $factory->submit_blast('amino.fa');
> > > >  print STDERR "waiting..." if( $v > 0 );
> > > >  while ( my @rids = $factory->each_rid ) {
> > > >    foreach my $rid ( @rids ) {
> > > >      my $rc = $factory->retrieve_blast($rid);
> > > >      if( !ref($rc) ) {
> > > >        if( $rc < 0 ) {
> > > >          $factory->remove_rid($rid);
> > > >        }
> > > >        print STDERR "." if ( $v > 0 );
> > > >        sleep 5;
> > > >      } else {
> > > >        my $result = $rc->next_result();
> > > >        #save the output
> > > >        my $filename = $result->query_name()."\.out";
> > > >        $factory->save_output($filename);
> > > >        $factory->remove_rid($rid);
> > > >        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >        while ( my $hit = $result->next_hit ) {
> > > >          next unless ( $v > 0);
> > > >          print "\thit name is ", $hit->name, "\n";
> > > >          while( my $hsp = $hit->next_hsp ) {
> > > >            print "\t\tscore is ", $hsp->score, "\n";
> > > >          }
> > > >        }
> > > >      }
> > > >    }
> > > >  }
> > > >}
> > > >
> > > >
> > > >Do you think there might still be something in the NCBI output format?
> > > >
> > > >Thank you,
> > > >Guojun
> > > >
> > > >
> > > >
> > > >
> > > >Guojun Yang
> > > >Department of Plant Biology
> > > >University of Georgia
> > > >Tel: 706-542-1857
> > > >Fax: 706-542-1805
> > > >http://www.arches.uga.edu/~guojun
> > > >
> > > >
> > > >
> > > >----- Original Message -----
> > > >From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >Subject: FW: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >
> > > >
> > > >
> > > >
> > > >>Sorry, forgot to add that I didn't see the regex issue that you
> > > mentioned.
> > > >>It could be a perl-related issue.  Try the fixes I mentioned and see
> > > what
> > > >>happens.
> > > >>
> > > >>
> > > >>>Christopher Fields
> > > >>>
> > > >>>
> > > >>Postdoctoral Researcher - Switzer Lab
> > > >>Dept. of Biochemistry
> > > >>University of Illinois Urbana-Champaign
> > > >>
> > > >>
> > > >>>>>-----Original Message-----
> > > >>>>>
> > > >>>>>
> > > >>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>Sent: Tuesday, February 14, 2006 12:36 PM
> > > >>>To: 'gyang at plantbio.uga.edu'
> > > >>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >>>
> > > >>>
> > > >>>>>It's a good habit to always add single quotes around words.  The
> > perl
> > > >>>>>
> > > >>>>>
> > > >>>interpreter may think a single bare word is a subroutine or perlfunc
> > > >>>called with no args so will try to find a subroutine named blastp().
> > > My
> > > >>>debugger actually gives the error that the bare word blastp may
> > > conflict
> > > >>>with a future reserved word.  Like you said, 'use strict' will point
> > > that
> > > >>>out.
> > > >>>
> > > >>>
> > > >>>>>As for the regex, it should match all the blast programs at NCBI
> > > (blastp,
> > > >>>>>
> > > >>>>>
> > > >>>blastn, blastx, tblastn, tblastx) and is built-in to make sure
> > nothing
> > > >>>else passes through.
> > > >>>
> > > >>>
> > > >>>>>So, if you are using the script below, there are several errors.
> > The
> > > bare
> > > >>>>>
> > > >>>>>
> > > >>>words for $prog and $db need quotes, and the flags for you @params
> > > array
> > > >>>don't have a dash before them.  I get this after adding quotes but
> > > before
> > > >>>adding the dashes to @params:
> > > >>>
> > > >>>
> > > >>>>>C:\Perl\Scripts>test_blast.pl
> > > >>>>>------------- EXCEPTION: Bio::Root::Exception -------------
> > > >>>>>
> > > >>>>>
> > > >>>MSG:
> > > >>>STACK: Error::throw
> > > >>>STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\bioperl-
> > > >>>live/Bio/Root/Root.pm:328
> > > >>>STACK: Bio::Tools::Run::RemoteBlast::submit_parameter
> > > >>>C:\Perl\src\bioperl\bioperl-live/Bio/Tools/Run/RemoteBlast.pm:325
> > > >>>STACK: Bio::Tools::Run::RemoteBlast::new C:\Perl\src\bioperl\bioperl-
> > > >>>live/Bio/Tools/Run/RemoteBlast.pm:256
> > > >>>STACK: C:\Perl\Scripts\test_blast.pl:15
> > > >>>-----------------------------------------------------------
> > > >>>
> > > >>>
> > > >>>>>The last line indicates a problem with this line:
> > > >>>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >>>>>Changing the @params to this:
> > > >>>>>my @params=( -prog=>$prog,
> > > >>>>>
> > > >>>>>
> > > >>>	-data=>$db,
> > > >>>	-expect=>$e_val,
> > > >>>	-readmethod=>'SearchIO');
> > > >>>
> > > >>>
> > > >>>>>fixes it, and I get output as expected.
> > > >>>>>Christopher Fields
> > > >>>>>
> > > >>>>>
> > > >>>Postdoctoral Researcher - Switzer Lab
> > > >>>Dept. of Biochemistry
> > > >>>University of Illinois Urbana-Champaign
> > > >>>
> > > >>>
> > > >>>>>>>>-----Original Message-----
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > >>>>Sent: Tuesday, February 14, 2006 11:48 AM
> > > >>>>To: Chris Fields; bioperl-l at lists.open-bio.org
> > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.2
> > > >>>>
> > > >>>>Hi, Chris,
> > > >>>>When I tried with the perldoc script, It did not work either. First
> > it
> > > >>>>says $prog can not be bare word if I "use strict". I added quotes on
> > > the
> > > >>>>words, then it says the value for $prog does not match expression
> > > >>>>t?blast[pnx]. Rejecting. STACK ...RemoteBlast.pm 325 and 256.  The
> > > >>>>
> > > >>>>
> > > >>>script
> > > >>>
> > > >>>
> > > >>>>is shown below. Why is the expression "t?blast[pnx]"?
> > > >>>>
> > > >>>>#!/usr/bin/perl
> > > >>>>
> > > >>>>use Bio::SeqIO;
> > > >>>>use Bio::Seq;
> > > >>>>use Bio::Tools::Run::RemoteBlast;
> > > >>>>use Bio::SearchIO;
> > > >>>>
> > > >>>>
> > > >>>>my $prog=blastp;
> > > >>>>my $db=swissprot;
> > > >>>>my $e_val=1e-10;
> > > >>>>my @params=( prog=>$prog,
> > > >>>>	data=>$db,
> > > >>>>	expect=>$e_val,
> > > >>>>	readmethod=>'SearchIO');
> > > >>>>my $factory=Bio::Tools::Run::RemoteBlast->new(@params);
> > > >>>>
> > > >>>>my $v = 1;
> > > >>>>
> > > >>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta' );
> > > >>>>
> > > >>>>while (my $input = $str->next_seq()){
> > > >>>>  #Blast a sequence against a database:
> > > >>>>  #Alternatively, you could  pass in a file with many
> > > >>>>  #sequences rather than loop through sequence one at a time
> > > >>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >>>>  #and swap the two lines below for an example of that.
> > > >>>>  my $r = $factory->submit_blast($input);
> > > >>>>  #my $r = $factory->submit_blast('amino.fa');
> > > >>>>  print STDERR "waiting..." if( $v > 0 );
> > > >>>>  while ( my @rids = $factory->each_rid ) {
> > > >>>>    foreach my $rid ( @rids ) {
> > > >>>>      my $rc = $factory->retrieve_blast($rid);
> > > >>>>      if( !ref($rc) ) {
> > > >>>>        if( $rc < 0 ) {
> > > >>>>          $factory->remove_rid($rid);
> > > >>>>        }
> > > >>>>        print STDERR "." if ( $v > 0 );
> > > >>>>        sleep 5;
> > > >>>>      } else {
> > > >>>>        my $result = $rc->next_result();
> > > >>>>        #save the output
> > > >>>>        my $filename = $result->query_name()."\.out";
> > > >>>>        $factory->save_output($filename);
> > > >>>>        $factory->remove_rid($rid);
> > > >>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>        while ( my $hit = $result->next_hit ) {
> > > >>>>          next unless ( $v > 0);
> > > >>>>          print "\thit name is ", $hit->name, "\n";
> > > >>>>          while( my $hsp = $hit->next_hsp ) {
> > > >>>>            print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>          }
> > > >>>>        }
> > > >>>>      }
> > > >>>>    }
> > > >>>>  }
> > > >>>>}
> > > >>>>
> > > >>>>Thank you for your help!
> > > >>>>
> > > >>>>
> > > >>>>Guojun
> > > >>>>Department of Plant Biology
> > > >>>>University of Georgia
> > > >>>>
> > > >>>>----- Original Message -----
> > > >>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>To: gyang at plantbio.uga.edu
> > > >>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>>>Try two things:
> > > >>>>>
> > > >>>>>
> > > >>>>>>1)  Use a much simpler script, like the one in 'perldoc
> > > >>>>>>
> > > >>>>>>
> > > >>>>>Bio::Tools::Run::RemoteBlast'.  If this fixes it, there's something
> > > >>>>>
> > > >>>>>
> > > >>>>wrong
> > > >>>>
> > > >>>>
> > > >>>>>with the logic in your subroutine:
> > > >>>>>
> > > >>>>>
> > > >>>>>>my $v = 1;
> > > >>>>>>my $str = Bio::SeqIO->new(-file=>'amino.fa' , -format => 'fasta'
> > );
> > > >>>>>>while (my $input = $str->next_seq()){
> > > >>>>>>
> > > >>>>>>
> > > >>>>>  #Blast a sequence against a database:
> > > >>>>>  #Alternatively, you could  pass in a file with many
> > > >>>>>  #sequences rather than loop through sequence one at a time
> > > >>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
> > > >>>>>  #and swap the two lines below for an example of that.
> > > >>>>>  my $r = $factory->submit_blast($input);
> > > >>>>>  #my $r = $factory->submit_blast('amino.fa');
> > > >>>>>  print STDERR "waiting..." if( $v > 0 );
> > > >>>>>  while ( my @rids = $factory->each_rid ) {
> > > >>>>>    foreach my $rid ( @rids ) {
> > > >>>>>      my $rc = $factory->retrieve_blast($rid);
> > > >>>>>      if( !ref($rc) ) {
> > > >>>>>        if( $rc < 0 ) {
> > > >>>>>          $factory->remove_rid($rid);
> > > >>>>>        }
> > > >>>>>        print STDERR "." if ( $v > 0 );
> > > >>>>>        sleep 5;
> > > >>>>>      } else {
> > > >>>>>        my $result = $rc->next_result();
> > > >>>>>        #save the output
> > > >>>>>        my $filename = $result->query_name()."\.out";
> > > >>>>>        $factory->save_output($filename);
> > > >>>>>        $factory->remove_rid($rid);
> > > >>>>>        print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>>        while ( my $hit = $result->next_hit ) {
> > > >>>>>          next unless ( $v > 0);
> > > >>>>>          print "\thit name is ", $hit->name, "\n";
> > > >>>>>          while( my $hsp = $hit->next_hsp ) {
> > > >>>>>            print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>>          }
> > > >>>>>        }
> > > >>>>>      }
> > > >>>>>    }
> > > >>>>>  }
> > > >>>>>}
> > > >>>>>
> > > >>>>>
> > > >>>>>>2) Try the RemoteBlast from Bugzilla and see if that works.  It
> > > >>>>>>
> > > >>>>>>
> > > >>>really
> > > >>>
> > > >>>
> > > >>>>>shouldn't make that much of a difference, but I noticed that the
> > CVS
> > > >>>>>RemoteBlast (1.28) was changed in Dec 2005, after bioperl-1.5.1 was
> > > >>>>>released; the Bugzilla version is based off CVS.
> > > >>>>>
> > > >>>>>
> > > >>>>>>Christopher Fields
> > > >>>>>>
> > > >>>>>>
> > > >>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>Dept. of Biochemistry
> > > >>>>>University of Illinois Urbana-Champaign
> > > >>>>>
> > > >>>>>
> > > >>>>>>>-----Original Message-----
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>Sent: Monday, February 13, 2006 3:00 PM
> > > >>>>>>To: bioperl-l at lists.open-bio.org
> > > >>>>>>Subject: Re: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>Thanks, Chris,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>I installed version 1.5.1 and replaced the blast.pm file with the
> > > >>>>>>
> > > >>>>>>
> > > >>>one
> > > >>>
> > > >>>
> > > >>>>from
> > > >>>>
> > > >>>>
> > > >>>>>>your bug report. The running version is 1.5 when I use the command
> > > >>>>>>
> > > >>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>>>sent me. But when I tried the script, it doesn't change much. My
> > > >>>>>>remoteblast code (portion) is here:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>sub search {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>local
> > $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'}="$ORGN";
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'WORD_SIZE'}=7;
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'HITLIST_SIZE'}=5000;
> > > >>>>>>local
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > >
> > >>>$Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'}=
> > > >>>
> > > >>>
> > > >>>>>>'no';
> > > >>>>>>local $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'}='3 1';
> > > >>>>>>my $query = Bio::Seq -> new ( -seq=>"$_[0]",
> > > >>>>>>			      -id=>"query",
> > > >>>>>>			      -desc=>"new seq");
> > > >>>>>>my $len=$query->length();
> > > >>>>>>@db=('nr','htgs','wgs');
> > > >>>>>>foreach my $db (@db) {
> > > >>>>>>my $factory = Bio::Tools::Run::RemoteBlast->new('-prog'
> > =>'blastn',
> > > >>>>>>						'-data' =>"$db",
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >>'-expect'=>"$E_value");
> > > >>
> > > >>
> > > >>>>>>>>>>my $blast_report = $factory->submit_blast($query);
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>my @rids = $factory->each_rid();
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>    print STDERR "$rid\n";
> > > >>>>>>}
> > > >>>>>># RID = Remote Blast ID (e.g: 1017772174-16400-6638)
> > > >>>>>>print STDERR "waiting...";
> > > >>>>>>sleep 60;
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>    my $rc = $factory->retrieve_blast($rid);
> > > >>>>>>    while (!ref($rc) ) {
> > > >>>>>>	if( $rc < 0 ) {
> > > >>>>>># retrieve_blast returns -1 on error
> > > >>>>>>	    $factory->remove_rid($rid);
> > > >>>>>>	    print "Error!\n";
> > > >>>>>>	    send_error($email,$function,$seqname,$queryname[$ST]);
> > > >>>>>>	    die "Can't retrieve $rid";
> > > >>>>>>	} if ($rc==0) { # retrieve_blast returns 0 on 'job not
> > > >>>>>>
> > > >>>>>>
> > > >>>finished'
> > > >>>
> > > >>>
> > > >>>>>>	    sleep 60;
> > > >>>>>>	    $rc = $factory->retrieve_blast($rid);
> > > >>>>>>	}
> > > >>>>>>    }
> > > >>>>>>    if (ref($rc)) {
> > > >>>>>>	print STDERR "Done.\n";
> > > >>>>>>	 while( my $result = $rc->next_result) {
> > > >>>>>>	    while( my $hit = $result->next_hit()) {
> > > >>>>>>	    	$hit_name=$hit->name;
> > > >>>>>>		$hit_name =~ /\S+[|](\S+)[.]\d+[|].*/;
> > > >>>>>>		$name=$1;
> > > >>>>>>		@left_plus_start=();
> > > >>>>>>		@left_plus_end=();
> > > >>>>>>		@left_minus_start=();
> > > >>>>>>		@left_minus_end=();
> > > >>>>>>		@right_plus_start=();
> > > >>>>>>		@right_plus_end=();
> > > >>>>>>		@right_minus_start=();
> > > >>>>>>		@right_minus_end=();
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>		if (!($name =~ /^[a-zA-Z][a-zA-Z]\_\d{6}/i)) {
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>		while( my $hsp = $hit->next_hsp()) {
> > > >>>>>>......
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>It was working quite well before around October laster year, but
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>it has
> > > >>>>
> > > >>>>
> > > >>>>>>stopped since then, When a submission is sent via a webpage, the
> > cgi
> > > >>>>>>starts to work and use a memory of ~20 Mb. Then it hangs there,
> > > >>>>>>
> > > >>>>>>
> > > >>>>finally
> > > >>>>
> > > >>>>
> > > >>>>>>the expected email is received but without real results although
> > it
> > > >>>>>>
> > > >>>>>>
> > > >>>>does
> > > >>>>
> > > >>>>
> > > >>>>>>contain something from other parts of the script. Apparently the
> > > >>>>>>
> > > >>>>>>
> > > >>>>search
> > > >>>>
> > > >>>>
> > > >>>>>>sub did not return anything (I know there is something should be
> > > >>>>>>returned.). Is it also possible the format of the NCBI output for
> > > >>>>>>
> > > >>>>>>
> > > >>>each
> > > >>>
> > > >>>
> > > >>>>>>result has changed?
> > > >>>>>>Thank you,
> > > >>>>>>Guojun
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>University of Georgia
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>>----- Original Message -----
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >>>>>>Subject: RE: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>How do you know two versions are installed (i.e. how are
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>checking
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>version)?  Do you see have two complete bioperl distributions (in
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>two
> > > >>>>
> > > >>>>
> > > >>>>>>>separate directories) or are you looking in modules?  Here's the
> > > >>>>>>>
> > > >>>>>>>
> > > >>>way
> > > >>>
> > > >>>
> > > >>>>to
> > > >>>>
> > > >>>>
> > > >>>>>>>check the version (from the FAQ):
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>perl -MBio::Root::Version -e 'print
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>$Bio::Root::Version::VERSION,"\n"'
> > > >>>>
> > > >>>>
> > > >>>>>>>>If you have two full bioperl distributions on your computer,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>normally
> > > >>>>
> > > >>>>
> > > >>>>>>only
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>one will be in use unless you have explicitly set the environment
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>variable
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>PERL5LIB.  The PERL5LIB  directories will be searched first
> > before
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>your
> > > >>>>
> > > >>>>
> > > >>>>>>>normal perl directory list (@INC) is searched.  You MAY get some
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>mixing
> > > >>>>
> > > >>>>
> > > >>>>>>>then, but only if perl can't find a particular module in the path
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>designated
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>in PERL5LIB; then it will progress through the directories listed
> > > >>>>>>>
> > > >>>>>>>
> > > >>>in
> > > >>>
> > > >>>
> > > >>>>>>@INC.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>This may happen if a module is unique to a particular release,
> > but
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>shouldn't
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>happen for the majority of modules, including RemoteBlast.  You
> > > >>>>>>>
> > > >>>>>>>
> > > >>>can
> > > >>>
> > > >>>
> > > >>>>>>check
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>what @INC and PERL5LIB are set to by using 'perl -V'.  @INC will
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>differ
> > > >>>>
> > > >>>>
> > > >>>>>>>depending on your OS, perl build, etc.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>Regardless, if you follow the directions for installing bioperl
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>for
> > > >>>>
> > > >>>>
> > > >>>>>>your
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>system ('perl Makefile.PL', 'make', 'make test', 'make install',
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>unless
> > > >>>>
> > > >>>>
> > > >>>>>>you
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>explicitly change the installation directory when using 'perl
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>Makefile.PL'),
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>then 'uninstalling' Bioperl shouldn't be a problem as it will
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>install
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>Bioperl distribution you downloaded over the old version in @INC.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>See
> > > >>>>
> > > >>>>
> > > >>>>>>this
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>page:
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>http://bioperl.open-bio.org/SRC/bioperl-live/INSTALL
> > > >>>>>>>>for more details.
> > > >>>>>>>>Christopher Fields
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>Dept. of Biochemistry
> > > >>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>>>Sent: Monday, February 13, 2006 12:32 PM
> > > >>>>>>>>To: bioperl-l at lists.open-bio.org
> > > >>>>>>>>Subject: [Bioperl-l] more on RemoteBlast.pm version 1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Hi, Chris,
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>I do have different versions of bioperl on my Linux machine
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>(1.4.
> > > >>>
> > > >>>
> > > >>>>and
> > > >>>>
> > > >>>>
> > > >>>>>>>>1.5.0), this may be the problem. Should I just install bioperl-
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>1.5.1
> > > >>>>
> > > >>>>
> > > >>>>>>or I
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>need to uninstall and remove the previous versions. I could not
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>find
> > > >>>>
> > > >>>>
> > > >>>>>>any
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>hint on uninstalling bioperl on linux. Could you please give me
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>some
> > > >>>>
> > > >>>>
> > > >>>>>>>>suggestion?
> > > >>>>>>>>Thanks,
> > > >>>>>>>>Guojun
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>University of Georgia
> > > >>>>>>>>      _____
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>  From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at lists.open-bio.org
> > > >>>>>>>>Sent: Mon, 13 Feb 2006 11:45:14 -0500
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>>>If you're using RemoteBlast 1.28, then you've likely
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>updated from CVS
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>which isn't the latest fix.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>Make sure that you check the following:
> > > >>>>>>>>>>1) Always post to the mailing list:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance .
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>2) You must have the complete bioperl-1.5.1 or bioperl-live
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>(CVS)
> > > >>>>
> > > >>>>
> > > >>>>>>>>installed first.  Perform a clean installation; do not upgrade
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>only
> > > >>>>
> > > >>>>
> > > >>>>>>>>Bio::SearchIO::blast and Bio::Tools::Run::RemoteBlast, as we
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>can't
> > > >>>
> > > >>>
> > > >>>>>>>>guarantee that mixing modules from old and new distributions
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>(1.4
> > > >>>
> > > >>>
> > > >>>>and
> > > >>>>
> > > >>>>
> > > >>>>>>>>1.5.1, for instance) will work.  A bioperl-1.5.1 or bioperl-live
> > > >>>>>>>>installation will allow text output from BLAST v.2.2.12 to be
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>saved
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>parsed; it will not parse the newest BLAST text output from NCBI
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>(v2.2.13)
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>but it should still save it. I believe as long as next_results()
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>isn't
> > > >>>>
> > > >>>>
> > > >>>>>>>>called, it will work.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>3) The bug fixes for the above issue with parsing BLAST
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>2.2.13
> > > >>>
> > > >>>
> > > >>>>>>text output
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>are NOT in CVS; they haven't been cleared and checked in by
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>Roger
> > > >>>
> > > >>>
> > > >>>>Hall
> > > >>>>
> > > >>>>
> > > >>>>>>>>(who's now taking care of RemoteBlast) and the powers that be
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>(Jason
> > > >>>>
> > > >>>>
> > > >>>>>>or
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>whomever is in charge of Bio::SearchIO).  They can be found in
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>Bugzilla:
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>The fix in RemoteBlast in Bugzilla (#1935) is to allow the
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>option
> > > >>>>
> > > >>>>
> > > >>>>>>of
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>saving XML output, so isn't necessary if you don't plan on using
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>this
> > > >>>>
> > > >>>>
> > > >>>>>>>>option.  And, remember, they haven't been committed yet to CVS,
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>>>means that the final version will change to refle the new
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>version.
> > > >>>
> > > >>>
> > > >>>>>>>>>>>>Christopher Fields
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>    _____
> > > >>>>>>>>>>>>From: Guojun Yang [mailto:gyang at plantbio.uga.edu]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Sent: Monday, February 13, 2006 9:26 AM
> > > >>>>>>>>To: Chris Fields
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>Hi, Chris
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>Thanks for your suggestion, however, it doesn't seem to work
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>for
> > > >>>>
> > > >>>>
> > > >>>>>>my cgi
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>even after I replace both blast.pm and RemoteBlast.pm. I didn't
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>even
> > > >>>>
> > > >>>>
> > > >>>>>>get
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>any RID. Is there any suggestion?
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>>>Guojun
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>Guojun Yang
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>Department of Plant Biology
> > > >>>>>>>>University of Georgia
> > > >>>>>>>>Tel: 706-542-1857
> > > >>>>>>>>Fax: 706-542-1805
> > > >>>>>>>>http://www.arches.uga.edu/~guojun
> > > >>>>>>>>    _____
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>To: gyang at plantbio.uga.edu, bioperl-l at bioperl.org
> > > >>>>>>>>Sent: Fri, 03 Feb 2006 16:07:29 -0500
> > > >>>>>>>>Subject: RE: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>version
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>1.28
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>I would say give the new code a try, but realize that it
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>hasn't
> > > >>>>
> > > >>>>
> > > >>>>>>been
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>checked
> > > >>>>>>>>in (like I said below). I will try going over the modified
> > > >>>>>>>>Bio::SearchIO::blast again this weekend to see if there is
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>anything I
> > > >>>>
> > > >>>>
> > > >>>>>>>>might
> > > >>>>>>>>have missed. The changed order in the header of BLAST text
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>output
> > > >>>
> > > >>>
> > > >>>>has
> > > >>>>
> > > >>>>
> > > >>>>>>me a
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>bit worried that it might not catch everything, but it at least
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>doesn't
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>hang
> > > >>>>>>>>in the while() loop I described in the bug report below (bug
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>#1934)
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>seems to process everything fine.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>If you want more stability in the code, you might consider
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>changing over
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>to
> > > >>>>>>>>XML output and parsing with Bio::SearchIO::blastxml. There are
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>some
> > > >>>>
> > > >>>>
> > > >>>>>>>>changes
> > > >>>>>>>>in Bio::Tools::Run::RemoteBlast (bug #1935) that accommodate
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>saving
> > > >>>>
> > > >>>>
> > > >>>>>>XML
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>output, but I believe it parses everything regardless. If you
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>look
> > > >>>
> > > >>>
> > > >>>>>>back
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>last month or so there has been a bit of discussion here about
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>it.
> > > >>>
> > > >>>
> > > >>>>>>Jason
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>describes a bit on how to set up RemoteBlast for XML:
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>http://bioperl.org/news/2005/11/06/getting-blastxml-using-
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>remoteblast/
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>Christopher Fields
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> > > >>>>>>>>>Sent: Friday, February 03, 2006 1:45 PM
> > > >>>>>>>>>To: bioperl-l at bioperl.org
> > > >>>>>>>>>Subject: [Bioperl-l] more question regarding RemoteBlast.pm
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>version
> > > >>>>
> > > >>>>
> > > >>>>>>1.28
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>Hi, Everybody,
> > > >>>>>>>>>I see this post and am wondering if this is the reason for the
> > > >>>>>>>>>malfunctionning of my webserver. We set up a webserver named
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>MAK,
> > > >>>>
> > > >>>>
> > > >>>>>>for
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>MITE
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>sequence analysis. It was working very well until around
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>November
> > > >>>>
> > > >>>>
> > > >>>>>>2005,
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>when it stopped returning any result (the site is fine and
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>seems
> > > >>>
> > > >>>
> > > >>>>to
> > > >>>>
> > > >>>>
> > > >>>>>>be
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>doing sth after submission). In the CGI script, I used
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>remoteblast
> > > >>>>
> > > >>>>
> > > >>>>>>(that
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>work was done in 2003) to do searches. I currently do not have
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>access to
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>the server because I moved. Quite several people sent emails
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>to
> > > >>>
> > > >>>
> > > >>>>us
> > > >>>>
> > > >>>>
> > > >>>>>>about
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>its malfunctioning. Is there any suggestion on fixing the
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>problem?
> > > >>>>
> > > >>>>
> > > >>>>>>>>Should
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>I simplily ask the remoteblast.pm be replaced with the new
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>version?
> > > >>>>
> > > >>>>
> > > >>>>>>>>>Thanks a lot,
> > > >>>>>>>>>Guojun
> > > >>>>>>>>>
> > > >>>>>>>>>Department of Plant Biology
> > > >>>>>>>>>University of Georgia
> > > >>>>>>>>>Tel: 706-542-1857
> > > >>>>>>>>>Fax: 706-542-1805
> > > >>>>>>>>>http://www.arches.uga.edu/~guojun
> > > >>>>>>>>>_____
> > > >>>>>>>>>
> > > >>>>>>>>>From: Chris Fields [mailto:cjfields at uiuc.edu]
> > > >>>>>>>>>To: 'Nagesh Chakka' [mailto:nagesh.chakka at anu.edu.au], 'Huang
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>Jian'
> > > >>>>
> > > >>>>
> > > >>>>>>>>>[mailto:hjian at kuicr.kyoto-u.ac.jp], 'bioperl-l'
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>[mailto:bioperl-
> > > >>>
> > > >>>
> > > >>>>>>>>>l at bioperl.org]
> > > >>>>>>>>>Sent: Fri, 03 Feb 2006 10:45:23 -0500
> > > >>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >>>>>>>>>
> > > >>>>>>>>>Like Nagesh says, try the latest RemoteBlast from bioperl-live
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>CVS.
> > > >>>>
> > > >>>>
> > > >>>>>>It
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>will
> > > >>>>>>>>>work for saving text output. However, it will not parse
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>anything
> > > >>>
> > > >>>
> > > >>>>>>using
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>next_result (it will likely hang) and will not save XML
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>format.
> > > >>>
> > > >>>
> > > >>>>See
> > > >>>>
> > > >>>>
> > > >>>>>>>>these
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>bugs:
> > > >>>>>>>>>
> > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> > > >>>>>>>>>http://bugzilla.bioperl.org/show_bug.cgi?id=1935
> > > >>>>>>>>>
> > > >>>>>>>>>for explanations and possible fixes (changes to RemoteBlast
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>and
> > > >>>
> > > >>>
> > > >>>>>>>>>Bio::SearchIO::blast). Note that these haven't been checked in
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>yet
> > > >>>>
> > > >>>>
> > > >>>>>>so
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>are
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>still not included in bioperl-live; they may be further
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>modified
> > > >>>
> > > >>>
> > > >>>>>>before
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>committing to CVS. If you're not worried about XML, you could
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>just
> > > >>>>
> > > >>>>
> > > >>>>>>try
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>first fix, which is a change to SearchIO::blast.
> > > >>>>>>>>>
> > > >>>>>>>>>Nagesh, I remember you posting to the list a month ago using a
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>script
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>which
> > > >>>>>>>>>had problems; the script you used saves the output but doesn't
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>actually
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>parse it (i.e. you don't use next_result() to go through the
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>data).
> > > >>>>
> > > >>>>
> > > >>>>>>Is
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>version of BLAST in your text output 2.2.12 or 2.2.13? Have
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>you
> > > >>>
> > > >>>
> > > >>>>>>tried
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>parsing the output using "-readmethod => SearchIO" or "-
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>readmethod
> > > >>>>
> > > >>>>
> > > >>>>>>=>
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>blast"
> > > >>>>>>>>>using your version of RemoteBlast and method next_result()?
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>Like
> > > >>>
> > > >>>
> > > >>>>>>below
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>(from
> > > >>>>>>>>>perldoc):
> > > >>>>>>>>>
> > > >>>>>>>>>while ( my @rids = $factory->each_rid ) {
> > > >>>>>>>>>foreach my $rid ( @rids ) {
> > > >>>>>>>>>my $rc = $factory->retrieve_blast($rid);
> > > >>>>>>>>>if( !ref($rc) ) {
> > > >>>>>>>>>if( $rc < 0 ) {
> > > >>>>>>>>>$factory->remove_rid($rid);
> > > >>>>>>>>>}
> > > >>>>>>>>>print STDERR "." if ( $v > 0 );
> > > >>>>>>>>>sleep 5;
> > > >>>>>>>>>} else { # parsing
> > > >>>>>>>>>starts here
> > > >>>>>>>>>my $result = $rc->next_result(); # it should hang
> > > >>>>>>>>>here
> > > >>>>>>>>>#save the output
> > > >>>>>>>>>my $filename = $result->query_name()."\.out";
> > > >>>>>>>>>$factory->save_output($filename);
> > > >>>>>>>>>$factory->remove_rid($rid);
> > > >>>>>>>>>print "\nQuery Name: ", $result->query_name(), "\n";
> > > >>>>>>>>>while ( my $hit = $result->next_hit ) {
> > > >>>>>>>>>next unless ( $v > 0);
> > > >>>>>>>>>print "\thit name is ", $hit->name, "\n";
> > > >>>>>>>>>while( my $hsp = $hit->next_hsp ) {
> > > >>>>>>>>>print "\t\tscore is ", $hsp->score, "\n";
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>}
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>My script hanged if I used next_result() in any way prior to
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>the
> > > >>>
> > > >>>
> > > >>>>>>fixes.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>I
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>want to see how many others are having the same issues with
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>parsing
> > > >>>>
> > > >>>>
> > > >>>>>>>>using
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>the CVS version of bioperl-live.
> > > >>>>>>>>>
> > > >>>>>>>>>Christopher Fields
> > > >>>>>>>>>Postdoctoral Researcher - Switzer Lab
> > > >>>>>>>>>Dept. of Biochemistry
> > > >>>>>>>>>University of Illinois Urbana-Champaign
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>-----Original Message-----
> > > >>>>>>>>>>From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>l-
> > > >>>
> > > >>>
> > > >>>>>>>>>>bounces at lists.open-bio.org] On Behalf Of Nagesh Chakka
> > > >>>>>>>>>>Sent: Thursday, February 02, 2006 7:24 PM
> > > >>>>>>>>>>To: Huang Jian; bioperl-l
> > > >>>>>>>>>>Subject: Re: [Bioperl-l] RemoteBlast.pm version 1.28
> > > >>>>>>>>>>
> > > >>>>>>>>>>Hi Huang,
> > > >>>>>>>>>>Thanks for the message. The older version of RemoteBlast.pm
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>works
> > > >>>>
> > > >>>>
> > > >>>>>>on
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>the
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>logic of checking the temporary file size to determine
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>whether
> > > >>>
> > > >>>
> > > >>>>the
> > > >>>>
> > > >>>>
> > > >>>>>>>>Blast
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>results are ready. This condition is not getting satisfied
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>may
> > > >>>
> > > >>>
> > > >>>>be
> > > >>>>
> > > >>>>
> > > >>>>>>due
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>to
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>some changes brought about by NCBI. I had this problem
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>recently
> > > >>>>
> > > >>>>
> > > >>>>>>and
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>figured out that the solution was to use the latest version
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>has
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>this problem fixed (does not use file size logic any more)
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>which
> > > >>>>
> > > >>>>
> > > >>>>>>is
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>not
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>yet included in the BioPerl package.
> > > >>>>>>>>>>Cheers
> > > >>>>>>>>>>Nagesh
> > > >>>>>>>>>>
> > > >>>>>>>>>>Huang Jian wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>>Dear Nagesh,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>I have replaced my old RemoteBlast.pm (v 1.17) with v 1.28
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>you
> > > >>>>
> > > >>>>
> > > >>>>>>send
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>me. Now it works perfectly!!!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Thank you!!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Huang
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>----- Original Message ----- From: "Nagesh Chakka"
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>To: "Huang Jian" ; "bioperl-l"
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>Sent: Friday, February 03, 2006 7:48 AM
> > > >>>>>>>>>>>Subject: Re: [Bioperl-l] Sorry, failure in post on the
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>net,
> > > >>>
> > > >>>
> > > >>>>so
> > > >>>>
> > > >>>>
> > > >>>>>>still
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>>>via email
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>>Hi Huang,
> > > >>>>>>>>>>>>I see that you are submitting a sequence for a remote
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>blast
> > > >>>
> > > >>>
> > > >>>>>>search.
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>Can
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>>>you check if the RemoteBlast.pm being used is v 1.28
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>(2005/12/09).
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>If
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>>>>>not I have attached it with this email, try to replace it
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>with
> > > >>>>
> > > >>>>
> > > >>>>>>the
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>>old
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>>>>one which has a bug.
> > > >>>>>>>>>>>>Let me know if it works.
> > > >>>>>>>>>>>>Nagesh
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>_______________________________________________
> > > >>>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>_______________________________________________
> > > >>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>_______________________________________________
> > > >>>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>_______________________________________________
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>>>Bioperl-l mailing list
> > > >>>>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>
> > > >>>>>>>>_______________________________________________
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>Bioperl-l mailing list
> > > >>>>>>Bioperl-l at lists.open-bio.org
> > > >>>>>>http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>
> > > >
> > > >_______________________________________________
> > > >Bioperl-l mailing list
> > > >Bioperl-l at lists.open-bio.org
> > > >http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > >
> > >
> > > Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 



From cjm at fruitfly.org  Tue Feb 21 01:48:57 2006
From: cjm at fruitfly.org (chris mungall)
Date: Mon, 20 Feb 2006 17:48:57 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
	<3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
Message-ID: <930b0083193357df7d43cc7a3111c938@fruitfly.org>


I like the idea of using an ontology to describe the ontology.

Note that the proposed structure:
OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI

will lead to cycles in the object graph when the metadata ontology 
describes itself.

actually, I think the ontology module already has object reference 
cycles. TermI->OntologyI->TermI

When I brought this up originally people didn't seem to care much - so 
long as you're only parsing GO then it's not a big issue, people have 
enough memory they won't notice a big chunk of memory that refuses to 
be garbage collected way after it's used. Of course, if you want to use 
bioperl to cycle though all of OBO + SnoMed + UMLS then it's a 
different story.

I think it's best of Sohel concentrates on getting obo.pm working, then 
we can start thinking as a group about the best way to capture ontology 
metadata. This includes metadata on the whole ontology, and metadata on 
the terms (eg synonyms).

To what extent are the current modules already in use? I think the 
object cycle is a serious flaw, will it be possible to fix this without 
a major overhaul?


On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:

> Sohel, please do keep the discussion on the list, in your own interest
> as there's a multitude of people who can respond to you.
>
> SimpleValue would probably be what I'd use too. As Heikki hinted you
> might even create an ontology for annotating ontologies, which would
> allow you to use Annotation::OntologyTerm for annotation, but then
> there's no qualifier value ...
>
> Bioperl 1.5.1 has been released last year, please check the website.
>
> 	-hilmar
>
> On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
>
>> Hi Hilmar,
>>   I really like your suggestion of implementing the Bio::AnnotatableI
>> interface in the Bio::Ontology::Ontology class. I am going to 
>> implement
>> this and play around a little with it. I am planning to use
>> Bio::Annotation::SimpleValue for annotating the header as it provides 
>> a
>> good way of specifying the Tag/value pair. What are your thoughts on
>> using this?
>>
>>   Also, I was wondering if you have any idea about the scheduled date
>> for the Bioperl 1.51 release. I would like to contribute some stuff in
>> the next release.
>>
>> Thanks,
>> Sohel.
>>
>> -----Original Message-----
>> From: Hilmar Lapp [mailto:hlapp at gmx.net]
>> Sent: Friday, February 10, 2006 3:40 PM
>> To: Sohel Merchant
>> Cc: Bioperl
>> Subject: Re: Bio::Ontology::Ontology
>>
>> Sohel,
>>
>> please allow me to copy the list in my response. There's many good and
>> insightful people on the list who may have something to add or
>> different ideas.
>>
>> I've come across that problem myself, for instance with InterPro. What
>> I've done so far simply is to stick it unstructured into the 
>> definition
>> slot, which is not helpful if your purpose goes further than just
>> displaying it in an unstructured fashion.
>>
>> I'm not sure you would want to create another class for this (like
>> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
>> implementation, probably not the interface) annotatable (i.e.,
>> implement Bio::Annotatable), which supposedly would be simple to do
>> (AnnotationCollection is already implemented, you'd just return an
>> instance of it).
>>
>> Even though tag/value pairs sound like quick&fast way to go I'm 
>> leaning
>> against it; in essence we're moving away from that elsewhere
>> (SeqFeatureI) and hence I don't think we should restart it here.
>>
>> I'm not giving a definitive answer here, just my (initial) thoughts.
>> Hope that helps nonetheless. Can you fancy yourself trying the
>> Annotatable approach and let us know how it goes?
>>
>> 	-hilmar
>>
>>
>> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
>>
>>> Hi Hilmar,
>>> ? How are you doing? I am Sohel Merchant, a programmer at dictyBase,
>>> Northwestern University. I am working on a parser for an ontology
>>> file. I really like the ontology object model which you have
>>> contributed to Bioperl. I think its just Awesome!! One of things 
>>> which
>>
>>> I thought would be great to capture is the ontology headers. Right 
>>> now
>>
>>> one can specify only the name, authority information. I was wondering
>>> if there is any way, I could also capture other ontology file headers
>>> like version of the file, date when that ontology file was made. I 
>>> was
>>
>>> thinking of making a header class or alternatively it could go as 
>>> Hash
>>
>>> of values in the Bio::Ontology::Ontology class itself. I wanted to
>>> know whets your thoughts about on this.
>>> ?
>>> Thanks,
>>> Sohel Merchant
>>> dictyBase
>>>
>> -- 
>> -------------------------------------------------------------
>> Hilmar Lapp                            email: lapp at gnf.org
>> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
>> -------------------------------------------------------------
>>
>>
>>
>>
> -- 
> -------------------------------------------------------------
> Hilmar Lapp                            email: lapp at gnf.org
> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> -------------------------------------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Tue Feb 21 04:42:18 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Mon, 20 Feb 2006 23:42:18 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <43FA0FB7.6060904@lsi.upc.edu>
Message-ID: 

Gabriel,

You had a couple of little errors in your script but once fixed it worked
fine:

#!/usr/bin/perl -w


use strict;

use lib "/Users/bosborne/bioperl-live";

use Bio::DB::Taxonomy;



my $nodesfile = "nodes.dmp";

my $namefile = "names.dmp";

my $db = new Bio::DB::Taxonomy(-source => 'flatfile',

-nodesfile => $nodesfile,

-namesfile => $namefile);


my $taxonid = $db->get_taxonid('Homo sapiens');


# Here, $taxonid is 9606. However,


my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);


print $species->common_name;


This is using bioperl-live on Mac OSX, Perl 5.8. Are you on Windows? If so
then do "-directory => C:/temp", see what happens.

Brian O.

On 2/20/06 1:51 PM, "Gabriel Valiente"  wrote:

> use Bio::DB::Taxonomy;
> my $nodesfile = "nodes.dmp";
> my $namesfile = "names.dmp";
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>                                -nodesfile => $nodesfile,
>                                -namesfile => $namefile);
> my $taxonid = $db->get_taxonid('Homo sapiens');
> 
> Here, $taxonid is 9606. However,
> 
> my $species = $db->get_Taxonomy_Node(-taxonid => $taxonid);




From valiente at lsi.upc.edu  Tue Feb 21 12:19:04 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 13:19:04 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <1125313334valiente@lsi.upc.es>

Thanks. There's still a problem with Bio::DB::Taxonomy:

use strict;
use Bio::DB::Taxonomy;

my $nodesfile = "nodes.dmp";
my $namesfile = "names.dmp";
my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
                              -nodesfile => $nodesfile,
                              -namesfile => $namesfile);

my $taxonid = $db->get_taxonid('Homo sapiens');
my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 

So far so good. Now, access to the parent node via

my $parent = $node->get_Parent_Node;

is alright, but access to the children nodes via

my @childrenids = $db->get_Children_Taxids($taxonid);

raises:

------------- EXCEPTION  -------------
MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not
implemented by package Bio::DB::Taxonomy::entrez.
This is not your fault - author of Bio::DB::Taxonomy::entrez should be
blamed!

STACK Bio::Root::RootI::throw_not_implemented
/home/valiente/bioperl-live/Bio/Root/RootI.pm:523
STACK Bio::DB::Taxonomy::get_Children_Taxids
/home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162
STACK toplevel fetch.pl:17

Perhaps there could be a $node->get_Children_Nodes() method in
Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez.
You, know, efficient access to the children of a node is a quite
important method for almost any interesting use of the NCBI Taxonomy.

Gabriel




From dhoworth at mrc-lmb.cam.ac.uk  Tue Feb 21 10:47:41 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Tue, 21 Feb 2006 10:47:41 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
Message-ID: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>

I'm drawing a simple graphic and seeing something I didn't expect. I'm 
not sure whether I've misunderstood the docs or found a bug. If I run a 
program containing:

     my $name   = 'O68601';
     my $length = 44;
     my $panel  = Bio::Graphics::Panel->new(
                 -length    => $length,
                 -width     => 800,
                 -pad_left  => 10,
                 -pad_right => 10,
                 -key_style => 'between',
                 );

     my $feature = new Bio::SeqFeature::Generic(
                 -start  => 1,
                 -end    => $length,
                 -display_name => $name . " ($length)",
                 );

     $panel->add_track($feature,
                 -glyph   => 'arrow',
                 -tick    =>  1,
                 -fgcolor => 'black',
                 -double  => 1,
                 -label   => 1,
                 );

Then I see a tick strip labelled at its left end with '1' and at its 
right end with '45'. I expected to see '44'. Should I be looking for a 
bug in Bio::Graphics or fixing my program?

Thanks, Dave


From gbazykin at Princeton.EDU  Tue Feb 21 14:37:32 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Tue, 21 Feb 2006 09:37:32 -0500
Subject: [Bioperl-l] planning sequence mutating modules
Message-ID: <922343764.20060221093732@princeton.edu>

Heikki:

Let me explain what I need more clearly, and perhaps you guys can tell
me how this can be done best in Bioperl.

I?d like to marry the trees and the sequences, so that I could get a
sequence corresponding to each of the nodes (including internal nodes)
on the tree. The sequences of the nodes can be either generated by
some evolution process, or loaded; PAUP, for example, can reconstruct
the sequences of the internal nodes. I am dealing with coding
sequence, and for my purposes, I need to look at individual codons
rather than nucleotides. Then I answer questions such as this:

- for this codon (position), when (before which nodes of the tree) did
all (synonymous or non-synonymous) mutations occur?

- for this node and for this codon, when (before which node) did the
preceding (synonymous or non-synonymous) mutation occur? Preceding
means that it occurred in the line of direct ancestors, i.e. between
some two sequences on the path from this node to the root.

- infer position-specific ?substitution matrix? from the tree, i.e. in
this position, what fraction of nucleotides A that were present at the
beginning of each brunch, turned into nucleotide ?C? by the end of the
branch, possibly weighting with branch lengths.

Further, I need to do simulate sequence evolution along the tree,
e.g., like this:

- mutate specified codon along the tree, perhaps with given
substitution matrix (and, possibly, with given
non-synonymous/synonymous substitutions rate). In the process, the
codons for all nodes will be generated.

I need to do all this for large trees (with hundreds of leaves) and
long sequences. So far, I have been using a huge hash to store all my
sequences for each of the nodes:

my $node = (some tree::node object)
my $posit = 0; 
$codons{$posit}->{$node} =  ?AAA?;

etc. But there should be a better way to do it? How can I integrate
all this into Bioperl? (I am new to object-oriented programming).

I?ll be thankful for any feedback.

Yegor



------------------------------
Tuesday, February 14, 2006, 11:09:27 AM, you wrote:

> Yegor,

> Like you said, there are examples how it is done.. It should be possible to
> evolve sequences based on a rooted tree. You just walk the tree and evolve
> each sequence from its parent.  If there is  an agreement how the branch
> lengths get translated to  mutations, even that could be done. Do you have
> any suggestions?

>         -Heikki



> On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
>> Hi,
>>
>> Just a thought: I really think that in perspective, it would be nice
>> to be able to evolve the sequence along a tree of given shape. I think
>> PAML's "evolver" has this functionality. I've already been doing this
>> in my scripts, but I am not sure how to couple the tree and the
>> sequence data properly.
>>
>> Yegor (George) Bazykin
>>
>>
>> ------------------------------
>>
>> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
>> > I've committed an interim solution to the sequence evolution problem:
>> >
>> >     $newseq = Bio::SeqUtils-> evolve
>> >         ($seq, $similarity, $transition_transversion_rate);
>> >
>> > I will go on to transform this code to fully OO, extensible solution.
>> >
>> >    -Heikki
>> >
>> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> >> Ryan Golhar's mail got me thinking that we should have a simple
>> >> framework for mutating sequences to a desired level. The model can then
>> >> be extended to necessary complexity when needed by subclassing.
>> >>
>> >> To start with, I have been planning:
>> >>
>> >>
>> >> Bio::SeqEvolution::EvolutionI - interface file
>> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>> >>         (defaults to Bio::PrimarySeq)
>> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>> >>        - returns an array of $count seqs
>> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>> >>       converteed to probabilites of change internally
>> >>
>> >>   various methods to define the extent of divergence:
>> >>   only one to start with:
>> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>> >>    (= 100% - identity)
>> >>
>> >> Bio::SeqEvolution::Factory - core class to call,
>> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>> >>
>> >>
>> >> Bio::SeqEvolution::DNASimple - default for nucleotides
>> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>> >>         e.g. 5 => 5:1, defaults to 1:1
>> >>         simple alternative to a scoring matrix
>> >>
>> >>
>> >> I am soliciting usual comments and suggestions about naming and minimal
>> >> functionality.
>> >>
>> >>
>> >>    -Heikki
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From osborne1 at optonline.net  Tue Feb 21 14:46:56 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 09:46:56 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1125313334valiente@lsi.upc.es>
Message-ID: 

Gabriel,

I don't think so, this works:

#!/usr/bin/perl -w



use strict;

use lib "/Users/bosborne/bioperl-live";


use Bio::DB::Taxonomy;


my $nodesfile = "nodes.dmp";

my $namefile = "names.dmp";

my $db = new Bio::DB::Taxonomy(-source => 'flatfile',

-nodesfile => $nodesfile,

-namesfile => $namefile);


my $taxonid = $db->get_taxonid('Homo sapiens');


my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 


# Here, $taxonid is 9606. However,


my $parent = $node->get_Parent_Node;


# is alright, but access to the children nodes via


my @childrenids = $db->get_Children_Taxids($taxonid);


print "@childrenids";


What Bioperl version are you using?

Brian O.


On 2/21/06 7:19 AM, "Gabriel Valiente"  wrote:

> my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid); 




From gbazykin at Princeton.EDU  Mon Feb 20 23:21:03 2006
From: gbazykin at Princeton.EDU (Georgii A Bazykin)
Date: Mon, 20 Feb 2006 18:21:03 -0500
Subject: [Bioperl-l] planning sequence mutating modules
In-Reply-To: <200602141809.28057.heikki@sanbi.ac.za>
References: <200602100906.11885.heikki@sanbi.ac.za>
	<200602140859.30136.heikki@sanbi.ac.za>
	<214316262.20060214093454@princeton.edu>
	<200602141809.28057.heikki@sanbi.ac.za>
Message-ID: <158747055.20060220182103@princeton.edu>

Heikki:

Let me explain what I need more clearly, and perhaps you guys can tell
me how this can be done best in Bioperl.

I?d like to marry the trees and the sequences, so that I could get a
sequence corresponding to each of the nodes (including internal nodes)
on the tree. The sequences of the nodes can be either generated by
some evolution process, or loaded; PAUP, for example, can reconstruct
the sequences of the internal nodes. I am dealing with coding
sequence, and for my purposes, I need to look at individual codons
rather than nucleotides. Then I answer questions such as this:

- for this codon (position), when (before which nodes of the tree) did
all (synonymous or non-synonymous) mutations occur?

- for this node and for this codon, when (before which node) did the
preceding (synonymous or non-synonymous) mutation occur? Preceding
means that it occurred in the line of direct ancestors, i.e. between
some two sequences on the path from this node to the root.

- infer position-specific ?substitution matrix? from the tree, i.e. in
this position, what fraction of nucleotides A that were present at the
beginning of each brunch, turned into nucleotide ?C? by the end of the
branch, possibly weighting with branch lengths.

Further, I need to do simulate sequence evolution along the tree,
e.g., like this:

- mutate specified codon along the tree, perhaps with given
substitution matrix (and, possibly, with given
non-synonymous/synonymous substitutions rate). In the process, the
codons for all nodes will be generated.

I need to do all this for large trees (with hundreds of leaves) and
long sequences. So far, I have been using a huge hash to store all my
sequences for each of the nodes:

my $node = (some tree::node object)
my $posit = 0; 
$codons{$posit}->{$node} =  ?AAA?;

etc. But there should be a better way to do it? How can I integrate
all this into Bioperl? (I am new to object-oriented programming).

I?ll be thankful for any feedback.

Yegor



------------------------------
Tuesday, February 14, 2006, 11:09:27 AM, you wrote:

> Yegor,

> Like you said, there are examples how it is done.. It should be possible to
> evolve sequences based on a rooted tree. You just walk the tree and evolve
> each sequence from its parent.  If there is  an agreement how the branch
> lengths get translated to  mutations, even that could be done. Do you have
> any suggestions?

>         -Heikki



> On Tuesday 14 February 2006 16:34, Georgii A Bazykin wrote:
>> Hi,
>>
>> Just a thought: I really think that in perspective, it would be nice
>> to be able to evolve the sequence along a tree of given shape. I think
>> PAML's "evolver" has this functionality. I've already been doing this
>> in my scripts, but I am not sure how to couple the tree and the
>> sequence data properly.
>>
>> Yegor (George) Bazykin
>>
>>
>> ------------------------------
>>
>> Tuesday, February 14, 2006, 1:59:29 AM, you wrote:
>> > I've committed an interim solution to the sequence evolution problem:
>> >
>> >     $newseq = Bio::SeqUtils-> evolve
>> >         ($seq, $similarity, $transition_transversion_rate);
>> >
>> > I will go on to transform this code to fully OO, extensible solution.
>> >
>> >    -Heikki
>> >
>> > On Friday 10 February 2006 09:06, Heikki Lehvaslaiho wrote:
>> >> Ryan Golhar's mail got me thinking that we should have a simple
>> >> framework for mutating sequences to a desired level. The model can then
>> >> be extended to necessary complexity when needed by subclassing.
>> >>
>> >> To start with, I have been planning:
>> >>
>> >>
>> >> Bio::SeqEvolution::EvolutionI - interface file
>> >> Bio::SeqEvolution::EvolutionI::seq() - seq to mutate
>> >> Bio::SeqEvolution::EvolutionI::seq_type() - returned seq class,
>> >>         (defaults to Bio::PrimarySeq)
>> >> Bio::SeqEvolution::EvolutionI::next_seq() - overridable by subclasses
>> >> Bio::SeqEvolution::EvolutionI::each_seqs($count)
>> >>        - returns an array of $count seqs
>> >> Bio::SeqEvolution::EvolutionI::_generate_seq()
>> >> Bio::SeqEvolution::EvolutionI::matrix  # Bio::Matrix::Scoring
>> >>       converteed to probabilites of change internally
>> >>
>> >>   various methods to define the extent of divergence:
>> >>   only one to start with:
>> >> Bio::SeqEvolution::EvolutionI::pam() -percentage accepted mutation
>> >>    (= 100% - identity)
>> >>
>> >> Bio::SeqEvolution::Factory - core class to call,
>> >>          instantiates subclasses, Bio::SeqEvolution::DNASimple for
>> >> nucleotides Bio::SeqEvolution::EvolutionI::type() - evolution model,
>> >>       defaults to Bio::SeqEvolution::DNASimple for nucleotides
>> >>
>> >>
>> >> Bio::SeqEvolution::DNASimple - default for nucleotides
>> >> Bio::SeqEvolution::DNASimple::transversion_rate - positive integer,
>> >>         e.g. 5 => 5:1, defaults to 1:1
>> >>         simple alternative to a scoring matrix
>> >>
>> >>
>> >> I am soliciting usual comments and suggestions about naming and minimal
>> >> functionality.
>> >>
>> >>
>> >>    -Heikki
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From jason.stajich at duke.edu  Tue Feb 21 14:51:39 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 21 Feb 2006 09:51:39 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1125313334valiente@lsi.upc.es>
References: <1125313334valiente@lsi.upc.es>
Message-ID: <16B69355-A7EC-4FA6-B0F3-A473C705B921@duke.edu>

of course it should, and it does support this.  Children query  
definitely exists for the flatfile implementation I don't understand  
why are you getting entrez errors when you are requesting the  
flatfile handle?
I can't investigate but it definitely worked for me to get  children  
nodes.  Did you actually try running the script that already should  
work - scripts/taxa/local_taxonomdb_query ?

You definitely can't request children nodes via the entrez  
implementation because NCBI doesn't (or didn't when this was written  
I don't know about now) provide children id access so it is pretty  
useful for that - although the eutils support may have expanded I'm  
not sure. If someone has the itch, please scratch it and work on this.

I think you need to pass in $parent instead of $taxonid to  
get_Children_Taxids -- although I guess I wrote the method to accept  
either.

-jason

On Feb 21, 2006, at 7:19 AM, Gabriel Valiente wrote:

> Thanks. There's still a problem with Bio::DB::Taxonomy:
>
> use strict;
> use Bio::DB::Taxonomy;
>
> my $nodesfile = "nodes.dmp";
> my $namesfile = "names.dmp";
> my $db = new Bio::DB::Taxonomy(-source => 'flatfile'
>                               -nodesfile => $nodesfile,
>                               -namesfile => $namesfile);
>
> my $taxonid = $db->get_taxonid('Homo sapiens');
> my $node = $db->get_Taxonomy_Node(-taxonid => $taxonid);
>
> So far so good. Now, access to the parent node via
>
> my $parent = $node->get_Parent_Node;
>
> is alright, but access to the children nodes via
>
> my @childrenids = $db->get_Children_Taxids($taxonid);
>
> raises:
>
> ------------- EXCEPTION  -------------
> MSG: Abstract method "Bio::DB::Taxonomy::get_Children_Taxids" is not
> implemented by package Bio::DB::Taxonomy::entrez.
> This is not your fault - author of Bio::DB::Taxonomy::entrez should be
> blamed!
>
> STACK Bio::Root::RootI::throw_not_implemented
> /home/valiente/bioperl-live/Bio/Root/RootI.pm:523
> STACK Bio::DB::Taxonomy::get_Children_Taxids
> /home/valiente/bioperl-live/Bio/DB/Taxonomy.pm:162
> STACK toplevel fetch.pl:17
>
> Perhaps there could be a $node->get_Children_Nodes() method in
> Bio::DB::Taxonomy, instead ofg relying on Bio::DB::Taxonomy::entrez.
> You, know, efficient access to the children of a node is a quite
> important method for almost any interesting use of the NCBI Taxonomy.
>
> Gabriel
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From hlapp at gmx.net  Tue Feb 21 02:52:34 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 20 Feb 2006 18:52:34 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <930b0083193357df7d43cc7a3111c938@fruitfly.org>
References: <000001c62e9a$4f82eee0$c2987ca5@pc13>
	<3666b00b7322d2bfe4d82129b047e5ce@gmx.net>
	<930b0083193357df7d43cc7a3111c938@fruitfly.org>
Message-ID: 

On 2/20/06, chris mungall  wrote:
>
> I like the idea of using an ontology to describe the ontology.
>
> Note that the proposed structure:
> OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
>
> will lead to cycles in the object graph when the metadata ontology
> describes itself.

Yes I know, that's why I didn't want to be too vocal about it ...

>
> actually, I think the ontology module already has object reference
> cycles. TermI->OntologyI->TermI
>
> When I brought this up originally people didn't seem to care much - so
> long as you're only parsing GO then it's not a big issue, people have
> enough memory they won't notice a big chunk of memory that refuses to
> be garbage collected way after it's used.

There is a method that destroys the cycle: $ontology->close()
(this is also an interface method)

Essentially, the cycle is not in OntologyI itself but in OntologyI
HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
terms which (may) hold a reference to an OntologyI which holds a
reference to the OntologyEngineI.

I say 'may' in parentheses because an implementation may use tricks
like late instantiation, stringified references (handles), and weak
references. It's possible to avoid the cycle altogether using such
tricks but it remains questionable how much this then affects
performance, and how ugly and incomprehensible the code would become.
Since there is the close() method I haven't bothered yet trying a
fully de-cycled implementation.

> Of course, if you want to use
> bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> different story.

Well if you want to keep all three in memory for some kind of
cross-reasoning then yes you are in trouble. But if you do one
ontology after another, you'd just have make sure to call close() on
an ontology once you're done with it.

>
> I think it's best of Sohel concentrates on getting obo.pm working, then
> we can start thinking as a group about the best way to capture ontology
> metadata. This includes metadata on the whole ontology, and metadata on
> the terms (eg synonyms).
>
> To what extent are the current modules already in use?

I don't know about others but I use them often.

> I think the object cycle is a serious flaw, will it be possible to fix this without
> a major overhaul?

If I recall correctly the way go-perl circumvents this is by having
the ontology of a term as a flat attribute. This also means that when
having a term alone, you cannot ask for its connected terms. It's been
a while, so Chris set me straight where this is not true.

It should be possible to come up with an implementation of OntologyI
that for all intents and purposes behaves like a flat scalar giving
the name until you call one of its graph traversal methods. At that
point it would instantiate the engine from persistent storage (file,
or a database connection), or retrieve one from a 'store'. The latter
is I believe what Allen started with the OntologyStore, but again I
would need to check the details.

    -hilmar

>
>
> On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
>
> > Sohel, please do keep the discussion on the list, in your own interest
> > as there's a multitude of people who can respond to you.
> >
> > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > might even create an ontology for annotating ontologies, which would
> > allow you to use Annotation::OntologyTerm for annotation, but then
> > there's no qualifier value ...
> >
> > Bioperl 1.5.1 has been released last year, please check the website.
> >
> >       -hilmar
> >
> > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> >
> >> Hi Hilmar,
> >>   I really like your suggestion of implementing the Bio::AnnotatableI
> >> interface in the Bio::Ontology::Ontology class. I am going to
> >> implement
> >> this and play around a little with it. I am planning to use
> >> Bio::Annotation::SimpleValue for annotating the header as it provides
> >> a
> >> good way of specifying the Tag/value pair. What are your thoughts on
> >> using this?
> >>
> >>   Also, I was wondering if you have any idea about the scheduled date
> >> for the Bioperl 1.51 release. I would like to contribute some stuff in
> >> the next release.
> >>
> >> Thanks,
> >> Sohel.
> >>
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Friday, February 10, 2006 3:40 PM
> >> To: Sohel Merchant
> >> Cc: Bioperl
> >> Subject: Re: Bio::Ontology::Ontology
> >>
> >> Sohel,
> >>
> >> please allow me to copy the list in my response. There's many good and
> >> insightful people on the list who may have something to add or
> >> different ideas.
> >>
> >> I've come across that problem myself, for instance with InterPro. What
> >> I've done so far simply is to stick it unstructured into the
> >> definition
> >> slot, which is not helpful if your purpose goes further than just
> >> displaying it in an unstructured fashion.
> >>
> >> I'm not sure you would want to create another class for this (like
> >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e., the
> >> implementation, probably not the interface) annotatable (i.e.,
> >> implement Bio::Annotatable), which supposedly would be simple to do
> >> (AnnotationCollection is already implemented, you'd just return an
> >> instance of it).
> >>
> >> Even though tag/value pairs sound like quick&fast way to go I'm
> >> leaning
> >> against it; in essence we're moving away from that elsewhere
> >> (SeqFeatureI) and hence I don't think we should restart it here.
> >>
> >> I'm not giving a definitive answer here, just my (initial) thoughts.
> >> Hope that helps nonetheless. Can you fancy yourself trying the
> >> Annotatable approach and let us know how it goes?
> >>
> >>      -hilmar
> >>
> >>
> >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> >>
> >>> Hi Hilmar,
> >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> >>> Northwestern University. I am working on a parser for an ontology
> >>> file. I really like the ontology object model which you have
> >>> contributed to Bioperl. I think its just Awesome!! One of things
> >>> which
> >>
> >>> I thought would be great to capture is the ontology headers. Right
> >>> now
> >>
> >>> one can specify only the name, authority information. I was wondering
> >>> if there is any way, I could also capture other ontology file headers
> >>> like version of the file, date when that ontology file was made. I
> >>> was
> >>
> >>> thinking of making a header class or alternatively it could go as
> >>> Hash
> >>
> >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> >>> know whets your thoughts about on this.
> >>>
> >>> Thanks,
> >>> Sohel Merchant
> >>> dictyBase
> >>>
> >> --
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >>
> >>
> >>
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From valiente at lsi.upc.edu  Tue Feb 21 16:10:05 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 17:10:05 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <1783551242valiente@lsi.upc.es>

It works now, with the #!/usr/bin/perl -w switch. Sorry about that.

I'd like to contribute a couple of additional methods to
Bio::DB::Taxonomy. The first one returns a reference to an array with
the full lineage of a given node.

sub lineage {
  my $node = shift;
  my @PATH;
  while ($node->node_name ne "root") {
    $node = $node->get_Parent_Node;
    unshift @PATH, $node;
  }
  return \@PATH;
}

The second one uses the lineage method to return the most recent common
ancestor of two given nodes.

sub LCA {
  my $node1 = shift;
  my $node2 = shift;
  my @PATH1 = @{lineage($node1)};
  my @PATH2 = @{lineage($node2)};
  my $root1 = shift @PATH1;
  my $root2 = shift @PATH2;
  while ($root1->node_name eq $root2->node_name) {
    $root1 = shift @PATH1;
    $root2 = shift @PATH2;
  }
  return $root1;
}

Jason, shall I include them myself in Bio::DB::Taxonomy or can you take
care of this? I think, the right place for these methods might be
Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy.

Thanks,

Gabriel




From lstein at cshl.edu  Tue Feb 21 15:55:30 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 21 Feb 2006 10:55:30 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
Message-ID: <200602211055.31221.lstein@cshl.edu>

Hi,

When you are looking at the resolution of individual bases, a base pair at 
position one occupies the half-open interval from 1->2, meaning that it comes 
up to, but doesn't quite touch, the 2. For the purposes of display, 
Bio::Graphics draws the end of the half-open interval.

Lincoln

On Tuesday 21 February 2006 05:47, Dave Howorth wrote:
> I'm drawing a simple graphic and seeing something I didn't expect. I'm
> not sure whether I've misunderstood the docs or found a bug. If I run a
> program containing:
>
>      my $name   = 'O68601';
>      my $length = 44;
>      my $panel  = Bio::Graphics::Panel->new(
>                  -length    => $length,
>                  -width     => 800,
>                  -pad_left  => 10,
>                  -pad_right => 10,
>                  -key_style => 'between',
>                  );
>
>      my $feature = new Bio::SeqFeature::Generic(
>                  -start  => 1,
>                  -end    => $length,
>                  -display_name => $name . " ($length)",
>                  );
>
>      $panel->add_track($feature,
>                  -glyph   => 'arrow',
>                  -tick    =>  1,
>                  -fgcolor => 'black',
>                  -double  => 1,
>                  -label   => 1,
>                  );
>
> Then I see a tick strip labelled at its left end with '1' and at its
> right end with '45'. I expected to see '44'. Should I be looking for a
> bug in Bio::Graphics or fixing my program?
>
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason.stajich at duke.edu  Tue Feb 21 16:28:22 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 21 Feb 2006 11:28:22 -0500
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
In-Reply-To: <1783551242valiente@lsi.upc.es>
References: <1783551242valiente@lsi.upc.es>
Message-ID: <1C38DDCF-9312-42D3-923F-C0DD4CE7E9AA@duke.edu>

you'll have to do it - I don't have time, I thought there was  
something like this already, but I guess not, so please put it in.  I  
must do this when we initialize the classification array when  
building a node,


On Feb 21, 2006, at 11:10 AM, Gabriel Valiente wrote:

> It works now, with the #!/usr/bin/perl -w switch. Sorry about that.
>
> I'd like to contribute a couple of additional methods to
> Bio::DB::Taxonomy. The first one returns a reference to an array with
> the full lineage of a given node.
>
> sub lineage {
>   my $node = shift;
>   my @PATH;
>   while ($node->node_name ne "root") {
>     $node = $node->get_Parent_Node;
>     unshift @PATH, $node;
>   }
>   return \@PATH;
> }
>
> The second one uses the lineage method to return the most recent  
> common
> ancestor of two given nodes.
>
> sub LCA {
>   my $node1 = shift;
>   my $node2 = shift;
>   my @PATH1 = @{lineage($node1)};
>   my @PATH2 = @{lineage($node2)};
>   my $root1 = shift @PATH1;
>   my $root2 = shift @PATH2;
>   while ($root1->node_name eq $root2->node_name) {
>     $root1 = shift @PATH1;
>     $root2 = shift @PATH2;
>   }
>   return $root1;
> }
>
> Jason, shall I include them myself in Bio::DB::Taxonomy or can you  
> take
> care of this? I think, the right place for these methods might be
> Bio::Taxonomy or Bio::Taxonomy::Node rather than Bio::DB::Taxonomy.
>
> Thanks,
>
> Gabriel
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From dhoworth at mrc-lmb.cam.ac.uk  Tue Feb 21 16:50:37 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Tue, 21 Feb 2006 16:50:37 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602211055.31221.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>
Message-ID: <43FB44DD.4090504@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> When you are looking at the resolution of individual bases, a base pair at 
> position one occupies the half-open interval from 1->2, meaning that it comes 
> up to, but doesn't quite touch, the 2. For the purposes of display, 
> Bio::Graphics draws the end of the half-open interval.

I think I understand the description of what it's doing but I don't 
understand why. What is the purpose of labelling the [44,45) interval 
45, when that interval is representing the 44th discrete mer?

I'm working with proteins and domains, so I'm always at the level of 
individual residues and people frequently care about the exact residue 
boundaries, especially when the regions are short. So I need to make 
pictures that match the data.

The displayed track seems more consistent with an interpretation that 
the residues are represented by the discrete integer points along the 
line but I don't know if I'm buying myself trouble later if I try to 
adopt that interpretation.

Alternatively, is there some way to get a track with 44 intervals, 
labelled 1 to 44?

Or will I need to patch my copy of bioperl to achieve that?

Thanks, Dave


From cjfields at uiuc.edu  Tue Feb 21 17:30:58 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 11:30:58 -0600
Subject: [Bioperl-l] another searchIO bug? with blast report
In-Reply-To: <43F5A2EA0200009B00000F45@gwia.kvl.dk>
Message-ID: <000301c6370c$93b07c70$15327e82@pyrimidine>

Anders,

I think you should look through the mail list archives for an answer,
specifically:

http://portal.open-bio.org/pipermail/bioperl-l/2004-November/017285.html

Look up the other methods in Bio::Search::HSP::BlastHSP as well. They may be
more helpful.  I can't help but think there is something wrong with the
logic in your subroutines since they don't call other methods built in to
HSP objects.  It may be an off-by-one error.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Anders Stegmann
> Sent: Friday, February 17, 2006 3:18 AM
> To: bioperl-l at bioperl.org
> Subject: Re: [Bioperl-l] another searchIO bug? with blast report
> 
> 
> 
> >>>Anders Stegmann  02/16/06 11:20 am >>>
> Hi!
> 
> I am blasting a protein seq (query) against an identical seq with a
> deletion of Aa nr 61 (subject).
> Then I print out the type of nomatch Aa and its position.
> The nomatch for the query seq is Aa G at position 61, which is correct.
> The nomatch for the subject seq is V at position 60, which is definitely
> not correct!?
> 
> Is this a bug?
> 
> testblast2.pl is the program to run
> 
> Q0045 is the query seq.
> 
> Q0045del61 is the subject seq (it has to be formated: formatdb -i
> Q0045del61 -p T -o F).
> 
> Regards Anders.
> 




From staffa at niehs.nih.gov  Tue Feb 21 17:24:39 2006
From: staffa at niehs.nih.gov (staffa)
Date: Tue, 21 Feb 2006 12:24:39 -0500
Subject: [Bioperl-l] Pattern Density
Message-ID: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>

Good Friends,
I have an important client who wants a histogram display of the density 
of "ccgg" along any chromosome of the mouse genome in 1000 bp windows.

I'm thinking that maybe there is a bio-perl module that could help with 
this.
That'd probably beat having to write something from scratch.
Any help that you give would be greatly appreciated.
I am more concerned about the reading and analysis of the sequence than 
actual plotting of the histogram, but anything you can offer will be 
appreciated.

Thank you.

Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1167 bytes
Desc: not available
URL: 

From lstein at cshl.edu  Tue Feb 21 18:25:59 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 21 Feb 2006 13:25:59 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FB44DD.4090504@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>
	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
Message-ID: <200602211326.00021.lstein@cshl.edu>

Hi Dave,

Well, when you are using 1-based coordinates, an line that contains 44 
intervals will have 45 ticks. If you move to 0-based coordinates, then the 
first tick will be labeled 0 and the last tick will be labeled 44. An 
alternative is to make each base dimensionless, but that becomes a problem 
when dealing with single base features, such as SNPs. These issues are why I 
have long advocated for interbase coordinates in which you number the 
positions between bases rather than the bases themselves.

Draw me the picture of what you expect to see. I think of it this way:

	1    2  3  4   5   6
         A>G>C>T>A>

Lincoln

On Tuesday 21 February 2006 11:50, Dave Howorth wrote:
> Lincoln Stein wrote:
> > When you are looking at the resolution of individual bases, a base pair
> > at position one occupies the half-open interval from 1->2, meaning that
> > it comes up to, but doesn't quite touch, the 2. For the purposes of
> > display, Bio::Graphics draws the end of the half-open interval.
>
> I think I understand the description of what it's doing but I don't
> understand why. What is the purpose of labelling the [44,45) interval
> 45, when that interval is representing the 44th discrete mer?
>
> I'm working with proteins and domains, so I'm always at the level of
> individual residues and people frequently care about the exact residue
> boundaries, especially when the regions are short. So I need to make
> pictures that match the data.
>
> The displayed track seems more consistent with an interpretation that
> the residues are represented by the discrete integer points along the
> line but I don't know if I'm buying myself trouble later if I try to
> adopt that interpretation.
>
> Alternatively, is there some way to get a track with 44 intervals,
> labelled 1 to 44?
>
> Or will I need to patch my copy of bioperl to achieve that?
>
> Thanks, Dave

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From osborne1 at optonline.net  Tue Feb 21 18:25:35 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 13:25:35 -0500
Subject: [Bioperl-l] Pattern Density
In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
Message-ID: 

Nick,

Right, BioPerl really can?t help you with the histogram itself but there are
probably multiple solutions to the problem of iterating over the sequence.
Here?s one idea, untested, it assumes your sequence is in fasta format:

use strict;
use Bio::DB::Fasta;
use Bio::Tools::SeqWords;

my $db  = Bio::DB::Fasta->new('/path/to/fasta/files');
my $obj = $db->get_Seq_by_id('CHROMOSOME_I');
my $start = 0;
my $windowsize = 1000;
my $str = ?ccgg?;
my $len = $obj->length;
my $overlap = 250;

while (1) {
    my $end = $start + $windowsize;
    last if ( $end > $len);
    my $subseq  = $obj->subseq($start,$end);
    my $count = get_count($str,$subseq);
    $start += $overlap;
}

sub get_count {
    my ($str,$subseq) = @_;
    my $seqobj = Bio::Seq->new(-seq => $subseq);
    my $seq_word = Bio::Tools::SeqWords->new(-seq => $seqobj);
    my $ref = $seq_word->count_overlap_words(length($str));
    $ref->{$str};
}

Note this skips the very last window, debugging needed.

Brian O.


On 2/21/06 12:24 PM, "staffa"  wrote:

> I am more concerned about the reading and analysis of the sequence than actual
> plotting of the histogram, but anything you can offer will be appreciated.





From gyang at plantbio.uga.edu  Tue Feb 21 18:45:50 2006
From: gyang at plantbio.uga.edu (Guojun Yang)
Date: Tue, 21 Feb 2006 13:45:50 -0500
Subject: [Bioperl-l] full chromosome accesscion number mess
In-Reply-To: <000001c63669$2bf06a80$15327e82@pyrimidine>
Message-ID: <20060221184550.6557851b@dogwood.plantbio.uga.edu>

Hi, everybody,  
In the process of reparing my CGI script after NCBI blast output format change, I noticed that the accession number for rice pseudochromosome is very confusing and cause trouble for sequence retrieving. My script use remoteblast to search for similar sequences,and then retrieve the hit sequence with a bit flanking region from GenBank. The rice pseudochromosomes have accession numbers similar to that of the individual clones like AP00XXX. I do not want the sequence retrieving to involve these accessions because it takes forever. Can anybody give some suggestion on how to deal with it?  
Thanks,  
 

Guojun Yang
Department of Plant Biology
University of Georgia


From valiente at lsi.upc.edu  Tue Feb 21 18:46:10 2006
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Tue, 21 Feb 2006 19:46:10 +0100 (MET)
Subject: [Bioperl-l] Local flat file implementation of Bio::DB::Taxonomy
Message-ID: <3193394449valiente@lsi.upc.es>

> you'll have to do it - I don't have time, I thought there was  
> something like this already, but I guess not, so please put it in.

Done. I've added methods get_Lineage_Nodes and get_LCA_Node to
Bio::Taxonomy::Node.

> Uhm, does that return the LCA or one of the first divergent ancestors?
> And what does it do if lineage($node1) is the same as lineage($node2)?

Thanks, I've already taken this into account.

Cheers

Gabriel




From s-merchant at northwestern.edu  Tue Feb 21 18:47:54 2006
From: s-merchant at northwestern.edu (Sohel Merchant)
Date: Tue, 21 Feb 2006 12:47:54 -0600
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: 
Message-ID: <000001c63717$5314ded0$c2987ca5@pc13>

Hi Hilmar and Chris,
  I have played around a bit using Bio::Annotation::Collection to
capture the headers of an ontology file. It behaves pretty well and
avoids the cycle issue which might arise by suing ontology to describe
the ontology. I have an initial version of a working parser for obo flat
file format. 

Chris, I was able to model any kind of relationship by using some of the
functionality in the Bio::Ontology::SimpleGoEngine which, I had
initially overlooked. 

I would like to commit this code to the Bioperl CVS, but I don't have
write access to it I believe. Can I send the stuff to either of you
guys?

Hilmar, I would like your feedback on the code base and would be happy
to make any changes required before we commit it to the CVS.

Thanks,
Sohel Merchant.
dictyBase

-----Original Message-----
From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
Lapp
Sent: Monday, February 20, 2006 8:53 PM
To: chris mungall
Cc: Bioperl; Sohel Merchant
Subject: Re: [Bioperl-l] Bio::Ontology::Ontology

On 2/20/06, chris mungall  wrote:
>
> I like the idea of using an ontology to describe the ontology.
>
> Note that the proposed structure:
> OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
>
> will lead to cycles in the object graph when the metadata ontology
> describes itself.

Yes I know, that's why I didn't want to be too vocal about it ...

>
> actually, I think the ontology module already has object reference
> cycles. TermI->OntologyI->TermI
>
> When I brought this up originally people didn't seem to care much - so
> long as you're only parsing GO then it's not a big issue, people have
> enough memory they won't notice a big chunk of memory that refuses to
> be garbage collected way after it's used.

There is a method that destroys the cycle: $ontology->close()
(this is also an interface method)

Essentially, the cycle is not in OntologyI itself but in OntologyI
HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
terms which (may) hold a reference to an OntologyI which holds a
reference to the OntologyEngineI.

I say 'may' in parentheses because an implementation may use tricks
like late instantiation, stringified references (handles), and weak
references. It's possible to avoid the cycle altogether using such
tricks but it remains questionable how much this then affects
performance, and how ugly and incomprehensible the code would become.
Since there is the close() method I haven't bothered yet trying a
fully de-cycled implementation.

> Of course, if you want to use
> bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> different story.

Well if you want to keep all three in memory for some kind of
cross-reasoning then yes you are in trouble. But if you do one
ontology after another, you'd just have make sure to call close() on
an ontology once you're done with it.

>
> I think it's best of Sohel concentrates on getting obo.pm working,
then
> we can start thinking as a group about the best way to capture
ontology
> metadata. This includes metadata on the whole ontology, and metadata
on
> the terms (eg synonyms).
>
> To what extent are the current modules already in use?

I don't know about others but I use them often.

> I think the object cycle is a serious flaw, will it be possible to fix
this without
> a major overhaul?

If I recall correctly the way go-perl circumvents this is by having
the ontology of a term as a flat attribute. This also means that when
having a term alone, you cannot ask for its connected terms. It's been
a while, so Chris set me straight where this is not true.

It should be possible to come up with an implementation of OntologyI
that for all intents and purposes behaves like a flat scalar giving
the name until you call one of its graph traversal methods. At that
point it would instantiate the engine from persistent storage (file,
or a database connection), or retrieve one from a 'store'. The latter
is I believe what Allen started with the OntologyStore, but again I
would need to check the details.

    -hilmar

>
>
> On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
>
> > Sohel, please do keep the discussion on the list, in your own
interest
> > as there's a multitude of people who can respond to you.
> >
> > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > might even create an ontology for annotating ontologies, which would
> > allow you to use Annotation::OntologyTerm for annotation, but then
> > there's no qualifier value ...
> >
> > Bioperl 1.5.1 has been released last year, please check the website.
> >
> >       -hilmar
> >
> > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> >
> >> Hi Hilmar,
> >>   I really like your suggestion of implementing the
Bio::AnnotatableI
> >> interface in the Bio::Ontology::Ontology class. I am going to
> >> implement
> >> this and play around a little with it. I am planning to use
> >> Bio::Annotation::SimpleValue for annotating the header as it
provides
> >> a
> >> good way of specifying the Tag/value pair. What are your thoughts
on
> >> using this?
> >>
> >>   Also, I was wondering if you have any idea about the scheduled
date
> >> for the Bioperl 1.51 release. I would like to contribute some stuff
in
> >> the next release.
> >>
> >> Thanks,
> >> Sohel.
> >>
> >> -----Original Message-----
> >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> >> Sent: Friday, February 10, 2006 3:40 PM
> >> To: Sohel Merchant
> >> Cc: Bioperl
> >> Subject: Re: Bio::Ontology::Ontology
> >>
> >> Sohel,
> >>
> >> please allow me to copy the list in my response. There's many good
and
> >> insightful people on the list who may have something to add or
> >> different ideas.
> >>
> >> I've come across that problem myself, for instance with InterPro.
What
> >> I've done so far simply is to stick it unstructured into the
> >> definition
> >> slot, which is not helpful if your purpose goes further than just
> >> displaying it in an unstructured fashion.
> >>
> >> I'm not sure you would want to create another class for this (like
> >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e.,
the
> >> implementation, probably not the interface) annotatable (i.e.,
> >> implement Bio::Annotatable), which supposedly would be simple to do
> >> (AnnotationCollection is already implemented, you'd just return an
> >> instance of it).
> >>
> >> Even though tag/value pairs sound like quick&fast way to go I'm
> >> leaning
> >> against it; in essence we're moving away from that elsewhere
> >> (SeqFeatureI) and hence I don't think we should restart it here.
> >>
> >> I'm not giving a definitive answer here, just my (initial)
thoughts.
> >> Hope that helps nonetheless. Can you fancy yourself trying the
> >> Annotatable approach and let us know how it goes?
> >>
> >>      -hilmar
> >>
> >>
> >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> >>
> >>> Hi Hilmar,
> >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> >>> Northwestern University. I am working on a parser for an ontology
> >>> file. I really like the ontology object model which you have
> >>> contributed to Bioperl. I think its just Awesome!! One of things
> >>> which
> >>
> >>> I thought would be great to capture is the ontology headers. Right
> >>> now
> >>
> >>> one can specify only the name, authority information. I was
wondering
> >>> if there is any way, I could also capture other ontology file
headers
> >>> like version of the file, date when that ontology file was made. I
> >>> was
> >>
> >>> thinking of making a header class or alternatively it could go as
> >>> Hash
> >>
> >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> >>> know whets your thoughts about on this.
> >>>
> >>> Thanks,
> >>> Sohel Merchant
> >>> dictyBase
> >>>
> >> --
> >> -------------------------------------------------------------
> >> Hilmar Lapp                            email: lapp at gnf.org
> >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> >> -------------------------------------------------------------
> >>
> >>
> >>
> >>
> > --
> > -------------------------------------------------------------
> > Hilmar Lapp                            email: lapp at gnf.org
> > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > -------------------------------------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------




From cjfields at uiuc.edu  Tue Feb 21 19:25:02 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 13:25:02 -0600
Subject: [Bioperl-l] full chromosome accesscion number mess
In-Reply-To: <20060221184550.6557851b@dogwood.plantbio.uga.edu>
Message-ID: <000001c6371c$83bf92a0$15327e82@pyrimidine>

What is the accession you're having problems with?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Guojun Yang
> Sent: Tuesday, February 21, 2006 12:46 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] full chromosome accesscion number mess
> 
> Hi, everybody,
> In the process of reparing my CGI script after NCBI blast output format
> change, I noticed that the accession number for rice pseudochromosome is
> very confusing and cause trouble for sequence retrieving. My script use
> remoteblast to search for similar sequences,and then retrieve the hit
> sequence with a bit flanking region from GenBank. The rice
> pseudochromosomes have accession numbers similar to that of the individual
> clones like AP00XXX. I do not want the sequence retrieving to involve
> these accessions because it takes forever. Can anybody give some
> suggestion on how to deal with it?
> Thanks,
> 
> 
> Guojun Yang
> Department of Plant Biology
> University of Georgia
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From hlapp at gmx.net  Tue Feb 21 19:31:31 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 11:31:31 -0800
Subject: [Bioperl-l] Bio::Ontology::Ontology
In-Reply-To: <000001c63717$5314ded0$c2987ca5@pc13>
References: 
	<000001c63717$5314ded0$c2987ca5@pc13>
Message-ID: 

Send it to me. I'll review and check it in if appropriate. You should
also write a test (and include it in what you send to me; see t/*.t
for examples for how to write a test). (and obviously the test should
succeed)

Chris, I suppose this is the time to object - I would conceptually
like the ontology-based annotation too but now we are up against a
(hopefully) working implementation which can only be beaten by another
working implementation, and frankly I don't have time to attempt one
now.

   -hilmar

On 2/21/06, Sohel Merchant  wrote:
> Hi Hilmar and Chris,
>   I have played around a bit using Bio::Annotation::Collection to
> capture the headers of an ontology file. It behaves pretty well and
> avoids the cycle issue which might arise by suing ontology to describe
> the ontology. I have an initial version of a working parser for obo flat
> file format.
>
> Chris, I was able to model any kind of relationship by using some of the
> functionality in the Bio::Ontology::SimpleGoEngine which, I had
> initially overlooked.
>
> I would like to commit this code to the Bioperl CVS, but I don't have
> write access to it I believe. Can I send the stuff to either of you
> guys?
>
> Hilmar, I would like your feedback on the code base and would be happy
> to make any changes required before we commit it to the CVS.
>
> Thanks,
> Sohel Merchant.
> dictyBase
>
> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Monday, February 20, 2006 8:53 PM
> To: chris mungall
> Cc: Bioperl; Sohel Merchant
> Subject: Re: [Bioperl-l] Bio::Ontology::Ontology
>
> On 2/20/06, chris mungall  wrote:
> >
> > I like the idea of using an ontology to describe the ontology.
> >
> > Note that the proposed structure:
> > OntologyI HAS_A Annotation::OntologyTerm IS_A TermI HAS_A OntologyI
> >
> > will lead to cycles in the object graph when the metadata ontology
> > describes itself.
>
> Yes I know, that's why I didn't want to be too vocal about it ...
>
> >
> > actually, I think the ontology module already has object reference
> > cycles. TermI->OntologyI->TermI
> >
> > When I brought this up originally people didn't seem to care much - so
> > long as you're only parsing GO then it's not a big issue, people have
> > enough memory they won't notice a big chunk of memory that refuses to
> > be garbage collected way after it's used.
>
> There is a method that destroys the cycle: $ontology->close()
> (this is also an interface method)
>
> Essentially, the cycle is not in OntologyI itself but in OntologyI
> HAS-A OntologyEngineI; i.e., the latter holds (may hold) references to
> terms which (may) hold a reference to an OntologyI which holds a
> reference to the OntologyEngineI.
>
> I say 'may' in parentheses because an implementation may use tricks
> like late instantiation, stringified references (handles), and weak
> references. It's possible to avoid the cycle altogether using such
> tricks but it remains questionable how much this then affects
> performance, and how ugly and incomprehensible the code would become.
> Since there is the close() method I haven't bothered yet trying a
> fully de-cycled implementation.
>
> > Of course, if you want to use
> > bioperl to cycle though all of OBO + SnoMed + UMLS then it's a
> > different story.
>
> Well if you want to keep all three in memory for some kind of
> cross-reasoning then yes you are in trouble. But if you do one
> ontology after another, you'd just have make sure to call close() on
> an ontology once you're done with it.
>
> >
> > I think it's best of Sohel concentrates on getting obo.pm working,
> then
> > we can start thinking as a group about the best way to capture
> ontology
> > metadata. This includes metadata on the whole ontology, and metadata
> on
> > the terms (eg synonyms).
> >
> > To what extent are the current modules already in use?
>
> I don't know about others but I use them often.
>
> > I think the object cycle is a serious flaw, will it be possible to fix
> this without
> > a major overhaul?
>
> If I recall correctly the way go-perl circumvents this is by having
> the ontology of a term as a flat attribute. This also means that when
> having a term alone, you cannot ask for its connected terms. It's been
> a while, so Chris set me straight where this is not true.
>
> It should be possible to come up with an implementation of OntologyI
> that for all intents and purposes behaves like a flat scalar giving
> the name until you call one of its graph traversal methods. At that
> point it would instantiate the engine from persistent storage (file,
> or a database connection), or retrieve one from a 'store'. The latter
> is I believe what Allen started with the OntologyStore, but again I
> would need to check the details.
>
>     -hilmar
>
> >
> >
> > On Feb 11, 2006, at 9:10 PM, Hilmar Lapp wrote:
> >
> > > Sohel, please do keep the discussion on the list, in your own
> interest
> > > as there's a multitude of people who can respond to you.
> > >
> > > SimpleValue would probably be what I'd use too. As Heikki hinted you
> > > might even create an ontology for annotating ontologies, which would
> > > allow you to use Annotation::OntologyTerm for annotation, but then
> > > there's no qualifier value ...
> > >
> > > Bioperl 1.5.1 has been released last year, please check the website.
> > >
> > >       -hilmar
> > >
> > > On Feb 10, 2006, at 3:32 PM, Sohel Merchant wrote:
> > >
> > >> Hi Hilmar,
> > >>   I really like your suggestion of implementing the
> Bio::AnnotatableI
> > >> interface in the Bio::Ontology::Ontology class. I am going to
> > >> implement
> > >> this and play around a little with it. I am planning to use
> > >> Bio::Annotation::SimpleValue for annotating the header as it
> provides
> > >> a
> > >> good way of specifying the Tag/value pair. What are your thoughts
> on
> > >> using this?
> > >>
> > >>   Also, I was wondering if you have any idea about the scheduled
> date
> > >> for the Bioperl 1.51 release. I would like to contribute some stuff
> in
> > >> the next release.
> > >>
> > >> Thanks,
> > >> Sohel.
> > >>
> > >> -----Original Message-----
> > >> From: Hilmar Lapp [mailto:hlapp at gmx.net]
> > >> Sent: Friday, February 10, 2006 3:40 PM
> > >> To: Sohel Merchant
> > >> Cc: Bioperl
> > >> Subject: Re: Bio::Ontology::Ontology
> > >>
> > >> Sohel,
> > >>
> > >> please allow me to copy the list in my response. There's many good
> and
> > >> insightful people on the list who may have something to add or
> > >> different ideas.
> > >>
> > >> I've come across that problem myself, for instance with InterPro.
> What
> > >> I've done so far simply is to stick it unstructured into the
> > >> definition
> > >> slot, which is not helpful if your purpose goes further than just
> > >> displaying it in an unstructured fashion.
> > >>
> > >> I'm not sure you would want to create another class for this (like
> > >> AnnotatedOntology). One could make Bio::Ontology::Ontology (i.e.,
> the
> > >> implementation, probably not the interface) annotatable (i.e.,
> > >> implement Bio::Annotatable), which supposedly would be simple to do
> > >> (AnnotationCollection is already implemented, you'd just return an
> > >> instance of it).
> > >>
> > >> Even though tag/value pairs sound like quick&fast way to go I'm
> > >> leaning
> > >> against it; in essence we're moving away from that elsewhere
> > >> (SeqFeatureI) and hence I don't think we should restart it here.
> > >>
> > >> I'm not giving a definitive answer here, just my (initial)
> thoughts.
> > >> Hope that helps nonetheless. Can you fancy yourself trying the
> > >> Annotatable approach and let us know how it goes?
> > >>
> > >>      -hilmar
> > >>
> > >>
> > >> On Feb 10, 2006, at 8:39 AM, Sohel Merchant wrote:
> > >>
> > >>> Hi Hilmar,
> > >>> How are you doing? I am Sohel Merchant, a programmer at dictyBase,
> > >>> Northwestern University. I am working on a parser for an ontology
> > >>> file. I really like the ontology object model which you have
> > >>> contributed to Bioperl. I think its just Awesome!! One of things
> > >>> which
> > >>
> > >>> I thought would be great to capture is the ontology headers. Right
> > >>> now
> > >>
> > >>> one can specify only the name, authority information. I was
> wondering
> > >>> if there is any way, I could also capture other ontology file
> headers
> > >>> like version of the file, date when that ontology file was made. I
> > >>> was
> > >>
> > >>> thinking of making a header class or alternatively it could go as
> > >>> Hash
> > >>
> > >>> of values in the Bio::Ontology::Ontology class itself. I wanted to
> > >>> know whets your thoughts about on this.
> > >>>
> > >>> Thanks,
> > >>> Sohel Merchant
> > >>> dictyBase
> > >>>
> > >> --
> > >> -------------------------------------------------------------
> > >> Hilmar Lapp                            email: lapp at gnf.org
> > >> GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > >> -------------------------------------------------------------
> > >>
> > >>
> > >>
> > >>
> > > --
> > > -------------------------------------------------------------
> > > Hilmar Lapp                            email: lapp at gnf.org
> > > GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
> > > -------------------------------------------------------------
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
>
>
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------
>
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From MEC at stowers-institute.org  Tue Feb 21 20:38:55 2006
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 21 Feb 2006 14:38:55 -0600
Subject: [Bioperl-l] Pattern Density
Message-ID: 

 
You might consider displaying ccgg content as a track in mouse genome
browser at
http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34
 
For example, the following track causes it to display 3 proportionally
sized red boxes in the first 3K of mouse Chr1 

[MotifContent]
glyph = xyplot
graph_type = boxes
fgcolor = black
bgcolor = red
height=100
min_score=0
max_score=100
label=1
key="Motif Content"

reference=Chr1
MotifContent CCGG   1..1000    score=20
MotifContent CCGG   1001..2000    score=50
MotifContent CCGG   2001..3000    score=30


There are many ways for computing the score.  I myself would begin with:

#!/usr/bin/env perl
use strict;

use Bio::SeqIO; # for reading sequence to scan
use TFBS::Word::Consensus; # for the pattern matching.  cf.
http://forkhead.cgb.ki.se/TFBS/ 
use PDL::Basic; # if you have it installed, for the histogram binning
statistics 

 
 



________________________________

	From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa
	Sent: Tuesday, February 21, 2006 11:25 AM
	To: bioperl-l at lists.open-bio.org
	Subject: [Bioperl-l] Pattern Density
	
	
	Good Friends, 
	I have an important client who wants a histogram display of the
density of "ccgg" along any chromosome of the mouse genome in 1000 bp
windows. 

	I'm thinking that maybe there is a bio-perl module that could
help with this. 
	That'd probably beat having to write something from scratch. 
	Any help that you give would be greatly appreciated. 
	I am more concerned about the reading and analysis of the
sequence than actual plotting of the histogram, but anything you can
offer will be appreciated. 

	Thank you. 

	Nick Staffa 
	Telephone: 919-316-4569 (NIEHS: 6-4569) 
	Scientific Computing Support Group 
	NIEHS Information Technology Support Services Contract 
	(Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov ) 
	National Institute of Environmental Health Sciences 
	National Institutes of Health 
	Research Triangle Park, North Carolina 




From cjfields at uiuc.edu  Tue Feb 21 21:15:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 15:15:18 -0600
Subject: [Bioperl-l] bioperl maillist searches not updated
Message-ID: <000801c6372b$eae00870$15327e82@pyrimidine>

Seems that using Google to search through the mailing list will only get
mail up to the beginning of August 2005.  I went back to look up Hilmar's
email on bioperl-db recently and can't find it.  So I tried anything in
2006:

http://www.google.com/search?hl=en&lr=&safe=off&as_qdr=all&q=site%3Abioperl.
org+inurl%3Apipermail+inurl%3Abioperl-l+2006&btnG=Search

And got nothin'!

The Open-Bio form has some mail from 2006, but only up to 1-24-2006.
Luckily, the mailing list archives seem to be fine:



Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From osborne1 at optonline.net  Tue Feb 21 21:13:44 2006
From: osborne1 at optonline.net (Brian Osborne)
Date: Tue, 21 Feb 2006 16:13:44 -0500
Subject: [Bioperl-l] Pattern Density
In-Reply-To: 
Message-ID: 

Nick,

I was mistaken previously when I hinted that you couldn't create histograms
using Bioperl:

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Graphics/Glyph/xyplot.
html

This could do exactly what you want.

Brian O.


On 2/21/06 3:38 PM, "Cook, Malcolm"  wrote:

>  
> You might consider displaying ccgg content as a track in mouse genome
> browser at
> http://gbrowse.informatics.jax.org/cgi-bin/gbrowse/mouse_build_34
>  
> For example, the following track causes it to display 3 proportionally
> sized red boxes in the first 3K of mouse Chr1
> 
> [MotifContent]
> glyph = xyplot
> graph_type = boxes
> fgcolor = black
> bgcolor = red
> height=100
> min_score=0
> max_score=100
> label=1
> key="Motif Content"
> 
> reference=Chr1
> MotifContent CCGG   1..1000    score=20
> MotifContent CCGG   1001..2000    score=50
> MotifContent CCGG   2001..3000    score=30
> 
> 
> There are many ways for computing the score.  I myself would begin with:
> 
> #!/usr/bin/env perl
> use strict;
> 
> use Bio::SeqIO; # for reading sequence to scan
> use TFBS::Word::Consensus; # for the pattern matching.  cf.
> http://forkhead.cgb.ki.se/TFBS/
> use PDL::Basic; # if you have it installed, for the histogram binning
> statistics 
> 
>  
>  
> 
> 
> 
> ________________________________
> 
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of staffa
> Sent: Tuesday, February 21, 2006 11:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Pattern Density
> 
> 
> Good Friends, 
> I have an important client who wants a histogram display of the
> density of "ccgg" along any chromosome of the mouse genome in 1000 bp
> windows. 
> 
> I'm thinking that maybe there is a bio-perl module that could
> help with this. 
> That'd probably beat having to write something from scratch.
> Any help that you give would be greatly appreciated.
> I am more concerned about the reading and analysis of the
> sequence than actual plotting of the histogram, but anything you can
> offer will be appreciated.
> 
> Thank you. 
> 
> Nick Staffa 
> Telephone: 919-316-4569 (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From cjfields at uiuc.edu  Tue Feb 21 21:58:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 15:58:07 -0600
Subject: [Bioperl-l] bioperl-db issues
Message-ID: <000d01c63731$e61be1f0$15327e82@pyrimidine>

Sorry about the huge delay in this response, got caught up with other
things.

> > Bad News:  There's a new problem now. I updated from CVS yesterday; I
> > walked
> > through the steps and ran 'nmake test', with everything passing fine.
> > However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> > every 5 minutes or so.  I noticed (when using '-debug') that it is
> > hanging
> > up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a
> > database,
> > load the biosql schema, and load sequences w/o loading taxonomy, the
> > problem
> > goes away.
> >
> > Here's the debugging output (I cut it off at the point it hangs up):
> > [...]
> 
> > preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> > taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> > taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND
> > ncbi_taxon_id =
> > ?
> > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> > SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)
> 
> I'm a bit surprised if this is the query where it hangs. Are the
> indexes all there? There should be a primary key index on
> taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name
> over (taxon_id,name,name_class). Also, there should be separate indexes
> on taxon_name.taxon_id and taxon_name.name. Are they all there? If you
> reinstantiated the schema from the DDL then it seems unlikely that
> somehow the indexes have vanished except if you messed with the schema
> or the DDL.

So far everything looks like you mentioned (see below for the ANALYZE
stuff).  The only thing that I wasn't sure about was that taxon_name indexes
were all primary keys.  That's really it.

> Putting an index on taxon_name.name_class really can't make sense, so
> let's assume it can't be that.
> 
> So really I suspect this has something to do with the state of the
> database and the version of MySQL. In particular, from some 4.x version
> of MySQL under certain circumstances you have to analyze the statistics
> of the tables in order to get the optimizer pick up the indexes
> properly. Are you on MySQL 4.x and if so, have you done that?
> 
> There's the ANALYZE TABLE command:
> http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html
> 
> Note the comment: "This statement works with MyISAM, BDB, and (as of
> MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?
>
> Also, you can check the execution plan for the query using EXPLAIN.
> http://dev.mysql.com/doc/refman/4.1/en/explain.html
> 
> This should show you whether the index would be picked up for the query
> or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to
> the db using the mysql shell (mysql).
> 
> I believe something similarly strange was encountered by someone using
> DB::GFF (or Chado) under MySQL, and if I recall correctly the solution
> was to optimize (analyze) the tables. Maybe someone who was in that
> thread reads this and can comment?

I find it odd that it worked well back in December and doesn't work now.  I
updated bioperl and bioperl-db from CVS since then, so have there been any
changes that may have caused this?  I noticed a few changes here and there.

Here's what I have tried thus far:

1) I reinstalled MySQL.  I thought it might be that I had my database on a
partitioned drive, so I reinstalled on the main drive.

2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the
schema, and got the same error when loading (hanging on SpeciesAdaptor.
Tried ANALYZE:
------------------------------------
mysql> ANALYZE TABLE taxon;
+----------------+---------+----------+----------+
| Table          | Op      | Msg_type | Msg_text |
+----------------+---------+----------+----------+
| bioseqdb.taxon | analyze | status   | OK       |
+----------------+---------+----------+----------+
1 row in set (0.42 sec)

mysql> ANALYZE TABLE taxon_name;
+---------------------+---------+----------+----------+
| Table               | Op      | Msg_type | Msg_text |
+---------------------+---------+----------+----------+
| bioseqdb.taxon_name | analyze | status   | OK       |
+---------------------+---------+----------+----------+
1 row in set (0.36 sec)

mysql>
------------------------------------
so that's fine.  

3) Using EXPLAIN table:
------------------------------------
mysql> EXPLAIN taxon;
+-------------------+---------------------+------+-----+---------+----------
------+
| Field             | Type                | Null | Key | Default | Extra
|
+-------------------+---------------------+------+-----+---------+----------
------+
| taxon_id          | int(10) unsigned    | NO   | PRI | NULL    |
auto_increment |
| ncbi_taxon_id     | int(10)             | YES  | UNI | NULL    |
|
| parent_taxon_id   | int(10) unsigned    | YES  | MUL | NULL    |
|
| node_rank         | varchar(32)         | YES  |     | NULL    |
|
| genetic_code      | tinyint(3) unsigned | YES  |     | NULL    |
|
| mito_genetic_code | tinyint(3) unsigned | YES  |     | NULL    |
|
| left_value        | int(10) unsigned    | YES  | UNI | NULL    |
|
| right_value       | int(10) unsigned    | YES  | UNI | NULL    |
|
+-------------------+---------------------+------+-----+---------+----------
------+
8 rows in set (0.02 sec)

mysql> EXPLAIN taxon_name;
+------------+------------------+------+-----+---------+-------+
| Field      | Type             | Null | Key | Default | Extra |
+------------+------------------+------+-----+---------+-------+
| taxon_id   | int(10) unsigned | NO   | PRI |         |       |
| name       | varchar(255)     | NO   | PRI |         |       |
| name_class | varchar(32)      | NO   | PRI |         |       |
+------------+------------------+------+-----+---------+-------+
3 rows in set (0.00 sec)

------------------------------------
Does taxon_name need three primary keys?

4) So I tried reloading the sequences:
------------------------------------
C:\Perl\src\bioperl\bioperl-db\scripts\biosql>load_seqdatabase.pl -format
genbank -dbname bioseqdb -dbuser root -dbpass ********** -testonly -safe
-debug NP_249092.gpt

And got this:

Loading NP_249092.gpt ...
attempting to load adaptor class for Bio::Seq::RichSeq
        attempting to load module Bio::DB::BioSQL::RichSeqAdaptor
......
SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank)
SimpleValueAdaptor::add_assoc: binding column 1 to "21" (FK to
Bio::SeqFeature::Generic)
SimpleValueAdaptor::add_assoc: binding column 2 to "34" (FK to
Bio::Annotation::SimpleValue)
SimpleValueAdaptor::add_assoc: binding column 3 to "11" (value)
SimpleValueAdaptor::add_assoc: binding column 4 to "1" (rank)
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
no adaptor found for class Bio::Annotation::TypeManager
BioNamespaceAdaptor: binding UK column 1 to "bioperl" (namespace)
SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
------------------------------------
Which is where it hangs, as before, usually about 2 minutes for each
sequence.  It seems there's a timeout happening in there somewhere...  It
definitely has something to do with the lookup, but like I said it did run
much faster last Nov-Dec.

So I'm a bit lost now.  Any ideas?  

I may try re-optimizing tables to see if it helps any.

I'm also really thinking of giving postgresql a shot but I have used mysql
for a while now; I'd like to stay with it if I can.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 





From cjfields at uiuc.edu  Wed Feb 22 04:09:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 22:09:18 -0600
Subject: [Bioperl-l] bioperl-db issues
In-Reply-To: 
Message-ID: <000001c63765$c0472370$15327e82@pyrimidine>

I got it worked out.  The Windows installer had picked out lower memory
settings (key buffer 10M, for instance) when I reinstalled, which
drastically slowed everything down.  I reset the settings for a server
environment and it's fine now.  Well, as fine as it will likely get since
I'm running this on a 1.8 GHz P4 with 756 MB RAM, so I'm not expecting it to
actually fly.  It's loading at about two sequences/second.  I'll have to see
if I get a speed improvement when optimizing tables.  I'll add this to the
wiki for installing bioperl-db under Windows.  

Are there optimal settings for using bioperl-db, such as key buffer and sort
buffer size, buffer pool size, etc?  Or do you think I'm likely to run into
a processor speed limit?  Just trying to get a fix on how much memory I
could push towards getting a smaller sequence database loaded, nothing like
swissprot.  I saw something in the mail list about setting
max_allowed_packet and a few other settings but that was about four years
ago.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: drycafe at gmail.com [mailto:drycafe at gmail.com] On Behalf Of Hilmar
> Lapp
> Sent: Tuesday, February 21, 2006 6:44 PM
> To: Chris Fields
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: bioperl-db issues
> 
> On 2/21/06, Chris Fields  wrote:
> > [...]
> > I find it odd that it worked well back in December and doesn't work now.
> I
> > updated bioperl and bioperl-db from CVS since then, so have there been
> any
> > changes that may have caused this?  I noticed a few changes here and
> there.
> 
> The changes were fixes to retrieve the rank on persistent annotation
> objects (it was only stored before, but never retrieved). Neither the
> SpeciesAdaptor nor any of the taxonomy queries was affected by this.
> 
> >
> > Here's what I have tried thus far:
> >
> > 1) I reinstalled MySQL.  I thought it might be that I had my database on
> a
> > partitioned drive, so I reinstalled on the main drive.
> >
> > 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded
> the
> > schema, and got the same error when loading (hanging on SpeciesAdaptor.
> > Tried ANALYZE:
> > ------------------------------------
> > mysql> ANALYZE TABLE taxon;
> > +----------------+---------+----------+----------+
> > | Table          | Op      | Msg_type | Msg_text |
> > +----------------+---------+----------+----------+
> > | bioseqdb.taxon | analyze | status   | OK       |
> > +----------------+---------+----------+----------+
> > 1 row in set (0.42 sec)
> >
> > mysql> ANALYZE TABLE taxon_name;
> > +---------------------+---------+----------+----------+
> > | Table               | Op      | Msg_type | Msg_text |
> > +---------------------+---------+----------+----------+
> > | bioseqdb.taxon_name | analyze | status   | OK       |
> > +---------------------+---------+----------+----------+
> > 1 row in set (0.36 sec)
> 
> I'm not sure but you may have to analyze all tables.
> 
> >
> > mysql>
> > ------------------------------------
> > so that's fine.
> >
> > 3) Using EXPLAIN table:
> > ------------------------------------
> > mysql> EXPLAIN taxon;
> 
> Note that you wouldn't use EXPLAIN on a table but on a query instead.
> I.e., copy&paste the offending query into the mysql editor, prefix it
> with EXPLAIN and then see what the results are. It should show whether
> the indexes are being used properly.
> 
> Most likely it doesn't use one of the idnexes that it should be using
> but does a full table scan instead. The explain plan should pinpoint
> that.
> 
> BTW you can also use this to reconfirm the command line observation
> about the query being slow - it should 'hang' in the mysql shell as
> well. If it doesn't then there is something else going on. (if the
> placeholders pose a problem replace them with the actual values as
> given in the log)
> 
> > [..]
> > SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> > SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
> > ------------------------------------
> > Which is where it hangs, as before, usually about 2 minutes for each
> > sequence.
> 
> Do you also see a SELECT CLASSIFICATION query succeeding the one above
> (e.g., if you wait)? I'm asking because I'm surprised that that isn't
> the one you're seeing as taking too long, because it has been reported
> earlier to cause such problems with mysql. Alex Zelensky posted what
> he found worked as a fix.
> 
>   -hilmar
> --
> ----------------------------------------------------------
> : Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
> ----------------------------------------------------------



From hlapp at gmx.net  Wed Feb 22 00:43:42 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 16:43:42 -0800
Subject: [Bioperl-l] bioperl-db issues
In-Reply-To: <000d01c63731$e61be1f0$15327e82@pyrimidine>
References: <000d01c63731$e61be1f0$15327e82@pyrimidine>
Message-ID: 

On 2/21/06, Chris Fields  wrote:
> [...]
> I find it odd that it worked well back in December and doesn't work now.  I
> updated bioperl and bioperl-db from CVS since then, so have there been any
> changes that may have caused this?  I noticed a few changes here and there.

The changes were fixes to retrieve the rank on persistent annotation
objects (it was only stored before, but never retrieved). Neither the
SpeciesAdaptor nor any of the taxonomy queries was affected by this.

>
> Here's what I have tried thus far:
>
> 1) I reinstalled MySQL.  I thought it might be that I had my database on a
> partitioned drive, so I reinstalled on the main drive.
>
> 2) I rebuilt the database from scratch, loading taxonomy fresh, loaded the
> schema, and got the same error when loading (hanging on SpeciesAdaptor.
> Tried ANALYZE:
> ------------------------------------
> mysql> ANALYZE TABLE taxon;
> +----------------+---------+----------+----------+
> | Table          | Op      | Msg_type | Msg_text |
> +----------------+---------+----------+----------+
> | bioseqdb.taxon | analyze | status   | OK       |
> +----------------+---------+----------+----------+
> 1 row in set (0.42 sec)
>
> mysql> ANALYZE TABLE taxon_name;
> +---------------------+---------+----------+----------+
> | Table               | Op      | Msg_type | Msg_text |
> +---------------------+---------+----------+----------+
> | bioseqdb.taxon_name | analyze | status   | OK       |
> +---------------------+---------+----------+----------+
> 1 row in set (0.36 sec)

I'm not sure but you may have to analyze all tables.

>
> mysql>
> ------------------------------------
> so that's fine.
>
> 3) Using EXPLAIN table:
> ------------------------------------
> mysql> EXPLAIN taxon;

Note that you wouldn't use EXPLAIN on a table but on a query instead.
I.e., copy&paste the offending query into the mysql editor, prefix it
with EXPLAIN and then see what the results are. It should show whether
the indexes are being used properly.

Most likely it doesn't use one of the idnexes that it should be using
but does a full table scan instead. The explain plan should pinpoint
that.

BTW you can also use this to reconfirm the command line observation
about the query being slow - it should 'hang' in the mysql shell as
well. If it doesn't then there is something else going on. (if the
placeholders pose a problem replace them with the actual values as
given in the log)

> [..]
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208963" (ncbi_taxid)
> ------------------------------------
> Which is where it hangs, as before, usually about 2 minutes for each
> sequence.

Do you also see a SELECT CLASSIFICATION query succeeding the one above
(e.g., if you wait)? I'm asking because I'm surprised that that isn't
the one you're seeing as taking too long, because it has been reported
earlier to cause such problems with mysql. Alex Zelensky posted what
he found worked as a fix.

  -hilmar
--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From cjfields at uiuc.edu  Wed Feb 22 05:13:18 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 21 Feb 2006 23:13:18 -0600
Subject: [Bioperl-l] removing sequences from a database?
Message-ID: <000001c6376e$b113c170$15327e82@pyrimidine>

I think this has been posed once but I couldn't find a straight answer on
the mailing list; is there a way to remove sequences in a BioSQL database
using bioperl-db?  This is the last I heard about it:

http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From hlapp at gmx.net  Wed Feb 22 05:20:05 2006
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 21 Feb 2006 21:20:05 -0800
Subject: [Bioperl-l] removing sequences from a database?
In-Reply-To: <000001c6376e$b113c170$15327e82@pyrimidine>
References: <000001c6376e$b113c170$15327e82@pyrimidine>
Message-ID: 

This is a pretty old posting :-) Sure you can remove sequences. In
fact you can remove any persistent object by calling $pobj->remove().
I.e., for a persistent sequence (which is what you get from the
adaptors): $pseq->remove()

Do not forget to call commit() on the persistence adaptor or the
persistent object itself or otherwise the operation is rolled back
when you disconnect.

BTW there are examples for objects other than the sequence object
itself (say you want to remove only the features) in the
scripts/biosql directory; some of the --mergeobjs closure examples do
this.

    -hilmar

On 2/21/06, Chris Fields  wrote:
> I think this has been posed once but I couldn't find a straight answer on
> the mailing list; is there a way to remove sequences in a BioSQL database
> using bioperl-db?  This is the last I heard about it:
>
> http://portal.open-bio.org/pipermail/bioperl-l/2001-November/006570.html
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


--
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------



From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 10:20:10 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 10:20:10 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602211326.00021.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
Message-ID: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> Hi Dave,
> 
> Well, when you are using 1-based coordinates, an line that contains 44 
> intervals will have 45 ticks. If you move to 0-based coordinates, then the 
> first tick will be labeled 0 and the last tick will be labeled 44. An 
> alternative is to make each base dimensionless, but that becomes a problem 
> when dealing with single base features, such as SNPs.
 >
> These issues are why I have long advocated for interbase coordinates
> in which you number the positions between bases rather than the bases
> themselves.

I see your point but I need to work with the coordinates that the users 
expect and are familiar with. (Things get much worse with PDB residue 
numbering :)

> Draw me the picture of what you expect to see. I think of it this way:
> 
> 	1    2  3  4   5   6
>          A>G>C>T>A>

I guess something went wrong with your ASCII art :(

OK, consider a 44-residue entry from SwissProt (P12239):

   TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR

The first T is numbered 1 and the last R is numbered 44.

So I expect to see a line with 44 positions indicated somehow (whether 
these are half-open intervals or points on the line), with the number 1 
at the left end and the number 44 at the right end.

An important point is that if I then place other tracks below this one 
that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, 
they should align properly (according to whatever convention is used to 
represent a residue).

For a short sequence like this it would be possible to use letters to 
represent the residue but I'd like to use the same convention for longer 
sequences as well and have everything be consistent.

I'm hoping Bio:Graphics will make this easy.

Thanks, Dave


From khoueiry at ibdm.univ-mrs.fr  Wed Feb 22 09:12:20 2006
From: khoueiry at ibdm.univ-mrs.fr (khoueiry)
Date: Wed, 22 Feb 2006 10:12:20 +0100
Subject: [Bioperl-l] [Fwd: Re:  Pattern Density]
Message-ID: <1140599541.19981.26.camel@localhost>

An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: 
-------------- next part --------------
An embedded message was scrubbed...
From: khoueiry 
Subject: Re: [Bioperl-l] Pattern Density
Date: Tue, 21 Feb 2006 19:47:54 +0100
Size: 3812
URL: 

From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 15:13:10 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 15:13:10 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <1140619014.3142.81.camel@localhost.localdomain>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>	
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<1140619014.3142.81.camel@localhost.localdomain>
Message-ID: <43FC7F86.6060901@mrc-lmb.cam.ac.uk>

Scott Cain wrote:
> I don't know if this helps at all, but you could think of that 45 tick
> mark as the termination, since the space between the 44th and the 45th
> tick mark corresponds to your 44th residue.

Yes, that's the way I do think of it and that's the way I expect 
everybody else to think of it.

But the numbers need to match the residues in any case. ie. the numbers 
need to match the spaces not the tick marks, if the spaces match the 
residues.

> I suppose it is a matter of correctly training your users :-)

The important thing is to have a consistent model, then it's easy to 
explain to users.

Cheers, Dave


From lstein at cshl.edu  Wed Feb 22 16:22:02 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 11:22:02 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
Message-ID: <200602221122.02707.lstein@cshl.edu>

The base starts at the tickmark and extends to (but doesn't touch) the next 
one. If you are down at the resolution at which you see residue letters, then 
lines drawn underneath the letters will line up like this:

 1  2  3  4  5  6  7  8  9 10    ticks
 T  S  N  T  P  N  Q  E  P       residues
    =========   ===========      domains

Right?

Lincoln

On Wednesday 22 February 2006 05:20, Dave Howorth wrote:
> Lincoln Stein wrote:
> > Hi Dave,
> >
> > Well, when you are using 1-based coordinates, an line that contains 44
> > intervals will have 45 ticks. If you move to 0-based coordinates, then
> > the first tick will be labeled 0 and the last tick will be labeled 44. An
> > alternative is to make each base dimensionless, but that becomes a
> > problem when dealing with single base features, such as SNPs.
> >
> > These issues are why I have long advocated for interbase coordinates
> > in which you number the positions between bases rather than the bases
> > themselves.
>
> I see your point but I need to work with the coordinates that the users
> expect and are familiar with. (Things get much worse with PDB residue
> numbering :)
>
> > Draw me the picture of what you expect to see. I think of it this way:
> >
> > 	1    2  3  4   5   6
> >          A>G>C>T>A>
>
> I guess something went wrong with your ASCII art :(
>
> OK, consider a 44-residue entry from SwissProt (P12239):
>
>    TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR
>
> The first T is numbered 1 and the last R is numbered 44.
>
> So I expect to see a line with 44 positions indicated somehow (whether
> these are half-open intervals or points on the line), with the number 1
> at the left end and the number 44 at the right end.
>
> An important point is that if I then place other tracks below this one
> that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI,
> they should align properly (according to whatever convention is used to
> represent a residue).
>
> For a short sequence like this it would be possible to use letters to
> represent the residue but I'd like to use the same convention for longer
> sequences as well and have everything be consistent.
>
> I'm hoping Bio:Graphics will make this easy.
>
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 16:34:08 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 16:34:08 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602221122.02707.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<200602221122.02707.lstein@cshl.edu>
Message-ID: <43FC9280.1020008@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> The base starts at the tickmark and extends to (but doesn't touch) the next 
> one. If you are down at the resolution at which you see residue letters, then 
> lines drawn underneath the letters will line up like this:
> 
>  1  2  3  4  5  6  7  8  9 10    ticks
>  T  S  N  T  P  N  Q  E  P       residues
>     =========   ===========      domains
> 
> Right?

Yes. What's your point?

Dave


From cain at cshl.edu  Wed Feb 22 16:29:21 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 22 Feb 2006 11:29:21 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC7F86.6060901@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
	<1140619014.3142.81.camel@localhost.localdomain>
	<43FC7F86.6060901@mrc-lmb.cam.ac.uk>
Message-ID: <1140625762.3142.107.camel@localhost.localdomain>

Hi Dave,

I took the example code you posted a few days ago and added a few
motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
last residue), which results in the attached graphic.

As Lincoln pointed it, the features are drawn from the beginning (1 and
35), and through the last residue (up to but not touching 11 and 45).
So the space between 35 and 36 corresponds to residue 35.  That's the
way it works.

Scott


On Wed, 2006-02-22 at 15:13 +0000, Dave Howorth wrote:
> Scott Cain wrote:
> > I don't know if this helps at all, but you could think of that 45 tick
> > mark as the termination, since the space between the 44th and the 45th
> > tick mark corresponds to your 44th residue.
> 
> Yes, that's the way I do think of it and that's the way I expect 
> everybody else to think of it.
> 
> But the numbers need to match the residues in any case. ie. the numbers 
> need to match the spaces not the tick marks, if the spaces match the 
> residues.
> 
> > I suppose it is a matter of correctly training your users :-)
> 
> The important thing is to have a consistent model, then it's easy to 
> explain to users.
> 
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory
-------------- next part --------------
A non-text attachment was scrubbed...
Name: motifs.png
Type: image/png
Size: 1879 bytes
Desc: not available
URL: 

From dhoworth at mrc-lmb.cam.ac.uk  Wed Feb 22 16:45:00 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Wed, 22 Feb 2006 16:45:00 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <1140625762.3142.107.camel@localhost.localdomain>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>	
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>	
	<1140619014.3142.81.camel@localhost.localdomain>	
	<43FC7F86.6060901@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
Message-ID: <43FC950C.7080007@mrc-lmb.cam.ac.uk>

Scott Cain wrote:
> Hi Dave,
> 
> I took the example code you posted a few days ago and added a few
> motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> last residue), which results in the attached graphic.

Yes, that's the same sort of graphic I'm getting.

> As Lincoln pointed it, the features are drawn from the beginning (1 and
> 35), and through the last residue (up to but not touching 11 and 45).
> So the space between 35 and 36 corresponds to residue 35.

But there is no residue 45!  So there should be no number 45 anywhere on 
the picture.

I think the problem is that the tick strip is displaying numbers for the 
ticks instead of the intervals. The intervals are what corresponds to 
users' models of physical reality and my graphics need to match that.

 > That's the way it works.

I guess I'll have to experiment and patch until it does what I want 
then, if nobody knows how to do it.

Cheers, Dave


From iamvela at yahoo.com  Wed Feb 22 17:21:59 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 09:21:59 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
Message-ID: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com>

Hi All:

I am new to Perl/BioPerl world.

I am debugging a program that used to work fine
before. 
Blast works fine and returns results, but I am unale
to get any hits from the results.

Here is the relevant code:

$blastObj = new Bio::SearchIO (-file=>$resultsFile,
-format=>'blast');
  while (my $result = $blastObj->next_result()) {
     while (my $bioPerlHit = $result->next_hit()) {
         .......


The first while condition returns true, but the second
while condition returns false. So looks like there is
some result, but it is unable to identify the hits in
the result. I printed the $result (pasted below).

Any ideas/comments to resolve this? Thanks in advance.

I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on
Windows XP platform. 

Like I said before, this application was running fine
on a different windows machine with similar
environment,so looks like there is some change in the
products/versions that is causing the problem.

thanks again,
Raghu




Blast result (i can send complete result if you need
it):

BLASTP 2.2.13 [Nov-27-2005]
Reference: Altschul, Stephen F., Thomas L. Madden,
Alejandro A. Sch?ffer, 
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman 
(1997), "Gapped BLAST and PSI-BLAST: a new generation
of 
protein database search programs", Nucleic Acids Res.
25:3389-3402.

RID: 1140573059-19990-140117828872.BLASTQ1


Database: All non-redundant GenBank CDS
translations+PDB+SwissProt+PIR+PRF excluding
environmental samples
           3,297,000 sequences; 1,129,354,045 total
letters
Query=  
Length=360


                                                      
            Score     E
Sequences producing significant alignments:           
            (Bits)  Value

ref|XP_534770.2|  PREDICTED: similar to
Mitogen-activated prot...   739    0.0   
gb|AAX36107.1|  mitogen-activated protein kinase 1
[synthetic con   739    0.0   
pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
Complexed...   739    0.0   
pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
With A S...   739    0.0   
ref|NP_786987.1|  mitogen-activated protein kinase 1
[Bos taur...   739    0.0   
emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
>prf||1813...   738    0.0   
gb|AAQ02541.1|  mitogen-activated protein kinase 1
[synthetic con   736    0.0   
gb|AAH99905.1|  Mitogen-activated protein kinase 1
[Homo sapiens]   735    0.0   
emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
             734    0.0   
gb|AAH58258.1|  Mitogen activated protein kinase 1
[Mus muscul...   731    0.0   
pdb|4ERK|   The Complex Structure Of The Map Kinase
Erk2OLOMOU...   731    0.0   
pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
Arginin...   730    0.0   
ref|XP_860750.1|  PREDICTED: similar to
Mitogen-activated prot...   729    0.0   
gb|AAK56503.1|  extracellular signal-regulated kinase
2 [Gallu...   726    0.0   
ref|XP_860716.1|  PREDICTED: similar to
Mitogen-activated prot...   726    0.0   
pdb|2ERK|   Phosphorylated Map Kinase Erk2            
             726    0.0   
pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
Kinase Co...   725    0.0   
ref|XP_860682.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
ref|XP_860651.1|  PREDICTED: similar to
Mitogen-activated prot...   720    0.0   
emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
>prf||181...   717    0.0   
ref|NP_001017127.1|  mitogen-activated protein kinase
1 [Xenopus    715    0.0   
dbj|BAE28679.1|  unnamed protein product [Mus
musculus]             713    0.0   
emb|CAA42482.1|  MAP kinase [Xenopus laevis]
>gb|AAH60748.1| M...   711    0.0   
sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
1 (Myel...   711    0.0   
gb|AAH76730.1|  Xp42 protein [Xenopus laevis]         
             706    0.0   
gb|AAH65868.1|  Mitogen-activated protein kinase 1
[Danio rerio]    696    0.0   
dbj|BAD23843.1|  extracellular signal regulated
protein kinase...   694    0.0   
ref|NP_878308.2|  mitogen-activated protein kinase 1
[Danio re...   694    0.0   
emb|CAG07778.1|  unnamed protein product [Tetraodon
nigroviridis]   692    0.0   
dbj|BAB11813.1|  ERK2 [Danio rerio]                   
             689    0.0   
gb|AAY57805.1|  extracellular signal-regulated kinase
2 [Danio re   687    0.0   
gb|AAH45505.1|  Mitogen-activated protein kinase 3
[Danio reri...   654    0.0   
dbj|BAB11812.1|  ERK1 [Danio rerio]                   
             654    0.0   
ref|XP_609884.2|  PREDICTED: similar to mitogen
activated prot...   653    0.0   
dbj|BAD23842.1|  extracellular signal regulated
protein kinase...   650    0.0   
gb|AAH29712.1|  Mitogen activated protein kinase 3
[Mus muscul...   644    0.0   
ref|XP_885698.1|  PREDICTED: similar to mitogen
activated prot...   644    0.0   
gb|AAA20009.1|  microtubule-associated protein-2
kinase             643    0.0   
emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
>ref|NP_059043...   641    0.0   
gb|AAH13992.1|  Mitogen-activated protein kinase 3
[Homo sapie...   641    0.0   
gb|AAQ02422.1|  mitogen-activated protein kinase 3
[synthetic ...   641    0.0   
gb|AAA41123.1|  extracellular signal-regulated kinase
1             640    0.0   
ref|XP_854045.1|  PREDICTED: similar to mitogen
activated prot...   640    0.0   
gb|AAA63486.1|  extracellular-signal-regulated kinase
1 [Rattus n   640    0.0   
emb|CAG02655.1|  unnamed protein product [Tetraodon
nigroviridis]   640    0.0   
emb|CAA42744.1|  protein serine/threonine kinase [Homo
sapiens...   639    0.0   
gb|AAA36142.1|  kinase 1                              
             639    0.0   
emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
>prf||181...   639    0.0   
ref|XP_885840.1|  PREDICTED: similar to mitogen
activated prot...   632    5e-180
ref|XP_885818.1|  PREDICTED: similar to mitogen
activated prot...   630    3e-179
ref|XP_860621.1|  PREDICTED: similar to
Mitogen-activated prot...   627    2e-178
gb|AAF71666.1|  extracellular signal-regulated kinase
1b [Rattus    627    2e-178
ref|XP_393029.1|  PREDICTED: similar to MAP kinase
[Apis mellifer   621    1e-176
gb|AAA83210.1|  MAP kinase                            
             619    4e-176
dbj|BAE46741.1|  Extracellular regulated MAP kinase
[Bombyx mori]   618    1e-175
gb|AAH13754.1|  Mapk3 protein [Mus musculus]          
             612    9e-174
dbj|BAE06412.1|  mitogen-activated protein kinase
[Ciona intestin   607    2e-172
dbj|BAE33167.1|  unnamed protein product [Mus
musculus]             600    3e-170
gb|AAN46679.1|  MAP kinase [Strongylocentrotus
purpuratus] >re...   598    1e-169
dbj|BAC02940.1|  mitogen-activated protein kinase
[Halocynthia ro   592    6e-168
gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
>gb|EAA4631...   590    2e-167
emb|CAD97888.1|  hypothetical protein [Homo sapiens]  
             589    5e-167
emb|CAD60453.1|  extracellular signal-regulated
protein kinase...   589    5e-167
emb|CAD56894.1|  mitogen-activated protein kinase 1
[Meloidogyne    589    6e-167
ref|XP_536917.2|  PREDICTED: similar to mitogen
activated prot...   588    1e-166
gb|AAN40736.1|  mitogen-activated protein kinase
[Paralichthys ol   586    4e-166
emb|CAE73725.1|  Hypothetical protein CBG21247
[Caenorhabditis br   583    3e-165
emb|CAA87057.1|  Hypothetical protein F43C1.2a
[Caenorhabditis...   581    2e-164
gb|AAA18956.1|  Sur-1 MAP kinase                      
             581    2e-164
emb|CAB60996.1|  Hypothetical protein F43C1.2b
[Caenorhabditis...   581    2e-164
gb|AAK52329.1|  extracellular signal-related kinase 1b
[Homo sapi   580    4e-164
ref|XP_885794.1|  PREDICTED: similar to mitogen
activated prot...   553    4e-156
ref|XP_868146.1|  PREDICTED: similar to mitogen
activated prot...   548    2e-154
gb|AAK52330.1|  extracellular signal-related kinase 1c
[Homo sapi   546    4e-154
dbj|BAA22620.1|  ERK2 [Mus musculus]                  
             544    2e-153
ref|XP_510921.1|  PREDICTED: mitogen-activated protein
kinase 3 [   529    8e-149
gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]    
             496    7e-139
emb|CAJ44437.1|  MAP kinase [Echinococcus
multilocularis]           491    1e-137
ref|XP_885774.1|  PREDICTED: similar to mitogen
activated prot...   444    3e-123
gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
str. PES...   431    2e-119
gb|AAZ38881.1|  extracellular regulated kinase
[Littorina littore   431    2e-119
emb|CAD60723.1|  unnamed protein product [Podospora
anserina]       411    2e-113
gb|AAK25816.1|  MAP kinase [Neurospora crassa]
>ref|XP_959713....   411    2e-113
gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
[Aspergillus ...   409    1e-112
gb|EAA74589.1|  hypothetical protein FG06385.1
[Gibberella zea...   409    1e-112
ref|XP_504312.1|  hypothetical protein [Yarrowia
lipolytica] >...   408    2e-112
gb|AAG01162.1|  mitogen-activated protein kinase
[Fusarium oxy...   408    2e-112
gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
>gb|AAK52840.1|...   408    2e-112
dbj|BAE57584.1|  unnamed protein product [Aspergillus
oryzae]       408    2e-112
dbj|BAD42855.1|  mitogen-activated protein kinase
[Bipolaris oryz   407    3e-112
gb|AAD50496.1|  mitogen activated protein kinase
[Colletotrichum    407    3e-112
gb|AAF05913.1|  mitogen-activated protein kinase
[Cochliobolus he   407    3e-112
gb|AAM89501.1|  mitogen-activated protein kinase
[Leptosphaeria m   407    3e-112
dbj|BAB21569.1|  mitogen-activated protein kinase
[Glomerella cin   407    3e-112
gb|AAB72017.1|  mitogen-activated protein kinase
[Nectria haem...   407    3e-112
emb|CAC36428.1|  mitogen activated protein kinase
[Gibberella fuj   406    6e-112
ref|XP_364720.1|  hypothetical protein MG09565.4
[Magnaporthe gri   406    6e-112
gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]   
             406    6e-112
gb|AAO63561.1|  mitogen activated protein kinase
[Verticillium fu   406    8e-112
dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]     
             405    1e-111

ALIGNMENTS
>ref|XP_534770.2| PREDICTED: similar to
Mitogen-activated protein kinase 1 (Extracellular 
signal-regulated kinase 2) (ERK-2) (Mitogen-activated 
protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
(ERT1) 
isoform 1 [Canis familiaris]
 ref|NP_620407.1| mitogen-activated protein kinase 1
[Homo sapiens]
 ref|NP_002736.3| mitogen-activated protein kinase 1
[Homo sapiens]
 gb|AAH17832.1| Mitogen-activated protein kinase 1
[Homo sapiens]
 sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
1 (Extracellular signal-regulated 
kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)

(MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
 gb|AAA58459.1| protein kinase 2
Length=360

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>gb|AAX36107.1| mitogen-activated protein kinase 1
[synthetic construct]
Length=361

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120
           
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
Sbjct  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120

Query  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180
           
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
Sbjct  121 
LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
 180

Query  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240
           
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
Sbjct  181 
TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
 240

Query  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300
           
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
Sbjct  241 
LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
 300

Query  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360
           
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
Sbjct  301 
RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
 360


>pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
Complexed With A Pyrazolopyridazine 
Derivative
Length=368

 Score =  739 bits (1909),  Expect = 0.0
 Identities = 360/360 (100%), Positives = 360/360
(100%), Gaps = 0/360 (0%)

Query  1   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 60
           
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
Sbjct  9   
MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
 68

Query  61  
HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
 120





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From lstein at cshl.edu  Wed Feb 22 18:23:09 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:23:09 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221323.09872.lstein@cshl.edu>

Hi Dave,

If you want to adjust the way that the arrow.pm module draws the ticks, please 
make it a user-configurable option with the default being the current method. 
It should be easy enough to do this -- you just offset the position of the 
labels by 0.5 interval and inhibit drawing of the last one.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From lstein at cshl.edu  Wed Feb 22 18:40:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Wed, 22 Feb 2006 13:40:27 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC950C.7080007@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<1140625762.3142.107.camel@localhost.localdomain>
	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
Message-ID: <200602221340.28573.lstein@cshl.edu>

I have just committed a version of the arrow.pm glyph that has a 
-label_intervals flag.

Lincoln

On Wednesday 22 February 2006 11:45, Dave Howorth wrote:
> Scott Cain wrote:
> > Hi Dave,
> >
> > I took the example code you posted a few days ago and added a few
> > motifs, one that goes from 1 to 10, and one that goes from 35 to 44 (the
> > last residue), which results in the attached graphic.
>
> Yes, that's the same sort of graphic I'm getting.
>
> > As Lincoln pointed it, the features are drawn from the beginning (1 and
> > 35), and through the last residue (up to but not touching 11 and 45).
> > So the space between 35 and 36 corresponds to residue 35.
>
> But there is no residue 45!  So there should be no number 45 anywhere on
> the picture.
>
> I think the problem is that the tick strip is displaying numbers for the
> ticks instead of the intervals. The intervals are what corresponds to
> users' models of physical reality and my graphics need to match that.
>
>  > That's the way it works.
>
> I guess I'll have to experiment and patch until it does what I want
> then, if nobody knows how to do it.
>
> Cheers, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From cjfields at uiuc.edu  Wed Feb 22 19:45:54 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 13:45:54 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222172159.73370.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <000c01c637e8$980c6f90$15327e82@pyrimidine>

Upgrade bioperl from CVS using nmake. 

Installation instructions for using nmake:

http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core

You can download a tarball using anonymous CVS (link at bottom):

http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/

or use CVS directly:

http://www.bioperl.org/wiki/Using_CVS

Then make sure to grab the last SearchIO::last bugfix, which is not in CVS
yet:

http://bugzilla.bioperl.org/show_bug.cgi?id=1934

Replace the blast.pm in \site\lib\Bio\SearchIO in your Perl directory.

Does that fix it?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 11:22 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Blast returns result, but does not return hits
> 
> Hi All:
> 
> I am new to Perl/BioPerl world.
> 
> I am debugging a program that used to work fine
> before.
> Blast works fine and returns results, but I am unale
> to get any hits from the results.
> 
> Here is the relevant code:
> 
> $blastObj = new Bio::SearchIO (-file=>$resultsFile,
> -format=>'blast');
>   while (my $result = $blastObj->next_result()) {
>      while (my $bioPerlHit = $result->next_hit()) {
>          .......
> 
> 
> The first while condition returns true, but the second
> while condition returns false. So looks like there is
> some result, but it is unable to identify the hits in
> the result. I printed the $result (pasted below).
> 
> Any ideas/comments to resolve this? Thanks in advance.
> 
> I am using Perl 5.8.7, BioPerl 1.2.3, Apache 1.3.34 on
> Windows XP platform.
> 
> Like I said before, this application was running fine
> on a different windows machine with similar
> environment,so looks like there is some change in the
> products/versions that is causing the problem.
> 
> thanks again,
> Raghu
> 
> 
> 
> 
> Blast result (i can send complete result if you need
> it):
> 
> 

> BLASTP 2.2.13 [Nov-27-2005]
> Reference: Altschul, Stephen F., Thomas L. Madden,
> Alejandro A. Sch?ffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
> Lipman
> (1997), "Gapped BLAST and PSI-BLAST: a new generation
> of
> protein database search programs", Nucleic Acids Res.
> 25:3389-3402.
> 
> RID: 1140573059-19990-140117828872.BLASTQ1
> 
> 
> Database: All non-redundant GenBank CDS
> translations+PDB+SwissProt+PIR+PRF excluding
> environmental samples
>            3,297,000 sequences; 1,129,354,045 total
> letters
> Query=
> Length=360
> 
> 
> 
>             Score     E
> Sequences producing significant alignments:
>             (Bits)  Value
> 
> ref|XP_534770.2|  PREDICTED: similar to
> Mitogen-activated prot...   739    0.0
> gb|AAX36107.1|  mitogen-activated protein kinase 1
> [synthetic con   739    0.0
> pdb|1WZY|A  Chain A, Crystal Structure Of Human Erk2
> Complexed...   739    0.0
> pdb|1TVO|A  Chain A, The Structure Of Erk2 In Complex
> With A S...   739    0.0
> ref|NP_786987.1|  mitogen-activated protein kinase 1
> [Bos taur...   739    0.0
> emb|CAA77752.1|  41kD protein kinase [Homo sapiens]
> >prf||1813...   738    0.0
> gb|AAQ02541.1|  mitogen-activated protein kinase 1
> [synthetic con   736    0.0
> gb|AAH99905.1|  Mitogen-activated protein kinase 1
> [Homo sapiens]   735    0.0
> emb|CAI29602.1|  hypothetical protein [Pongo pygmaeus]
>              734    0.0
> gb|AAH58258.1|  Mitogen activated protein kinase 1
> [Mus muscul...   731    0.0
> pdb|4ERK|   The Complex Structure Of The Map Kinase
> Erk2OLOMOU...   731    0.0
> pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2 With An
> Arginin...   730    0.0
> ref|XP_860750.1|  PREDICTED: similar to
> Mitogen-activated prot...   729    0.0
> gb|AAK56503.1|  extracellular signal-regulated kinase
> 2 [Gallu...   726    0.0
> ref|XP_860716.1|  PREDICTED: similar to
> Mitogen-activated prot...   726    0.0
> pdb|2ERK|   Phosphorylated Map Kinase Erk2
>              726    0.0
> pdb|1PME|   Structure Of Penta Mutant Human Erk2 Map
> Kinase Co...   725    0.0
> ref|XP_860682.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> ref|XP_860651.1|  PREDICTED: similar to
> Mitogen-activated prot...   720    0.0
> emb|CAA77753.1|  40kDa protein kinase [Homo sapiens]
> >prf||181...   717    0.0
> ref|NP_001017127.1|  mitogen-activated protein kinase
> 1 [Xenopus    715    0.0
> dbj|BAE28679.1|  unnamed protein product [Mus
> musculus]             713    0.0
> emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> >gb|AAH60748.1| M...   711    0.0
> sp|P26696|MK01_XENLA  Mitogen-activated protein kinase
> 1 (Myel...   711    0.0
> gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
>              706    0.0
> gb|AAH65868.1|  Mitogen-activated protein kinase 1
> [Danio rerio]    696    0.0
> dbj|BAD23843.1|  extracellular signal regulated
> protein kinase...   694    0.0
> ref|NP_878308.2|  mitogen-activated protein kinase 1
> [Danio re...   694    0.0
> emb|CAG07778.1|  unnamed protein product [Tetraodon
> nigroviridis]   692    0.0
> dbj|BAB11813.1|  ERK2 [Danio rerio]
>              689    0.0
> gb|AAY57805.1|  extracellular signal-regulated kinase
> 2 [Danio re   687    0.0
> gb|AAH45505.1|  Mitogen-activated protein kinase 3
> [Danio reri...   654    0.0
> dbj|BAB11812.1|  ERK1 [Danio rerio]
>              654    0.0
> ref|XP_609884.2|  PREDICTED: similar to mitogen
> activated prot...   653    0.0
> dbj|BAD23842.1|  extracellular signal regulated
> protein kinase...   650    0.0
> gb|AAH29712.1|  Mitogen activated protein kinase 3
> [Mus muscul...   644    0.0
> ref|XP_885698.1|  PREDICTED: similar to mitogen
> activated prot...   644    0.0
> gb|AAA20009.1|  microtubule-associated protein-2
> kinase             643    0.0
> emb|CAA46318.1|  MAP kinase [Rattus norvegicus]
> >ref|NP_059043...   641    0.0
> gb|AAH13992.1|  Mitogen-activated protein kinase 3
> [Homo sapie...   641    0.0
> gb|AAQ02422.1|  mitogen-activated protein kinase 3
> [synthetic ...   641    0.0
> gb|AAA41123.1|  extracellular signal-regulated kinase
> 1             640    0.0
> ref|XP_854045.1|  PREDICTED: similar to mitogen
> activated prot...   640    0.0
> gb|AAA63486.1|  extracellular-signal-regulated kinase
> 1 [Rattus n   640    0.0
> emb|CAG02655.1|  unnamed protein product [Tetraodon
> nigroviridis]   640    0.0
> emb|CAA42744.1|  protein serine/threonine kinase [Homo
> sapiens...   639    0.0
> gb|AAA36142.1|  kinase 1
>              639    0.0
> emb|CAA77754.1|  44kDa protein kinase [Homo sapiens]
> >prf||181...   639    0.0
> ref|XP_885840.1|  PREDICTED: similar to mitogen
> activated prot...   632    5e-180
> ref|XP_885818.1|  PREDICTED: similar to mitogen
> activated prot...   630    3e-179
> ref|XP_860621.1|  PREDICTED: similar to
> Mitogen-activated prot...   627    2e-178
> gb|AAF71666.1|  extracellular signal-regulated kinase
> 1b [Rattus    627    2e-178
> ref|XP_393029.1|  PREDICTED: similar to MAP kinase
> [Apis mellifer   621    1e-176
> gb|AAA83210.1|  MAP kinase
>              619    4e-176
> dbj|BAE46741.1|  Extracellular regulated MAP kinase
> [Bombyx mori]   618    1e-175
> gb|AAH13754.1|  Mapk3 protein [Mus musculus]
>              612    9e-174
> dbj|BAE06412.1|  mitogen-activated protein kinase
> [Ciona intestin   607    2e-172
> dbj|BAE33167.1|  unnamed protein product [Mus
> musculus]             600    3e-170
> gb|AAN46679.1|  MAP kinase [Strongylocentrotus
> purpuratus] >re...   598    1e-169
> dbj|BAC02940.1|  mitogen-activated protein kinase
> [Halocynthia ro   592    6e-168
> gb|AAL48618.1|  RE08694p [Drosophila melanogaster]
> >gb|EAA4631...   590    2e-167
> emb|CAD97888.1|  hypothetical protein [Homo sapiens]
>              589    5e-167
> emb|CAD60453.1|  extracellular signal-regulated
> protein kinase...   589    5e-167
> emb|CAD56894.1|  mitogen-activated protein kinase 1
> [Meloidogyne    589    6e-167
> ref|XP_536917.2|  PREDICTED: similar to mitogen
> activated prot...   588    1e-166
> gb|AAN40736.1|  mitogen-activated protein kinase
> [Paralichthys ol   586    4e-166
> emb|CAE73725.1|  Hypothetical protein CBG21247
> [Caenorhabditis br   583    3e-165
> emb|CAA87057.1|  Hypothetical protein F43C1.2a
> [Caenorhabditis...   581    2e-164
> gb|AAA18956.1|  Sur-1 MAP kinase
>              581    2e-164
> emb|CAB60996.1|  Hypothetical protein F43C1.2b
> [Caenorhabditis...   581    2e-164
> gb|AAK52329.1|  extracellular signal-related kinase 1b
> [Homo sapi   580    4e-164
> ref|XP_885794.1|  PREDICTED: similar to mitogen
> activated prot...   553    4e-156
> ref|XP_868146.1|  PREDICTED: similar to mitogen
> activated prot...   548    2e-154
> gb|AAK52330.1|  extracellular signal-related kinase 1c
> [Homo sapi   546    4e-154
> dbj|BAA22620.1|  ERK2 [Mus musculus]
>              544    2e-153
> ref|XP_510921.1|  PREDICTED: mitogen-activated protein
> kinase 3 [   529    8e-149
> gb|AAT02418.1|  MAP kinase [Schistosoma japonicum]
>              496    7e-139
> emb|CAJ44437.1|  MAP kinase [Echinococcus
> multilocularis]           491    1e-137
> ref|XP_885774.1|  PREDICTED: similar to mitogen
> activated prot...   444    3e-123
> gb|EAA14714.3|  ENSANGP00000016639 [Anopheles gambiae
> str. PES...   431    2e-119
> gb|AAZ38881.1|  extracellular regulated kinase
> [Littorina littore   431    2e-119
> emb|CAD60723.1|  unnamed protein product [Podospora
> anserina]       411    2e-113
> gb|AAK25816.1|  MAP kinase [Neurospora crassa]
> >ref|XP_959713....   411    2e-113
> gb|EAL89122.1|  MAP kinase (FUS3/KSS1), putative
> [Aspergillus ...   409    1e-112
> gb|EAA74589.1|  hypothetical protein FG06385.1
> [Gibberella zea...   409    1e-112
> ref|XP_504312.1|  hypothetical protein [Yarrowia
> lipolytica] >...   408    2e-112
> gb|AAG01162.1|  mitogen-activated protein kinase
> [Fusarium oxy...   408    2e-112
> gb|AAS20192.1|  AMK1 [Alternaria brassicicola]
> >gb|AAK52840.1|...   408    2e-112
> dbj|BAE57584.1|  unnamed protein product [Aspergillus
> oryzae]       408    2e-112
> dbj|BAD42855.1|  mitogen-activated protein kinase
> [Bipolaris oryz   407    3e-112
> gb|AAD50496.1|  mitogen activated protein kinase
> [Colletotrichum    407    3e-112
> gb|AAF05913.1|  mitogen-activated protein kinase
> [Cochliobolus he   407    3e-112
> gb|AAM89501.1|  mitogen-activated protein kinase
> [Leptosphaeria m   407    3e-112
> dbj|BAB21569.1|  mitogen-activated protein kinase
> [Glomerella cin   407    3e-112
> gb|AAB72017.1|  mitogen-activated protein kinase
> [Nectria haem...   407    3e-112
> emb|CAC36428.1|  mitogen activated protein kinase
> [Gibberella fuj   406    6e-112
> ref|XP_364720.1|  hypothetical protein MG09565.4
> [Magnaporthe gri   406    6e-112
> gb|AAG23132.1|  MAP kinase [Botryotinia fuckeliana]
>              406    6e-112
> gb|AAO63561.1|  mitogen activated protein kinase
> [Verticillium fu   406    8e-112
> dbj|BAE53432.1|  MAP kinase Pmk1 [Hypocrea lixii]
>              405    1e-111
> 
> ALIGNMENTS
> >ref|XP_534770.2| PREDICTED: similar to
> Mitogen-activated protein kinase 1 (Extracellular
> signal-regulated kinase 2) (ERK-2) (Mitogen-activated
> protein kinase 2) (MAP kinase 2) (MAPK 2) (p42-MAPK)
> (ERT1)
> isoform 1 [Canis familiaris]
>  ref|NP_620407.1| mitogen-activated protein kinase 1
> [Homo sapiens]
>  ref|NP_002736.3| mitogen-activated protein kinase 1
> [Homo sapiens]
>  gb|AAH17832.1| Mitogen-activated protein kinase 1
> [Homo sapiens]
>  sp|P28482|MK01_HUMAN Mitogen-activated protein kinase
> 1 (Extracellular signal-regulated
> kinase 2) (ERK-2) (Mitogen-activated protein kinase 2)
> 
> (MAP kinase 2) (MAPK 2) (p42-MAPK) (ERT1)
>  gb|AAA58459.1| protein kinase 2
> Length=360
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >gb|AAX36107.1| mitogen-activated protein kinase 1
> [synthetic construct]
> Length=361
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
> Sbjct  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> Query  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
> Sbjct  121
> LSNDHICYFLYQILRGLKYIHSANVLHRDLKPSNLLLNTTCDLKICDFGLARVADPDHDH
>  180
> 
> Query  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
> Sbjct  181
> TGFLTEYVATRWYRAPEIMLNSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHI
>  240
> 
> Query  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
> Sbjct  241
> LGILGSPSQEDLNCIINLKARNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHK
>  300
> 
> Query  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
> Sbjct  301
> RIEVEQALAHPYLEQYYDPSDEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGYRS
>  360
> 
> 
> >pdb|1WZY|A Chain A, Crystal Structure Of Human Erk2
> Complexed With A Pyrazolopyridazine
> Derivative
> Length=368
> 
>  Score =  739 bits (1909),  Expect = 0.0
>  Identities = 360/360 (100%), Positives = 360/360
> (100%), Gaps = 0/360 (0%)
> 
> Query  1
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  60
> 
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
> Sbjct  9
> MAAAAAAGAGPEMVRGQVFDVGPRYTNLSYIGEGAYGMVCSAYDNVNKVRVAIKKISPFE
>  68
> 
> Query  61
> HQTYCQRTLREIKILLRFRHENIIGINDIIRAPTIEQMKDVYIVQDLMETDLYKLLKTQH
>  120
> 
> 
> 
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l




From iamvela at yahoo.com  Wed Feb 22 21:06:54 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 13:06:54 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000c01c637e8$980c6f90$15327e82@pyrimidine>
Message-ID: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>

Thanks Chris. I am getting below mentioned errors with
nmake.

As suggested, I downloaded the nmake utility from
Microsoft website and the bioperl-live tarball.

After untaring, I replaced the blast.pm file (under
bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
size) attached to the bug report 1934.

I then did the following to install packages using
nmake:

1) perl Makefile.pl was successful without any errors.


2) 'c:\nmake' results in following errors

        pl2bat.bat blib\script\bp_unflatten_seq.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.


3) 'c:\nmake test' fails with following errors:

NMAKE : fatal error U1095: expanded command line
'C:\mod_perl\Perl\bin\perl.exe
"-MExtUtils::Command::MM" "-e" "test_harness(0,
'blib\lib', 'blib\arch')" t\AACh
ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
t\AlignStats.t t\AlignUti
l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
t\Annotation.t t\AnnotationAdapto
r.t t\asciitree.t t\Assembly.t t\Biblio.t
t\Biblio_biofetch.t t\Biblio_eutils.t
t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
t\BioGraphics.t t\BlastIndex.t
 t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
t\Chain.t t\chaosxml.t t\cig
arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
t\Compatible.t t\consed.t
 t\CoordinateGraph.t t\CoordinateMapper.t
t\Correlate.t t\ctf.t t\CytoMap.t t\DB
.t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
t\ECnumber.t t\ELM.t t\embl
.t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
t\entrezgene.t t\ePCR.t t\ESEfind
er.t t\est2genome.t t\Exception.t t\Exonerate.t
t\exp.t t\fasta.t t\FeatureIO.t
t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
t\gcg.t t\GDB.t t\Gel.t t\genba
nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
t\Genomewise.t t\Genpred.t
 t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
t\GuessSeqFormat.t t\hmmer.t t\HNN
.t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
t\InterProParser.t t\IUPAC.
t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
t\largepseq.t t\LinkageMap.t t\L
iveSeq.t t\LocatableSeq.t t\Location.t
t\LocationFactory.t t\LocusLink.t t\lucy.
t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
t\MeSH.t t\metafasta.t t\Me
taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
t\MitoProt.t t\Molphy.t t\Mult
iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
t\NetPhos.t t\Node.t t\OddCo
des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
t\OMIMparser.t t\Ontology.t t\On
tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
t\phd.t t\Phenotype.t t\Phyli
pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
t\pir.t t\pln.t t\PopGen.
t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
t\primedseq.t t\Primer.t t\prime
r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
t\ProtMatrix.t t\ProtPsm.t t\Ps
eudowise.t t\psm.t t\QRNA.t t\qual.t
t\RandDistFunctions.t t\RandomTreeFactory.t
 t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
t\Relationship.t t\Relatio
nshipType.t t\RemoteBlast.t t\RepeatMasker.t
t\RestrictionAnalysis.t t\Restricti
onEnzyme.t t\RestrictionIO.t t\RNAChange.t
t\Root-Utilities.t t\RootI.t t\RootIO
.t t\RootStorable.t t\Scansite.t t\scf.t
t\SearchDist.t t\SearchIO.t t\Seq.t t\s
eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
t\SeqDiff.t t\SeqFeatCollectio
n.t t\SeqFeature.t t\seqfeaturePrimer.t
t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
 t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
t\sequencetrace.t t\SeqUtils.t
 t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
t\Sigcleave.t t\Sim4.t t\Similar
ityPair.t t\SimpleAlign.t t\simpleGOparser.t
t\singlet.t t\sirna.t t\SiteMatrix.
t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
t\splicedseq.t t\StandAloneBlast.t t\
StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
t\TagHaplotype.t t\Taxonom
y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
t\tinyseq.t t\Tools.t t\Tree
.t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
t\tutorial.t t\UCSCParsers.t
 t\Unflattener.t t\Unflattener2.t t\UniGene.t
t\Variation_IO.t t\WABA.t t\XEMBL_
DB.t t\ztr.t' too long
Stop.

C:\bioperl-live\bioperl-live>



4) 'c:\nmake install' results in following errors:

        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_taxid4species.pl blib\script\bp_taxid4species.pl
        pl2bat.bat blib\script\bp_taxid4species.pl
        C:\mod_perl\Perl\bin\perl.exe
-MExtUtils::Command -e cp ./scripts_temp/b
p_seqret.pl blib\script\bp_seqret.pl
        pl2bat.bat blib\script\bp_seqret.pl
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioscripts.pod
Can't open bioscripts.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodatabases.pod
Can't open biodatabases.pod: No such file or
directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
biodesign.pod
Can't open biodesign.pod: No such file or directory.
        C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
"-Iblib\lib" doc/makedoc.PL
bioperl.pod
Can't open bioperl.pod: No such file or directory.
Appending installation info to
C:\mod_perl\Perl\lib/perllocal.pod
NMAKE : fatal error U1095: expanded command line '@
C:\mod_perl\Perl\bin\perl.ex
e "-MExtUtils::Command::MM" -e perllocal_install 
"Module" "Bio"  "installed int
o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic" 
VERSION "1.5"  EXE_FILES "./
scripts_temp/bp_biblio.pl
./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
./scripts_temp/bp_genbank2gff3.
pl ./scripts_temp/bp_generate_histogram.pl
./scripts_temp/bp_load_gff.pl ./scrip
ts_temp/bp_meta_gff.pl
./scripts_temp/bp_process_gadfly.pl
./scripts_temp/bp_pro
cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
./scripts_temp/bp_embl2picture
.pl ./scripts_temp/bp_glyphs1-demo.pl
./scripts_temp/bp_glyphs2-demo.pl ./script
s_temp/bp_biofetch_genbank_proxy.pl
./scripts_temp/bp_bioflat_index.pl ./scripts
_temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
./scripts_temp/bp_contig_draw.
pl ./scripts_temp/bp_feature_draw.pl
./scripts_temp/bp_frend.pl ./scripts_temp/b
p_search_overview.pl ./scripts_temp/bp_fetch.pl
./scripts_temp/bp_index.pl ./scr
ipts_temp/bp_seqret.pl
./scripts_temp/bp_composite_LD.pl
./scripts_temp/bp_heter
ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
./scripts_temp/bp_filter_
search.pl ./scripts_temp/bp_hmmer_to_table.pl
./scripts_temp/bp_search2table.pl
./scripts_temp/bp_extract_feature_seq.pl
./scripts_temp/bp_make_mrna_protein.pl
./scripts_temp/bp_seqconvert.pl
./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
_translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
./scripts_temp/bp_aacomp.pl
 ./scripts_temp/bp_chaos_plot.pl
./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
ligo_count.pl
./scripts_temp/bp_classify_hits_kingdom.pl
./scripts_temp/bp_local
_taxonomydb_query.pl
./scripts_temp/bp_query_entrez_taxa.pl
./scripts_temp/bp_ta
xid4species.pl ./scripts_temp/bp_blast2tree.pl
./scripts_temp/bp_nexus2nh.pl ./s
cripts_temp/bp_tree2pag.pl
./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
l ./scripts_temp/bp_sreformat.pl
./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
mask_by_search.pl ./scripts_temp/bp_mutate.pl
./scripts_temp/bp_pairwise_kaks.pl
 ./scripts_temp/bp_remote_blast.pl
./scripts_temp/bp_search2alnblocks.pl ./scrip
ts_temp/bp_search2BSML.pl
./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
C:\mod_perl\Perl\lib\perllocal.
pod' too long
Stop.

C:\bioperl-live\bioperl-live>

--- Chris Fields  wrote:

> Upgrade bioperl from CVS using nmake. 
> 
> Installation instructions for using nmake:
> 
>
http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> 
> You can download a tarball using anonymous CVS (link
> at bottom):
> 
>
http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> 
> or use CVS directly:
> 
> http://www.bioperl.org/wiki/Using_CVS
> 
> Then make sure to grab the last SearchIO::last
> bugfix, which is not in CVS
> yet:
> 
> http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> 
> Replace the blast.pm in \site\lib\Bio\SearchIO in
> your Perl directory.
> 
> Does that fix it?
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 11:22 AM
> > To: bioperl-l at lists.open-bio.org
> > Subject: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Hi All:
> > 
> > I am new to Perl/BioPerl world.
> > 
> > I am debugging a program that used to work fine
> > before.
> > Blast works fine and returns results, but I am
> unale
> > to get any hits from the results.
> > 
> > Here is the relevant code:
> > 
> > $blastObj = new Bio::SearchIO
> (-file=>$resultsFile,
> > -format=>'blast');
> >   while (my $result = $blastObj->next_result()) {
> >      while (my $bioPerlHit = $result->next_hit())
> {
> >          .......
> > 
> > 
> > The first while condition returns true, but the
> second
> > while condition returns false. So looks like there
> is
> > some result, but it is unable to identify the hits
> in
> > the result. I printed the $result (pasted below).
> > 
> > Any ideas/comments to resolve this? Thanks in
> advance.
> > 
> > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> 1.3.34 on
> > Windows XP platform.
> > 
> > Like I said before, this application was running
> fine
> > on a different windows machine with similar
> > environment,so looks like there is some change in
> the
> > products/versions that is causing the problem.
> > 
> > thanks again,
> > Raghu
> > 
> > 
> > 
> > 
> > Blast result (i can send complete result if you
> need
> > it):
> > 
> > 

> > BLASTP 2.2.13 [Nov-27-2005]
> > Reference: Altschul, Stephen F., Thomas L. Madden,
> > Alejandro A. Sch?ffer,
> > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> J.
> > Lipman
> > (1997), "Gapped BLAST and PSI-BLAST: a new
> generation
> > of
> > protein database search programs", Nucleic Acids
> Res.
> > 25:3389-3402.
> > 
> > RID: 1140573059-19990-140117828872.BLASTQ1
> > 
> > 
> > Database: All non-redundant GenBank CDS
> > translations+PDB+SwissProt+PIR+PRF excluding
> > environmental samples
> >            3,297,000 sequences; 1,129,354,045
> total
> > letters
> > Query=
> > Length=360
> > 
> > 
> > 
> >             Score     E
> > Sequences producing significant alignments:
> >             (Bits)  Value
> > 
> > ref|XP_534770.2|  PREDICTED: similar to
> > Mitogen-activated prot...   739    0.0
> > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > [synthetic con   739    0.0
> > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> Erk2
> > Complexed...   739    0.0
> > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> Complex
> > With A S...   739    0.0
> > ref|NP_786987.1|  mitogen-activated protein kinase
> 1
> > [Bos taur...   739    0.0
> > emb|CAA77752.1|  41kD protein kinase [Homo
> sapiens]
> > >prf||1813...   738    0.0
> > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > [synthetic con   736    0.0
> > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > [Homo sapiens]   735    0.0
> > emb|CAI29602.1|  hypothetical protein [Pongo
> pygmaeus]
> >              734    0.0
> > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > [Mus muscul...   731    0.0
> > pdb|4ERK|   The Complex Structure Of The Map
> Kinase
> > Erk2OLOMOU...   731    0.0
> > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> With An
> > Arginin...   730    0.0
> > ref|XP_860750.1|  PREDICTED: similar to
> > Mitogen-activated prot...   729    0.0
> > gb|AAK56503.1|  extracellular signal-regulated
> kinase
> > 2 [Gallu...   726    0.0
> > ref|XP_860716.1|  PREDICTED: similar to
> > Mitogen-activated prot...   726    0.0
> > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> >              726    0.0
> > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> Map
> > Kinase Co...   725    0.0
> > ref|XP_860682.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > ref|XP_860651.1|  PREDICTED: similar to
> > Mitogen-activated prot...   720    0.0
> > emb|CAA77753.1|  40kDa protein kinase [Homo
> sapiens]
> > >prf||181...   717    0.0
> > ref|NP_001017127.1|  mitogen-activated protein
> kinase
> > 1 [Xenopus    715    0.0
> > dbj|BAE28679.1|  unnamed protein product [Mus
> > musculus]             713    0.0
> > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > >gb|AAH60748.1| M...   711    0.0
> > sp|P26696|MK01_XENLA  Mitogen-activated protein
> kinase
> > 1 (Myel...   711    0.0
> > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> >              706    0.0
> > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > [Danio rerio]    696    0.0
> > dbj|BAD23843.1|  extracellular signal regulated
> > protein kinase...   694    0.0
> > ref|NP_878308.2|  mitogen-activated protein kinase
> 1
> > [Danio re...   694    0.0
> > emb|CAG07778.1|  unnamed protein product
> [Tetraodon
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Wed Feb 22 21:55:34 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 15:55:34 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222210654.53827.qmail@web34404.mail.mud.yahoo.com>
Message-ID: <001701c637fa$b5110120$15327e82@pyrimidine>

You know, I assumed you were using ActivePerl b/c of the older version of
Bioperl (and since it?s the most commonly used Perl for Windows build).  My
goof.  It looks like you're using Apache/mod_perl/perl, right?  The only
Perl/Apache/mod_perl combos for Windows I know of are listed here:

http://perl.apache.org/docs/2.0/os/win32/install.html

The only Perl for Windows we have actively supported is ActivePerl AFAIK,
but maybe we can walk through this.  Anything learned here can be added to
the installation instructions in case this comes up again.

To start, what mod_perl/Perl version are you using, and from what
distributor (IndigoStar, Apache, etc)?  Each distribution should have some
documentation for installing CPAN modules or prebuilt/pretested packages,
like ActiveState's PPM or IndigoStar's GPM.  I think Apache's Perl build is
from ActiveState's source code so should come with PPM.

Next: you obviously have installed Bioperl before (v1.2.3); did you use
'make' or 'nmake', or was it from a repository (like IndigoPerl's GPM)?
AFAIK, you would install it like you would any other perl module; there
should be no problem with 'make/nmake', though 'make/nmake test' will not
pass completely (it should pass most tests, though, otherwise something is
seriously wrong).

The other option, though not as nice, is setting the PERL5LIB variable to
include the bioperl-live directory; it works for me while I'm developing.  I
don?t know how this may affect other mod_perl-related functions, though.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Wednesday, February 22, 2006 3:07 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris. I am getting below mentioned errors with
> nmake.
> 
> As suggested, I downloaded the nmake utility from
> Microsoft website and the bioperl-live tarball.
> 
> After untaring, I replaced the blast.pm file (under
> bioperl-live\Bio\SearchIO) with the blast.pm (86 KB
> size) attached to the bug report 1934.
> 
> I then did the following to install packages using
> nmake:
> 
> 1) perl Makefile.pl was successful without any errors.
> 
> 
> 2) 'c:\nmake' results in following errors
> 
>         pl2bat.bat blib\script\bp_unflatten_seq.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> 
> 
> 3) 'c:\nmake test' fails with following errors:
> 
> NMAKE : fatal error U1095: expanded command line
> 'C:\mod_perl\Perl\bin\perl.exe
> "-MExtUtils::Command::MM" "-e" "test_harness(0,
> 'blib\lib', 'blib\arch')" t\AACh
> ange.t t\AAReverseMutate.t t\abi.t t\ace.t t\AlignIO.t
> t\AlignStats.t t\AlignUti
> l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> t\Annotation.t t\AnnotationAdapto
> r.t t\asciitree.t t\Assembly.t t\Biblio.t
> t\Biblio_biofetch.t t\Biblio_eutils.t
> t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> t\BioGraphics.t t\BlastIndex.t
>  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t t\bsml_sax.t
> t\Chain.t t\chaosxml.t t\cig
> arstring.t t\ClusterIO.t t\Coalescent.t t\CodonTable.t
> t\Compatible.t t\consed.t
>  t\CoordinateGraph.t t\CoordinateMapper.t
> t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t t\Domcut.t
> t\ECnumber.t t\ELM.t t\embl
> .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> t\entrezgene.t t\ePCR.t t\ESEfind
> er.t t\est2genome.t t\Exception.t t\Exonerate.t
> t\exp.t t\fasta.t t\FeatureIO.t
> t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> t\gcg.t t\GDB.t t\Gel.t t\genba
> nk.t t\GeneCoordinateMapper.t t\Geneid.t t\Genewise.t
> t\Genomewise.t t\Genpred.t
>  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> t\GuessSeqFormat.t t\hmmer.t t\HNN
> .t t\HtSNP.t t\Index.t t\InstanceSite.t t\interpro.t
> t\InterProParser.t t\IUPAC.
> t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> t\largepseq.t t\LinkageMap.t t\L
> iveSeq.t t\LocatableSeq.t t\Location.t
> t\LocationFactory.t t\LocusLink.t t\lucy.
> t t\Map.t t\MapIO.t t\masta.t t\Matrix.t t\Measure.t
> t\MeSH.t t\metafasta.t t\Me
> taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> t\MitoProt.t t\Molphy.t t\Mult
> iFile.t t\multiple_fasta.t t\Mutation.t t\Mutator.t
> t\NetPhos.t t\Node.t t\OddCo
> des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> t\OMIMparser.t t\Ontology.t t\On
> tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> t\phd.t t\Phenotype.t t\Phyli
> pDist.t t\PhysicalMap.t t\pICalculator.t t\Pictogram.t
> t\pir.t t\pln.t t\PopGen.
> t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> t\primedseq.t t\Primer.t t\prime
> r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> t\ProtMatrix.t t\ProtPsm.t t\Ps
> eudowise.t t\psm.t t\QRNA.t t\qual.t
> t\RandDistFunctions.t t\RandomTreeFactory.t
>  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t t\Registry.t
> t\Relationship.t t\Relatio
> nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> t\RestrictionAnalysis.t t\Restricti
> onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> t\Root-Utilities.t t\RootI.t t\RootIO
> .t t\RootStorable.t t\Scansite.t t\scf.t
> t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> t\SeqDiff.t t\SeqFeatCollectio
> n.t t\SeqFeature.t t\seqfeaturePrimer.t
> t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
>  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> t\sequencetrace.t t\SeqUtils.t
>  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> t\Sigcleave.t t\Sim4.t t\Similar
> ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> t\singlet.t t\sirna.t t\SiteMatrix.
> t t\SNP.t t\Sopma.t t\Species.t t\Spidey.t
> t\splicedseq.t t\StandAloneBlast.t t\
> StructIO.t t\Structure.t t\swiss.t t\Symbol.t t\tab.t
> t\TagHaplotype.t t\Taxonom
> y.t t\TaxonTree.t t\Tempfile.t t\Term.t t\tigrxml.t
> t\tinyseq.t t\Tools.t t\Tree
> .t t\TreeBuild.t t\TreeIO.t t\trim.t t\tRNAscanSE.t
> t\tutorial.t t\UCSCParsers.t
>  t\Unflattener.t t\Unflattener2.t t\UniGene.t
> t\Variation_IO.t t\WABA.t t\XEMBL_
> DB.t t\ztr.t' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> 
> 
> 4) 'c:\nmake install' results in following errors:
> 
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_taxid4species.pl blib\script\bp_taxid4species.pl
>         pl2bat.bat blib\script\bp_taxid4species.pl
>         C:\mod_perl\Perl\bin\perl.exe
> -MExtUtils::Command -e cp ./scripts_temp/b
> p_seqret.pl blib\script\bp_seqret.pl
>         pl2bat.bat blib\script\bp_seqret.pl
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioscripts.pod
> Can't open bioscripts.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodatabases.pod
> Can't open biodatabases.pod: No such file or
> directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> biodesign.pod
> Can't open biodesign.pod: No such file or directory.
>         C:\mod_perl\Perl\bin\perl.exe "-Iblib\arch"
> "-Iblib\lib" doc/makedoc.PL
> bioperl.pod
> Can't open bioperl.pod: No such file or directory.
> Appending installation info to
> C:\mod_perl\Perl\lib/perllocal.pod
> NMAKE : fatal error U1095: expanded command line '@
> C:\mod_perl\Perl\bin\perl.ex
> e "-MExtUtils::Command::MM" -e perllocal_install
> "Module" "Bio"  "installed int
> o" "C:\mod_perl\Perl\site\lib"  LINKTYPE "dynamic"
> VERSION "1.5"  EXE_FILES "./
> scripts_temp/bp_biblio.pl
> ./scripts_temp/bp_genbank2gff.pl ./scripts_temp/bp_bul
> k_load_gff.pl ./scripts_temp/bp_fast_load_gff.pl
> ./scripts_temp/bp_genbank2gff3.
> pl ./scripts_temp/bp_generate_histogram.pl
> ./scripts_temp/bp_load_gff.pl ./scrip
> ts_temp/bp_meta_gff.pl
> ./scripts_temp/bp_process_gadfly.pl
> ./scripts_temp/bp_pro
> cess_sgd.pl ./scripts_temp/bp_process_wormbase.pl
> ./scripts_temp/bp_embl2picture
> .pl ./scripts_temp/bp_glyphs1-demo.pl
> ./scripts_temp/bp_glyphs2-demo.pl ./script
> s_temp/bp_biofetch_genbank_proxy.pl
> ./scripts_temp/bp_bioflat_index.pl ./scripts
> _temp/bp_biogetseq.pl ./scripts_temp/bp_flanks.pl
> ./scripts_temp/bp_contig_draw.
> pl ./scripts_temp/bp_feature_draw.pl
> ./scripts_temp/bp_frend.pl ./scripts_temp/b
> p_search_overview.pl ./scripts_temp/bp_fetch.pl
> ./scripts_temp/bp_index.pl ./scr
> ipts_temp/bp_seqret.pl
> ./scripts_temp/bp_composite_LD.pl
> ./scripts_temp/bp_heter
> ogeneity_test.pl ./scripts_temp/bp_fastam9_to_table.pl
> ./scripts_temp/bp_filter_
> search.pl ./scripts_temp/bp_hmmer_to_table.pl
> ./scripts_temp/bp_search2table.pl
> ./scripts_temp/bp_extract_feature_seq.pl
> ./scripts_temp/bp_make_mrna_protein.pl
> ./scripts_temp/bp_seqconvert.pl
> ./scripts_temp/bp_split_seq.pl ./scripts_temp/bp
> _translate_seq.pl ./scripts_temp/bp_unflatten_seq.pl
> ./scripts_temp/bp_aacomp.pl
>  ./scripts_temp/bp_chaos_plot.pl
> ./scripts_temp/bp_gccalc.pl ./scripts_temp/bp_o
> ligo_count.pl
> ./scripts_temp/bp_classify_hits_kingdom.pl
> ./scripts_temp/bp_local
> _taxonomydb_query.pl
> ./scripts_temp/bp_query_entrez_taxa.pl
> ./scripts_temp/bp_ta
> xid4species.pl ./scripts_temp/bp_blast2tree.pl
> ./scripts_temp/bp_nexus2nh.pl ./s
> cripts_temp/bp_tree2pag.pl
> ./scripts_temp/bp_mrtrans.pl ./scripts_temp/bp_nrdb.p
> l ./scripts_temp/bp_sreformat.pl
> ./scripts_temp/bp_dbsplit.pl ./scripts_temp/bp_
> mask_by_search.pl ./scripts_temp/bp_mutate.pl
> ./scripts_temp/bp_pairwise_kaks.pl
>  ./scripts_temp/bp_remote_blast.pl
> ./scripts_temp/bp_search2alnblocks.pl ./scrip
> ts_temp/bp_search2BSML.pl
> ./scripts_temp/bp_search2gff.pl ./scripts_temp/bp_sear
> ch2tribe.pl ./scripts_temp/bp_seq_length.pl"  >>
> C:\mod_perl\Perl\lib\perllocal.
> pod' too long
> Stop.
> 
> C:\bioperl-live\bioperl-live>
> 
> --- Chris Fields  wrote:
> 
> > Upgrade bioperl from CVS using nmake.
> >
> > Installation instructions for using nmake:
> >
> >
> http://www.bioperl.org/wiki/INSTALL.WIN#Beyond_the_Core
> >
> > You can download a tarball using anonymous CVS (link
> > at bottom):
> >
> >
> http://cvs.open-bio.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/
> >
> > or use CVS directly:
> >
> > http://www.bioperl.org/wiki/Using_CVS
> >
> > Then make sure to grab the last SearchIO::last
> > bugfix, which is not in CVS
> > yet:
> >
> > http://bugzilla.bioperl.org/show_bug.cgi?id=1934
> >
> > Replace the blast.pm in \site\lib\Bio\SearchIO in
> > your Perl directory.
> >
> > Does that fix it?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 11:22 AM
> > > To: bioperl-l at lists.open-bio.org
> > > Subject: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > Hi All:
> > >
> > > I am new to Perl/BioPerl world.
> > >
> > > I am debugging a program that used to work fine
> > > before.
> > > Blast works fine and returns results, but I am
> > unale
> > > to get any hits from the results.
> > >
> > > Here is the relevant code:
> > >
> > > $blastObj = new Bio::SearchIO
> > (-file=>$resultsFile,
> > > -format=>'blast');
> > >   while (my $result = $blastObj->next_result()) {
> > >      while (my $bioPerlHit = $result->next_hit())
> > {
> > >          .......
> > >
> > >
> > > The first while condition returns true, but the
> > second
> > > while condition returns false. So looks like there
> > is
> > > some result, but it is unable to identify the hits
> > in
> > > the result. I printed the $result (pasted below).
> > >
> > > Any ideas/comments to resolve this? Thanks in
> > advance.
> > >
> > > I am using Perl 5.8.7, BioPerl 1.2.3, Apache
> > 1.3.34 on
> > > Windows XP platform.
> > >
> > > Like I said before, this application was running
> > fine
> > > on a different windows machine with similar
> > > environment,so looks like there is some change in
> > the
> > > products/versions that is causing the problem.
> > >
> > > thanks again,
> > > Raghu
> > >
> > >
> > >
> > >
> > > Blast result (i can send complete result if you
> > need
> > > it):
> > >
> > > 

> > > BLASTP 2.2.13 [Nov-27-2005]
> > > Reference: Altschul, Stephen F., Thomas L. Madden,
> > > Alejandro A. Sch?ffer,
> > > Jinghui Zhang, Zheng Zhang, Webb Miller, and David
> > J.
> > > Lipman
> > > (1997), "Gapped BLAST and PSI-BLAST: a new
> > generation
> > > of
> > > protein database search programs", Nucleic Acids
> > Res.
> > > 25:3389-3402.
> > >
> > > RID: 1140573059-19990-140117828872.BLASTQ1
> > >
> > >
> > > Database: All non-redundant GenBank CDS
> > > translations+PDB+SwissProt+PIR+PRF excluding
> > > environmental samples
> > >            3,297,000 sequences; 1,129,354,045
> > total
> > > letters
> > > Query=
> > > Length=360
> > >
> > >
> > >
> > >             Score     E
> > > Sequences producing significant alignments:
> > >             (Bits)  Value
> > >
> > > ref|XP_534770.2|  PREDICTED: similar to
> > > Mitogen-activated prot...   739    0.0
> > > gb|AAX36107.1|  mitogen-activated protein kinase 1
> > > [synthetic con   739    0.0
> > > pdb|1WZY|A  Chain A, Crystal Structure Of Human
> > Erk2
> > > Complexed...   739    0.0
> > > pdb|1TVO|A  Chain A, The Structure Of Erk2 In
> > Complex
> > > With A S...   739    0.0
> > > ref|NP_786987.1|  mitogen-activated protein kinase
> > 1
> > > [Bos taur...   739    0.0
> > > emb|CAA77752.1|  41kD protein kinase [Homo
> > sapiens]
> > > >prf||1813...   738    0.0
> > > gb|AAQ02541.1|  mitogen-activated protein kinase 1
> > > [synthetic con   736    0.0
> > > gb|AAH99905.1|  Mitogen-activated protein kinase 1
> > > [Homo sapiens]   735    0.0
> > > emb|CAI29602.1|  hypothetical protein [Pongo
> > pygmaeus]
> > >              734    0.0
> > > gb|AAH58258.1|  Mitogen activated protein kinase 1
> > > [Mus muscul...   731    0.0
> > > pdb|4ERK|   The Complex Structure Of The Map
> > Kinase
> > > Erk2OLOMOU...   731    0.0
> > > pdb|1GOL|   Coordinates Of Rat Map Kinase Erk2
> > With An
> > > Arginin...   730    0.0
> > > ref|XP_860750.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   729    0.0
> > > gb|AAK56503.1|  extracellular signal-regulated
> > kinase
> > > 2 [Gallu...   726    0.0
> > > ref|XP_860716.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   726    0.0
> > > pdb|2ERK|   Phosphorylated Map Kinase Erk2
> > >              726    0.0
> > > pdb|1PME|   Structure Of Penta Mutant Human Erk2
> > Map
> > > Kinase Co...   725    0.0
> > > ref|XP_860682.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > ref|XP_860651.1|  PREDICTED: similar to
> > > Mitogen-activated prot...   720    0.0
> > > emb|CAA77753.1|  40kDa protein kinase [Homo
> > sapiens]
> > > >prf||181...   717    0.0
> > > ref|NP_001017127.1|  mitogen-activated protein
> > kinase
> > > 1 [Xenopus    715    0.0
> > > dbj|BAE28679.1|  unnamed protein product [Mus
> > > musculus]             713    0.0
> > > emb|CAA42482.1|  MAP kinase [Xenopus laevis]
> > > >gb|AAH60748.1| M...   711    0.0
> > > sp|P26696|MK01_XENLA  Mitogen-activated protein
> > kinase
> > > 1 (Myel...   711    0.0
> > > gb|AAH76730.1|  Xp42 protein [Xenopus laevis]
> > >              706    0.0
> > > gb|AAH65868.1|  Mitogen-activated protein kinase 1
> > > [Danio rerio]    696    0.0
> > > dbj|BAD23843.1|  extracellular signal regulated
> > > protein kinase...   694    0.0
> > > ref|NP_878308.2|  mitogen-activated protein kinase
> > 1
> > > [Danio re...   694    0.0
> > > emb|CAG07778.1|  unnamed protein product
> > [Tetraodon
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com




From iamvela at yahoo.com  Wed Feb 22 22:32:08 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 14:32:08 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <001701c637fa$b5110120$15327e82@pyrimidine>
Message-ID: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>

Chris,

Please see my response below.

--- Chris Fields  wrote:

> You know, I assumed you were using ActivePerl b/c of
> the older version of
> Bioperl (and since it?s the most commonly used Perl
> for Windows build).  My
> goof.  It looks like you're using
> Apache/mod_perl/perl, right?  The only
> Perl/Apache/mod_perl combos for Windows I know of
> are listed here:


I am using ActivePerl 5.8.7 downloaded from
activeperl.com. I just happened to install it under
c:\mod_perl\Perl directory (application has hardcoded
dependencies for this directory). I am not using
apache/mod_perl/perl.

Please see below version string returned by perl
exectutable.

 
C:\bioperl-live\bioperl-live>perl -version

This is perl, v5.8.7 built for
MSWin32-x86-multi-thread
(with 14 registered patches, see perl -V for more
detail)

Copyright 1987-2005, Larry Wall

Binary build 815 [211909] provided by ActiveState
http://www.ActiveState.com
ActiveState is a division of Sophos.
Built Nov  2 2005 08:44:52


> 
>
http://perl.apache.org/docs/2.0/os/win32/install.html
> 
> The only Perl for Windows we have actively supported
> is ActivePerl AFAIK,
> but maybe we can walk through this.  Anything
> learned here can be added to
> the installation instructions in case this comes up
> again.
> 
> To start, what mod_perl/Perl version are you using,
> and from what
> distributor (IndigoStar, Apache, etc)?  Each
> distribution should have some
> documentation for installing CPAN modules or
> prebuilt/pretested packages,
> like ActiveState's PPM or IndigoStar's GPM.  I think
> Apache's Perl build is
> from ActiveState's source code so should come with
> PPM.
> 



I used 'ppm' to install packages (DBI, Oracle-DBD,
bioperl etc) before, so this is the first time I tried
to install it using 'nmake' utility.

After downloading the latest bioperl tar ball and
replacing the blast.pm file, can I just do ppm install
bioperl instead of doing nmake?


> Next: you obviously have installed Bioperl before
> (v1.2.3); did you use
> 'make' or 'nmake', or was it from a repository (like
> IndigoPerl's GPM)?
> AFAIK, you would install it like you would any other
> perl module; there
> should be no problem with 'make/nmake', though
> 'make/nmake test' will not
> pass completely (it should pass most tests, though,
> otherwise something is
> seriously wrong).
> 
> The other option, though not as nice, is setting the
> PERL5LIB variable to
> include the bioperl-live directory; it works for me
> while I'm developing. 

I tried setting PERL5LIB, but it did not make any
difference. I am still getting the same errors.


I wanted to a clean install, i tried 'nmake clean',
but looks like there is no 'rm' utility installed on
my machine.

thanks for all your help,
Raghu

> I
> don?t know how this may affect other
> mod_perl-related functions, though.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Raghunath Verabelli
> [mailto:iamvela at yahoo.com]
> > Sent: Wednesday, February 22, 2006 3:07 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > Thanks Chris. I am getting below mentioned errors
> with
> > nmake.
> > 
> > As suggested, I downloaded the nmake utility from
> > Microsoft website and the bioperl-live tarball.
> > 
> > After untaring, I replaced the blast.pm file
> (under
> > bioperl-live\Bio\SearchIO) with the blast.pm (86
> KB
> > size) attached to the bug report 1934.
> > 
> > I then did the following to install packages using
> > nmake:
> > 
> > 1) perl Makefile.pl was successful without any
> errors.
> > 
> > 
> > 2) 'c:\nmake' results in following errors
> > 
> >         pl2bat.bat blib\script\bp_unflatten_seq.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_taxid4species.pl blib\script\bp_taxid4species.pl
> >         pl2bat.bat blib\script\bp_taxid4species.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> > -MExtUtils::Command -e cp ./scripts_temp/b
> > p_seqret.pl blib\script\bp_seqret.pl
> >         pl2bat.bat blib\script\bp_seqret.pl
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioscripts.pod
> > Can't open bioscripts.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodatabases.pod
> > Can't open biodatabases.pod: No such file or
> > directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > biodesign.pod
> > Can't open biodesign.pod: No such file or
> directory.
> >         C:\mod_perl\Perl\bin\perl.exe
> "-Iblib\arch"
> > "-Iblib\lib" doc/makedoc.PL
> > bioperl.pod
> > Can't open bioperl.pod: No such file or directory.
> > 
> > 
> > 3) 'c:\nmake test' fails with following errors:
> > 
> > NMAKE : fatal error U1095: expanded command line
> > 'C:\mod_perl\Perl\bin\perl.exe
> > "-MExtUtils::Command::MM" "-e" "test_harness(0,
> > 'blib\lib', 'blib\arch')" t\AACh
> > ange.t t\AAReverseMutate.t t\abi.t t\ace.t
> t\AlignIO.t
> > t\AlignStats.t t\AlignUti
> > l.t t\alignUtilities.t t\Allele.t t\Alphabet.t
> > t\Annotation.t t\AnnotationAdapto
> > r.t t\asciitree.t t\Assembly.t t\Biblio.t
> > t\Biblio_biofetch.t t\Biblio_eutils.t
> > t\BiblioReferences.t t\BioDBGFF.t t\BioFetch_DB.t
> > t\BioGraphics.t t\BlastIndex.t
> >  t\BPbl2seq.t t\BPlite.t t\BPpsilite.t
> t\bsml_sax.t
> > t\Chain.t t\chaosxml.t t\cig
> > arstring.t t\ClusterIO.t t\Coalescent.t
> t\CodonTable.t
> > t\Compatible.t t\consed.t
> >  t\CoordinateGraph.t t\CoordinateMapper.t
> > t\Correlate.t t\ctf.t t\CytoMap.t t\DB
> > .t t\DBCUTG.t t\DBFasta.t t\DNAMutation.t
> t\Domcut.t
> > t\ECnumber.t t\ELM.t t\embl
> > .t t\EMBL_DB.t t\EMBOSS_Tools.t t\EncodedSeq.t
> > t\entrezgene.t t\ePCR.t t\ESEfind
> > er.t t\est2genome.t t\Exception.t t\Exonerate.t
> > t\exp.t t\fasta.t t\FeatureIO.t
> > t\flat.t t\FootPrinter.t t\game.t t\GbrowseGFF.t
> > t\gcg.t t\GDB.t t\Gel.t t\genba
> > nk.t t\GeneCoordinateMapper.t t\Geneid.t
> t\Genewise.t
> > t\Genomewise.t t\Genpred.t
> >  t\GFF.t t\GOR4.t t\GOterm.t t\GraphAdaptor.t
> > t\GuessSeqFormat.t t\hmmer.t t\HNN
> > .t t\HtSNP.t t\Index.t t\InstanceSite.t
> t\interpro.t
> > t\InterProParser.t t\IUPAC.
> > t t\kegg.t t\largefasta.t t\LargeLocatableSeq.t
> > t\largepseq.t t\LinkageMap.t t\L
> > iveSeq.t t\LocatableSeq.t t\Location.t
> > t\LocationFactory.t t\LocusLink.t t\lucy.
> > t t\Map.t t\MapIO.t t\masta.t t\Matrix.t
> t\Measure.t
> > t\MeSH.t t\metafasta.t t\Me
> > taSeq.t t\MicrosatelliteMarker.t t\MiniMIMentry.t
> > t\MitoProt.t t\Molphy.t t\Mult
> > iFile.t t\multiple_fasta.t t\Mutation.t
> t\Mutator.t
> > t\NetPhos.t t\Node.t t\OddCo
> > des.t t\OMIMentry.t t\OMIMentryAllelicVariant.t
> > t\OMIMparser.t t\Ontology.t t\On
> > tologyEngine.t t\OntologyStore.t t\PAML.t t\Perl.t
> > t\phd.t t\Phenotype.t t\Phyli
> > pDist.t t\PhysicalMap.t t\pICalculator.t
> t\Pictogram.t
> > t\pir.t t\pln.t t\PopGen.
> > t t\PopGenSims.t t\primaryqual.t t\PrimarySeq.t
> > t\primedseq.t t\Primer.t t\prime
> > r3.t t\Promoterwise.t t\ProtDist.t t\protgraph.t
> > t\ProtMatrix.t t\ProtPsm.t t\Ps
> > eudowise.t t\psm.t t\QRNA.t t\qual.t
> > t\RandDistFunctions.t t\RandomTreeFactory.t
> >  t\Range.t t\RangeI.t t\raw.t t\RefSeq.t
> t\Registry.t
> > t\Relationship.t t\Relatio
> > nshipType.t t\RemoteBlast.t t\RepeatMasker.t
> > t\RestrictionAnalysis.t t\Restricti
> > onEnzyme.t t\RestrictionIO.t t\RNAChange.t
> > t\Root-Utilities.t t\RootI.t t\RootIO
> > .t t\RootStorable.t t\Scansite.t t\scf.t
> > t\SearchDist.t t\SearchIO.t t\Seq.t t\s
> > eq_quality.t t\SeqAnalysisParser.t t\SeqBuilder.t
> > t\SeqDiff.t t\SeqFeatCollectio
> > n.t t\SeqFeature.t t\seqfeaturePrimer.t
> > t\SeqHound_DB.t t\SeqIO.t t\SeqPattern.t
> >  t\seqread_fail.t t\SeqStats.t t\SequenceFamily.t
> > t\sequencetrace.t t\SeqUtils.t
> >  t\SeqVersion.t t\seqwithquality.t t\SeqWords.t
> > t\Sigcleave.t t\Sim4.t t\Similar
> > ityPair.t t\SimpleAlign.t t\simpleGOparser.t
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Thu Feb 23 00:02:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 22 Feb 2006 18:02:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060222223208.45715.qmail@web34409.mail.mud.yahoo.com>
Message-ID: <002101c6380c$75910880$15327e82@pyrimidine>

> 
> I am using ActivePerl 5.8.7 downloaded from
> activeperl.com. I just happened to install it under
> c:\mod_perl\Perl directory (application has hardcoded
> dependencies for this directory). I am not using
> apache/mod_perl/perl.
> 
> Please see below version string returned by perl
> exectutable.
> 
> 
> C:\bioperl-live\bioperl-live>perl -version
> 
> This is perl, v5.8.7 built for
> MSWin32-x86-multi-thread
> (with 14 registered patches, see perl -V for more
> detail)
> 
> Copyright 1987-2005, Larry Wall
> 
> Binary build 815 [211909] provided by ActiveState
> http://www.ActiveState.com
> ActiveState is a division of Sophos.
> Built Nov  2 2005 08:44:52
 
When you type 'perl -V' what do you see (make sure it is a capital 'V', not
lower case).

> http://perl.apache.org/docs/2.0/os/win32/install.html
> >
> > The only Perl for Windows we have actively supported
> > is ActivePerl AFAIK,
> > but maybe we can walk through this.  Anything
> > learned here can be added to
> > the installation instructions in case this comes up
> > again.
> >
> I used 'ppm' to install packages (DBI, Oracle-DBD,
> bioperl etc) before, so this is the first time I tried
> to install it using 'nmake' utility.
>
> After downloading the latest bioperl tar ball and
> replacing the blast.pm file, can I just do ppm install
> bioperl instead of doing nmake?

Okay, so I know you're using PPM now.  No, you can't do that.  I'm adding a
section to this page:

http://bioperl.open-bio.org/wiki/Making_a_BioPerl_release

about building your own PPM; it will explain everything.  It isn't up yet
but should be up tonight or tomorrow.  BTW, you'll still need nmake to work
for this to work.  Again, make sure nmake is in your PATH env variable, or
at least have it in the same directory you plan running 'nmake', 'nmake
install.'  Although nmake is buggy I haven't had a problem with it yet.
 
> > Next: you obviously have installed Bioperl before
> > (v1.2.3); did you use
> > 'make' or 'nmake', or was it from a repository (like
> > IndigoPerl's GPM)?
> > AFAIK, you would install it like you would any other
> > perl module; there
> > should be no problem with 'make/nmake', though
> > 'make/nmake test' will not
> > pass completely (it should pass most tests, though,
> > otherwise something is
> > seriously wrong).
> >
> > The other option, though not as nice, is setting the
> > PERL5LIB variable to
> > include the bioperl-live directory; it works for me
> > while I'm developing.
> 
> I tried setting PERL5LIB, but it did not make any
> difference. I am still getting the same errors.
 
Do you mean the errors from nmake or errors from your scripts?  If PERL5LIB
is set properly then it should parse those directories for modules before it
checks the rest in @INC (i.e. will not need to make and install these using
nmake).  

The reason I don't recommend this is it's not the best habit to get into
installing the entire Bioperl distribution into a folder and using PERL5LIB,
but some are forced to do it this way, so it's there if you need it.  A
direct installation is recommended if possible.

The PERL5LIB I use below only contains modules I'm working on or
modifications of current modules (like SearchIO::blast, RemoteBlast, etc).
Bioperl from CVS is installed via PPM (custom-built PPM, BTW, using the
instructions I mentioned).  

The following is what my PERL5LIB is set to.  Note that it also tells you
what @INC is set to as well:

C:\Perl\src\bioperl\bioperl-live>perl -V
Summary of my perl5 (revision 5 version 8 subversion 7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0, archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef useithreads=define 



  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="C:/Perl/src/bioperl/bioperl-live;
C:/Perl/src/bioperl/bioperl-db"
  @INC:
    C:/Perl/src/bioperl/bioperl-live
     C:/Perl/src/bioperl/bioperl-db
    C:/Perl/lib
    C:/Perl/site/lib
    .



From iamvela at yahoo.com  Thu Feb 23 02:25:02 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Wed, 22 Feb 2006 18:25:02 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <002101c6380c$75910880$15327e82@pyrimidine>
Message-ID: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>


Thanks very much Chris for your time.
Please see below output that you requested (the only
difference i saw between your output and mine is @INC
value. I have only 2 directories c:\mod_perl\perl
where i installed activeperl. I see two additional
directories in your @INC path).

>  
> When you type 'perl -V' what do you see (make sure
> it is a capital 'V', not
> lower case).

C:\Documents and Settings\Administrator>perl  -V
Summary of my perl5 (revision 5 version 8 subversion
7) configuration:
  Platform:
    osname=MSWin32, osvers=5.0,
archname=MSWin32-x86-multi-thread
    uname=''
    config_args='undef'
    hint=recommended, useposix=true, d_sigaction=undef
    usethreads=define use5005threads=undef
useithreads=define usemultiplicity=de
fine
    useperlio=define d_sfio=undef uselargefiles=define
usesocks=undef
    use64bitint=undef use64bitall=undef
uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cl', ccflags ='-nologo -Gf -W3 -MD -Zi
-DNDEBUG -O1 -DWIN32 -D_CONSOLE -
DNO_STRICT -DHAVE_DES_FCRYPT -DNO_HASH_SEED
-DUSE_SITECUSTOMIZE -DPERL_IMPLICIT_
CONTEXT -DPERL_IMPLICIT_SYS -DUSE_PERLIO
-DPERL_MSVCRT_READFIX',
    optimize='-MD -Zi -DNDEBUG -O1',
    cppflags='-DWIN32'
    ccversion='12.00.8804', gccversion='',
gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8,
byteorder=1234
    d_longlong=undef, longlongsize=8,
d_longdbl=define, longdblsize=10
    ivtype='long', ivsize=4, nvtype='double',
nvsize=8, Off_t='__int64', lseeksi
ze=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='link', ldflags ='-nologo -nodefaultlib -debug
-opt:ref,icf  -libpath:"C:
\mod_perl\Perl\lib\CORE"  -machine:x86'
    libpth=\lib
    libs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comdlg32
.lib advapi32.lib shell32.lib ole32.lib oleaut32.lib 
netapi32.lib uuid.lib ws2_
32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    perllibs=  oldnames.lib kernel32.lib user32.lib
gdi32.lib winspool.lib  comd
lg32.lib advapi32.lib shell32.lib ole32.lib
oleaut32.lib  netapi32.lib uuid.lib
ws2_32.lib mpr.lib winmm.lib  version.lib odbc32.lib
odbccp32.lib msvcrt.lib
    libc=msvcrt.lib, so=dll, useshrplib=yes,
libperl=perl58.lib
    gnulibc_version='undef'
  Dynamic Linking:
    dlsrc=dl_win32.xs, dlext=dll, d_dlsymun=undef,
ccdlflags=' '
    cccdlflags=' ', lddlflags='-dll -nologo
-nodefaultlib -debug -opt:ref,icf  -
libpath:"C:\mod_perl\Perl\lib\CORE"  -machine:x86'


Characteristics of this binary (from libperl):
  Compile-time options: MULTIPLICITY USE_ITHREADS
USE_LARGE_FILES
                        USE_SITECUSTOMIZE
PERL_IMPLICIT_CONTEXT
                        PERL_IMPLICIT_SYS
  Locally applied patches:
        ActivePerl Build 815 [211909]
        Iin_load_module moved for compatibility with
build 806
        PerlEx support in CGI::Carp
        Less verbose ExtUtils::Install and Pod::Find
        instmodsh upgraded from
ExtUtils-MakeMaker-6.25
        Patch for CAN-2005-0448 from Debian with
modifications
        Upgrade to Time-HiRes-1.76
        25774 Keys of %INC always use forward slashes
        25747 Accidental interpolation of $@ in
Pod::Html
        25362 File::Path::mkpath resets errno
        25181 Incorrect (X)HTML generated by Pod::Html
        24999 Avoid redefinition warning for MinGW
        24699 ICMP_UNREACHABLE handling in Net::Ping
        21540 Fix backward-compatibility issues in
if.pm
  Built under MSWin32
  Compiled at Nov  2 2005 08:44:52
  %ENV:
    PERL5LIB="c:\bioperl-live"
  @INC:
    c:\bioperl-live
    C:/mod_perl/Perl/lib
    C:/mod_perl/Perl/site/lib
    .



__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From michael.watson at bbsrc.ac.uk  Thu Feb 23 10:17:39 2006
From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C))
Date: Thu, 23 Feb 2006 10:17:39 -0000
Subject: [Bioperl-l] CONTIG sequence files from the NCBI
Message-ID: <8975119BCD0AC5419D61A9CF1A923E9503008306@iahce2ksrv1.iah.bbsrc.ac.uk>

What I mean is, you have accession1, which is a contig file referring to
n other sequence files.  Accession1 has a version number.  Is that
version number increased when one of the sequences that constitute it is
updated? 

-----Original Message-----
From: Brian Osborne [mailto:osborne1 at optonline.net] 
Sent: 18 February 2006 04:56
To: michael watson (IAH-C); bioperl-l
Subject: Re: [Bioperl-l] CONTIG sequence files from the NCBI

Michael,

Yes, BioPerl has done this for you. Essentially what it does it take all
the ids in the CONTIG section and query for each individually, then use
the sequences and the location data to create the single large sequence.
This sequence is appended to the annotation and feature section of the
initial Genbank entry. If you want to study this yourself take a look at
Bio::DB::NCBIHelper::postprocess_data.

OK, to answer your first question with my assumption: what NCBI is doing
is simply providing a shorthand rather than an entire large sequence,
therefore no feature coordinates change, whether it's shorthand, CONTIG,
or longhand, ORIGIN. Second, my explanation tells you that all the
sequences are the very latest versions of each sequence, that's how
eutils works by default.
However, I don't think I've answered your question because I'm not sure
I understand what you mean by "when I ask bioperl if these sequences
have been updated, I will be told no". All Bioperl does is read the file
provided by GenBank and use its stated version, nothing fancy.

Brian O.


On 2/16/06 5:31 AM, "michael watson (IAH-C)"

wrote:

> Hi
> 
> I have two questions really.  I fetched bacterial genome sequences 
> from the NCBI using Bio::DB::GenBank.
> 
> Some of these sequence entries are CONTIG sequences, ie they just 
> point to other sequences that need to be joined together to form the 
> entire genome.
> 
> Looking at my downloads, it looks as if bioperl has done all the 
> necessary joining for me - or maybe it was the NCBI that did the 
> joining?
> 
> OK, so firstly, did bioperl do the joining, and if so, are all the 
> co-ordinates of the features updated to reflect their new location on 
> the new, joined sequence?
> 
> And secondly, sequence versions... I'm thinking that possibly the 
> sequence version of the CONTIG may be 1 (as it hasn't changed) yet the

> versions of the sequences it refers to might have changed, so when I 
> ask bioperl if these sequences have been updated, I will be told no 
> because the CONTIG sequence version is 1, but I should be told yes 
> because the underlying sequences have...?
> 
> Make sense?
> 
> Thanks
> Mick
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





From neetisomaiya at gmail.com  Thu Feb 23 10:26:23 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:56:23 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
References: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <764978cf0602230226vb907821x5407599bf9accf44@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 10:45:19 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 16:15:19 +0530
Subject: [Bioperl-l] using parameters other than default in standalone blast
Message-ID: <764978cf0602230245m45747fexbb42074a98515177@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlas t->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 10:14:46 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:44:46 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230214r4b2a5efcl69ac207789379416@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive



From neetisomaiya at gmail.com  Thu Feb 23 10:13:10 2006
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Thu, 23 Feb 2006 15:43:10 +0530
Subject: [Bioperl-l] urgent help required - syntax for using paramaters
	different from default in standalone blast
Message-ID: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>

Hi,

I am running standalone blast and I wanna use a particular e value, gap open
and extension cost and matrix. Is the following the correct syntax for the
same :

                                my $Seq_in = Bio::SeqIO->new (-file =>
$file, -format => 'fasta');
                                my $query = $Seq_in->next_seq();
                                my $factory =
Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
                                                 'database' => '
human.rna.fna',
                                                 _READMETHOD => "Blast"
                                                 );
                                $factory->e(0.0001);
                                $factory->G(-11);
                                $factory->E(-1);
                                $factory->M('BLOSUM80');

                                my $blast_report =
$factory->blastall($query);
                                my $result = $blast_report->next_result;

--
-Neeti
Even my blood says, B positive

--
-Neeti
Even my blood says, B positive



From cjfields at uiuc.edu  Thu Feb 23 14:39:40 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 08:39:40 -0600
Subject: [Bioperl-l] urgent help required - syntax for using
	paramatersdifferent from default in standalone blast
In-Reply-To: <764978cf0602230213w57e16513kc7ec4bd1d9d7512d@mail.gmail.com>
Message-ID: <000301c63886$fa95eb20$15327e82@pyrimidine>

Have you tried this to see if it works?  The blast report itself should tell
you if everything is set correctly.  Use 'perldoc
Bio::Tools::Run::StandAlone::Blast', which explains everything.  I don't
know if the example script works but the test script StandAloneBlast.t (in
/t) should; that will give you plenty of examples for setting parameters.

And please, don't spam the bioperl-l list with repeated emails (four at last
count over 2 1/2 hours).
 
Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of neeti somaiya
> Sent: Thursday, February 23, 2006 4:13 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] urgent help required - syntax for using
> paramatersdifferent from default in standalone blast
> 
> Hi,
> 
> I am running standalone blast and I wanna use a particular e value, gap
> open
> and extension cost and matrix. Is the following the correct syntax for the
> same :
> 
>                                 my $Seq_in = Bio::SeqIO->new (-file =>
> $file, -format => 'fasta');
>                                 my $query = $Seq_in->next_seq();
>                                 my $factory =
> Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastn',
>                                                  'database' => '
> human.rna.fna',
>                                                  _READMETHOD => "Blast"
>                                                  );
>                                 $factory->e(0.0001);
>                                 $factory->G(-11);
>                                 $factory->E(-1);
>                                 $factory->M('BLOSUM80');
> 
>                                 my $blast_report =
> $factory->blastall($query);
>                                 my $result = $blast_report->next_result;
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> --
> -Neeti
> Even my blood says, B positive
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Thu Feb 23 15:23:53 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 09:23:53 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223022502.31234.qmail@web34406.mail.mud.yahoo.com>
Message-ID: <000a01c6388d$281ed010$15327e82@pyrimidine>

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Wednesday, February 22, 2006 8:25 PM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> 
> Thanks very much Chris for your time.
> Please see below output that you requested (the only
> difference i saw between your output and mine is @INC
> value. I have only 2 directories c:\mod_perl\perl
> where i installed activeperl. I see two additional
> directories in your @INC path).
> 
> >
> > When you type 'perl -V' what do you see (make sure
> > it is a capital 'V', not
> > lower case).
> 
> C:\Documents and Settings\Administrator>perl  -V
> Summary of my perl5 (revision 5 version 8 subversion
> 7) configuration:
>   Platform:
>     osname=MSWin32, osvers=5.0,
> archname=MSWin32-x86-multi-thread

[....]

> if.pm
>   Built under MSWin32
>   Compiled at Nov  2 2005 08:44:52
>   %ENV:
>     PERL5LIB="c:\bioperl-live"
>   @INC:
>     c:\bioperl-live
>     C:/mod_perl/Perl/lib
>     C:/mod_perl/Perl/site/lib
>     .

Personally I wouldn't place the the bioperl-live folder in the root
directory; this shouldn't make a difference, but you can try moving it to
the perl directory in a separate folder to see if that helps.  Can't see why
it would make a difference, but it is Windows... Main reason I'll switching
over to Mac OS X!

Make sure that the Bio directory is in the bioperl-live directory,
regardless (i.e. if PERL5LIB is set to
C:\mod_perl\Perl\bioperl\bioperl-live, then there should be a directory like
C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it won't work.

What do you get with this?

perl -MBio::Root::Version -e "print $Bio::Root::Version::VERSION"

If everything is working (PERL5LIB, etc) then it should be 1.5 for CVS
bioperl; otherwise it will either find the old version (1.2.3) or fail
completely.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 




From iamvela at yahoo.com  Thu Feb 23 16:23:56 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 08:23:56 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000a01c6388d$281ed010$15327e82@pyrimidine>
Message-ID: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>

Thanks Chris for all your help.

The patch for blast.pm worked. I was able to parse the
hits from the raw file. I uninstalled previous
versions of bioperl using ppm and then I installed
bioperl 1.4.x using nmake, and applied your fix. I am
getting hits the way I wanted.

However, I noticed that the p-value for each hit
doesn't seem to be parsed
correctly. It sets it to 0 for all hits. Not sure if
this is a known issue. Any suggestions/comments,
please let me know.

Thanks,
Raghu

--- Chris Fields  wrote:

> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath
> Verabelli
> > Sent: Wednesday, February 22, 2006 8:25 PM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > 
> > Thanks very much Chris for your time.
> > Please see below output that you requested (the
> only
> > difference i saw between your output and mine is
> @INC
> > value. I have only 2 directories c:\mod_perl\perl
> > where i installed activeperl. I see two additional
> > directories in your @INC path).
> > 
> > >
> > > When you type 'perl -V' what do you see (make
> sure
> > > it is a capital 'V', not
> > > lower case).
> > 
> > C:\Documents and Settings\Administrator>perl  -V
> > Summary of my perl5 (revision 5 version 8
> subversion
> > 7) configuration:
> >   Platform:
> >     osname=MSWin32, osvers=5.0,
> > archname=MSWin32-x86-multi-thread
> 
> [....]
> 
> > if.pm
> >   Built under MSWin32
> >   Compiled at Nov  2 2005 08:44:52
> >   %ENV:
> >     PERL5LIB="c:\bioperl-live"
> >   @INC:
> >     c:\bioperl-live
> >     C:/mod_perl/Perl/lib
> >     C:/mod_perl/Perl/site/lib
> >     .
> 
> Personally I wouldn't place the the bioperl-live
> folder in the root
> directory; this shouldn't make a difference, but you
> can try moving it to
> the perl directory in a separate folder to see if
> that helps.  Can't see why
> it would make a difference, but it is Windows...
> Main reason I'll switching
> over to Mac OS X!
> 
> Make sure that the Bio directory is in the
> bioperl-live directory,
> regardless (i.e. if PERL5LIB is set to
> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> should be a directory like
> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> won't work.
> 
> What do you get with this?
> 
> perl -MBio::Root::Version -e "print
> $Bio::Root::Version::VERSION"
> 
> If everything is working (PERL5LIB, etc) then it
> should be 1.5 for CVS
> bioperl; otherwise it will either find the old
> version (1.2.3) or fail
> completely.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Thu Feb 23 17:41:07 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 11:41:07 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223162356.97964.qmail@web34405.mail.mud.yahoo.com>
Message-ID: <000301c638a0$53eb9a30$15327e82@pyrimidine>

Yes that's a potential issue.  I'll try to replicate that here; please send
a code example so I can see how you're calling for the p-value.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> Sent: Thursday, February 23, 2006 10:24 AM
> To: Chris Fields; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Thanks Chris for all your help.
> 
> The patch for blast.pm worked. I was able to parse the
> hits from the raw file. I uninstalled previous
> versions of bioperl using ppm and then I installed
> bioperl 1.4.x using nmake, and applied your fix. I am
> getting hits the way I wanted.
> 
> However, I noticed that the p-value for each hit
> doesn't seem to be parsed
> correctly. It sets it to 0 for all hits. Not sure if
> this is a known issue. Any suggestions/comments,
> please let me know.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > > -----Original Message-----
> > > From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > Verabelli
> > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > >
> > > Thanks very much Chris for your time.
> > > Please see below output that you requested (the
> > only
> > > difference i saw between your output and mine is
> > @INC
> > > value. I have only 2 directories c:\mod_perl\perl
> > > where i installed activeperl. I see two additional
> > > directories in your @INC path).
> > >
> > > >
> > > > When you type 'perl -V' what do you see (make
> > sure
> > > > it is a capital 'V', not
> > > > lower case).
> > >
> > > C:\Documents and Settings\Administrator>perl  -V
> > > Summary of my perl5 (revision 5 version 8
> > subversion
> > > 7) configuration:
> > >   Platform:
> > >     osname=MSWin32, osvers=5.0,
> > > archname=MSWin32-x86-multi-thread
> >
> > [....]
> >
> > > if.pm
> > >   Built under MSWin32
> > >   Compiled at Nov  2 2005 08:44:52
> > >   %ENV:
> > >     PERL5LIB="c:\bioperl-live"
> > >   @INC:
> > >     c:\bioperl-live
> > >     C:/mod_perl/Perl/lib
> > >     C:/mod_perl/Perl/site/lib
> > >     .
> >
> > Personally I wouldn't place the the bioperl-live
> > folder in the root
> > directory; this shouldn't make a difference, but you
> > can try moving it to
> > the perl directory in a separate folder to see if
> > that helps.  Can't see why
> > it would make a difference, but it is Windows...
> > Main reason I'll switching
> > over to Mac OS X!
> >
> > Make sure that the Bio directory is in the
> > bioperl-live directory,
> > regardless (i.e. if PERL5LIB is set to
> > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > should be a directory like
> > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > won't work.
> >
> > What do you get with this?
> >
> > perl -MBio::Root::Version -e "print
> > $Bio::Root::Version::VERSION"
> >
> > If everything is working (PERL5LIB, etc) then it
> > should be 1.5 for CVS
> > bioperl; otherwise it will either find the old
> > version (1.2.3) or fail
> > completely.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From cjfields at uiuc.edu  Thu Feb 23 18:06:37 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:06:37 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000301c638a0$53eb9a30$15327e82@pyrimidine>
Message-ID: <000401c638a3$e37fb520$15327e82@pyrimidine>

Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-mill NCBI
blast report these days gives e-values (expectation value), NOT p-values.  I
think they changed over to using only e-values with BLAST v2.  Make sure you
didn't mix these up; look out the text output to make sure that P values are
present.  That would explain why you're getting 0, since they don't exist.

>From the BLAST tutorial:

The BLAST programs report E-value rather than P-values because it is easier
to understand the difference between, for example, E-value of 5 and 10 than
P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and E-value
are nearly identical.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> Sent: Thursday, February 23, 2006 11:41 AM
> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> Yes that's a potential issue.  I'll try to replicate that here; please
> send
> a code example so I can see how you're calling for the p-value.
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
> 
> > -----Original Message-----
> > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> > Sent: Thursday, February 23, 2006 10:24 AM
> > To: Chris Fields; bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> >
> > Thanks Chris for all your help.
> >
> > The patch for blast.pm worked. I was able to parse the
> > hits from the raw file. I uninstalled previous
> > versions of bioperl using ppm and then I installed
> > bioperl 1.4.x using nmake, and applied your fix. I am
> > getting hits the way I wanted.
> >
> > However, I noticed that the p-value for each hit
> > doesn't seem to be parsed
> > correctly. It sets it to 0 for all hits. Not sure if
> > this is a known issue. Any suggestions/comments,
> > please let me know.
> >
> > Thanks,
> > Raghu
> >
> > --- Chris Fields  wrote:
> >
> > > > -----Original Message-----
> > > > From: bioperl-l-bounces at lists.open-bio.org
> > > [mailto:bioperl-l-
> > > > bounces at lists.open-bio.org] On Behalf Of Raghunath
> > > Verabelli
> > > > Sent: Wednesday, February 22, 2006 8:25 PM
> > > > To: Chris Fields; bioperl-l at lists.open-bio.org
> > > > Subject: Re: [Bioperl-l] Blast returns result, but
> > > does not return hits
> > > >
> > > >
> > > > Thanks very much Chris for your time.
> > > > Please see below output that you requested (the
> > > only
> > > > difference i saw between your output and mine is
> > > @INC
> > > > value. I have only 2 directories c:\mod_perl\perl
> > > > where i installed activeperl. I see two additional
> > > > directories in your @INC path).
> > > >
> > > > >
> > > > > When you type 'perl -V' what do you see (make
> > > sure
> > > > > it is a capital 'V', not
> > > > > lower case).
> > > >
> > > > C:\Documents and Settings\Administrator>perl  -V
> > > > Summary of my perl5 (revision 5 version 8
> > > subversion
> > > > 7) configuration:
> > > >   Platform:
> > > >     osname=MSWin32, osvers=5.0,
> > > > archname=MSWin32-x86-multi-thread
> > >
> > > [....]
> > >
> > > > if.pm
> > > >   Built under MSWin32
> > > >   Compiled at Nov  2 2005 08:44:52
> > > >   %ENV:
> > > >     PERL5LIB="c:\bioperl-live"
> > > >   @INC:
> > > >     c:\bioperl-live
> > > >     C:/mod_perl/Perl/lib
> > > >     C:/mod_perl/Perl/site/lib
> > > >     .
> > >
> > > Personally I wouldn't place the the bioperl-live
> > > folder in the root
> > > directory; this shouldn't make a difference, but you
> > > can try moving it to
> > > the perl directory in a separate folder to see if
> > > that helps.  Can't see why
> > > it would make a difference, but it is Windows...
> > > Main reason I'll switching
> > > over to Mac OS X!
> > >
> > > Make sure that the Bio directory is in the
> > > bioperl-live directory,
> > > regardless (i.e. if PERL5LIB is set to
> > > C:\mod_perl\Perl\bioperl\bioperl-live, then there
> > > should be a directory like
> > > C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> > > won't work.
> > >
> > > What do you get with this?
> > >
> > > perl -MBio::Root::Version -e "print
> > > $Bio::Root::Version::VERSION"
> > >
> > > If everything is working (PERL5LIB, etc) then it
> > > should be 1.5 for CVS
> > > bioperl; otherwise it will either find the old
> > > version (1.2.3) or fail
> > > completely.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> >
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From jason.stajich at duke.edu  Thu Feb 23 18:29:57 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Thu, 23 Feb 2006 13:29:57 -0500
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000401c638a3$e37fb520$15327e82@pyrimidine>
References: <000401c638a3$e37fb520$15327e82@pyrimidine>
Message-ID: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>

p-values do show up in WU-BLAST reports so that is why we have a p- 
value function.


On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:

> Hold up a second.  Do you mean e-value, or p-value?  A run-of-the- 
> mill NCBI
> blast report these days gives e-values (expectation value), NOT p- 
> values.  I
> think they changed over to using only e-values with BLAST v2.  Make  
> sure you
> didn't mix these up; look out the text output to make sure that P  
> values are
> present.  That would explain why you're getting 0, since they don't  
> exist.
>
>> From the BLAST tutorial:
>
> The BLAST programs report E-value rather than P-values because it  
> is easier
> to understand the difference between, for example, E-value of 5 and  
> 10 than
> P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and  
> E-value
> are nearly identical.
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Chris Fields
>> Sent: Thursday, February 23, 2006 11:41 AM
>> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] Blast returns result, but does not return  
>> hits
>>
>> Yes that's a potential issue.  I'll try to replicate that here;  
>> please
>> send
>> a code example so I can see how you're calling for the p-value.
>>
>> Christopher Fields
>> Postdoctoral Researcher - Switzer Lab
>> Dept. of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>> -----Original Message-----
>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
>>> Sent: Thursday, February 23, 2006 10:24 AM
>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>> Subject: Re: [Bioperl-l] Blast returns result, but does not  
>>> return hits
>>>
>>> Thanks Chris for all your help.
>>>
>>> The patch for blast.pm worked. I was able to parse the
>>> hits from the raw file. I uninstalled previous
>>> versions of bioperl using ppm and then I installed
>>> bioperl 1.4.x using nmake, and applied your fix. I am
>>> getting hits the way I wanted.
>>>
>>> However, I noticed that the p-value for each hit
>>> doesn't seem to be parsed
>>> correctly. It sets it to 0 for all hits. Not sure if
>>> this is a known issue. Any suggestions/comments,
>>> please let me know.
>>>
>>> Thanks,
>>> Raghu
>>>
>>> --- Chris Fields  wrote:
>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org
>>>> [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
>>>> Verabelli
>>>>> Sent: Wednesday, February 22, 2006 8:25 PM
>>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] Blast returns result, but
>>>> does not return hits
>>>>>
>>>>>
>>>>> Thanks very much Chris for your time.
>>>>> Please see below output that you requested (the
>>>> only
>>>>> difference i saw between your output and mine is
>>>> @INC
>>>>> value. I have only 2 directories c:\mod_perl\perl
>>>>> where i installed activeperl. I see two additional
>>>>> directories in your @INC path).
>>>>>
>>>>>>
>>>>>> When you type 'perl -V' what do you see (make
>>>> sure
>>>>>> it is a capital 'V', not
>>>>>> lower case).
>>>>>
>>>>> C:\Documents and Settings\Administrator>perl  -V
>>>>> Summary of my perl5 (revision 5 version 8
>>>> subversion
>>>>> 7) configuration:
>>>>>   Platform:
>>>>>     osname=MSWin32, osvers=5.0,
>>>>> archname=MSWin32-x86-multi-thread
>>>>
>>>> [....]
>>>>
>>>>> if.pm
>>>>>   Built under MSWin32
>>>>>   Compiled at Nov  2 2005 08:44:52
>>>>>   %ENV:
>>>>>     PERL5LIB="c:\bioperl-live"
>>>>>   @INC:
>>>>>     c:\bioperl-live
>>>>>     C:/mod_perl/Perl/lib
>>>>>     C:/mod_perl/Perl/site/lib
>>>>>     .
>>>>
>>>> Personally I wouldn't place the the bioperl-live
>>>> folder in the root
>>>> directory; this shouldn't make a difference, but you
>>>> can try moving it to
>>>> the perl directory in a separate folder to see if
>>>> that helps.  Can't see why
>>>> it would make a difference, but it is Windows...
>>>> Main reason I'll switching
>>>> over to Mac OS X!
>>>>
>>>> Make sure that the Bio directory is in the
>>>> bioperl-live directory,
>>>> regardless (i.e. if PERL5LIB is set to
>>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
>>>> should be a directory like
>>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
>>>> won't work.
>>>>
>>>> What do you get with this?
>>>>
>>>> perl -MBio::Root::Version -e "print
>>>> $Bio::Root::Version::VERSION"
>>>>
>>>> If everything is working (PERL5LIB, etc) then it
>>>> should be 1.5 for CVS
>>>> bioperl; otherwise it will either find the old
>>>> version (1.2.3) or fail
>>>> completely.
>>>>
>>>> Christopher Fields
>>>> Postdoctoral Researcher - Switzer Lab
>>>> Dept. of Biochemistry
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>
>>>
>>> __________________________________________________
>>> Do You Yahoo!?
>>> Tired of spam?  Yahoo! Mail has the best spam protection around
>>> http://mail.yahoo.com
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Thu Feb 23 18:34:19 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 12:34:19 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <698C1C9B-BB22-44C5-A277-4CE9930C1BCD@duke.edu>
Message-ID: <000501c638a7$c2802630$15327e82@pyrimidine>

I think Raghu's running NCBI BLAST, though.  Am I right? 

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: Jason Stajich [mailto:jason.stajich at duke.edu]
> Sent: Thursday, February 23, 2006 12:30 PM
> To: Chris Fields
> Cc: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Blast returns result, but does not return hits
> 
> p-values do show up in WU-BLAST reports so that is why we have a p-
> value function.
> 
> 
> On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> 
> > Hold up a second.  Do you mean e-value, or p-value?  A run-of-the-
> > mill NCBI
> > blast report these days gives e-values (expectation value), NOT p-
> > values.  I
> > think they changed over to using only e-values with BLAST v2.  Make
> > sure you
> > didn't mix these up; look out the text output to make sure that P
> > values are
> > present.  That would explain why you're getting 0, since they don't
> > exist.
> >
> >> From the BLAST tutorial:
> >
> > The BLAST programs report E-value rather than P-values because it
> > is easier
> > to understand the difference between, for example, E-value of 5 and
> > 10 than
> > P-values of 0.993 and 0.99995. However, when E < 0.01, P-values and
> > E-value
> > are nearly identical.
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Chris Fields
> >> Sent: Thursday, February 23, 2006 11:41 AM
> >> To: 'Raghunath Verabelli'; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] Blast returns result, but does not return
> >> hits
> >>
> >> Yes that's a potential issue.  I'll try to replicate that here;
> >> please
> >> send
> >> a code example so I can see how you're calling for the p-value.
> >>
> >> Christopher Fields
> >> Postdoctoral Researcher - Switzer Lab
> >> Dept. of Biochemistry
> >> University of Illinois Urbana-Champaign
> >>
> >>> -----Original Message-----
> >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>> bounces at lists.open-bio.org] On Behalf Of Raghunath Verabelli
> >>> Sent: Thursday, February 23, 2006 10:24 AM
> >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>> Subject: Re: [Bioperl-l] Blast returns result, but does not
> >>> return hits
> >>>
> >>> Thanks Chris for all your help.
> >>>
> >>> The patch for blast.pm worked. I was able to parse the
> >>> hits from the raw file. I uninstalled previous
> >>> versions of bioperl using ppm and then I installed
> >>> bioperl 1.4.x using nmake, and applied your fix. I am
> >>> getting hits the way I wanted.
> >>>
> >>> However, I noticed that the p-value for each hit
> >>> doesn't seem to be parsed
> >>> correctly. It sets it to 0 for all hits. Not sure if
> >>> this is a known issue. Any suggestions/comments,
> >>> please let me know.
> >>>
> >>> Thanks,
> >>> Raghu
> >>>
> >>> --- Chris Fields  wrote:
> >>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org
> >>>> [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Raghunath
> >>>> Verabelli
> >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> >>>>> To: Chris Fields; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] Blast returns result, but
> >>>> does not return hits
> >>>>>
> >>>>>
> >>>>> Thanks very much Chris for your time.
> >>>>> Please see below output that you requested (the
> >>>> only
> >>>>> difference i saw between your output and mine is
> >>>> @INC
> >>>>> value. I have only 2 directories c:\mod_perl\perl
> >>>>> where i installed activeperl. I see two additional
> >>>>> directories in your @INC path).
> >>>>>
> >>>>>>
> >>>>>> When you type 'perl -V' what do you see (make
> >>>> sure
> >>>>>> it is a capital 'V', not
> >>>>>> lower case).
> >>>>>
> >>>>> C:\Documents and Settings\Administrator>perl  -V
> >>>>> Summary of my perl5 (revision 5 version 8
> >>>> subversion
> >>>>> 7) configuration:
> >>>>>   Platform:
> >>>>>     osname=MSWin32, osvers=5.0,
> >>>>> archname=MSWin32-x86-multi-thread
> >>>>
> >>>> [....]
> >>>>
> >>>>> if.pm
> >>>>>   Built under MSWin32
> >>>>>   Compiled at Nov  2 2005 08:44:52
> >>>>>   %ENV:
> >>>>>     PERL5LIB="c:\bioperl-live"
> >>>>>   @INC:
> >>>>>     c:\bioperl-live
> >>>>>     C:/mod_perl/Perl/lib
> >>>>>     C:/mod_perl/Perl/site/lib
> >>>>>     .
> >>>>
> >>>> Personally I wouldn't place the the bioperl-live
> >>>> folder in the root
> >>>> directory; this shouldn't make a difference, but you
> >>>> can try moving it to
> >>>> the perl directory in a separate folder to see if
> >>>> that helps.  Can't see why
> >>>> it would make a difference, but it is Windows...
> >>>> Main reason I'll switching
> >>>> over to Mac OS X!
> >>>>
> >>>> Make sure that the Bio directory is in the
> >>>> bioperl-live directory,
> >>>> regardless (i.e. if PERL5LIB is set to
> >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then there
> >>>> should be a directory like
> >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise it
> >>>> won't work.
> >>>>
> >>>> What do you get with this?
> >>>>
> >>>> perl -MBio::Root::Version -e "print
> >>>> $Bio::Root::Version::VERSION"
> >>>>
> >>>> If everything is working (PERL5LIB, etc) then it
> >>>> should be 1.5 for CVS
> >>>> bioperl; otherwise it will either find the old
> >>>> version (1.2.3) or fail
> >>>> completely.
> >>>>
> >>>> Christopher Fields
> >>>> Postdoctoral Researcher - Switzer Lab
> >>>> Dept. of Biochemistry
> >>>> University of Illinois Urbana-Champaign
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>
> >>>
> >>> __________________________________________________
> >>> Do You Yahoo!?
> >>> Tired of spam?  Yahoo! Mail has the best spam protection around
> >>> http://mail.yahoo.com
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12




From iamvela at yahoo.com  Thu Feb 23 19:33:50 2006
From: iamvela at yahoo.com (Raghunath Verabelli)
Date: Thu, 23 Feb 2006 11:33:50 -0800 (PST)
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <000501c638a7$c2802630$15327e82@pyrimidine>
Message-ID: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>

Chris, you are right. I am using NCBI BLAST.

Here is my http query:

my $urltext =
"http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=nr&PROGRAM=blastp";

This is my code for populating p-value:

my $pValue = $bioPerlHit->significance;


I looked at the text output, could not find any p
value column, the only 'value' column in the output is
'E value'. I will try that.

Thanks,
Raghu
 
--- Chris Fields  wrote:

> I think Raghu's running NCBI BLAST, though.  Am I
> right? 
> 
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign 
> 
> 
> > -----Original Message-----
> > From: Jason Stajich
> [mailto:jason.stajich at duke.edu]
> > Sent: Thursday, February 23, 2006 12:30 PM
> > To: Chris Fields
> > Cc: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > Subject: Re: [Bioperl-l] Blast returns result, but
> does not return hits
> > 
> > p-values do show up in WU-BLAST reports so that is
> why we have a p-
> > value function.
> > 
> > 
> > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > 
> > > Hold up a second.  Do you mean e-value, or
> p-value?  A run-of-the-
> > > mill NCBI
> > > blast report these days gives e-values
> (expectation value), NOT p-
> > > values.  I
> > > think they changed over to using only e-values
> with BLAST v2.  Make
> > > sure you
> > > didn't mix these up; look out the text output to
> make sure that P
> > > values are
> > > present.  That would explain why you're getting
> 0, since they don't
> > > exist.
> > >
> > >> From the BLAST tutorial:
> > >
> > > The BLAST programs report E-value rather than
> P-values because it
> > > is easier
> > > to understand the difference between, for
> example, E-value of 5 and
> > > 10 than
> > > P-values of 0.993 and 0.99995. However, when E <
> 0.01, P-values and
> > > E-value
> > > are nearly identical.
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher - Switzer Lab
> > > Dept. of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Chris
> Fields
> > >> Sent: Thursday, February 23, 2006 11:41 AM
> > >> To: 'Raghunath Verabelli';
> bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] Blast returns result,
> but does not return
> > >> hits
> > >>
> > >> Yes that's a potential issue.  I'll try to
> replicate that here;
> > >> please
> > >> send
> > >> a code example so I can see how you're calling
> for the p-value.
> > >>
> > >> Christopher Fields
> > >> Postdoctoral Researcher - Switzer Lab
> > >> Dept. of Biochemistry
> > >> University of Illinois Urbana-Champaign
> > >>
> > >>> -----Original Message-----
> > >>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-
> > >>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath Verabelli
> > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > >>> Subject: Re: [Bioperl-l] Blast returns result,
> but does not
> > >>> return hits
> > >>>
> > >>> Thanks Chris for all your help.
> > >>>
> > >>> The patch for blast.pm worked. I was able to
> parse the
> > >>> hits from the raw file. I uninstalled previous
> > >>> versions of bioperl using ppm and then I
> installed
> > >>> bioperl 1.4.x using nmake, and applied your
> fix. I am
> > >>> getting hits the way I wanted.
> > >>>
> > >>> However, I noticed that the p-value for each
> hit
> > >>> doesn't seem to be parsed
> > >>> correctly. It sets it to 0 for all hits. Not
> sure if
> > >>> this is a known issue. Any
> suggestions/comments,
> > >>> please let me know.
> > >>>
> > >>> Thanks,
> > >>> Raghu
> > >>>
> > >>> --- Chris Fields  wrote:
> > >>>
> > >>>>> -----Original Message-----
> > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > >>>> [mailto:bioperl-l-
> > >>>>> bounces at lists.open-bio.org] On Behalf Of
> Raghunath
> > >>>> Verabelli
> > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > >>>>> To: Chris Fields;
> bioperl-l at lists.open-bio.org
> > >>>>> Subject: Re: [Bioperl-l] Blast returns
> result, but
> > >>>> does not return hits
> > >>>>>
> > >>>>>
> > >>>>> Thanks very much Chris for your time.
> > >>>>> Please see below output that you requested
> (the
> > >>>> only
> > >>>>> difference i saw between your output and
> mine is
> > >>>> @INC
> > >>>>> value. I have only 2 directories
> c:\mod_perl\perl
> > >>>>> where i installed activeperl. I see two
> additional
> > >>>>> directories in your @INC path).
> > >>>>>
> > >>>>>>
> > >>>>>> When you type 'perl -V' what do you see
> (make
> > >>>> sure
> > >>>>>> it is a capital 'V', not
> > >>>>>> lower case).
> > >>>>>
> > >>>>> C:\Documents and Settings\Administrator>perl
>  -V
> > >>>>> Summary of my perl5 (revision 5 version 8
> > >>>> subversion
> > >>>>> 7) configuration:
> > >>>>>   Platform:
> > >>>>>     osname=MSWin32, osvers=5.0,
> > >>>>> archname=MSWin32-x86-multi-thread
> > >>>>
> > >>>> [....]
> > >>>>
> > >>>>> if.pm
> > >>>>>   Built under MSWin32
> > >>>>>   Compiled at Nov  2 2005 08:44:52
> > >>>>>   %ENV:
> > >>>>>     PERL5LIB="c:\bioperl-live"
> > >>>>>   @INC:
> > >>>>>     c:\bioperl-live
> > >>>>>     C:/mod_perl/Perl/lib
> > >>>>>     C:/mod_perl/Perl/site/lib
> > >>>>>     .
> > >>>>
> > >>>> Personally I wouldn't place the the
> bioperl-live
> > >>>> folder in the root
> > >>>> directory; this shouldn't make a difference,
> but you
> > >>>> can try moving it to
> > >>>> the perl directory in a separate folder to
> see if
> > >>>> that helps.  Can't see why
> > >>>> it would make a difference, but it is
> Windows...
> > >>>> Main reason I'll switching
> > >>>> over to Mac OS X!
> > >>>>
> > >>>> Make sure that the Bio directory is in the
> > >>>> bioperl-live directory,
> > >>>> regardless (i.e. if PERL5LIB is set to
> > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> there
> > >>>> should be a directory like
> > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> it
> > >>>> won't work.
> > >>>>
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


From cjfields at uiuc.edu  Thu Feb 23 21:11:38 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 23 Feb 2006 15:11:38 -0600
Subject: [Bioperl-l] Blast returns result, but does not return hits
In-Reply-To: <20060223193350.1917.qmail@web34410.mail.mud.yahoo.com>
Message-ID: <000301c638bd$bc9eb590$15327e82@pyrimidine>

I think you want $hit->expect (for hits) or $hsp->evalue (for HSPs).
$hit->significance (for NCBI blast) gives the values from the descriptions
(the score and expect) for each hit.

If you want to see what methods are available for any given object (in this
case Bio::Search::Hit::BlastHit ot Bio::Search::HSP::BlastHSP), use the
below script from the bioperl FAQ (use PPM to install Class::Inspector
first) and pass the object module name on the command line.  Be careful as
many of these are get/sets (so don't pass any args).
----------------------------------
#!perl
use Class::Inspector;
$class = shift || die "Usage: methods perl_class_name\n";
eval "require $class";
print join ("\n", sort @{Class::Inspector-methods($class,'full','public')}),
"\n";
----------------------------------
You should get something like this:

C:\Perl\Scripts>methods.pl Bio::Search::Hit::BlastHit
Bio::Root::Root::DESTROY
Bio::Root::Root::confess
Bio::Root::Root::debug
Bio::Root::Root::throw
Bio::Root::Root::verbose
Bio::Root::RootI::carp
Bio::Root::RootI::deprecated
Bio::Root::RootI::stack_trace
Bio::Root::RootI::stack_trace_dump
Bio::Root::RootI::throw_not_implemented
Bio::Root::RootI::warn
Bio::Root::RootI::warn_not_implemented
Bio::Search::Hit::BlastHit::expect
Bio::Search::Hit::BlastHit::found_again
Bio::Search::Hit::BlastHit::iteration
Bio::Search::Hit::BlastHit::new
Bio::Search::Hit::GenericHit::accession
Bio::Search::Hit::GenericHit::add_hsp
Bio::Search::Hit::GenericHit::algorithm
Bio::Search::Hit::GenericHit::ambiguous_aln
Bio::Search::Hit::GenericHit::bits
Bio::Search::Hit::GenericHit::description
Bio::Search::Hit::GenericHit::each_accession_number
Bio::Search::Hit::GenericHit::end
Bio::Search::Hit::GenericHit::frac_aligned_hit
Bio::Search::Hit::GenericHit::frac_aligned_query
Bio::Search::Hit::GenericHit::frac_conserved
Bio::Search::Hit::GenericHit::frac_identical
Bio::Search::Hit::GenericHit::frame
Bio::Search::Hit::GenericHit::gaps
Bio::Search::Hit::GenericHit::hsp
Bio::Search::Hit::GenericHit::hsps
Bio::Search::Hit::GenericHit::length
Bio::Search::Hit::GenericHit::length_aln
Bio::Search::Hit::GenericHit::locus
Bio::Search::Hit::GenericHit::logical_length
Bio::Search::Hit::GenericHit::matches
Bio::Search::Hit::GenericHit::n
Bio::Search::Hit::GenericHit::name
Bio::Search::Hit::GenericHit::next_hsp
Bio::Search::Hit::GenericHit::num_hsps
Bio::Search::Hit::GenericHit::num_unaligned_hit
Bio::Search::Hit::GenericHit::num_unaligned_query
Bio::Search::Hit::GenericHit::num_unaligned_sbjct
Bio::Search::Hit::GenericHit::overlap
Bio::Search::Hit::GenericHit::p
Bio::Search::Hit::GenericHit::query_length
Bio::Search::Hit::GenericHit::range
Bio::Search::Hit::GenericHit::rank
Bio::Search::Hit::GenericHit::raw_score
Bio::Search::Hit::GenericHit::rewind
Bio::Search::Hit::GenericHit::score
Bio::Search::Hit::GenericHit::seq_inds
Bio::Search::Hit::GenericHit::significance
Bio::Search::Hit::GenericHit::start
Bio::Search::Hit::GenericHit::strand
Bio::Search::Hit::GenericHit::tiled_hsps
Bio::Search::Hit::HitI::hit_description
Bio::Search::Hit::HitI::hit_length

Nice, huh?

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 

> -----Original Message-----
> From: Raghunath Verabelli [mailto:iamvela at yahoo.com]
> Sent: Thursday, February 23, 2006 1:34 PM
> To: Chris Fields; 'Jason Stajich'
> Cc: bioperl-l at lists.open-bio.org
> Subject: RE: [Bioperl-l] Blast returns result, but does not return hits
> 
> Chris, you are right. I am using NCBI BLAST.
> 
> Here is my http query:
> 
> my $urltext =
> "http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?CMD=Put&QUERY=$seq&DATABASE=n
> r&PROGRAM=blastp";
> 
> This is my code for populating p-value:
> 
> my $pValue = $bioPerlHit->significance;
> 
> 
> I looked at the text output, could not find any p
> value column, the only 'value' column in the output is
> 'E value'. I will try that.
> 
> Thanks,
> Raghu
> 
> --- Chris Fields  wrote:
> 
> > I think Raghu's running NCBI BLAST, though.  Am I
> > right?
> >
> > Christopher Fields
> > Postdoctoral Researcher - Switzer Lab
> > Dept. of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> > > -----Original Message-----
> > > From: Jason Stajich
> > [mailto:jason.stajich at duke.edu]
> > > Sent: Thursday, February 23, 2006 12:30 PM
> > > To: Chris Fields
> > > Cc: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > Subject: Re: [Bioperl-l] Blast returns result, but
> > does not return hits
> > >
> > > p-values do show up in WU-BLAST reports so that is
> > why we have a p-
> > > value function.
> > >
> > >
> > > On Feb 23, 2006, at 1:06 PM, Chris Fields wrote:
> > >
> > > > Hold up a second.  Do you mean e-value, or
> > p-value?  A run-of-the-
> > > > mill NCBI
> > > > blast report these days gives e-values
> > (expectation value), NOT p-
> > > > values.  I
> > > > think they changed over to using only e-values
> > with BLAST v2.  Make
> > > > sure you
> > > > didn't mix these up; look out the text output to
> > make sure that P
> > > > values are
> > > > present.  That would explain why you're getting
> > 0, since they don't
> > > > exist.
> > > >
> > > >> From the BLAST tutorial:
> > > >
> > > > The BLAST programs report E-value rather than
> > P-values because it
> > > > is easier
> > > > to understand the difference between, for
> > example, E-value of 5 and
> > > > 10 than
> > > > P-values of 0.993 and 0.99995. However, when E <
> > 0.01, P-values and
> > > > E-value
> > > > are nearly identical.
> > > >
> > > > Christopher Fields
> > > > Postdoctoral Researcher - Switzer Lab
> > > > Dept. of Biochemistry
> > > > University of Illinois Urbana-Champaign
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >> bounces at lists.open-bio.org] On Behalf Of Chris
> > Fields
> > > >> Sent: Thursday, February 23, 2006 11:41 AM
> > > >> To: 'Raghunath Verabelli';
> > bioperl-l at lists.open-bio.org
> > > >> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not return
> > > >> hits
> > > >>
> > > >> Yes that's a potential issue.  I'll try to
> > replicate that here;
> > > >> please
> > > >> send
> > > >> a code example so I can see how you're calling
> > for the p-value.
> > > >>
> > > >> Christopher Fields
> > > >> Postdoctoral Researcher - Switzer Lab
> > > >> Dept. of Biochemistry
> > > >> University of Illinois Urbana-Champaign
> > > >>
> > > >>> -----Original Message-----
> > > >>> From: bioperl-l-bounces at lists.open-bio.org
> > [mailto:bioperl-l-
> > > >>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath Verabelli
> > > >>> Sent: Thursday, February 23, 2006 10:24 AM
> > > >>> To: Chris Fields; bioperl-l at lists.open-bio.org
> > > >>> Subject: Re: [Bioperl-l] Blast returns result,
> > but does not
> > > >>> return hits
> > > >>>
> > > >>> Thanks Chris for all your help.
> > > >>>
> > > >>> The patch for blast.pm worked. I was able to
> > parse the
> > > >>> hits from the raw file. I uninstalled previous
> > > >>> versions of bioperl using ppm and then I
> > installed
> > > >>> bioperl 1.4.x using nmake, and applied your
> > fix. I am
> > > >>> getting hits the way I wanted.
> > > >>>
> > > >>> However, I noticed that the p-value for each
> > hit
> > > >>> doesn't seem to be parsed
> > > >>> correctly. It sets it to 0 for all hits. Not
> > sure if
> > > >>> this is a known issue. Any
> > suggestions/comments,
> > > >>> please let me know.
> > > >>>
> > > >>> Thanks,
> > > >>> Raghu
> > > >>>
> > > >>> --- Chris Fields  wrote:
> > > >>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: bioperl-l-bounces at lists.open-bio.org
> > > >>>> [mailto:bioperl-l-
> > > >>>>> bounces at lists.open-bio.org] On Behalf Of
> > Raghunath
> > > >>>> Verabelli
> > > >>>>> Sent: Wednesday, February 22, 2006 8:25 PM
> > > >>>>> To: Chris Fields;
> > bioperl-l at lists.open-bio.org
> > > >>>>> Subject: Re: [Bioperl-l] Blast returns
> > result, but
> > > >>>> does not return hits
> > > >>>>>
> > > >>>>>
> > > >>>>> Thanks very much Chris for your time.
> > > >>>>> Please see below output that you requested
> > (the
> > > >>>> only
> > > >>>>> difference i saw between your output and
> > mine is
> > > >>>> @INC
> > > >>>>> value. I have only 2 directories
> > c:\mod_perl\perl
> > > >>>>> where i installed activeperl. I see two
> > additional
> > > >>>>> directories in your @INC path).
> > > >>>>>
> > > >>>>>>
> > > >>>>>> When you type 'perl -V' what do you see
> > (make
> > > >>>> sure
> > > >>>>>> it is a capital 'V', not
> > > >>>>>> lower case).
> > > >>>>>
> > > >>>>> C:\Documents and Settings\Administrator>perl
> >  -V
> > > >>>>> Summary of my perl5 (revision 5 version 8
> > > >>>> subversion
> > > >>>>> 7) configuration:
> > > >>>>>   Platform:
> > > >>>>>     osname=MSWin32, osvers=5.0,
> > > >>>>> archname=MSWin32-x86-multi-thread
> > > >>>>
> > > >>>> [....]
> > > >>>>
> > > >>>>> if.pm
> > > >>>>>   Built under MSWin32
> > > >>>>>   Compiled at Nov  2 2005 08:44:52
> > > >>>>>   %ENV:
> > > >>>>>     PERL5LIB="c:\bioperl-live"
> > > >>>>>   @INC:
> > > >>>>>     c:\bioperl-live
> > > >>>>>     C:/mod_perl/Perl/lib
> > > >>>>>     C:/mod_perl/Perl/site/lib
> > > >>>>>     .
> > > >>>>
> > > >>>> Personally I wouldn't place the the
> > bioperl-live
> > > >>>> folder in the root
> > > >>>> directory; this shouldn't make a difference,
> > but you
> > > >>>> can try moving it to
> > > >>>> the perl directory in a separate folder to
> > see if
> > > >>>> that helps.  Can't see why
> > > >>>> it would make a difference, but it is
> > Windows...
> > > >>>> Main reason I'll switching
> > > >>>> over to Mac OS X!
> > > >>>>
> > > >>>> Make sure that the Bio directory is in the
> > > >>>> bioperl-live directory,
> > > >>>> regardless (i.e. if PERL5LIB is set to
> > > >>>> C:\mod_perl\Perl\bioperl\bioperl-live, then
> > there
> > > >>>> should be a directory like
> > > >>>> C:\Perl\bioperl\bioperl-live\Bio).  Otherwise
> > it
> > > >>>> won't work.
> > > >>>>
> >
> === message truncated ===
> 
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com



From cain at cshl.edu  Wed Feb 22 14:36:54 2006
From: cain at cshl.edu (Scott Cain)
Date: Wed, 22 Feb 2006 09:36:54 -0500
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>
	<200602211055.31221.lstein@cshl.edu>	<43FB44DD.4090504@mrc-lmb.cam.ac.uk>
	<200602211326.00021.lstein@cshl.edu>
	<43FC3ADA.4090203@mrc-lmb.cam.ac.uk>
Message-ID: <1140619014.3142.81.camel@localhost.localdomain>

Hi Dave,

I don't know if this helps at all, but you could think of that 45 tick
mark as the termination, since the space between the 44th and the 45th
tick mark corresponds to your 44th residue.  I suppose it is a matter of
correctly training your users :-)

Scott


On Wed, 2006-02-22 at 10:20 +0000, Dave Howorth wrote:
> Lincoln Stein wrote:
> > Hi Dave,
> > 
> > Well, when you are using 1-based coordinates, an line that contains 44 
> > intervals will have 45 ticks. If you move to 0-based coordinates, then the 
> > first tick will be labeled 0 and the last tick will be labeled 44. An 
> > alternative is to make each base dimensionless, but that becomes a problem 
> > when dealing with single base features, such as SNPs.
>  >
> > These issues are why I have long advocated for interbase coordinates
> > in which you number the positions between bases rather than the bases
> > themselves.
> 
> I see your point but I need to work with the coordinates that the users 
> expect and are familiar with. (Things get much worse with PDB residue 
> numbering :)
> 
> > Draw me the picture of what you expect to see. I think of it this way:
> > 
> > 	1    2  3  4   5   6
> >          A>G>C>T>A>
> 
> I guess something went wrong with your ASCII art :(
> 
> OK, consider a 44-residue entry from SwissProt (P12239):
> 
>    TSNTPNQEPVSYPIFTVRWVAVHTLAVPTIFFLGAIAAMQFIQR
> 
> The first T is numbered 1 and the last R is numbered 44.
> 
> So I expect to see a line with 44 positions indicated somehow (whether 
> these are half-open intervals or points on the line), with the number 1 
> at the left end and the number 44 at the right end.
> 
> An important point is that if I then place other tracks below this one 
> that start at say 27 and go to say 42, representing VPTIFFLGAIAAMQFI, 
> they should align properly (according to whatever convention is used to 
> represent a residue).
> 
> For a short sequence like this it would be possible to use letters to 
> represent the residue but I'd like to use the same convention for longer 
> sequences as well and have everything be consistent.
> 
> I'm hoping Bio:Graphics will make this easy.
> 
> Thanks, Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



From hlapp at gnf.org  Fri Feb 24 02:10:13 2006
From: hlapp at gnf.org (Hilmar Lapp)
Date: Thu, 23 Feb 2006 18:10:13 -0800
Subject: [Bioperl-l] [BioSQL-l] Load seqfeature from biosql database
	with perl
In-Reply-To: <1140744561.2888.19.camel@alien>
Message-ID: 

Yes, kudos to you for figuring this out yourself, and you actually figured
out the more difficult way. I apologize for my delay in responding, I was
tied up this morning and last night.

You got the first key step right, namely obtaining the right persistence
adaptor. This step determines which object you get back.

Your query will work, and in fact will be equally fast as the simple
solution (which is simple only because it is simpler to code, not because
the internally executed query is simpler). The simple solution is that every
Bio::DB::PersistenceAdaptorI implementing object (i.e., any object you get
back from $db->get_object_adaptor(..)) has a method
$adp->find_by_primary_key(). So, using that method:

    $feature = $adaptor->find_by_primary_key($seqfeature_id);

You can also control the type of object to be created (so long as it is a
Bio::SeqFeatureI) by passing in an object factory in addition.

BTW as an aside, using the finder method will also make the object cache
used for lookup first if the cache is enabled. It doesn't matter for seq
features because due to the potentially large number of objects the cache is
not enabled by default for this adaptor.

    -hilmar  

On 2/23/06 5:29 PM, "Michael Cipriano"  wrote:

> Ah, I think I figured it out.
> 
> my $seqfeature_id = '401138';
> my $adaptor = $seqdb_obj->get_object_adaptor("Bio::SeqFeatureI");
> 
> my $query = Bio::DB::Query::BioQuery->new(
> 
> -datacollections=>["Bio::SeqFeatureI t1"],
>                                         -where => ["t1.Bio::SeqFeatureI
> = ?"]);
> 
> my $qres = $adaptor->find_by_query($query,      -name=>'FIND FEATURE BY
> SEQ',
> 
> -values=>[$seqfeature_id]);
> 
> while(my $loc = $qres->next_object())
> {
>         my $obj = $loc;
> 
>         print $obj->primary_key() . "\n";
>         print 'location:' . $obj->location->to_FTstring() . "\n";
>         $obj->add_tag_value("test", "moretest");
>         foreach my $tag ($obj->get_all_tags())
>         {
>                 print " Values for tag $tag: ";
>                 print join(' ',$obj->get_tag_values($tag));
>                 print "\n";
>         }
>         print "------------------\n";
> 
> }
> 
> 
> 
> This seems to work
> On Wed, 2006-02-22 at 16:58 -0800, Michael Cipriano wrote:
>> Hello BioSQLers,
>> 
>> I have a simple question (I hope), Can I easily load a seqfeature from a
>> biosql database into a perl Bio::SeqFeatureI object?  I have the
>> database value for the  seqfeature.seqfeature_id and would like to load
>> it using this alone.
>> 
>> I do not want to have to load the whole bioentry object then search for
>> the feature, I just want the feature object since the bioentry is a
>> whole genome and loading that will take more time then necessary.
>> 
>> I have searched the documentation and have even tried looking through
>> the code for the modules, but could not find an easy fast method.
>> 
>> Please reply directly to me as well as the list as I am not a list
>> member.
>> 
>> Thanks for your help,
>> 
>> 
>> Michael Cipriano
>> 
>> _______________________________________________
>> BioSQL-l mailing list
>> BioSQL-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biosql-l
> 
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------




From praveecbt at yahoo.co.in  Fri Feb 24 05:57:22 2006
From: praveecbt at yahoo.co.in (Praveen Raj)
Date: Fri, 24 Feb 2006 05:57:22 +0000 (GMT)
Subject: [Bioperl-l] Problem in BioPerl. Help!
Message-ID: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>

Dear sir,
   
           I have one problem in using Bioperl module 'Clustalw.pm'.
Clustalw creates SimpleAlign object as output,isn't it?.
  I successfully convert the object into 'clustal' and 'phylip' format using a
  file handler.
Sir, I want to make a newick format( for phylogenetic tree ) from the object itself.
But I know that Standalone Clustalw creates a newick file(.dnd extension) as an output along with 
the .aln file.
When I created a 'clustal' format and printed into a web page, it look like this;
   
  CLUSTAL W (1.83) Multiple Sequence Alignments
Sequence format is Pearson
Sequence 1: >gi|dengue2|           13 aa
Sequence 2: >gi|yellowfever|       13 aa
Start of Pairwise alignments
Aligning...
Sequences (1:2) Aligned. Score:  15
Guide tree        file created:   [\tXGgJDIuZZ\jmIerlkHz7.dnd]
Start of Multiple Alignment
There are 1 groups
...............
   
  I don't know where the .dnd file(it's in newick format) is created.
It's not in the current directory.
Is there any method to specify the path for the .dnd file?
  I have gone through all the documentation provided with the BioPerl & clustalw.
  
How can I create a 'newick' output(.dnd file) format from a SimpleAlign object,created by Clustalw.pm?
   
  It's a great benefit for me, if you provide a solution for the same.
I can't move forward without a solution for this.
  So, Please reply...
   
                                    Thanking you,
                                                   Praveen Raj(student).
                                                   National Institute of Virology,   
                                                   Pune. India

				
---------------------------------
 Jiyo cricket on Yahoo! India cricket
Yahoo! Messenger Mobile Stay in touch with your buddies all the time.


From roy at colibase.bham.ac.uk  Fri Feb 24 15:51:46 2006
From: roy at colibase.bham.ac.uk (Roy Chaudhuri)
Date: Fri, 24 Feb 2006 15:51:46 +0000
Subject: [Bioperl-l] Problem in BioPerl. Help!
In-Reply-To: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
References: <20060224055722.13351.qmail@web8701.mail.in.yahoo.com>
Message-ID: <43FF2B92.9090801@colibase.bham.ac.uk>

Praveen Raj wrote:
> Sir, I want to make a newick format( for phylogenetic tree ) from the
> object itself. But I know that Standalone Clustalw creates a newick
> file(.dnd extension) as an output along with the .aln file.

Be careful with this. The .dnd files produced by ClustalW contain a 
Newick format guide tree- produced from pairwise-aligned sequences to 
guide the multiple alignment process. This should not be confused with a 
phylogenetic analysis, and the .dnd file is usually best ignored.

ClustalW can be used to produce a true phylogenetic tree from the 
alignment using the Neighbor-joining method (see the menus and 
documentation for details). This method produces files with a .ph or 
.phb extension (.phb if the tree is bootstrapped). I'm not sure if this 
process can be done using BioPerl, but it is possible to do using 
ClustalW's command line flags, so if you need to automate the process 
you could use Perl's system command. If you want to use BioPerl you can 
use the Phylip program neighbor to generate your tree directly from a 
SimpleAlign object, using the module 
Bio::Tools::Run::Phylo::Phylip::Neighbor.

Cheers.
Roy.
--
Dr. Roy Chaudhuri
Bioinformatics Research Fellow
Division of Immunity and Infection
University of Birmingham, U.K.

http://xbase.bham.ac.uk




From perlmails at gmail.com  Sun Feb 26 11:51:37 2006
From: perlmails at gmail.com (perlmails at gmail.com)
Date: Sun, 26 Feb 2006 17:21:37 +0530
Subject: [Bioperl-l] extract ncDNA
Message-ID: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>

Dear Bioperl group,

I have been working on extracting non-coding DNA (ncDNA) sequences
from an organimsm.

I tried extracting the intergenic sequences from the sense-strand
after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
the EMBL feature table entries using the Bioperl and the additional
script (mentioned below).

Now, I realised that there is a problem to extract the ncDNA sequences
from the negative-strand, Any ideas?

To extract the ncDNAs from negative-strand, I thought of converting
the negative-strand co-ordinates to sense-strand co-ordinates and
adding these to the sense-strand cords. Then filter all the features
(select the ncDNAs after discarding the features from EMBL FT) to get
all the ncDNAs.

Is there anything I am missing for using from the bioperl kit?

##<<>
use strict;

my $EMBL_cord_file = "Organism.feature.cords";  # feature
co-ordinates: start \t end
my $RAW_file = "Organism.raw";
my $ncDNA_file = "Organism.ncDNA";

open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
open(RAW, $RAW_file) or die "Canot open RAW_file";
open(OUT, ">$ncDNA_file") or die;

my @dna=;
my $dna = join('', at dna);

while($dna){
	$dna=~s/\s//g;
	while(){
		my @cords = split /\t/;
		my	$start = $cords[0];
		my	$end = $cords[1];
		my $replaceString = "\n>$start..$end";
		substr($dna, $start-1, $end-$start+1, $replaceString);
}
	print OUT $dna,"\n";
	exit;
}
##<<>

Another thing is, since I am reading the whole file in a scalar the
script does not complete the extraction of all ncDNAs from the
sense-strand. Obviously, the features are parsed first before the
flattening of the 266,000 nt sequence into a single string.

Any help would be appreciated.

-PO



From cjfields at uiuc.edu  Sun Feb 26 14:12:57 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 26 Feb 2006 08:12:57 -0600
Subject: [Bioperl-l] extract ncDNA
In-Reply-To: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
References: <235f7dbe0602260351j7d195d73r2a8801b29e105098@mail.gmail.com>
Message-ID: 

You're not using bioperl.  See:

http://www.bioperl.org/wiki/HOWTO:Beginners

then go to:

http://www.bioperl.org/wiki/HOWTO:Feature-Annotation

Chris


On Feb 26, 2006, at 5:51 AM, perlmails at gmail.com wrote:

> Dear Bioperl group,
>
> I have been working on extracting non-coding DNA (ncDNA) sequences
> from an organimsm.
>
> I tried extracting the intergenic sequences from the sense-strand
> after filtering the features (CDS, gene, mRNA, tRNA, rRNA etc) from
> the EMBL feature table entries using the Bioperl and the additional
> script (mentioned below).
>
> Now, I realised that there is a problem to extract the ncDNA sequences
> from the negative-strand, Any ideas?
>
> To extract the ncDNAs from negative-strand, I thought of converting
> the negative-strand co-ordinates to sense-strand co-ordinates and
> adding these to the sense-strand cords. Then filter all the features
> (select the ncDNAs after discarding the features from EMBL FT) to get
> all the ncDNAs.
>
> Is there anything I am missing for using from the bioperl kit?
>
> ##<<>
> use strict;
>
> my $EMBL_cord_file = "Organism.feature.cords";  # feature
> co-ordinates: start \t end
> my $RAW_file = "Organism.raw";
> my $ncDNA_file = "Organism.ncDNA";
>
> open(EMBLCORD, $EMBL_cord_file) or die "Canot open EMBL_cord_file";
> open(RAW, $RAW_file) or die "Canot open RAW_file";
> open(OUT, ">$ncDNA_file") or die;
>
> my @dna=;
> my $dna = join('', at dna);
>
> while($dna){
> 	$dna=~s/\s//g;
> 	while(){
> 		my @cords = split /\t/;
> 		my	$start = $cords[0];
> 		my	$end = $cords[1];
> 		my $replaceString = "\n>$start..$end";
> 		substr($dna, $start-1, $end-$start+1, $replaceString);
> }
> 	print OUT $dna,"\n";
> 	exit;
> }
> ##<<>
>
> Another thing is, since I am reading the whole file in a scalar the
> script does not complete the extraction of all ncDNAs from the
> sense-strand. Obviously, the features are parsed first before the
> flattening of the 266,000 nt sequence into a single string.
>
> Any help would be appreciated.
>
> -PO
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





From saldroubi at yahoo.com  Sun Feb 26 20:15:14 2006
From: saldroubi at yahoo.com (Sam Al-Droubi)
Date: Sun, 26 Feb 2006 12:15:14 -0800 (PST)
Subject: [Bioperl-l] Is it worth it?
Message-ID: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>

Hello everyone,
   
  Please forgive me for posting my questions on this list since they are not directly related to bioperl but since most of you are doing bioinformatics, I thought I could ask for some advise.  Also, please point me to other lists or websites if more appropriate. 
   
  Basically I am wondering if it is worth it getting a Master or PhD degree in bioinformatics with funding?  I already have an MS degree in Software Engineering and I've take a few bioinformatics courses and I like the field.  Additionally, I am almost 40 years old and have a stable job.  If I am to get PhD in 3 to 4 years, what job opportunities will be out there for me?  And is it better to work in academia or the private sector?  What the average salary like?
   
  Thank you very much and please respond to me directly instead of of the list since my questions are off topic.
   
   


Sincerely, 
Sam Al-Droubi, M.S.
saldroubi at yahoo.com


From joel at macresearcher.com  Mon Feb 27 03:12:12 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Sun, 26 Feb 2006 20:12:12 -0700
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
References: <20060226201514.14549.qmail@web34312.mail.mud.yahoo.com>
Message-ID: 

It seems to me that your mind is already made up. By asking such a  
question I think it's safe to say a PhD program in Bioinformatics  
would not be your cup of tea. This is not to be negative. If you like  
bioinformatics, do bioinformatics. Join an open-source project, or  
start one of your own. If you live in a town with a University, find  
a lab that needs bioinformatics work and volunteer your time. If you  
really have a passion for bioinformatics, just do bioinformatics and  
your path will become clear, opportunities will arise, your salary  
will be what you need. Just my two shekels of course.

- Joel

On Feb 26, 2006, at 1:15 PM, Sam Al-Droubi wrote:

> Hello everyone,
>
>   Please forgive me for posting my questions on this list since  
> they are not directly related to bioperl but since most of you are  
> doing bioinformatics, I thought I could ask for some advise.  Also,  
> please point me to other lists or websites if more appropriate.
>
>   Basically I am wondering if it is worth it getting a Master or  
> PhD degree in bioinformatics with funding?  I already have an MS  
> degree in Software Engineering and I've take a few bioinformatics  
> courses and I like the field.  Additionally, I am almost 40 years  
> old and have a stable job.  If I am to get PhD in 3 to 4 years,  
> what job opportunities will be out there for me?  And is it better  
> to work in academia or the private sector?  What the average salary  
> like?
>
>   Thank you very much and please respond to me directly instead of  
> of the list since my questions are off topic.
>
>
>
>
> Sincerely,
> Sam Al-Droubi, M.S.
> saldroubi at yahoo.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From sdavis2 at mail.nih.gov  Mon Feb 27 11:39:27 2006
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 27 Feb 2006 06:39:27 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
Message-ID: 




On 2/26/06 10:12 PM, "Joel Dudley"  wrote:

> It seems to me that your mind is already made up. By asking such a
> question I think it's safe to say a PhD program in Bioinformatics
> would not be your cup of tea. This is not to be negative. If you like
> bioinformatics, do bioinformatics. Join an open-source project, or
> start one of your own. If you live in a town with a University, find
> a lab that needs bioinformatics work and volunteer your time. If you
> really have a passion for bioinformatics, just do bioinformatics and
> your path will become clear, opportunities will arise, your salary
> will be what you need. Just my two shekels of course.

I would second this sentiment.  Most of the folks that I know that are doing
bioinformatics are doing it WITHOUT a degree in it.  The trick is to have
both computational skills AND domain-specific knowledge.  Just find a
project that will require you to gain some domain-specific knowledge (which
can actually happen pretty quickly) and go for it.  As Joel said, there are
dozens of open source projects that would love a helping hand.  If you need
more face-time, do as Joel suggests and work with a local university (or
even high school) to design some web-based tools or something like that to
do things that would be either educational or novel.

Sean




From dhoworth at mrc-lmb.cam.ac.uk  Mon Feb 27 10:40:19 2006
From: dhoworth at mrc-lmb.cam.ac.uk (Dave Howorth)
Date: Mon, 27 Feb 2006 10:40:19 +0000
Subject: [Bioperl-l] Bio::Graphics off by one?
In-Reply-To: <200602221340.28573.lstein@cshl.edu>
References: <43FAEFCD.70709@mrc-lmb.cam.ac.uk>	<1140625762.3142.107.camel@localhost.localdomain>	<43FC950C.7080007@mrc-lmb.cam.ac.uk>
	<200602221340.28573.lstein@cshl.edu>
Message-ID: <4402D713.2050007@mrc-lmb.cam.ac.uk>

Lincoln Stein wrote:
> I have just committed a version of the arrow.pm glyph that has a 
> -label_intervals flag.

Thanks Lincoln,

I've edited your new version so it displays the tick labels pretty much 
as I need. My changes were to the first and last label and to move the 
position of the others a little. I hope that it behaves exactly like 
your version unless label_intervals is set. I've attached my edited version.

There's still an oddity with the number of minor ticks at the start and 
end of the line (I've seen 7, 8, and 9 minor intervals at the start of 
the line as well as 10) but I'll probably ignore that for now.

Thanks, Dave
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arrow.pm
Type: application/x-perl
Size: 16357 bytes
Desc: not available
URL: 

From boris.steipe at utoronto.ca  Mon Feb 27 15:42:54 2006
From: boris.steipe at utoronto.ca (Boris Steipe)
Date: Mon, 27 Feb 2006 10:42:54 -0500
Subject: [Bioperl-l] Is it worth it?
In-Reply-To: 
References: 
Message-ID: <56C842D6-18AD-40B0-AE9A-47A29AE83F1D@utoronto.ca>

I'd put I slightly different emphasis on this: obviously most of  
those in the field can't have a degree in bioinformatics because such  
degree programs haven't been around for all that long. One shouldn't  
conclude that graduate programs are therefore somehow less relevant.  
To successfully apply for a paid job, you need credentials for your  
ability to be productive.

Credentials can come from open source projects IF you can document  
the scope and quality of your contributions.

Credentials can come from a graduate degree IF your thesis appears  
relevant, original and well executed.

Credentials can come from peer-reviewed publications.

Credentials can come from personal references of collaborators.



Regards,
B.

On 27 Feb 2006, at 06:39, Sean Davis wrote:

>
>
>
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
>
>> It seems to me that your mind is already made up. By asking such a
>> question I think it's safe to say a PhD program in Bioinformatics
>> would not be your cup of tea. This is not to be negative. If you like
>> bioinformatics, do bioinformatics. Join an open-source project, or
>> start one of your own. If you live in a town with a University, find
>> a lab that needs bioinformatics work and volunteer your time. If you
>> really have a passion for bioinformatics, just do bioinformatics and
>> your path will become clear, opportunities will arise, your salary
>> will be what you need. Just my two shekels of course.
>
> I would second this sentiment.  Most of the folks that I know that  
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is  
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific  
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said,  
> there are
> dozens of open source projects that would love a helping hand.  If  
> you need
> more face-time, do as Joel suggests and work with a local  
> university (or
> even high school) to design some web-based tools or something like  
> that to
> do things that would be either educational or novel.
>
> Sean
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From slenk at emich.edu  Mon Feb 27 21:07:38 2006
From: slenk at emich.edu (Stephen Gordon Lenk)
Date: Mon, 27 Feb 2006 16:07:38 -0500
Subject: [Bioperl-l] Is it worth it?
Message-ID: <556d070556f727.556f727556d070@emich.edu>

Gee golly ollie, this is good advice. I face the same issues, but am much older (53). I am taking a Sloan MS in 
Bioinformatics while working full time at the car parts company. I bring what I have newly learned at school to 
work (Perl especially, in which I build and share tools even as far away as exotic India (smile)). I take what I have 
from work (discipline, experience, work ethic) and apply it to open source and shared school projects. The 
world has given me a lot; I enjoy giving back. Why not take an MS in Biology/Bioinformatics at your pace and 
see where it leads. I have no idea if I will EVER have a JOB in Bioinformatics, so I just live it day by day. Plug 
follows - see MCPrimers at CPAN for PCR primer design for molecular cloning with site-directed mutagenesis. I 
did this as an outgrowth of a Rectech class I took. 



----- Original Message -----
From: Sean Davis 
Date: Monday, February 27, 2006 6:39 am
Subject: Re: [Bioperl-l] Is it worth it?

> 
> 
> 
> On 2/26/06 10:12 PM, "Joel Dudley"  wrote:
> 
> > It seems to me that your mind is already made up. By asking such a
> > question I think it's safe to say a PhD program in Bioinformatics
> > would not be your cup of tea. This is not to be negative. If you 
> like> bioinformatics, do bioinformatics. Join an open-source 
> project, or
> > start one of your own. If you live in a town with a University, find
> > a lab that needs bioinformatics work and volunteer your time. If you
> > really have a passion for bioinformatics, just do bioinformatics and
> > your path will become clear, opportunities will arise, your salary
> > will be what you need. Just my two shekels of course.
> 
> I would second this sentiment.  Most of the folks that I know that 
> are doing
> bioinformatics are doing it WITHOUT a degree in it.  The trick is 
> to have
> both computational skills AND domain-specific knowledge.  Just find a
> project that will require you to gain some domain-specific 
> knowledge (which
> can actually happen pretty quickly) and go for it.  As Joel said, 
> there are
> dozens of open source projects that would love a helping hand.  If 
> you need
> more face-time, do as Joel suggests and work with a local 
> university (or
> even high school) to design some web-based tools or something like 
> that to
> do things that would be either educational or novel.
> 
> Sean
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From joel at macresearcher.com  Tue Feb 28 01:56:13 2006
From: joel at macresearcher.com (Joel Dudley)
Date: Mon, 27 Feb 2006 18:56:13 -0700
Subject: [Bioperl-l] BioPerlers Represent!
Message-ID: 

Hey list,
	The contest to fill the script repository at MacResearch.org is  
ending very soon. Thus far we've only received a paltry three  
submissions with PERL scripts. The contest take home prize is a black  
iPod nano (2GB) so if you've got anything lying around that you'd  
like to share I'd suggest zipping it up and adding it to the script  
repository. Full contest details can be viewed here:

http://www.macresearch.org/ipod_contest

Now before get ready to smack me with your anti-spam cudgel, or shake  
your fist in my general direction, please note that MacResearch.org  
is completely non-profit, existing only to aid and foster community  
for scientists using OS X. I gain nothing personally by attracting  
BioPerl scripts to the repository but I'd love to see Perl well  
represented. Thanks for understanding.

- Joel


From jforment at ibmcp.upv.es  Tue Feb 28 12:17:59 2006
From: jforment at ibmcp.upv.es (Javier Forment)
Date: Tue, 28 Feb 2006 13:17:59 +0100
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for hit
 or for hsp objects?
Message-ID: <44043F77.1010901@ibmcp.upv.es>

Hi bioperlers... I have some questions when parsing BLAST results.

As far as I know, bioperl documentation for Bio::SearchIO states that 
frac_identical and frac_conserved are methods for hsp objects (e.g., 
$hsp->frac_identical). I have found that it is also possible to use 
these methods for hit objects (e.g., $hit->frac_identical), since it 
does not give an error, but in this case they don't work properly (I 
think that they work fine with blastn, but not with blastx). So my 
questions are:

1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
2.- if so, how they get the frac_identical for a hit when it has more 
than one HSP (maybe getting the average value for all the hsps)?
3.- if so, why they don't work fine sometimes, for example, with blastx?
4.- if not, is there any method to get the fraction of identical or 
conserved residues for a hit, other than averaging the corresponding 
values for all the hsps of this hit?

Thanks a lot in advance,

Javier.

-- 
Javier Forment Millet
Unidad de Bioinformatica del Laboratorio de Genomica
Instituto de Biologia Molecular y Celular de Plantas
Universidad Politecnica de Valencia
Avenida de los Naranjos, s/n
46022 Valencia (Spain)
Tlf.(1): +34-963877885
Tlf.(2): 685142553
FAX: +34-963877859
e-mail: jforment at ibmcp.upv.es


From jason.stajich at duke.edu  Tue Feb 28 13:31:00 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 08:31:00 -0500
Subject: [Bioperl-l] Are frac_identical and frac_conserved methods for
	hit or for hsp objects?
In-Reply-To: <44043F77.1010901@ibmcp.upv.es>
References: <44043F77.1010901@ibmcp.upv.es>
Message-ID: 

Personally, I only use these values from HSPs - the Hit methods  
require HSPs to be tiled to summarize the bases and I'm not convinced  
the method works for all situations.

If you want it summarized to a single value for query/hit pair I  
would use FASTA or use WU-BLAST to if you must use BLAST, get the  
links path out and summarize it on a set of HSPs paths.

-jason
On Feb 28, 2006, at 7:17 AM, Javier Forment wrote:

> Hi bioperlers... I have some questions when parsing BLAST results.
>
> As far as I know, bioperl documentation for Bio::SearchIO states that
> frac_identical and frac_conserved are methods for hsp objects (e.g.,
> $hsp->frac_identical). I have found that it is also possible to use
> these methods for hit objects (e.g., $hit->frac_identical), since it
> does not give an error, but in this case they don't work properly (I
> think that they work fine with blastn, but not with blastx). So my
> questions are:
>
> 1.- is it right to use $hit->frac_identical and $hit->frac_conserved?
> 2.- if so, how they get the frac_identical for a hit when it has more
> than one HSP (maybe getting the average value for all the hsps)?
> 3.- if so, why they don't work fine sometimes, for example, with  
> blastx?
> 4.- if not, is there any method to get the fraction of identical or
> conserved residues for a hit, other than averaging the corresponding
> values for all the hsps of this hit?
>
> Thanks a lot in advance,
>
> Javier.
>
> -- 
> Javier Forment Millet
> Unidad de Bioinformatica del Laboratorio de Genomica
> Instituto de Biologia Molecular y Celular de Plantas
> Universidad Politecnica de Valencia
> Avenida de los Naranjos, s/n
> 46022 Valencia (Spain)
> Tlf.(1): +34-963877885
> Tlf.(2): 685142553
> FAX: +34-963877859
> e-mail: jforment at ibmcp.upv.es
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From julioallen at hotmail.com  Tue Feb 28 13:22:14 2006
From: julioallen at hotmail.com (James Allen)
Date: Tue, 28 Feb 2006 13:22:14 +0000
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
Message-ID: 

Hello,
I'm retrieving data using the 'features' method of Bio::DB::GFF, and when 
the feature is on the reverse strand (ie = -1) the start and end points are 
flipped, so that 'feature->end' is the smaller number (ie what I consider 
the start point) and 'feature->start' is the larger number.
Is there anyway to prevent this behaviour, so that the start value of my 
feature is the same as the start value in my database, regardless of the 
strand?

Thanks,
Julio




From ewijaya at singnet.com.sg  Tue Feb 28 10:01:23 2006
From: ewijaya at singnet.com.sg (Edward WIJAYA)
Date: Tue, 28 Feb 2006 18:01:23 +0800
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file (Fasta)
	into Array
Message-ID: 

Hi,

Does Bio::SeqIO has a method  specially designed for
reading all the sequences from a fasta file into array.

What I have currently is this subroutine, it seems to me
__very inefficient__. I was wondering
is there a better way to achieve it.


sub get_sequence_from_fasta {
     my $file = shift;
     my @seqs= ();

     open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->new(-format => 'fasta',
                              -noclose => 1 ,
                              -fh => \*INFILE);

     while ( my $seq = $in->next_seq() ) {
        push @seqs, $seq->seq();
     }
     return @seqs;
}


BTW, I also have tried to do this. I thought
this might be a better way to do the above job.
but it doesn't work.

sub get_sequence_from_fasta_that_doesnot_work {
     my $file = shift;
      open my fh, "<$file" or die "$0:  Can't open file $file: $!";
     my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
     return <$in>;
}

Hope to hear from you again.

--
Regards,
Edward WIJAYA
SINGAPORE


From lstein at cshl.edu  Tue Feb 28 15:08:27 2006
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 28 Feb 2006 10:08:27 -0500
Subject: [Bioperl-l] GFF feature start and end points on reverse strand
In-Reply-To: 
References: 
Message-ID: <200602281008.28373.lstein@cshl.edu>

Call the absolute(1) method, which turns off relative addressing.

Lincoln

On Tuesday 28 February 2006 08:22, James Allen wrote:
> Hello,
> I'm retrieving data using the 'features' method of Bio::DB::GFF, and when
> the feature is on the reverse strand (ie = -1) the start and end points are
> flipped, so that 'feature->end' is the smaller number (ie what I consider
> the start point) and 'feature->start' is the larger number.
> Is there anyway to prevent this behaviour, so that the start value of my
> feature is the same as the start value in my database, regardless of the
> strand?
>
> Thanks,
> Julio
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
FOR URGENT MESSAGES & SCHEDULING, 
PLEASE CONTACT MY ASSISTANT, 
SANDRA MICHELSEN, AT michelse at cshl.edu


From jason.stajich at duke.edu  Tue Feb 28 17:36:34 2006
From: jason.stajich at duke.edu (Jason Stajich)
Date: Tue, 28 Feb 2006 12:36:34 -0500
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file
	(Fasta) into Array
In-Reply-To: 
References: 
Message-ID: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>


On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:

> Hi,
>
> Does Bio::SeqIO has a method  specially designed for
> reading all the sequences from a fasta file into array.
>
no but feel free to contribute one.
> What I have currently is this subroutine, it seems to me
> __very inefficient__. I was wondering
> is there a better way to achieve it.
>
Do you have a reason to think this is the slow part of your algorithm  
or are you just going on a gut reaction?  There is certainly overhead  
in calling a method but I am pretty sure that it isn't that  
significant, depends on how many sequences you are reading in I guess.

Just write a next_seq_array method and have it put the seqs onto an  
array within the method and do a benchmark test to show that it is  
faster.

-jason
>
> sub get_sequence_from_fasta {
>      my $file = shift;
>      my @seqs= ();
>
>      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->new(-format => 'fasta',
>                               -noclose => 1 ,
>                               -fh => \*INFILE);
>
>      while ( my $seq = $in->next_seq() ) {
>         push @seqs, $seq->seq();
>      }
>      return @seqs;
> }
>
>
> BTW, I also have tried to do this. I thought
> this might be a better way to do the above job.
> but it doesn't work.
>
> sub get_sequence_from_fasta_that_doesnot_work {
>      my $file = shift;
>       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
>      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
>      return <$in>;
> }
>
> Hope to hear from you again.
>
> --
> Regards,
> Edward WIJAYA
> SINGAPORE
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Duke University
http://www.duke.edu/~jes12




From cjfields at uiuc.edu  Tue Feb 28 18:50:50 2006
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 28 Feb 2006 12:50:50 -0600
Subject: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence file(Fasta)
	into Array
In-Reply-To: <1207BA95-F0E8-4049-8AAE-4B84964D147B@duke.edu>
Message-ID: <002001c63c97$e57f20c0$15327e82@pyrimidine>

Is there any particular reason why you aren't opening the file directly with
Bio::SeqIO?  

 sub get_sequence_from_fasta {
      my $file = shift;
      my @seqs= ();
      my $in = Bio::SeqIO->new(-format => 'fasta',
                               -file => "<$file");
      while ( my $seq = $in->next_seq() ) {
         push @seqs, $seq->seq();
      }
      return @seqs;
 }

I'm not completely sure of your intent here, but I think if you want to use
a globbed filehandle this way you need to open the file before entering the
sub then pass the filehandle to the sub.  I'm not sure why you pass the file
name, open the file, attach the file handle, parse the seqs, then return an
array?  Or am I missing something here?

Also, read:

http://www.bioperl.org/wiki/HOWTO:SeqIO#Working_Examples

which explains that loading arrays can be memory-intensive if the seqs are
big.

Christopher Fields
Postdoctoral Researcher - Switzer Lab
Dept. of Biochemistry
University of Illinois Urbana-Champaign 


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Jason Stajich
> Sent: Tuesday, February 28, 2006 11:37 AM
> To: Edward WIJAYA
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::SeqIO -- Reading Formated sequence
> file(Fasta) into Array
> 
> 
> On Feb 28, 2006, at 5:01 AM, Edward WIJAYA wrote:
> 
> > Hi,
> >
> > Does Bio::SeqIO has a method  specially designed for
> > reading all the sequences from a fasta file into array.
> >
> no but feel free to contribute one.
> > What I have currently is this subroutine, it seems to me
> > __very inefficient__. I was wondering
> > is there a better way to achieve it.
> >
> Do you have a reason to think this is the slow part of your algorithm
> or are you just going on a gut reaction?  There is certainly overhead
> in calling a method but I am pretty sure that it isn't that
> significant, depends on how many sequences you are reading in I guess.
> 
> Just write a next_seq_array method and have it put the seqs onto an
> array within the method and do a benchmark test to show that it is
> faster.
> 
> -jason
> >
> > sub get_sequence_from_fasta {
> >      my $file = shift;
> >      my @seqs= ();
> >
> >      open INFILE, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->new(-format => 'fasta',
> >                               -noclose => 1 ,
> >                               -fh => \*INFILE);
> >
> >      while ( my $seq = $in->next_seq() ) {
> >         push @seqs, $seq->seq();
> >      }
> >      return @seqs;
> > }
> >
> >
> > BTW, I also have tried to do this. I thought
> > this might be a better way to do the above job.
> > but it doesn't work.
> >
> > sub get_sequence_from_fasta_that_doesnot_work {
> >      my $file = shift;
> >       open my fh, "<$file" or die "$0:  Can't open file $file: $!";
> >      my $in = Bio::SeqIO->newFh( -format => 'fasta', -fh => $fh );
> >      return <$in>;
> > }
> >
> > Hope to hear from you again.
> >
> > --
> > Regards,
> > Edward WIJAYA
> > SINGAPORE
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Duke University
> http://www.duke.edu/~jes12
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l



From pterry2 at unlnotes.unl.edu  Tue Feb 28 18:53:11 2006
From: pterry2 at unlnotes.unl.edu (Philip M Terry)
Date: Tue, 28 Feb 2006 12:53:11 -0600
Subject: [Bioperl-l] Bioperl use question
Message-ID: 


Hello,

Is this an appropriate mailing list for this question?

I am trying Test 4 from the Tisdale book, p-299, "Mastering Perl for
Bioinformatics".

Comparing screen output from p-303 of the Tisdale book for bp1.pl with
mine:

philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$ ./bp1.pl
Sequence name is AI129902
Sequence acc  is AI129902
First 5 bases is CTCCG

-------------------- WARNING ---------------------
MSG: acc (gb|3598416) does not exist
---------------------------------------------------
Submitted Blast for [ROA1_HUMAN]
philip-terrys-power-mac-g5:~/perl_prac/pgms/source mterry$

Two questions:
i. why the warning message in my screen output?
ii. my Blast fails, that is,
--I don't see "dots" on the output line on screen following "Submitted
Blast for [ROA1_HUMAN]"?
--my output file, blast.out has 0 KB in it?

My computer system:
Power Mac G5, OS X 10.4.5, installed "core" bioperl, that is,
sudo perl -MCPAN -e shell;
cpan> force install B/BI/BIRNEY/bioperl-1.4.tar.gz

Can you comment?

Thanks,
Philip M. Terry, Ph.D.
University of Nebraska-Lincoln



From staffa at niehs.nih.gov  Tue Feb 28 20:01:42 2006
From: staffa at niehs.nih.gov (staffa)
Date: Tue, 28 Feb 2006 15:01:42 -0500
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
Message-ID: 

Hello,
Does anyone know if Bio::Tools::SeqWords
count_words
or
count_overlap_words
will do DNA pattern searches and honor ambiguity symbols
like exist in some restriction enzyme pattern definitions,
e.g. GGnnCC


> Thank you.
>
> Nick Staffa
> Telephone: 919-316-4569  (NIEHS: 6-4569)
> Scientific Computing Support Group
> NIEHS Information Technology Support Services Contract
> (Science Task Monitor: Jack L. Field( field1 at niehs.nih.gov )
> National Institute of Environmental Health Sciences
> National Institutes of Health
> Research Triangle Park, North Carolina
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1028 bytes
Desc: not available
URL: 

From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 21:45:16 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 08:45:16 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: 
References: <4f36aca98f5d0646586f644951dac300@niehs.nih.gov>
	
Message-ID: <4404C46C.4010005@infotech.monash.edu.au>

Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 22:01:38 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:01:38 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>
Message-ID: <4404C842.2050608@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From staffa at niehs.nih.gov  Tue Feb 28 21:46:30 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 16:46:30 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08507@NIHCESMLBX6.nih.gov>

Yes 
N matches any of the four bases.

Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 4:45 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] seq_word and pattern counts


Nick

> Does anyone know if Bio::Tools::SeqWords
> *count_words
> or
> count_overlap_words
> will do DNA pattern searches and honor ambiguity symbols
> like exist in some restriction enzyme pattern definitions,
> e.g. GGnnCC*

Examination of the code

http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/Bio/Tools/SeqWords.html#CODE4

suggests that all it does is count N-mers of any set of letters,
and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
the same N-mer.

So no it does not handle ambiguity symbols in any special manner.

What would you like it to do?
If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
it could be?
If it has 2 "N"s in it, does it count toward all 16 possible 
non-ambiguous N-mers?
And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From staffa at niehs.nih.gov  Tue Feb 28 22:08:40 2006
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS) [C])
Date: Tue, 28 Feb 2006 17:08:40 -0500
Subject: [Bioperl-l] seq_word and pattern counts
Message-ID: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>

The real problem is this:
We want to count sites in a long sequence where a restriction enzyme would cut.
This restriction enzyme, in the example I gave will recognize GGnnCC,
that is two G separated by two of any bases followed by two C.

The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
I'm sure there is some module somewhere for this purpose. 





Nick Staffa
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: Jack L. Field (field1 at niehs.nih.gov ))
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina




-----Original Message-----
From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au]
Sent: Tuesday, February 28, 2006 5:02 PM
To: Staffa, Nick (NIH/NIEHS) [C]
Cc: bioperl-l
Subject: Re: [Bioperl-l] seq_word and pattern counts


Staffa, Nick (NIH/NIEHS) [C] wrote:
> Yes 
> N matches any of the four bases.

It's still not clear what you want to me.

For simplicity, let's say we are counting words of length 1,
(which means overlapping and non-overlapping are the same)
and our sequence is "AGTN" (ie. 4 letters long)

The module would return the following
{ A=>1, G=>1, T=>1, N=>1 }    # sum of counts = 4

But you want it to return this?
{ A=>2, G=>2, T=>2, C=>1 }    # sum of counts = 7
ie. the N contributes 1 A, 1 G, 1 T and 1 C (and 0 N)

And correspondingly for all the possible ambiguity codes?

And if the word length was 2, then if we encoutered a "NN"
it would add 16 to the total count ie. 1 AA, 1 AT, 1 AC etc?

>>Does anyone know if Bio::Tools::SeqWords
>>*count_words
>>or
>>count_overlap_words
>>will do DNA pattern searches and honor ambiguity symbols
>>like exist in some restriction enzyme pattern definitions,
>>e.g. GGnnCC*

> suggests that all it does is count N-mers of any set of letters,
> and does so in a case-insensitive way ie. CAT, Cat, cat are counted as 
> the same N-mer.
> So no it does not handle ambiguity symbols in any special manner.
> What would you like it to do?
> If a N-mer has 1 "N" in it, does it count towards the 4 possible N-mers 
> it could be?
> If it has 2 "N"s in it, does it count toward all 16 possible 
> non-ambiguous N-mers?
> And so on?

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010


From torsten.seemann at infotech.monash.edu.au  Tue Feb 28 22:47:01 2006
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Wed, 01 Mar 2006 09:47:01 +1100
Subject: [Bioperl-l] seq_word and pattern counts
In-Reply-To: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
References: <7930EE6CD7CA354D93B444D0433C061101D08508@NIHCESMLBX6.nih.gov>
Message-ID: <4404D2E5.4090405@infotech.monash.edu.au>

Staffa, Nick (NIH/NIEHS) [C] wrote:
> The real problem is this:
> We want to count sites in a long sequence where a restriction enzyme would cut.
> This restriction enzyme, in the example I gave will recognize GGnnCC,
> that is two G separated by two of any bases followed by two C.
> The GCG program findpatterns will do this, but bioperl makes certain statistics easy.
> I'm sure there is some module somewhere for this purpose. 

(Nick - please respond to me AND the bioperl-l at bioperl.org mailing list 
ie. "Reply All", so others can benefit from the Q&A - I've re-sent your 
past responses already).

Perhaps this module?

http://doc.bioperl.org/bioperl-live/Bio/Tools/RestrictionEnzyme.html

With this code?

my $enz = "GGNNCC";
my $re = new Bio::Tools::RestrictionEnzyme(-NAME =>"NicksResEnz--$enz",
	  			  	 -MAKE =>'custom');
@fragments = $re->cut_seq($seqobj);
print "$enz cuts ", $seqobj->display_id, " ", scalar(@fragments), " 
times.\n";

-- 
Torsten Seemann
Victorian Bioinformatics Consortium, Monash University, Australia
http://www.vicbioinformatics.com/
Phone: +61 3 9905 9010