From jay at jays.net Thu Jun 1 00:58:29 2006 From: jay at jays.net (Jay Hannah) Date: Wed, 31 May 2006 23:58:29 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <000001c68528$d1b6ec10$15327e82@pyrimidine> References: <000001c68528$d1b6ec10$15327e82@pyrimidine> Message-ID: <447E73F5.40403@jays.net> Chris Fields wrote: >> Is the doc/ tree being abandoned? > > Most docs have been moved over to the wiki, which generates nicely formatted > docs for printing. Oh. Well, if we've already jumped off that cliff I say we just go for it. Move everything to the wiki, nuke the empty CVS dirs, and call it good. I hereby volunteer to strip the code out of bptutorial.pl and put it wherever. Where should I put it when I'm done? (examples/tutorial.pl?) >> (What's the conceptual difference between a HOWTO and a tutorial?) > > I believe the reasoning is along these lines: HOWTO's are focused in on > specific areas (graphics, trees, BLAST report parsing, etc) and thus usually > has greater detail. The tutorials are more broadly based (sort of a general > bioperl HOWTO). The only exception is the Beginner's HOWTO, but even that > has additional information over the tutorial (at least it did the last time > I looked at the tutorial, which has been a while). Huh. Sounds like a subtle line. I might suggest picking one name or the other and shuffling everything into one list on the wiki. >> It's hard for me to dive into a wiki lifestyle for the huge documentation >> pillars since it can't ever get back into the distro... (can it?) Small, >> throw away stuff is great for the wiki, but huge, established, thoughtful, >> long documents should be left in the distro? Present (and searchable) on >> the wiki but static? > > Hence the problem we face now. It is something we need to really look into > before adding too much more to the wiki. IMHO, I think we should have very > little information directly in the distribution itself since it's already > quite large. It's almost as easy to have a bare-bones INSTALL file, which > would point to the wiki for additional information. But I may be very much > alone in that train of thought ; > If the doc/ tree has already moved then I guess I just joined the all-wiki camp. I assume it stores full revision history and we have backups in case somebody blows something up. Any system is better than multiple systems breeding inconsistencies. Keep the spammers/clueless out and/or quickly remove their nonsense and I'm pro-wiki. Revisions email reviewers? >> Sick of my endless questions yet? -grin- > > Not really. Give it a few more posts. It'll come. :) j Current toy: http://openlab.jays.net/ From ULNJUJERYDIX at spammotel.com Thu Jun 1 02:53:46 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Thu, 1 Jun 2006 14:53:46 +0800 Subject: [Bioperl-l] **Fwd: Re: SOLVED ver2 Bio::Graphics::Panel make ruler have neg values Message-ID: <5b6410e0605312353l1fbf8256hc8a2b85d0f0ac199@mail.gmail.com> Thanks Lincoln! Your code worked in ver 1.4 as well. think the prob i had was due to me just adapting from the blast output tutorial so i had something like my $feature = Bio::SeqFeature::Generic->new(-display_name=>$name,-score=>$score,-start=>$start,-end=>$end, -source=>$source); and maybe also because I didn't have the + sign for the numbers on a side note, I think that the ability to offset the ruler might prove useful for some applications. Will spend more time to understand the $relative_coords_offset option in the arrow.pm when i can afford to, and perhaps help contribute an offset option to arrow.pm cheers kevin Content-Disposition: inline > > Hi Kevin, > > Since you are modifying the Panel.pm source code, why don't you just go > ahead > and use the current Bio::Graphics development tree? Since 1.5.1 it > supports > negative coordinates. Here's an illustration: > > #!/usr/bin/perl > > use strict; > > use Bio::Graphics; > use Bio::Graphics::Feature; > > my $whole = Bio::Graphics::Feature->new(-start=>-200,-end=>+200); > my $feature = > Bio::Graphics::Feature->new(-start=>-100,-end=>+100,-strand=>+1); > my $panel = Bio::Graphics::Panel->new(-start=> -200, > -end => +200, > -width=>800, > -pad_left=>10, > -pad_right=>10); > $panel->add_track($whole, > -glyph=>'arrow', > -double=>1, > -tick=>2); > $panel->add_track($feature, > -glyph=>'box', > -stranded=>1); > print $panel->png; > > exit 0; > > The resulting image is attached. > > Lincoln > > On Tuesday 30 May 2006 23:45, Kevin Lam Koiyau wrote: > > I am so sorry for the truncated email accidentally hit reply. > > if anyone is interested i have opted to change > > > > change line 161 of arrow.pm in Perl/site/lib/Bio/Graphics/Glyph/arrow.pm > > in linux its > > /usr/lib/perl5/site_perl/5.8.5/Bio/Graphics/Glyph/arrow.pm > > > > > > $gd->string($font,$middle,$center+$a2-1,$label,$font_color) > > > > to > > > > $gd->string($font,$middle,$center+$a2-1,$label-1000,$font_color) > > > > just for this one-off use. > > > > > > > > strangely I found at line 112 for ver 1.51 bioperl in arrow.pm a hidden > > option for coords offset? > > my $relative_coords_offset = > $self->option('relative_coords_offset'); > > $relative_coords_offset = 1 unless defined > $relative_coords_offset; > > but entering the option -relative_coords_offset=>1000 in the arrow > glyphs > > didn't do anything... > > > > > > > > Hi! > > > > > oh it was in a slightly different header asking about the create image > > > map feature. > > > I am using the stable version 1.4 of bioperl now. In any case I have > not > > > added the sequence as a feature annotated seq. as I already have the > bp > > > where the TF binds (in 1-1050 numberings) so what I did was to just > add > > > graded segments based on the position. > > > I saw that there is a scale function for the arrow glyp however, it is > a > > > multiply function, can it be hacked to take in a offset value (ie > minus > > > the > > > scale by 1000?) > > > > > > cheers > > > kevin > > > > > > > > > Hi, > > > > > > > For some reason I didn't see the first posting on this. In current > > > > > > bioperl > > > > > > > live, the ruler can have negative numberings - I use this routinely. > > > > You need > > > > to create a feature that starts in negative coordinates. What is > > > > > > happening > > > > > > > to > > > > you when you try this? > > > > > > > > Lincoln > > > > > > > > On Wednesday 24 May 2006 21:59, Kevin Lam Koiyau wrote: > > > > > Hi > > > > > thanks for the help offered thus far! > > > > > sigh I am trying to annotate TFBS on a -1000 to +50 bp promtoer > seq > > > > > > > > using > > > > > > > > > bioperl. therefore i was asked to make the numberings as such > (-1000) > > > > > > is > > > > > > > > there any way at all to do this in bioperl without changing the > .pm > > > > > > > > file? > > > > > > > > > thanks guys.. > > > > > kevin > > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > -- > > > > Lincoln D. Stein > > > > Cold Spring Harbor Laboratory > > > > 1 Bungtown Road > > > > Cold Spring Harbor, NY 11724 > > > > (516) 367-8380 (voice) > > > > (516) 367-8389 (fax) > > > > FOR URGENT MESSAGES & SCHEDULING, > > > > PLEASE CONTACT MY ASSISTANT, > > > > SANDRA MICHELSEN, AT michelse at cshl.edu > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > > > From sb at mrc-dunn.cam.ac.uk Thu Jun 1 03:59:21 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 01 Jun 2006 08:59:21 +0100 Subject: [Bioperl-l] For CVS developers - potential pitfallwith"returnundef" In-Reply-To: <001801c684e3$16e33730$15327e82@pyrimidine> References: <001801c684e3$16e33730$15327e82@pyrimidine> Message-ID: <447E9E59.6090709@mrc-dunn.cam.ac.uk> Chris Fields wrote: > > Sendu Bala wrote: >> Just looking for all return undef;s isn't enough. It's entirely possible >> to do something like: >> >> my $return_value; >> { >> # do something that assigns to return_value on success >> # on failure, just do nothing >> } >> return $return_value; > > Agreed, though looking for these is obviously much harder. > > The way to get around those is: > > return $return_value if $return_value; > return; > > which I've seen used in a number of get/set methods. Though if anyone is using that cookie-cutter/macro style, that's much worse because now you can't return 0. return $return_value if defined($return_value); return; In any case, it burns the eyes. I share Lincoln's POV. I also fully understand your point about not being able to trust the docs (Bio::Map::Marker...). But the solution is to change the code so they match the docs when the docs make sense, not change the code so that it no longer matches the docs[*]. In a massive OO project like bioperl the users need to be able to rely on the docs. You can't turn around and say "you've used this method for years, but now I'm changing how it works because you might have used the method incorrectly". Ideally any code changes add functionality or improve it's working without affecting code that uses the method correctly according to its old docs. * though if there isn't time/interest in changing the code, and the method never worked as per the docs, then by all means change the docs to avoid confusion - just don't change the docs on a method that worked according to the docs, because then you can assume people use the method and will be affected by the change From lstein at cshl.edu Thu Jun 1 11:40:38 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Thu, 1 Jun 2006 11:40:38 -0400 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color In-Reply-To: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> Message-ID: <200606011140.38726.lstein@cshl.edu> Hi, The border is coming from the HTML 0 in the img() call. Lincoln On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote: > Hello everybody, > > does anybody know how to remove the background color of the Panel. > Currently, I am not adding anything to it, so I can troubleshot the > problem, and I have tried setting up > all color attributes I could find to the panel, but no luck. Whatever I do, > I get the BLUE border of the panel. > > Has anybody faced the same problem? > > Thanks in advance, > > Jelena > > And here is the code I am currently using: > > --------------------------------------------------------------------------- >-------------------------------- my $panel = > Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, > -width => 800, > -pad_left => 10, > -pad_right => 10, > -key_color => 'white', > -bgcolor => 'white', > -gridcolor=>'black', > -fgcolor => 'black', > -grid => 0, > ); > my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' , > -url => '/tmpimages'); > #make clickable image > print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); > print $map; > > --------------------------------------------------------------------------- >-------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From arareko at campus.iztacala.unam.mx Thu Jun 1 12:13:05 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Jun 2006 11:13:05 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: Message-ID: <447F1211.2010705@campus.iztacala.unam.mx> You're right Brian. I also think that the text/POD part is more important than the script. Since we're more into moving everything to the Wiki, I believe this would be the right approach. Moving the script part of the tutorial into the examples/ directory is also a nice idea. Mauricio. Brian Osborne wrote: > Mauricio, > > Bernd didn't say he want the _script_ in the package, he said he wanted > bptutorial.pl in the package, not indicating whether it was the > documentation or the script that was important. It's my suspicion that the > documentation is more important than the script, and this is what my last > letter was asking, in part: is the script important? Or can we focus on the > text/POD part? > > Brian O. > > > On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra" > wrote: > >> I agree with what Bernd Web said in another reply. For some people will >> be nice to still be able to run the script from the codebase and >> interact with it. > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Jun 1 12:20:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 11:20:34 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447F1211.2010705@campus.iztacala.unam.mx> Message-ID: <000b01c68597$5026bdf0$15327e82@pyrimidine> Sounds good to me. I guess the tutorial (post-stripping)would be moved to /scripts or /examples then? Also, what do we do about similar situation with other docs moved to the wiki (INSTALL, INSTALL.WIN, etc)? Should we have a placeholder file in the distribution pointing out the wiki docs instead? Chris > -----Original Message----- > From: Mauricio Herrera Cuadra [mailto:arareko at campus.iztacala.unam.mx] > Sent: Thursday, June 01, 2006 11:13 AM > To: Brian Osborne > Cc: Chris Fields; bioperl-l at lists.open-bio.org; 'Jay Hannah' > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > You're right Brian. I also think that the text/POD part is more > important than the script. Since we're more into moving everything to > the Wiki, I believe this would be the right approach. > > Moving the script part of the tutorial into the examples/ directory is > also a nice idea. > > Mauricio. > > Brian Osborne wrote: > > Mauricio, > > > > Bernd didn't say he want the _script_ in the package, he said he wanted > > bptutorial.pl in the package, not indicating whether it was the > > documentation or the script that was important. It's my suspicion that > the > > documentation is more important than the script, and this is what my > last > > letter was asking, in part: is the script important? Or can we focus on > the > > text/POD part? > > > > Brian O. > > > > > > On 5/31/06 6:49 PM, "Mauricio Herrera Cuadra" > > wrote: > > > >> I agree with what Bernd Web said in another reply. For some people will > >> be nice to still be able to run the script from the codebase and > >> interact with it. > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > MAURICIO HERRERA CUADRA > arareko at campus.iztacala.unam.mx > Laboratorio de Gen?tica > Unidad de Morfofisiolog?a y Funci?n > Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Jun 1 12:28:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 11:28:38 -0500 Subject: [Bioperl-l] For CVS developers - potentialpitfallwith"returnundef" In-Reply-To: <447E9E59.6090709@mrc-dunn.cam.ac.uk> Message-ID: <000c01c68598$704b15d0$15327e82@pyrimidine> > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Thursday, June 01, 2006 2:59 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers - > potentialpitfallwith"returnundef" > > Chris Fields wrote: > > > > Sendu Bala wrote: > >> Just looking for all return undef;s isn't enough. It's entirely > possible > >> to do something like: > >> > >> my $return_value; > >> { > >> # do something that assigns to return_value on success > >> # on failure, just do nothing > >> } > >> return $return_value; > > > > Agreed, though looking for these is obviously much harder. > > > > The way to get around those is: > > > > return $return_value if $return_value; > > return; > > > > which I've seen used in a number of get/set methods. > > Though if anyone is using that cookie-cutter/macro style, that's much > worse because now you can't return 0. > > return $return_value if defined($return_value); > return; Makes sense. Really, this all comes down to semantics and the context of how the method is called and what is expected as a return value. I suppose it also depends on what one considers 'best practice,' which can be subjective. I don't want us getting into a situation in which we come across as critiquing someone else's code w/o some valid points, i.e. Lincoln's point about complaining. I think that's why this thread is pretty important, in that we're getting a broad range of opinions on the issue. > In any case, it burns the eyes. Yep, I agree. > I share Lincoln's POV. I also fully > understand your point about not being able to trust the docs > (Bio::Map::Marker...). But the solution is to change the code so they > match the docs when the docs make sense, not change the code so that it > no longer matches the docs[*]. In a massive OO project like bioperl the So you know, Lincoln and I both support the idea of an audit. He also notes (and I agree) that people will likely complain. Anyway, changing the code to match the docs makes sense therotically, but in practice that doesn't always work. Any situation where code does not behave as expected (i.e. as described in the docs) are bugs and can be reported as such. The problem arises when the docs are completely wrong, as Bio::Restriction::IO was before I made changes to it. In many cases simple small code changes won't work, such as when methods inherit from an interface but don't implement all methods (so essentially are incomplete). Hilmar made the point that we should change the docs to reflect inconsistencies in particular plugin modules for IO classes (AlignIO has a few modules with unimplemented write methods, and so on). When the code radically varies, such as in the Restriction::IO case (where none of the write methods worked), the docs should be changed in the IO class to reflect this. Of course, you should also add a bit to the TO DO section of POD and add a bit to the Project Priority List on the wiki to point this out, both of whichI did. It comes down to 'truth in advertising', does it do what's expected. > users need to be able to rely on the docs. You can't turn around and say > "you've used this method for years, but now I'm changing how it works > because you might have used the method incorrectly". Ideally any code Not what I did, BTW. The API is intact; you can still use the write methods if you want (they throw errors just fine). In fact, I didn't change any methods except in one module (Restriction::IO::bairoch), where I added a warning to the read method b/c it didn't work as expected, and I filed a bug report. Essentially, the only thing I changed was the docs to reflect what the code currently can accomplish (at least until you read the TO DO). We already had one person email the group asking why code in the synopsis didn't work. Adding read and write methods to most of these modules (making the code do what the docs reflect, in your words) is a lot of work, esp. for someone like me unfamiliar with the class architecture and methods for those modules. IMHO, contributions to bioperl should accomplish what is reflected in their docs once added to the core; if a write method hasn't been written, then add it to the docs in a TO DO section or add a warning to the synopsis. Don't put in the docs what you intend the code to accomplish down the road but what it does currently. Is that unreasonable? Anyway, when something doesn't perform as expected (produces invalid output or contains errors), it's considered a bug. That includes misrepresenting what a module does in the docs. When we try to fix bugs we have to decipher what the intent of the original author was from the docs and code, then try to get it to work by modifying the code. In extreme cases (such as unimplemented methods) that may mean writing up entire methods from scratch. The read and write methods for IO modules are normally the longest methods in a class. That's a heck of a lot of effort for something that a large majority of us aren't interested in taking up, esp. when the submitting author should have had everything up to spec (i.e. what's in the docs) when adding it to the core. > changes add functionality or improve it's working without affecting code > that uses the method correctly according to its old docs. > > > * though if there isn't time/interest in changing the code, and the > method never worked as per the docs, then by all means change the docs > to avoid confusion - just don't change the docs on a method that worked > according to the docs, because then you can assume people use the method > and will be affected by the change Again, didn't do that. The methods in the docs either didn't exist (not implemented) or didn't work (contained bugs). The docs were changed b/c they were misleading. -chris _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Thu Jun 1 12:36:07 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Jun 2006 11:36:07 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E48B9.4080503@jays.net> References: <447E48B9.4080503@jays.net> Message-ID: <447F1777.3070906@campus.iztacala.unam.mx> Jay Hannah wrote: > Mauricio Herrera Cuadra wrote: >> I've added a link in the left menu of the wiki. If you think it should >> point to the Tutorials page instead of the Bptutorial.pl page please let >> me know. > > Instead of all these competing links on the left, maybe we should have a master "documentation" page linked on the left cascading like so? > > Documentation (linked on the left menu) > - Quick start > - FAQ > - HOWTOs > - Tutorials Nice idea, I'll check with Jason if it's possible (in mediawiki) to create a new Documentation sidebar to hold this 4 sections. > (What's the conceptual difference between a HOWTO and a tutorial?) My concept is that Tutorials cover a wider aspect of BioPerl, contrary to the HOWTO's which focus on a certain topic. > Why isn't the short "Current events" just listed on the top of the "News" page? I don't know, maybe because it was important when Jason started the Wiki a couple of months ago. Do you think it should be erased from the sidebar? > Sick of my endless questions yet? -grin- > > j > Of course not! :) Mauricio. -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From sb at mrc-dunn.cam.ac.uk Thu Jun 1 12:46:03 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 01 Jun 2006 17:46:03 +0100 Subject: [Bioperl-l] For CVS developers - potentialpitfallwith"returnundef" In-Reply-To: <000c01c68598$704b15d0$15327e82@pyrimidine> References: <000c01c68598$704b15d0$15327e82@pyrimidine> Message-ID: <447F19CB.4090607@mrc-dunn.cam.ac.uk> Chris Fields wrote: > > Sendu Bala wrote: [snip] >> users need to be able to rely on the docs. You can't turn around and say >> "you've used this method for years, but now I'm changing how it works >> because you might have used the method incorrectly". Ideally any code > > Not what I did, BTW. [snip] >> * though if there isn't time/interest in changing the code, and the >> method never worked as per the docs, then by all means change the docs >> to avoid confusion - just don't change the docs on a method that worked >> according to the docs, because then you can assume people use the method >> and will be affected by the change > > Again, didn't do that. I'm very sorry that I allowed the ambiguity, but my comments were certainly not directed at your recent changes to Bio::Restriction::IO. In fact, I put in the above * comment to exclude your changes from my discussion; you changed the docs because the code never did what they said they did (the docs were bad). That's fine (good!). My comments were a general point, slightly directed at the idea of changing all the return undef;s - changing the code so that it no longer matches the docs of a previously working method. That's what I think is bad. Though in this particular case it shouldn't make any difference at all. From osborne1 at optonline.net Thu Jun 1 12:46:02 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 01 Jun 2006 12:46:02 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine> Message-ID: Chris, I think the INSTALL* files should be in the package, this is the de facto convention for 99% of the packages I've ever seen. Then any Wiki page just links to the file in CVS. Personally I don't like the idea of maintaining a Wiki page and a file that both say essentially the same thing (this is what has happened with the INSTALL and INSTALL.WIN files). I've spent plenty of time merging redundant text and removing files that contained these redundancies so it's unfortunate to see them appear anew, sooner or later they'll get out of sync despite best intentions. The most likely cause will be someone other than the person who created the initial duplication (and promised to maintain both) making a change in one of the two files. Brian O. On 6/1/06 12:20 PM, "Chris Fields" wrote: > Also, what do we do about similar situation with other docs moved to the > wiki (INSTALL, INSTALL.WIN, etc)? Should we have a placeholder file in the > distribution pointing out the wiki docs instead? From sb at mrc-dunn.cam.ac.uk Thu Jun 1 12:57:27 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 01 Jun 2006 17:57:27 +0100 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <000b01c68597$5026bdf0$15327e82@pyrimidine> References: <000b01c68597$5026bdf0$15327e82@pyrimidine> Message-ID: <447F1C77.5040403@mrc-dunn.cam.ac.uk> Chris Fields wrote: > Sounds good to me. I guess the tutorial (post-stripping)would be moved to > /scripts or /examples then? > > Also, what do we do about similar situation with other docs moved to the > wiki (INSTALL, INSTALL.WIN, etc)? Should we have a placeholder file in the > distribution pointing out the wiki docs instead? Imho, something like an installation document should be there in full so once you've downloaded you can install without reference to anything else. Also, an installation document could be considered specific to the release version. Which is to say, it never goes out of date even if new versions of bioperl are released with new installation instructions - it applies to the installation directory it is found in. The wiki can have the latest installation instructions, and you don't have to worry about keeping things synced. From cjfields at uiuc.edu Thu Jun 1 13:13:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 12:13:30 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447F1C77.5040403@mrc-dunn.cam.ac.uk> Message-ID: <000d01c6859e$b47cc2c0$15327e82@pyrimidine> So basically have a minimal set of installation instructions in CVS and a more detailed installation instructions on the wiki. Sounds reasonable enough but bioperl is a pretty complex distribution (lots of additional modules required, platform-specific issues, so on). Maybe we can come up with a pared-down INSTALL file which combines the basic elements for installing on UNIX/Windows/Mac/FreeBSD and points out dependencies. I still like the idea of just having a simple conversion from wiki->txt direct from the web page (i.e. best of both worlds). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Thursday, June 01, 2006 11:57 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Chris Fields wrote: > > Sounds good to me. I guess the tutorial (post-stripping)would be moved > to > > /scripts or /examples then? > > > > Also, what do we do about similar situation with other docs moved to the > > wiki (INSTALL, INSTALL.WIN, etc)? Should we have a placeholder file in > the > > distribution pointing out the wiki docs instead? > > Imho, something like an installation document should be there in full so > once you've downloaded you can install without reference to anything > else. Also, an installation document could be considered specific to the > release version. Which is to say, it never goes out of date even if new > versions of bioperl are released with new installation instructions - it > applies to the installation directory it is found in. > > The wiki can have the latest installation instructions, and you don't > have to worry about keeping things synced. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From s-merchant at northwestern.edu Thu Jun 1 13:17:32 2006 From: s-merchant at northwestern.edu (Sohel Merchant) Date: Thu, 1 Jun 2006 12:17:32 -0500 Subject: [Bioperl-l] Bio::OntologyIO Message-ID: <000001c6859f$446f7fd0$c2987ca5@pc13> Hi Everyone, I would like to announce the availability of an obo format parser which can parse GO, PO, PATO and other ontology files in obo format. The parser can be used through the Bio::OntologyIO module. Thanks to HIlamar Lapp and Chris Mungall for their invaluable contributions. Thanks, Sohel Merchant. Sohel Merchant dictyBase Bioinformatics Software Engineer Center for Genetic Medicine Northwestern University 676 St. Clair Street, Suite 1206 Chicago IL 60611 From cjfields at uiuc.edu Thu Jun 1 13:46:35 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 12:46:35 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: Message-ID: <001101c685a3$53f4bf70$15327e82@pyrimidine> I understand your point, though I think the wiki gives us an opportunity add helpful links and use markup to help clarify things a bit more. I have seen several distributions which don't have INSTALL files, just simple README with very basic instructions (Bio::ASN1::EntrezGene is one). I've been reluctant to mess around with the wiki Install pages too much more b/c of syncing problems, just as you mentioned. I will look into thing a bit more to see if there's an easier way to go about converting wiki->text. Chris > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Thursday, June 01, 2006 11:46 AM > To: Chris Fields; 'Mauricio Herrera Cuadra' > Cc: bioperl-l at lists.open-bio.org; 'Jay Hannah' > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Chris, > > I think the INSTALL* files should be in the package, this is the de facto > convention for 99% of the packages I've ever seen. Then any Wiki page just > links to the file in CVS. > > Personally I don't like the idea of maintaining a Wiki page and a file > that > both say essentially the same thing (this is what has happened with the > INSTALL and INSTALL.WIN files). I've spent plenty of time merging > redundant > text and removing files that contained these redundancies so it's > unfortunate to see them appear anew, sooner or later they'll get out of > sync > despite best intentions. The most likely cause will be someone other than > the person who created the initial duplication (and promised to maintain > both) making a change in one of the two files. > > Brian O. > > > On 6/1/06 12:20 PM, "Chris Fields" wrote: > > > Also, what do we do about similar situation with other docs moved to the > > wiki (INSTALL, INSTALL.WIN, etc)? Should we have a placeholder file in > the > > distribution pointing out the wiki docs instead? From cjfields at uiuc.edu Thu Jun 1 13:46:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 12:46:45 -0500 Subject: [Bioperl-l] For CVS developers -potentialpitfallwith"returnundef" In-Reply-To: <447F19CB.4090607@mrc-dunn.cam.ac.uk> Message-ID: <001201c685a3$59d78da0$15327e82@pyrimidine> .... > > Again, didn't do that. > > I'm very sorry that I allowed the ambiguity, but my comments were > certainly not directed at your recent changes to Bio::Restriction::IO. > In fact, I put in the above * comment to exclude your changes from my > discussion; you changed the docs because the code never did what they > said they did (the docs were bad). That's fine (good!). My comments were > a general point, slightly directed at the idea of changing all the > return undef;s - changing the code so that it no longer matches the docs > of a previously working method. That's what I think is bad. Though in > this particular case it shouldn't make any difference at all. Agreed. In any case, if tests have been properly set up then they should catch problems. This is, of course, if they are properly set up. Chris > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From gad14 at cornell.edu Thu Jun 1 15:10:31 2006 From: gad14 at cornell.edu (Genevieve DeClerck) Date: Thu, 01 Jun 2006 15:10:31 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447D5668.7070500@mrc-dunn.cam.ac.uk> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <447D5668.7070500@mrc-dunn.cam.ac.uk> Message-ID: <447F3BA7.9030500@cornell.edu> Problem solved, albeit, in a slightly hacky way. I tried to make seek() work for a good long while with the SearchIO blast results object, but I just couldn't get it to work. (Probably b/c seek wants to see a genuine file handle-- not a SearchIO filehandle.) I used SearchIO's fh() to get the handle and could while(<$fh>) through the data but when I used seek($fh,0,0) to reset the cursor position in the handle in prep for another loop, i got an error complaining about my use of seek() by indicating that "SEEK" could not be found in Seekable.pm. I concluded that it was not going to be possible and instead made an array if SeqFeature objects which contain all the relevant blast output data (i.e. the m8/hit table stuff). It still seems unfortunate that one can't reuse the SearchIO object for cases when the SearchIO blast report needs to be accessed mltiple times. Thanks for your help, Genevieve Sendu Bala wrote: > Genevieve DeClerck wrote: > >>Thanks for your comment Sendu, it was very helpful. I think this must be >>what's going on.. I am using $blast_report->next_result in both >>subroutines. It appears that analyzing the blast results first w/ my >>sort subroutine empties (?) the $blast_result object so that when I try >>to print, there is nothing left to print. (and visa-versa when I print >>first then try to sort). >>So, from the looks of things, using next_result has the effect of >>popping the Bio::Search::Result::ResultI objects off of the SearchIO >>blast report object?? > > > Not quite. It's more or less exactly like opening a file and then trying > to read it all twice like this: > open(FILE, "file"); > while () { > print # prints each line in the file > } > while () { > print # never happens, we never enter this while loop > } > > To get the second while loop to print anything we need to say seek(FILE, > 0, 0) before it. Or in the first while loop store each line in an array, > and then make the second loop a foreach through that array. > > > >>It seems I could get around this by making a copy of the blast report by >>setting it to another new variable...(not the most elegant solution) but >>I'm having trouble with this... >> >>If I do: >> >> my $blast_report_copy = $blast_report; >> >>I'm just copying the reference to the SearchIO blast result, so it >>doesn't help me. How can I make another physical copy of this blast >>result object? Seems like a simple thing but how to do it is escaping me. > > > Not really a good idea, and it may not work anyway if the object > contains a filehandle. But for a simple object you might recursively > loop through the data structure and copy each element out into a similar > data structure. > > > >>But better yet, the way to go is to 'reset the counter,' or to find a >>way to look at/print/sort the results without removing data from the >>blast result object. How is this done though?? > > > It would be rather nice if this worked: > my $blast_report = $factory->blastall($ref_seq_objs); > my $blast_fh = $blast_report->fh(); > while (<$blast_fh>) { > # $_ is a ResultI object, use as normal > } > seek($blast_fh, 0, 0); # this would be great, but does it work? > while <$blast_fh>) { > # go through the results again in your second subroutine > } > > An alternative hacky way of doing it, which may also not work, would be > to go through your $blast_report as normal, but then before going > through it a second time, say > my $fh = $blast_report->_fh; > seek($fh, 0, 0); > > Finally, the most sensible way (assuming bioperl provides no methods of > its own for this) of solving the problem is, the first time you go > through each next_result, next_hit and next_hsp, just store the returned > objects in an array of arrays of arrays. Then the second time get the > objects from your array structure instead of with the method calls. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jelenaob at gmail.com Thu Jun 1 11:45:49 2006 From: jelenaob at gmail.com (Jelena Obradovic) Date: Thu, 1 Jun 2006 08:45:49 -0700 Subject: [Bioperl-l] Bio::Graphic::Panel backgroud color In-Reply-To: <200606011140.38726.lstein@cshl.edu> References: <5042a62b0605292158g187f4855hd93f76e0086ac27d@mail.gmail.com> <200606011140.38726.lstein@cshl.edu> Message-ID: <5042a62b0606010845u79a5d5b3h131c4ed54f90fee3@mail.gmail.com> Thanks Lincoln. I figure out the solution just after I post a question, Murpfy's law ... but my post left hanging in my email ... :( The problem is in CGI->img method. Instead of print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); I should have used: rint $cgi->img({-src=>$url,-usemap=>"#$mapname", -border=>undef}); Thanks anyways for your help. Cheers, Jelena On 6/1/06, Lincoln Stein wrote: > > Hi, > > The border is coming from the HTML 0 > in > the img() call. > > Lincoln > > > > On Tuesday 30 May 2006 00:58, Jelena Obradovic wrote: > > Hello everybody, > > > > does anybody know how to remove the background color of the Panel. > > Currently, I am not adding anything to it, so I can troubleshot the > > problem, and I have tried setting up > > all color attributes I could find to the panel, but no luck. Whatever I > do, > > I get the BLUE border of the panel. > > > > Has anybody faced the same problem? > > > > Thanks in advance, > > > > Jelena > > > > And here is the code I am currently using: > > > > > --------------------------------------------------------------------------- > >-------------------------------- my $panel = > > Bio::Graphics::Panel->new(-length => $prim_seq->length() + 200, > > -width => 800, > > -pad_left => 10, > > -pad_right => 10, > > -key_color => 'white', > > -bgcolor => 'white', > > -gridcolor=>'black', > > -fgcolor => 'black', > > -grid => 0, > > ); > > my ($url,$map,$mapname) = $panel->image_and_map( #-root=>'$root_url' > , > > -url => '/tmpimages'); > > #make clickable image > > print $cgi->img({-src=>$url,-usemap=>"#$mapname"}); > > print $map; > > > > > --------------------------------------------------------------------------- > >-------------------------------- > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Lincoln D. Stein > Cold Spring Harbor Laboratory > 1 Bungtown Road > Cold Spring Harbor, NY 11724 > (516) 367-8380 (voice) > (516) 367-8389 (fax) > FOR URGENT MESSAGES & SCHEDULING, > PLEASE CONTACT MY ASSISTANT, > SANDRA MICHELSEN, AT michelse at cshl.edu > From osborne1 at optonline.net Thu Jun 1 15:36:27 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 01 Jun 2006 15:36:27 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <000d01c6859e$b47cc2c0$15327e82@pyrimidine> Message-ID: Chris, Right - how would this be done? Brian O. On 6/1/06 1:13 PM, "Chris Fields" wrote: > I still like the idea of just having a simple conversion from wiki->txt > direct from the web page (i.e. best of both worlds). From osborne1 at optonline.net Thu Jun 1 15:44:13 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 01 Jun 2006 15:44:13 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E48B9.4080503@jays.net> Message-ID: Jay, You asked about the doc/ directory. The only directory I see in my bioperl-live/doc directory is examples/, the reason this remains is that it contains scripts and images related to the Graphics HOWTO, in theory these could be moved to the Wiki and the examples/ directory deleted. One explanation for why you see doc/html and all those other dirs is that you aren't using the 'cvs -d' option (there are other explanations) when you update. If examples/ is removed then presumably the README can be removed and makedoc.pl moved elsewhere. Brian O. On 5/31/06 9:54 PM, "Jay Hannah" wrote: > Brian Osborne wrote: >> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we >> don't want to have to maintain two bptutorials. > > We certainly wouldn't want to try to maintain two copies, one POD one in wiki. > That would be the worst of all options. One option that hasn't been mentioned > yet is to keep maintenance of that in POD in the distro (leaving the cool > runability alone), and then flag that document as unchangeable in the wiki > with a note on top "Maintenance of this document is done in POD in the distro. > Submit POD patches to bioperl-l and we'll re-post an updated copy to this > wiki." > > Just a thought. > >> - What do we do with the script part of bptutorial.pl? It certainly could be >> excised and put into the examples/ directory, for example, but this would >> break a few of the paths that are being used. > > /README says this: > > scripts/ - Useful production-quality scripts with POD documentation > examples/ - Scripts demonstrating the many uses of Bioperl > > I'm personally not clear on the difference. Little stuff should start in > examples/ and graduate to scripts/ once they've matured? > > Is the doc/ tree being abandoned? > > doc/faq (empty?) > doc/howto > doc/howto/examples > doc/howto/figs (empty?) > doc/howto/html (empty?) > doc/howto/pdf (empty?) > doc/howto/sgml (empty?) > doc/howto/txt (empty?) > doc/howto/xml (empty?) > > Does all that stuff officially live in and is being changed in the wiki, never > to return to the distro? > > Any reason those empty dirs aren't nuked out of CVS? > > Chris Fields wrote: >> Jay, looks like there are still some weird formatting issues with the >> bptutorial wiki page, something which I ran into before when getting the >> Install docs up for Windows and UNIX (the mediawiki setup thinks 2 or more >> spaces preceding a line denotes code for some reason). Not much you can do >> in these cases except remove the extra spaces in those spots. Looking good >> though! > > Sorry, I spent zero time on the whole conversion. I'm not sure what parts > didn't convert well. I've never done that conversion before, and know nothing > about mediawiki. I just blindly let Pod::Simple::Wiki do its thing then ran > off to work. :) > > Mauricio Herrera Cuadra wrote: >> I've added a link in the left menu of the wiki. If you think it should >> point to the Tutorials page instead of the Bptutorial.pl page please let >> me know. > > Instead of all these competing links on the left, maybe we should have a > master "documentation" page linked on the left cascading like so? > > Documentation (linked on the left menu) > - Quick start > - FAQ > - HOWTOs > - Tutorials > > (What's the conceptual difference between a HOWTO and a tutorial?) > > It's hard for me to dive into a wiki lifestyle for the huge documentation > pillars since it can't ever get back into the distro... (can it?) Small, > throw away stuff is great for the wiki, but huge, established, thoughtful, > long documents should be left in the distro? Present (and searchable) on the > wiki but static? > > Why isn't the short "Current events" just listed on the top of the "News" > page? > > Sick of my endless questions yet? -grin- > > j > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Jun 1 15:47:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 14:47:40 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: Message-ID: <001301c685b4$3dbfb820$15327e82@pyrimidine> > -----Original Message----- > From: Brian Osborne [mailto:osborne1 at optonline.net] > Sent: Thursday, June 01, 2006 2:36 PM > To: Chris Fields; 'Sendu Bala'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl > > Chris, > > Right - how would this be done? I'll look into a few of the wiki converters, there are a few things that claim to convert wiki to other formats (and vice versa). It may not be direct, though. I'll post anything if I figure something out. Chris > Brian O. > > > On 6/1/06 1:13 PM, "Chris Fields" wrote: > > > I still like the idea of just having a simple conversion from wiki->txt > > direct from the web page (i.e. best of both worlds). From osborne1 at optonline.net Thu Jun 1 15:45:39 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 01 Jun 2006 15:45:39 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <447E73F5.40403@jays.net> Message-ID: Jay, Yes, good idea, thank you for volunteering. Brian O. On 6/1/06 12:58 AM, "Jay Hannah" wrote: > I hereby volunteer to strip the code out of bptutorial.pl and put it wherever. > Where should I put it when I'm done? (examples/tutorial.pl?) From hubert.prielinger at gmx.at Thu Jun 1 16:33:45 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Thu, 01 Jun 2006 14:33:45 -0600 Subject: [Bioperl-l] remoteblast xml problem Message-ID: <447F4F29.9070600@gmx.at> hi, I have the following program and it worked quite well, for retrieving remoteblast results in a textfile, now I have altered it to to xml, and it didn't work anymore..... it takes all the parameter at the commandline, submits the query, but I don't retrieve any results file anymore..... it seems that it hangs in a endless loop...... the only output I get is: $rc is not a ref! over and over..... it doesn't enter the else term anymore.... every help is appreciated, thanks in advance #!/usr/bin/perl -w use strict; use warnings; use Bio::SeqIO; use Bio::Tools::Run::RemoteBlast; use Bio::Seq; use IO::String; use Bio::SearchIO; #use lib qw(/usr/local/bioperl/bioperl-1.5.1); print "Please insert database:\t"; my $db_STD = ; chomp $db_STD; print "Please insert matrix:\t"; my $matrix_STD = ; chomp $matrix_STD; print "Please insert count:\t"; my $count_STD = ; chomp $count_STD; print "Please insert gapcosts:\t"; my $gapcosts_STD = ; chomp $gapcosts_STD; my $prog = 'blastp'; my $db = $db_STD; my $e_val = '20000'; my $matrix = $matrix_STD; my $wordSize = '2'; my @data; my $line_dataArray; my $rid; my $count = $count_STD; my @params = ( '-prog' => $prog, '-data' => $db, '-expect' => $e_val, '-MATRIX_NAME' => $matrix, '-readmethod' => 'xml', '-WORD_SIZE' => $wordSize, ); my $seqio_obj = Bio::SeqIO->new( -file => "aloneblosum62.txt", -format => "raw", ); print "entering blast...."; my $xmlFactory = Bio::Tools::Run::RemoteBlast->new(@params); $Bio::Tools::Run::RemoteBlast::HEADER{'COMPOSITION_BASED_STATISTICS'} = '1'; $Bio::Tools::Run::RemoteBlast::HEADER{'GAPCOSTS'} = $gapcosts_STD; $Bio::Tools::Run::RemoteBlast::HEADER{'DESCRIPTIONS'} = '1000'; $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'ALIGNMENTS'} = '1000'; $Bio::Tools::Run::RemoteBlast::RETRIEVALHEADER{'FORMAT_TYPE'} = 'XML'; print "Blast entered successfully \n"; while ( my $query = $seqio_obj->next_seq ) { print "submit Sequence...just do it....\n"; my $r = $xmlFactory->submit_blast($query); print $query->seq; print "\n"; # sleep 30; # Wait for the reply and save the output file print "entering while loop for saving Output.... \n"; while ( my @rids = $xmlFactory->each_rid ) { foreach my $rid (@rids) { my $rc = $xmlFactory->retrieve_blast($rid); if ( !ref($rc) ) { print '$rc is not a ref!', "\n"; if ( $rc < 0 ) { print "Remove rid ...\n"; $xmlFactory->remove_rid($rid); } # sleep 5; } else { print "retrieved Results successfully \n"; print $rid; print "\n"; my $filename = "comp80swiss$count.xml"; $xmlFactory->save_output($filename); print "File saved successfully \n"; my $checkinput = $xmlFactory->file; open(my $fh,"<$checkinput") or die $!; while(<$fh>){ print; } close $fh; $count++; $xmlFactory->remove_rid($rid); } } print "\n"; print "\n"; } } From emmanuel.quevillon at versailles.inra.fr Thu Jun 1 17:15:42 2006 From: emmanuel.quevillon at versailles.inra.fr (Emmanuel Quevillon) Date: Thu, 01 Jun 2006 23:15:42 +0200 Subject: [Bioperl-l] How to submit new module? Message-ID: <447F58FE.7020603@versailles.inra.fr> Hi, I just created some new parsers for TargetP, TandemRepeatFinder and RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just like to know the differents steps procedure to submit them to BioPerl and to be integrated in the next release (I hope)? Is there any documentation about it? Thanks -- Emmanuel --------------------------------------------------------------------- Emmanuel Quevillon INRA-URGI / Bayer CropScience 523 Place des Terrasses http://www.infobiogen.fr 91000 EVRY http://urgi.infobiogen.fr Tel : 01 60 87 37 42 http://www.bayercropscience.com PGP public key server : http://pgp.mit.edu/ Key ID : 0x0B84357F --------------------------------------------------------------------- From cjfields at uiuc.edu Thu Jun 1 17:36:05 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 16:36:05 -0500 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447F3BA7.9030500@cornell.edu> Message-ID: <001b01c685c3$63840070$15327e82@pyrimidine> Genevieve, seek() won't work here; all the file IO is handled through Bio::Root::IO methods. The SearchIO system is set up like an XML SAX parser so if you want to save objects as they come you'll have to store the object refs in an array, like so: my @hsps; while ($result = $parser->next_result) { while ($hit = $result->next_hit) { while ($hsp = $hit->next_hsp) { push @hsps, $hsp; } } } Or similarly with hits: my @hits; while ($result = $parser->next_result) { while ($hit = $result->next_hit) { push @hits, $hit; } } Or you could use more complex data structures (array of arrays) as Sendu suggested. You should be able to sort like anything else by calling methods within the sort: # total number of hsps my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits; # if you really like your accessions in alphabetical order my @sorted = sort {$a->accession cmp $b->accession} @hits; Then if you wanted to print later you could sort based on something else, like the score: my @sort_score = sort {$a->score <=> $b->score} @hits; So you would end up with something like the following subroutines: sub sort_results{ my $report = shift; while($result = $report->next_result()){ while(my $hit = $result->next_hit()){ push @hits, $hit; } } my @sorted = sort {$a->num_hsps <=> $b->num_hsps} @hits; print $_->accession,"\t",$_->num_hsps,"\n" for @sorted; } sub print_blast_results{ my $report = shift; my @sort_score = sort {$a->score <=> $b->score} @hits; for my $h (@sort_score) { while (my $hsp = $h->next_hsp) { # might use something else here like hit->name or accession, # not sure what you want my $q_name = $hsp->seq_id; print join(", ",$q_name,$h->name,$hsp->bits)."\n"; } } } Just so you know, I couldn't get display_id or display_name to work when using the Bio::Search::HSP::GenericHSP object. Your results may vary. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Genevieve DeClerck > Sent: Thursday, June 01, 2006 2:11 PM > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] results problem with StandAloneBlast > > Problem solved, albeit, in a slightly hacky way. > > I tried to make seek() work for a good long while with the SearchIO > blast results object, but I just couldn't get it to work. (Probably b/c > seek wants to see a genuine file handle-- not a SearchIO filehandle.) I > used SearchIO's fh() to get the handle and could while(<$fh>) through > the data but when I used seek($fh,0,0) to reset the cursor position in > the handle in prep for another loop, i got an error complaining about my > use of seek() by indicating that "SEEK" could not be found in Seekable.pm. > > I concluded that it was not going to be possible and instead made an > array if SeqFeature objects which contain all the relevant blast output > data (i.e. the m8/hit table stuff). > > It still seems unfortunate that one can't reuse the SearchIO object for > cases when the SearchIO blast report needs to be accessed mltiple times. > > Thanks for your help, > Genevieve > > > > Sendu Bala wrote: > > > Genevieve DeClerck wrote: > > > >>Thanks for your comment Sendu, it was very helpful. I think this must be > >>what's going on.. I am using $blast_report->next_result in both > >>subroutines. It appears that analyzing the blast results first w/ my > >>sort subroutine empties (?) the $blast_result object so that when I try > >>to print, there is nothing left to print. (and visa-versa when I print > >>first then try to sort). > >>So, from the looks of things, using next_result has the effect of > >>popping the Bio::Search::Result::ResultI objects off of the SearchIO > >>blast report object?? > > > > > > Not quite. It's more or less exactly like opening a file and then trying > > to read it all twice like this: > > open(FILE, "file"); > > while () { > > print # prints each line in the file > > } > > while () { > > print # never happens, we never enter this while loop > > } > > > > To get the second while loop to print anything we need to say seek(FILE, > > 0, 0) before it. Or in the first while loop store each line in an array, > > and then make the second loop a foreach through that array. > > > > > > > >>It seems I could get around this by making a copy of the blast report by > >>setting it to another new variable...(not the most elegant solution) but > >>I'm having trouble with this... > >> > >>If I do: > >> > >> my $blast_report_copy = $blast_report; > >> > >>I'm just copying the reference to the SearchIO blast result, so it > >>doesn't help me. How can I make another physical copy of this blast > >>result object? Seems like a simple thing but how to do it is escaping > me. > > > > > > Not really a good idea, and it may not work anyway if the object > > contains a filehandle. But for a simple object you might recursively > > loop through the data structure and copy each element out into a similar > > data structure. > > > > > > > >>But better yet, the way to go is to 'reset the counter,' or to find a > >>way to look at/print/sort the results without removing data from the > >>blast result object. How is this done though?? > > > > > > It would be rather nice if this worked: > > my $blast_report = $factory->blastall($ref_seq_objs); > > my $blast_fh = $blast_report->fh(); > > while (<$blast_fh>) { > > # $_ is a ResultI object, use as normal > > } > > seek($blast_fh, 0, 0); # this would be great, but does it work? > > while <$blast_fh>) { > > # go through the results again in your second subroutine > > } > > > > An alternative hacky way of doing it, which may also not work, would be > > to go through your $blast_report as normal, but then before going > > through it a second time, say > > my $fh = $blast_report->_fh; > > seek($fh, 0, 0); > > > > Finally, the most sensible way (assuming bioperl provides no methods of > > its own for this) of solving the problem is, the first time you go > > through each next_result, next_hit and next_hsp, just store the returned > > objects in an array of arrays of arrays. Then the second time get the > > objects from your array structure instead of with the method calls. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From arareko at campus.iztacala.unam.mx Thu Jun 1 17:49:30 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Thu, 01 Jun 2006 16:49:30 -0500 Subject: [Bioperl-l] How to submit new module? In-Reply-To: <447F58FE.7020603@versailles.inra.fr> References: <447F58FE.7020603@versailles.inra.fr> Message-ID: <447F60EA.1050608@campus.iztacala.unam.mx> Hi Emmanuel, Take a look into the BioPerl FAQ: http://bioperl.org/wiki/FAQ It contains some info that will guide you through the appropriate steps depending on your situation. Regards, Mauricio. Emmanuel Quevillon wrote: > Hi, > > I just created some new parsers for TargetP, TandemRepeatFinder and > RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just > like to know the differents steps procedure to submit them to BioPerl > and to be integrated in the next release (I hope)? > Is there any documentation about it? > > Thanks > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From cjfields at uiuc.edu Thu Jun 1 17:47:11 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 1 Jun 2006 16:47:11 -0500 Subject: [Bioperl-l] How to submit new module? In-Reply-To: <447F58FE.7020603@versailles.inra.fr> Message-ID: <001c01c685c4$f01e7550$15327e82@pyrimidine> The Bioperl FAQ on the wiki answers this: http://www.bioperl.org/wiki/FAQ#I.27ve_got_an_idea_for_a_module_how_do_I_con tribute_it.3F Basically, you've already done the first step, but you might want to resubmit the email in a different form, with something about "New parsers for TargetP, TandemRepeatFinder and RepeatMasker" in the Subject line to get more input about those from the users-at-large. BTW, there is already a Bio::Tools::RepeatMasker, so you should check it out to make sure there isn't any redundancy between your version and the bioperl-live version. The developers may be reluctant to replace the bioperl-live version with yours to prevent API problems with end users, unless you provide some serious justification (like the current one is broken, not complete, etc). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Emmanuel Quevillon > Sent: Thursday, June 01, 2006 4:16 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] How to submit new module? > > Hi, > > I just created some new parsers for TargetP, TandemRepeatFinder and > RepeatMasker. These parsers are inheriting from Bio::Tools. I'd just > like to know the differents steps procedure to submit them to BioPerl > and to be integrated in the next release (I hope)? > Is there any documentation about it? > > Thanks > > -- > Emmanuel > > --------------------------------------------------------------------- > Emmanuel Quevillon > > INRA-URGI / Bayer CropScience > 523 Place des Terrasses http://www.infobiogen.fr > 91000 EVRY http://urgi.infobiogen.fr > Tel : 01 60 87 37 42 http://www.bayercropscience.com > > PGP public key server : http://pgp.mit.edu/ > Key ID : 0x0B84357F > --------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Fri Jun 2 03:52:07 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 2 Jun 2006 09:52:07 +0200 Subject: [Bioperl-l] For CVS developers -potentialpitfallwith"returnundef" In-Reply-To: <001201c685a3$59d78da0$15327e82@pyrimidine> References: <001201c685a3$59d78da0$15327e82@pyrimidine> Message-ID: <200606020952.08034.heikki@sanbi.ac.za> I've started going through the files that have 'return undef' lines. I'll report back later. Initial impression is that there are a few cases where the context indicates list to be returned but failure returns an explicit undef. I'll fix those. Most of the cases are much more ambiguous. Even when documentation says the failure returns undef, it is clearly meant to mean false. In most cases documentation does not comment on return value at all. Luckily the context is almost always scalar and therefore it does not matter too much. I seem to be changing 'return undef' to plain 'return' a bit overzealously, so do not take it personally. -Heikki On Thursday 01 June 2006 19:46, Chris Fields wrote: > .... > > > > Again, didn't do that. > > > > I'm very sorry that I allowed the ambiguity, but my comments were > > certainly not directed at your recent changes to Bio::Restriction::IO. > > In fact, I put in the above * comment to exclude your changes from my > > discussion; you changed the docs because the code never did what they > > said they did (the docs were bad). That's fine (good!). My comments were > > a general point, slightly directed at the idea of changing all the > > return undef;s - changing the code so that it no longer matches the docs > > of a previously working method. That's what I think is bad. Though in > > this particular case it shouldn't make any difference at all. > > Agreed. In any case, if tests have been properly set up then they should > catch problems. This is, of course, if they are properly set up. > > Chris > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From sb at mrc-dunn.cam.ac.uk Fri Jun 2 05:04:18 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Fri, 02 Jun 2006 10:04:18 +0100 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <447F4F29.9070600@gmx.at> References: <447F4F29.9070600@gmx.at> Message-ID: <447FFF12.506@mrc-dunn.cam.ac.uk> Hubert Prielinger wrote: > hi, > I have the following program and it worked quite well, for retrieving > remoteblast results in a textfile, > now I have altered it to to xml, and it didn't work anymore..... > it takes all the parameter at the commandline, submits the query, but I > don't retrieve any results file anymore..... > > it seems that it hangs in a endless loop...... > the only output I get is: $rc is not a ref! over and over..... it > doesn't enter the else term anymore.... There is no problem with your code. The problem is with the NCBI server and should be reported to them. You can visit the site and do a blast, requesting xml format, and you will typically get one normal 'waiting' message and the promise that it will be updated in x seconds, but subsequent attempts to get progress information result in an xml error page because the NCBI server doesn't actually send any data. Unfortunately the way that the bioperl code is written, it treats no data as 'waiting' instead of an error. I've offered a patch to fix this at this bug page: http://bugzilla.bioperl.org/show_bug.cgi?id=2015 From cjfields at uiuc.edu Fri Jun 2 10:30:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Jun 2006 09:30:18 -0500 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <447FFF12.506@mrc-dunn.cam.ac.uk> Message-ID: <001a01c68651$12925250$15327e82@pyrimidine> Sendu, Hubert, Hubert, your code looks fine so Sendu's patch should fix the problem (break out of that infinite loop). I applied Sendu's patch to RemoteBlast in CVS; it passed all tests in RemoteBlast.t. Try updating from CVS to see if it works. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Friday, June 02, 2006 4:04 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] remoteblast xml problem > > Hubert Prielinger wrote: > > hi, > > I have the following program and it worked quite well, for retrieving > > remoteblast results in a textfile, > > now I have altered it to to xml, and it didn't work anymore..... > > it takes all the parameter at the commandline, submits the query, but I > > don't retrieve any results file anymore..... > > > > it seems that it hangs in a endless loop...... > > the only output I get is: $rc is not a ref! over and over..... it > > doesn't enter the else term anymore.... > > There is no problem with your code. The problem is with the NCBI server > and should be reported to them. You can visit the site and do a blast, > requesting xml format, and you will typically get one normal 'waiting' > message and the promise that it will be updated in x seconds, but > subsequent attempts to get progress information result in an xml error > page because the NCBI server doesn't actually send any data. > > Unfortunately the way that the bioperl code is written, it treats no > data as 'waiting' instead of an error. I've offered a patch to fix this > at this bug page: > http://bugzilla.bioperl.org/show_bug.cgi?id=2015 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Jun 2 15:13:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Jun 2006 14:13:31 -0500 Subject: [Bioperl-l] Bio::AlignIO::metafasta tests Message-ID: <000301c68678$a3cdaa40$15327e82@pyrimidine> Heikki, I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up when running AlignIO.t (I was fixing bug 2000): http://bugzilla.open-bio.org/show_bug.cgi?id=2016 Not sure what's going on there but using read_aln and write_aln seem to work normally. It may have something to do with Bio::SimpleAlign but I'm not absolutely sure. Any ideas what may be going on here? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From hubert.prielinger at gmx.at Fri Jun 2 17:11:41 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 02 Jun 2006 15:11:41 -0600 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <001a01c68651$12925250$15327e82@pyrimidine> References: <001a01c68651$12925250$15327e82@pyrimidine> Message-ID: <4480A98D.6010501@gmx.at> hi, sorry, but I have updated the remoteblast module and I have run several attempts with the same results as before. It didn't work. I didn't get any results. regards Hubert Chris Fields wrote: > Sendu, Hubert, > > > Hubert, your code looks fine so Sendu's patch should fix the problem (break > out of that infinite loop). I applied Sendu's patch to RemoteBlast in CVS; > it passed all tests in RemoteBlast.t. Try updating from CVS to see if it > works. > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >> Sent: Friday, June 02, 2006 4:04 AM >> To: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] remoteblast xml problem >> >> Hubert Prielinger wrote: >> >>> hi, >>> I have the following program and it worked quite well, for retrieving >>> remoteblast results in a textfile, >>> now I have altered it to to xml, and it didn't work anymore..... >>> it takes all the parameter at the commandline, submits the query, but I >>> don't retrieve any results file anymore..... >>> >>> it seems that it hangs in a endless loop...... >>> the only output I get is: $rc is not a ref! over and over..... it >>> doesn't enter the else term anymore.... >>> >> There is no problem with your code. The problem is with the NCBI server >> and should be reported to them. You can visit the site and do a blast, >> requesting xml format, and you will typically get one normal 'waiting' >> message and the promise that it will be updated in x seconds, but >> subsequent attempts to get progress information result in an xml error >> page because the NCBI server doesn't actually send any data. >> >> Unfortunately the way that the bioperl code is written, it treats no >> data as 'waiting' instead of an error. I've offered a patch to fix this >> at this bug page: >> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Fri Jun 2 17:54:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Jun 2006 16:54:20 -0500 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <4480A98D.6010501@gmx.at> Message-ID: <000001c6868f$1b68dbe0$15327e82@pyrimidine> Hubert, Could you post this on bugzilla with your script and test data so I can try to replicate you error? I may not get to it until Monday. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Friday, June 02, 2006 4:12 PM > To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala' > Subject: Re: [Bioperl-l] remoteblast xml problem > > hi, > sorry, but I have updated the remoteblast module and I have run several > attempts with the same results as before. It didn't work. > I didn't get any results. > > regards > Hubert > > > Chris Fields wrote: > > Sendu, Hubert, > > > > > > Hubert, your code looks fine so Sendu's patch should fix the problem > (break > > out of that infinite loop). I applied Sendu's patch to RemoteBlast in > CVS; > > it passed all tests in RemoteBlast.t. Try updating from CVS to see if > it > > works. > > > > Chris > > > > > >> -----Original Message----- > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >> bounces at lists.open-bio.org] On Behalf Of Sendu Bala > >> Sent: Friday, June 02, 2006 4:04 AM > >> To: bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] remoteblast xml problem > >> > >> Hubert Prielinger wrote: > >> > >>> hi, > >>> I have the following program and it worked quite well, for retrieving > >>> remoteblast results in a textfile, > >>> now I have altered it to to xml, and it didn't work anymore..... > >>> it takes all the parameter at the commandline, submits the query, but > I > >>> don't retrieve any results file anymore..... > >>> > >>> it seems that it hangs in a endless loop...... > >>> the only output I get is: $rc is not a ref! over and over..... it > >>> doesn't enter the else term anymore.... > >>> > >> There is no problem with your code. The problem is with the NCBI server > >> and should be reported to them. You can visit the site and do a blast, > >> requesting xml format, and you will typically get one normal 'waiting' > >> message and the promise that it will be updated in x seconds, but > >> subsequent attempts to get progress information result in an xml error > >> page because the NCBI server doesn't actually send any data. > >> > >> Unfortunately the way that the bioperl code is written, it treats no > >> data as 'waiting' instead of an error. I've offered a patch to fix this > >> at this bug page: > >> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hubert.prielinger at gmx.at Fri Jun 2 19:19:40 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 02 Jun 2006 17:19:40 -0600 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <000001c68691$8c4eeb40$15327e82@pyrimidine> References: <000001c68691$8c4eeb40$15327e82@pyrimidine> Message-ID: <4480C78C.1000701@gmx.at> hi, I have submitted the bug -> Bug 2017 with the script and input file, just start it from command line thank you very much greetings Hubert Chris Fields wrote: > Hubert, > > I have a script that's using blastxml and XML output which seems to work. > I'll try looking at it to get a better idea this weekend. > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Friday, June 02, 2006 4:12 PM >> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala' >> Subject: Re: [Bioperl-l] remoteblast xml problem >> >> hi, >> sorry, but I have updated the remoteblast module and I have run several >> attempts with the same results as before. It didn't work. >> I didn't get any results. >> >> regards >> Hubert >> >> >> Chris Fields wrote: >> >>> Sendu, Hubert, >>> >>> >>> Hubert, your code looks fine so Sendu's patch should fix the problem >>> >> (break >> >>> out of that infinite loop). I applied Sendu's patch to RemoteBlast in >>> >> CVS; >> >>> it passed all tests in RemoteBlast.t. Try updating from CVS to see if >>> >> it >> >>> works. >>> >>> Chris >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>> Sent: Friday, June 02, 2006 4:04 AM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>> >>>> Hubert Prielinger wrote: >>>> >>>> >>>>> hi, >>>>> I have the following program and it worked quite well, for retrieving >>>>> remoteblast results in a textfile, >>>>> now I have altered it to to xml, and it didn't work anymore..... >>>>> it takes all the parameter at the commandline, submits the query, but >>>>> >> I >> >>>>> don't retrieve any results file anymore..... >>>>> >>>>> it seems that it hangs in a endless loop...... >>>>> the only output I get is: $rc is not a ref! over and over..... it >>>>> doesn't enter the else term anymore.... >>>>> >>>>> >>>> There is no problem with your code. The problem is with the NCBI server >>>> and should be reported to them. You can visit the site and do a blast, >>>> requesting xml format, and you will typically get one normal 'waiting' >>>> message and the promise that it will be updated in x seconds, but >>>> subsequent attempts to get progress information result in an xml error >>>> page because the NCBI server doesn't actually send any data. >>>> >>>> Unfortunately the way that the bioperl code is written, it treats no >>>> data as 'waiting' instead of an error. I've offered a patch to fix this >>>> at this bug page: >>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > From cjfields at uiuc.edu Fri Jun 2 20:33:48 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Jun 2006 19:33:48 -0500 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <4480C78C.1000701@gmx.at> References: <000001c68691$8c4eeb40$15327e82@pyrimidine> <4480C78C.1000701@gmx.at> Message-ID: You need to add the input conditions as well (you have several lines which may play a role; I would like to know what you normally enter for those). How long did you let the script run? I ran a quick check on your sequences; you have almost 1600, so you have to expect that you'll run into some problems here! Most here (including me) would suggest you try installing a local blast setup for something like this. Chris On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: > hi, > I have submitted the bug -> Bug 2017 > with the script and input file, just start it from command line > > thank you very much > greetings > > Hubert > > Chris Fields wrote: >> Hubert, >> >> I have a script that's using blastxml and XML output which seems >> to work. >> I'll try looking at it to get a better idea this weekend. >> >> Chris >> >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >>> Sent: Friday, June 02, 2006 4:12 PM >>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala' >>> Subject: Re: [Bioperl-l] remoteblast xml problem >>> >>> hi, >>> sorry, but I have updated the remoteblast module and I have run >>> several >>> attempts with the same results as before. It didn't work. >>> I didn't get any results. >>> >>> regards >>> Hubert >>> >>> >>> Chris Fields wrote: >>> >>>> Sendu, Hubert, >>>> >>>> >>>> Hubert, your code looks fine so Sendu's patch should fix the >>>> problem >>>> >>> (break >>> >>>> out of that infinite loop). I applied Sendu's patch to >>>> RemoteBlast in >>>> >>> CVS; >>> >>>> it passed all tests in RemoteBlast.t. Try updating from CVS to >>>> see if >>>> >>> it >>> >>>> works. >>>> >>>> Chris >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>>> Sent: Friday, June 02, 2006 4:04 AM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>> >>>>> Hubert Prielinger wrote: >>>>> >>>>> >>>>>> hi, >>>>>> I have the following program and it worked quite well, for >>>>>> retrieving >>>>>> remoteblast results in a textfile, >>>>>> now I have altered it to to xml, and it didn't work anymore..... >>>>>> it takes all the parameter at the commandline, submits the >>>>>> query, but >>>>>> >>> I >>> >>>>>> don't retrieve any results file anymore..... >>>>>> >>>>>> it seems that it hangs in a endless loop...... >>>>>> the only output I get is: $rc is not a ref! over and >>>>>> over..... it >>>>>> doesn't enter the else term anymore.... >>>>>> >>>>>> >>>>> There is no problem with your code. The problem is with the >>>>> NCBI server >>>>> and should be reported to them. You can visit the site and do a >>>>> blast, >>>>> requesting xml format, and you will typically get one normal >>>>> 'waiting' >>>>> message and the promise that it will be updated in x seconds, but >>>>> subsequent attempts to get progress information result in an >>>>> xml error >>>>> page because the NCBI server doesn't actually send any data. >>>>> >>>>> Unfortunately the way that the bioperl code is written, it >>>>> treats no >>>>> data as 'waiting' instead of an error. I've offered a patch to >>>>> fix this >>>>> at this bug page: >>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hubert.prielinger at gmx.at Fri Jun 2 20:49:15 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 02 Jun 2006 18:49:15 -0600 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: References: <000001c68691$8c4eeb40$15327e82@pyrimidine> <4480C78C.1000701@gmx.at> Message-ID: <4480DC8B.7070005@gmx.at> hi, input database: swissprot matrix: pam30 count: 1 gapcosts: 9 1 I know that there are a lot of sequences, but that doesn't matter, you can delete all of them except one, the amount of the sequences is not the problem, the script reads one line and submits it.....then the second line and so on.....I have tried it with only one sequence either and I got the same result.... the script run at that time for more than 20 minutes!!!!!! .....and that should be enough time to retrieve the results for ONE sequence, I guess regards Hubert Chris Fields wrote: > You need to add the input conditions as well (you have several > lines which may play a role; I would like to know what you normally > enter for those). > > How long did you let the script run? I ran a quick check on your > sequences; you have almost 1600, so you have to expect that you'll run > into some problems here! Most here (including me) would suggest you > try installing a local blast setup for something like this. > > Chris > > On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: > >> hi, >> I have submitted the bug -> Bug 2017 >> with the script and input file, just start it from command line >> >> thank you very much >> greetings >> >> Hubert >> >> Chris Fields wrote: >>> Hubert, >>> >>> I have a script that's using blastxml and XML output which seems to >>> work. >>> I'll try looking at it to get a better idea this weekend. >>> >>> Chris >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >>>> Sent: Friday, June 02, 2006 4:12 PM >>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu Bala' >>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>> >>>> hi, >>>> sorry, but I have updated the remoteblast module and I have run >>>> several >>>> attempts with the same results as before. It didn't work. >>>> I didn't get any results. >>>> >>>> regards >>>> Hubert >>>> >>>> >>>> Chris Fields wrote: >>>> >>>>> Sendu, Hubert, >>>>> >>>>> >>>>> Hubert, your code looks fine so Sendu's patch should fix the problem >>>>> >>>> (break >>>> >>>>> out of that infinite loop). I applied Sendu's patch to >>>>> RemoteBlast in >>>>> >>>> CVS; >>>> >>>>> it passed all tests in RemoteBlast.t. Try updating from CVS to >>>>> see if >>>>> >>>> it >>>> >>>>> works. >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>>>> Sent: Friday, June 02, 2006 4:04 AM >>>>>> To: bioperl-l at lists.open-bio.org >>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>> >>>>>> Hubert Prielinger wrote: >>>>>> >>>>>> >>>>>>> hi, >>>>>>> I have the following program and it worked quite well, for >>>>>>> retrieving >>>>>>> remoteblast results in a textfile, >>>>>>> now I have altered it to to xml, and it didn't work anymore..... >>>>>>> it takes all the parameter at the commandline, submits the >>>>>>> query, but >>>>>>> >>>> I >>>> >>>>>>> don't retrieve any results file anymore..... >>>>>>> >>>>>>> it seems that it hangs in a endless loop...... >>>>>>> the only output I get is: $rc is not a ref! over and over..... it >>>>>>> doesn't enter the else term anymore.... >>>>>>> >>>>>>> >>>>>> There is no problem with your code. The problem is with the NCBI >>>>>> server >>>>>> and should be reported to them. You can visit the site and do a >>>>>> blast, >>>>>> requesting xml format, and you will typically get one normal >>>>>> 'waiting' >>>>>> message and the promise that it will be updated in x seconds, but >>>>>> subsequent attempts to get progress information result in an xml >>>>>> error >>>>>> page because the NCBI server doesn't actually send any data. >>>>>> >>>>>> Unfortunately the way that the bioperl code is written, it treats no >>>>>> data as 'waiting' instead of an error. I've offered a patch to >>>>>> fix this >>>>>> at this bug page: >>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > From cjfields at uiuc.edu Fri Jun 2 20:57:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Jun 2006 19:57:37 -0500 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <4480DC8B.7070005@gmx.at> References: <000001c68691$8c4eeb40$15327e82@pyrimidine> <4480C78C.1000701@gmx.at> <4480DC8B.7070005@gmx.at> Message-ID: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu> Yes, I see the same error you do. But I have a similar script (blastp, XML blast report, XML parsing, similar loop structure) that works fine. I'm trying to dissect the problem but I think it may be something logically wrong here (something not so obvious) and not a bug... What I'm trying to say is, when you send sequences using remoteblast like, this you are essentially spamming the NCBI BLAST server with ~1600 requests. This script wasn't set up with that intent in mind; you should really try to set up your own local blast database if possible. If you can't, try running this script in off-hours (10pm-6am EST or something like that). Chris On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote: > hi, > input database: swissprot > matrix: pam30 > count: 1 > gapcosts: 9 1 > > I know that there are a lot of sequences, but that doesn't matter, > you can delete all of them except one, the amount of the sequences > is not the problem, the script reads one line and submits > it.....then the second line and so on.....I have tried it with only > one sequence either and I got the same result.... the script run at > that time for more than 20 minutes!!!!!! .....and that should be > enough time to retrieve the results for ONE sequence, I guess > > regards > Hubert > > > > Chris Fields wrote: >> You need to add the input conditions as well (you have several >> lines which may play a role; I would like to know what you >> normally enter for those). >> >> How long did you let the script run? I ran a quick check on your >> sequences; you have almost 1600, so you have to expect that you'll >> run into some problems here! Most here (including me) would >> suggest you try installing a local blast setup for something like >> this. >> >> Chris >> >> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: >> >>> hi, >>> I have submitted the bug -> Bug 2017 >>> with the script and input file, just start it from command line >>> >>> thank you very much >>> greetings >>> >>> Hubert >>> >>> Chris Fields wrote: >>>> Hubert, >>>> >>>> I have a script that's using blastxml and XML output which seems >>>> to work. >>>> I'll try looking at it to get a better idea this weekend. >>>> >>>> Chris >>>> >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >>>>> Sent: Friday, June 02, 2006 4:12 PM >>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu >>>>> Bala' >>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>> >>>>> hi, >>>>> sorry, but I have updated the remoteblast module and I have run >>>>> several >>>>> attempts with the same results as before. It didn't work. >>>>> I didn't get any results. >>>>> >>>>> regards >>>>> Hubert >>>>> >>>>> >>>>> Chris Fields wrote: >>>>> >>>>>> Sendu, Hubert, >>>>>> >>>>>> >>>>>> Hubert, your code looks fine so Sendu's patch should fix the >>>>>> problem >>>>>> >>>>> (break >>>>> >>>>>> out of that infinite loop). I applied Sendu's patch to >>>>>> RemoteBlast in >>>>>> >>>>> CVS; >>>>> >>>>>> it passed all tests in RemoteBlast.t. Try updating from CVS >>>>>> to see if >>>>>> >>>>> it >>>>> >>>>>> works. >>>>>> >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>>>>> Sent: Friday, June 02, 2006 4:04 AM >>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>> >>>>>>> Hubert Prielinger wrote: >>>>>>> >>>>>>> >>>>>>>> hi, >>>>>>>> I have the following program and it worked quite well, for >>>>>>>> retrieving >>>>>>>> remoteblast results in a textfile, >>>>>>>> now I have altered it to to xml, and it didn't work >>>>>>>> anymore..... >>>>>>>> it takes all the parameter at the commandline, submits the >>>>>>>> query, but >>>>>>>> >>>>> I >>>>> >>>>>>>> don't retrieve any results file anymore..... >>>>>>>> >>>>>>>> it seems that it hangs in a endless loop...... >>>>>>>> the only output I get is: $rc is not a ref! over and >>>>>>>> over..... it >>>>>>>> doesn't enter the else term anymore.... >>>>>>>> >>>>>>>> >>>>>>> There is no problem with your code. The problem is with the >>>>>>> NCBI server >>>>>>> and should be reported to them. You can visit the site and do >>>>>>> a blast, >>>>>>> requesting xml format, and you will typically get one normal >>>>>>> 'waiting' >>>>>>> message and the promise that it will be updated in x seconds, >>>>>>> but >>>>>>> subsequent attempts to get progress information result in an >>>>>>> xml error >>>>>>> page because the NCBI server doesn't actually send any data. >>>>>>> >>>>>>> Unfortunately the way that the bioperl code is written, it >>>>>>> treats no >>>>>>> data as 'waiting' instead of an error. I've offered a patch >>>>>>> to fix this >>>>>>> at this bug page: >>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hubert.prielinger at gmx.at Fri Jun 2 21:36:42 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Fri, 02 Jun 2006 19:36:42 -0600 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu> References: <000001c68691$8c4eeb40$15327e82@pyrimidine> <4480C78C.1000701@gmx.at> <4480DC8B.7070005@gmx.at> <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu> Message-ID: <4480E7AA.3020603@gmx.at> hi chris, thanks but I never intended to run the remoteblast with so much, only a few of them, acutally I goal is to run the phiblast with regular expression, so that i just don't need that file anymore. another question for parsing the xml output....is there a xml parser available for blast xml output or how to start..... I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but I'm not sure how to start....sorry, I guess I'm too stupid.... is their maybe another introduction or an example. thanks Hubert Chris Fields wrote: > Yes, I see the same error you do. But I have a similar script > (blastp, XML blast report, XML parsing, similar loop structure) that > works fine. I'm trying to dissect the problem but I think it may be > something logically wrong here (something not so obvious) and not a > bug... > > What I'm trying to say is, when you send sequences using remoteblast > like, this you are essentially spamming the NCBI BLAST server with > ~1600 requests. This script wasn't set up with that intent in mind; > you should really try to set up your own local blast database if > possible. If you can't, try running this script in off-hours > (10pm-6am EST or something like that). > > > Chris > > On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote: > > >> hi, >> input database: swissprot >> matrix: pam30 >> count: 1 >> gapcosts: 9 1 >> >> I know that there are a lot of sequences, but that doesn't matter, >> you can delete all of them except one, the amount of the sequences >> is not the problem, the script reads one line and submits >> it.....then the second line and so on.....I have tried it with only >> one sequence either and I got the same result.... the script run at >> that time for more than 20 minutes!!!!!! .....and that should be >> enough time to retrieve the results for ONE sequence, I guess >> >> regards >> Hubert >> >> >> >> Chris Fields wrote: >> >>> You need to add the input conditions as well (you have several >>> lines which may play a role; I would like to know what you >>> normally enter for those). >>> >>> How long did you let the script run? I ran a quick check on your >>> sequences; you have almost 1600, so you have to expect that you'll >>> run into some problems here! Most here (including me) would >>> suggest you try installing a local blast setup for something like >>> this. >>> >>> Chris >>> >>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: >>> >>> >>>> hi, >>>> I have submitted the bug -> Bug 2017 >>>> with the script and input file, just start it from command line >>>> >>>> thank you very much >>>> greetings >>>> >>>> Hubert >>>> >>>> Chris Fields wrote: >>>> >>>>> Hubert, >>>>> >>>>> I have a script that's using blastxml and XML output which seems >>>>> to work. >>>>> I'll try looking at it to get a better idea this weekend. >>>>> >>>>> Chris >>>>> >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >>>>>> Sent: Friday, June 02, 2006 4:12 PM >>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; 'Sendu >>>>>> Bala' >>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>> >>>>>> hi, >>>>>> sorry, but I have updated the remoteblast module and I have run >>>>>> several >>>>>> attempts with the same results as before. It didn't work. >>>>>> I didn't get any results. >>>>>> >>>>>> regards >>>>>> Hubert >>>>>> >>>>>> >>>>>> Chris Fields wrote: >>>>>> >>>>>> >>>>>>> Sendu, Hubert, >>>>>>> >>>>>>> >>>>>>> Hubert, your code looks fine so Sendu's patch should fix the >>>>>>> problem >>>>>>> >>>>>>> >>>>>> (break >>>>>> >>>>>> >>>>>>> out of that infinite loop). I applied Sendu's patch to >>>>>>> RemoteBlast in >>>>>>> >>>>>>> >>>>>> CVS; >>>>>> >>>>>> >>>>>>> it passed all tests in RemoteBlast.t. Try updating from CVS >>>>>>> to see if >>>>>>> >>>>>>> >>>>>> it >>>>>> >>>>>> >>>>>>> works. >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>>>>>> Sent: Friday, June 02, 2006 4:04 AM >>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>>> >>>>>>>> Hubert Prielinger wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> hi, >>>>>>>>> I have the following program and it worked quite well, for >>>>>>>>> retrieving >>>>>>>>> remoteblast results in a textfile, >>>>>>>>> now I have altered it to to xml, and it didn't work >>>>>>>>> anymore..... >>>>>>>>> it takes all the parameter at the commandline, submits the >>>>>>>>> query, but >>>>>>>>> >>>>>>>>> >>>>>> I >>>>>> >>>>>> >>>>>>>>> don't retrieve any results file anymore..... >>>>>>>>> >>>>>>>>> it seems that it hangs in a endless loop...... >>>>>>>>> the only output I get is: $rc is not a ref! over and >>>>>>>>> over..... it >>>>>>>>> doesn't enter the else term anymore.... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> There is no problem with your code. The problem is with the >>>>>>>> NCBI server >>>>>>>> and should be reported to them. You can visit the site and do >>>>>>>> a blast, >>>>>>>> requesting xml format, and you will typically get one normal >>>>>>>> 'waiting' >>>>>>>> message and the promise that it will be updated in x seconds, >>>>>>>> but >>>>>>>> subsequent attempts to get progress information result in an >>>>>>>> xml error >>>>>>>> page because the NCBI server doesn't actually send any data. >>>>>>>> >>>>>>>> Unfortunately the way that the bioperl code is written, it >>>>>>>> treats no >>>>>>>> data as 'waiting' instead of an error. I've offered a patch >>>>>>>> to fix this >>>>>>>> at this bug page: >>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> >>> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Sat Jun 3 00:35:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 2 Jun 2006 23:35:21 -0500 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <4480E7AA.3020603@gmx.at> References: <000001c68691$8c4eeb40$15327e82@pyrimidine> <4480C78C.1000701@gmx.at> <4480DC8B.7070005@gmx.at> <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu> <4480E7AA.3020603@gmx.at> Message-ID: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote: > hi chris, > thanks but I never intended to run the remoteblast with so much, > only a few of them, acutally I goal is to run the phiblast with > regular expression, so that i just don't need that > file anymore Not a problem. Just to let you know, I did manage to get the script working, so I'm marking the bug INVALID. I think the problem isn't that there is an infinite loop so much as setting composition-based statistics causes the search to take much much longer; try removing that line to see what I mean. Just so you know, using $result->query_name doesn't get you what you would expect (it gives you a part of the RID, which you don't want; this is something in the XML output that is beyond our control). You might want to change it to something else or you'll get filenames with numerical names. > another question for parsing the xml output....is there a xml > parser available for blast xml output or how to start..... > I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, > but I'm not sure how to start....sorry, I guess I'm too stupid.... > is their maybe another introduction or an example. Bio::SearchIO objects are used to parse BLAST XML output if you have it saved to a file. For instance: my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml'); while (my $result = $factory->next_result) { while (my $hit = $result->next_hit) { while (my $hsp = $hit->next_hsp { #do stuff here } } } The only thing that changes in parsing a text BLAST report from an XML BLAST report is the -format line (similar to the -readmethod parameter in RemoteBlast). You shouldn't need to look up any more documentation other than these on the wiki: http://www.bioperl.org/wiki/HOWTO:SearchIO http://www.bioperl.org/wiki/Module:Bio::SearchIO http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml Pay attention to the fact you'll need to install XML::SAX (CPAN) and that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding up parsing. Chris > thanks > Hubert > > > Chris Fields wrote: >> Yes, I see the same error you do. But I have a similar script >> (blastp, XML blast report, XML parsing, similar loop structure) >> that works fine. I'm trying to dissect the problem but I think >> it may be something logically wrong here (something not so >> obvious) and not a bug... >> >> What I'm trying to say is, when you send sequences using >> remoteblast like, this you are essentially spamming the NCBI >> BLAST server with ~1600 requests. This script wasn't set up with >> that intent in mind; you should really try to set up your own >> local blast database if possible. If you can't, try running this >> script in off-hours (10pm-6am EST or something like that). >> >> >> Chris >> >> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote: >> >> >>> hi, >>> input database: swissprot >>> matrix: pam30 >>> count: 1 >>> gapcosts: 9 1 >>> >>> I know that there are a lot of sequences, but that doesn't >>> matter, you can delete all of them except one, the amount of the >>> sequences is not the problem, the script reads one line and >>> submits it.....then the second line and so on.....I have tried >>> it with only one sequence either and I got the same result.... >>> the script run at that time for more than 20 >>> minutes!!!!!! .....and that should be enough time to retrieve >>> the results for ONE sequence, I guess >>> >>> regards >>> Hubert >>> >>> >>> >>> Chris Fields wrote: >>> >>>> You need to add the input conditions as well (you have several >>>> lines which may play a role; I would like to know what >>>> you normally enter for those). >>>> >>>> How long did you let the script run? I ran a quick check on >>>> your sequences; you have almost 1600, so you have to expect >>>> that you'll run into some problems here! Most here (including >>>> me) would suggest you try installing a local blast setup for >>>> something like this. >>>> >>>> Chris >>>> >>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: >>>> >>>> >>>>> hi, >>>>> I have submitted the bug -> Bug 2017 >>>>> with the script and input file, just start it from command line >>>>> >>>>> thank you very much >>>>> greetings >>>>> >>>>> Hubert >>>>> >>>>> Chris Fields wrote: >>>>> >>>>>> Hubert, >>>>>> >>>>>> I have a script that's using blastxml and XML output which >>>>>> seems to work. >>>>>> I'll try looking at it to get a better idea this weekend. >>>>>> >>>>>> Chris >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >>>>>>> Sent: Friday, June 02, 2006 4:12 PM >>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; >>>>>>> 'Sendu Bala' >>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>> >>>>>>> hi, >>>>>>> sorry, but I have updated the remoteblast module and I have >>>>>>> run several >>>>>>> attempts with the same results as before. It didn't work. >>>>>>> I didn't get any results. >>>>>>> >>>>>>> regards >>>>>>> Hubert >>>>>>> >>>>>>> >>>>>>> Chris Fields wrote: >>>>>>> >>>>>>> >>>>>>>> Sendu, Hubert, >>>>>>>> >>>>>>>> >>>>>>>> Hubert, your code looks fine so Sendu's patch should fix >>>>>>>> the problem >>>>>>>> >>>>>>>> >>>>>>> (break >>>>>>> >>>>>>> >>>>>>>> out of that infinite loop). I applied Sendu's patch to >>>>>>>> RemoteBlast in >>>>>>>> >>>>>>>> >>>>>>> CVS; >>>>>>> >>>>>>> >>>>>>>> it passed all tests in RemoteBlast.t. Try updating from >>>>>>>> CVS to see if >>>>>>>> >>>>>>>> >>>>>>> it >>>>>>> >>>>>>> >>>>>>>> works. >>>>>>>> >>>>>>>> Chris >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM >>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>>>> >>>>>>>>> Hubert Prielinger wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> hi, >>>>>>>>>> I have the following program and it worked quite well, >>>>>>>>>> for retrieving >>>>>>>>>> remoteblast results in a textfile, >>>>>>>>>> now I have altered it to to xml, and it didn't work >>>>>>>>>> anymore..... >>>>>>>>>> it takes all the parameter at the commandline, submits >>>>>>>>>> the query, but >>>>>>>>>> >>>>>>>>>> >>>>>>> I >>>>>>> >>>>>>> >>>>>>>>>> don't retrieve any results file anymore..... >>>>>>>>>> >>>>>>>>>> it seems that it hangs in a endless loop...... >>>>>>>>>> the only output I get is: $rc is not a ref! over and >>>>>>>>>> over..... it >>>>>>>>>> doesn't enter the else term anymore.... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> There is no problem with your code. The problem is with >>>>>>>>> the NCBI server >>>>>>>>> and should be reported to them. You can visit the site and >>>>>>>>> do a blast, >>>>>>>>> requesting xml format, and you will typically get one >>>>>>>>> normal 'waiting' >>>>>>>>> message and the promise that it will be updated in x >>>>>>>>> seconds, but >>>>>>>>> subsequent attempts to get progress information result in >>>>>>>>> an xml error >>>>>>>>> page because the NCBI server doesn't actually send any data. >>>>>>>>> >>>>>>>>> Unfortunately the way that the bioperl code is written, it >>>>>>>>> treats no >>>>>>>>> data as 'waiting' instead of an error. I've offered a >>>>>>>>> patch to fix this >>>>>>>>> at this bug page: >>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> Christopher Fields >>>> Postdoctoral Researcher >>>> Lab of Dr. Robert Switzer >>>> Dept of Biochemistry >>>> University of Illinois Urbana-Champaign >>>> >>>> >>>> >>>> >>>> >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> > Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason.stajich at duke.edu Sat Jun 3 11:10:51 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 3 Jun 2006 11:10:51 -0400 Subject: [Bioperl-l] Bio::Tree::IO "Collapse" function In-Reply-To: <1149084373.447da2d5c5339@128.91.55.38> References: <1149019912.447ca7085124e@128.91.55.38> <6B175FC0-F9D4-4658-AF9D-23D7F1C1B241@duke.edu> <1149084373.447da2d5c5339@128.91.55.38> Message-ID: <9206E0B2-15DC-4AB2-B71B-5EA9D1D11AEC@duke.edu> The bootstrap is stored as the node ID because that is a limitation of the newick format, there isn't a formal way to distinguish internal IDs from bootstraps. There are several differents ways that programs encode the internal ID and a bootstrap value in that one slot - we try and parse it out if the the bootstrap is stored in brackets like INTERNALID[BOOTSTRAP]. Formats like nhx explicitly solve this problem, but most programs only use the simple newick. if you know your data it is a simple procedure to move the internal ID data into the bootstrap slot. in terms of ignoreoverwrite you just need to send in a second parameter which is true $node->add_Descendent($childnode, 1); -jason On May 31, 2006, at 10:06 AM, Lucia Peixoto wrote: > Hi > Thanks > a couple more questions > why is the bootstrap value stored as the node id? Is that right? > > also, in the add_descendant method, how do you set the > $ignoreoverwrite > parameter to true? > > Lucia > > Quoting Jason Stajich : > >> you need to special case the root - it won't have an ancestor. just >> protect the my $parent = $node->ancestor with an if statement as I >> did below >> >> On May 30, 2006, at 4:11 PM, Lucia Peixoto wrote: >> >>> Hi >>> OK that was silly, but what I have in my code is what you just wrote >>> But the problem is that if I write >>> >>> $parent->add_Descendent($child) >>> >>> it tells me that I am calling the method "ass_Descendent" on an >>> undefined value >>> (but I did define $parent before??) >>> >>> So here it goes the code so far: >>> >>> use Bio::TreeIO; >>> my $in = new Bio::TreeIO(-file => 'Test2.tre', >>> -format => 'newick'); >>> my $out = new Bio::TreeIO(-file => '>mytree.out', >>> -format => 'newick'); >>> while( my $tree = $in->next_tree ) { >>> foreach my $node ( grep { ! $_->is_Leaf() } $tree->get_nodes >>> () ) { >>> my $bootstrap=$node->_creation_id; >>> >>> if ($bootstrap < 70 ){ >>>>>> if( my $parent = $node->ancestor ) { >>> my @children=$node->get_all_Descendents; >>> foreach my $child (@children){ >>> $parent->add_Descendent($child); >>> } >> } >>> >>> ........ >>> >>> eventually I'll add (once I assigned the children to the parent >>> succesfully): >>> $tree->remove_Node($node); >>> >>> } >>> } >>> $out->write_tree($tree); >>> } >>> >>> Quoting aaron.j.mackey at gsk.com: >>> >>>>> foreach $child (@children){ >>>>> $parent=add_Descendent->$child; >>>>> } >>>> >>>> I think what you want is $parent->add_Descendent($child) >>>> >>>> -Aaron >>>> >>> >>> >>> Lucia Peixoto >>> Department of Biology,SAS >>> University of Pennsylvania >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> > > > Lucia Peixoto > Department of Biology,SAS > University of Pennsylvania -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason.stajich at duke.edu Sat Jun 3 11:29:31 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 3 Jun 2006 11:29:31 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <447C7985.9000404@cornell.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> Message-ID: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> you can get all the Hits or hsps with the following method: my @hits = $result->hits; my @hsps = $hit->hsps; You can also reset the counter since these implementations are in- memory and already parsed (and not a stream processor per se). next_XX just iterates through the list stored in the parent object. $result->rewind; and $hit->rewind; For example, the rewind needs to be called if you want to use a ResultWriter object and filter some of the values for the final writing after first inspecting them. -jason On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote: > Thanks for your comment Sendu, it was very helpful. I think this > must be > what's going on.. I am using $blast_report->next_result in both > subroutines. It appears that analyzing the blast results first w/ my > sort subroutine empties (?) the $blast_result object so that when I > try > to print, there is nothing left to print. (and visa-versa when I print > first then try to sort). > So, from the looks of things, using next_result has the effect of > popping the Bio::Search::Result::ResultI objects off of the SearchIO > blast report object?? > > It seems I could get around this by making a copy of the blast > report by > setting it to another new variable...(not the most elegant > solution) but > I'm having trouble with this... > > If I do: > > my $blast_report_copy = $blast_report; > > I'm just copying the reference to the SearchIO blast result, so it > doesn't help me. How can I make another physical copy of this blast > result object? Seems like a simple thing but how to do it is > escaping me. > > But better yet, the way to go is to 'reset the counter,' or to find a > way to look at/print/sort the results without removing data from the > blast result object. How is this done though?? > > Sendu and Brian, I didn't post the sort_results subroutine because > it is > sprawling, as is a lot of my code. The code I provided was more > like an > aid for my explanation of the problem.. it doesn't actually run - > sorry > for the confusion, I should have more clear on that. The important > thing to know perhaps is that both sort_results and > print_blast_results > contain a foreach loop where I am using the 'next_results' method to > view blast results. (And to clarify for Torsten, the blastall() is > working just fine - the analysis/viewing of the results object is > where > I am encountering the problem.) > > > Any other ideas would be greatly appreciated... > > Thank you, > Genevieve > > > > > Sendu Bala wrote: > >> Genevieve DeClerck wrote: >> >>> Hi, >> >> [snip] >> >>> If I've sorted the results the sorted-results will print to screen, >>> however when I try to print the Hit Table results nothing is >>> returned, >>> as if the blast results have evaporated.... and visa versa, if i >>> comment out the part where i point my sorting subroutine to the >>> blast >>> results reference, my hit table results suddenly prints to screen. >> >> [snip] >> >>> Here's an abbreviated version of my code: >> >> [snip] >> >>> ####### >>> ### the following 2 actions seem to be mutually exclusive. >>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >>> # SeqFeature objs stored in arrays. arrays are then printed >>> # to stdout >>> &sort_results($blast_report); >>> >>> # 2) print blast results >>> &print_blast_results($blast_report); >> >> >>> sub print_blast_results{ >>> my $report = shift; >>> while(my $result = $report->next_result()){ >> >> [snip] >> >> You didn't give us your sort_results subroutine, but is it as >> simple as >> they both use $report->next_result (and/or $result->next_hit), but >> you >> don't reset the internal counter back to the start, so the second >> subroutine tries to get the next_result and finds the first >> subroutine >> has already looked at the last result and so next_result returns >> false? >> >> From a quick look it wasn't obvious how to reset the counter. >> Hopefully >> this can be done and someone else knows how. >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Sat Jun 3 15:13:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 3 Jun 2006 14:13:22 -0500 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> Message-ID: Nice! Didn't know I could do that. Maybe we should add some of this to the HOWTO (or is it already in there?). Chris On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote: > you can get all the Hits or hsps with the following method: > my @hits = $result->hits; > my @hsps = $hit->hsps; > > > You can also reset the counter since these implementations are in- > memory and already parsed (and not a stream processor per se). > next_XX just iterates through the list stored in the parent object. > > $result->rewind; > > and > > $hit->rewind; > > > For example, the rewind needs to be called if you want to use a > ResultWriter object and filter some of the values for the final > writing after first inspecting them. > > -jason > > > On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote: > >> Thanks for your comment Sendu, it was very helpful. I think this >> must be >> what's going on.. I am using $blast_report->next_result in both >> subroutines. It appears that analyzing the blast results first w/ my >> sort subroutine empties (?) the $blast_result object so that when I >> try >> to print, there is nothing left to print. (and visa-versa when I >> print >> first then try to sort). >> So, from the looks of things, using next_result has the effect of >> popping the Bio::Search::Result::ResultI objects off of the SearchIO >> blast report object?? >> >> It seems I could get around this by making a copy of the blast >> report by >> setting it to another new variable...(not the most elegant >> solution) but >> I'm having trouble with this... >> >> If I do: >> >> my $blast_report_copy = $blast_report; >> >> I'm just copying the reference to the SearchIO blast result, so it >> doesn't help me. How can I make another physical copy of this blast >> result object? Seems like a simple thing but how to do it is >> escaping me. >> >> But better yet, the way to go is to 'reset the counter,' or to find a >> way to look at/print/sort the results without removing data from the >> blast result object. How is this done though?? >> >> Sendu and Brian, I didn't post the sort_results subroutine because >> it is >> sprawling, as is a lot of my code. The code I provided was more >> like an >> aid for my explanation of the problem.. it doesn't actually run - >> sorry >> for the confusion, I should have more clear on that. The important >> thing to know perhaps is that both sort_results and >> print_blast_results >> contain a foreach loop where I am using the 'next_results' method to >> view blast results. (And to clarify for Torsten, the blastall() is >> working just fine - the analysis/viewing of the results object is >> where >> I am encountering the problem.) >> >> >> Any other ideas would be greatly appreciated... >> >> Thank you, >> Genevieve >> >> >> >> >> Sendu Bala wrote: >> >>> Genevieve DeClerck wrote: >>> >>>> Hi, >>> >>> [snip] >>> >>>> If I've sorted the results the sorted-results will print to screen, >>>> however when I try to print the Hit Table results nothing is >>>> returned, >>>> as if the blast results have evaporated.... and visa versa, if i >>>> comment out the part where i point my sorting subroutine to the >>>> blast >>>> results reference, my hit table results suddenly prints to screen. >>> >>> [snip] >>> >>>> Here's an abbreviated version of my code: >>> >>> [snip] >>> >>>> ####### >>>> ### the following 2 actions seem to be mutually exclusive. >>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >>>> # SeqFeature objs stored in arrays. arrays are then printed >>>> # to stdout >>>> &sort_results($blast_report); >>>> >>>> # 2) print blast results >>>> &print_blast_results($blast_report); >>> >>> >>>> sub print_blast_results{ >>>> my $report = shift; >>>> while(my $result = $report->next_result()){ >>> >>> [snip] >>> >>> You didn't give us your sort_results subroutine, but is it as >>> simple as >>> they both use $report->next_result (and/or $result->next_hit), but >>> you >>> don't reset the internal counter back to the start, so the second >>> subroutine tries to get the next_result and finds the first >>> subroutine >>> has already looked at the last result and so next_result returns >>> false? >>> >>> From a quick look it wasn't obvious how to reset the counter. >>> Hopefully >>> this can be done and someone else knows how. >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason.stajich at duke.edu Sat Jun 3 15:31:59 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sat, 3 Jun 2006 15:31:59 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> Message-ID: In the HOWTO hits() and hsps() were there, I just added rewind in the table of methods. If someone wanted to write a little section in the HOWTO about resetting the iterator that would be great. -jason On Jun 3, 2006, at 3:13 PM, Chris Fields wrote: > Nice! Didn't know I could do that. Maybe we should add some of this > to the HOWTO (or is it already in there?). > > Chris > > On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote: > >> you can get all the Hits or hsps with the following method: >> my @hits = $result->hits; >> my @hsps = $hit->hsps; >> >> >> You can also reset the counter since these implementations are in- >> memory and already parsed (and not a stream processor per se). >> next_XX just iterates through the list stored in the parent object. >> >> $result->rewind; >> >> and >> >> $hit->rewind; >> >> >> For example, the rewind needs to be called if you want to use a >> ResultWriter object and filter some of the values for the final >> writing after first inspecting them. >> >> -jason >> >> >> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote: >> >>> Thanks for your comment Sendu, it was very helpful. I think this >>> must be >>> what's going on.. I am using $blast_report->next_result in both >>> subroutines. It appears that analyzing the blast results first w/ my >>> sort subroutine empties (?) the $blast_result object so that when I >>> try >>> to print, there is nothing left to print. (and visa-versa when I >>> print >>> first then try to sort). >>> So, from the looks of things, using next_result has the effect of >>> popping the Bio::Search::Result::ResultI objects off of the SearchIO >>> blast report object?? >>> >>> It seems I could get around this by making a copy of the blast >>> report by >>> setting it to another new variable...(not the most elegant >>> solution) but >>> I'm having trouble with this... >>> >>> If I do: >>> >>> my $blast_report_copy = $blast_report; >>> >>> I'm just copying the reference to the SearchIO blast result, so it >>> doesn't help me. How can I make another physical copy of this blast >>> result object? Seems like a simple thing but how to do it is >>> escaping me. >>> >>> But better yet, the way to go is to 'reset the counter,' or to >>> find a >>> way to look at/print/sort the results without removing data from the >>> blast result object. How is this done though?? >>> >>> Sendu and Brian, I didn't post the sort_results subroutine because >>> it is >>> sprawling, as is a lot of my code. The code I provided was more >>> like an >>> aid for my explanation of the problem.. it doesn't actually run - >>> sorry >>> for the confusion, I should have more clear on that. The important >>> thing to know perhaps is that both sort_results and >>> print_blast_results >>> contain a foreach loop where I am using the 'next_results' method to >>> view blast results. (And to clarify for Torsten, the blastall() is >>> working just fine - the analysis/viewing of the results object is >>> where >>> I am encountering the problem.) >>> >>> >>> Any other ideas would be greatly appreciated... >>> >>> Thank you, >>> Genevieve >>> >>> >>> >>> >>> Sendu Bala wrote: >>> >>>> Genevieve DeClerck wrote: >>>> >>>>> Hi, >>>> >>>> [snip] >>>> >>>>> If I've sorted the results the sorted-results will print to >>>>> screen, >>>>> however when I try to print the Hit Table results nothing is >>>>> returned, >>>>> as if the blast results have evaporated.... and visa versa, if i >>>>> comment out the part where i point my sorting subroutine to the >>>>> blast >>>>> results reference, my hit table results suddenly prints to >>>>> screen. >>>> >>>> [snip] >>>> >>>>> Here's an abbreviated version of my code: >>>> >>>> [snip] >>>> >>>>> ####### >>>>> ### the following 2 actions seem to be mutually exclusive. >>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >>>>> # SeqFeature objs stored in arrays. arrays are then printed >>>>> # to stdout >>>>> &sort_results($blast_report); >>>>> >>>>> # 2) print blast results >>>>> &print_blast_results($blast_report); >>>> >>>> >>>>> sub print_blast_results{ >>>>> my $report = shift; >>>>> while(my $result = $report->next_result()){ >>>> >>>> [snip] >>>> >>>> You didn't give us your sort_results subroutine, but is it as >>>> simple as >>>> they both use $report->next_result (and/or $result->next_hit), but >>>> you >>>> don't reset the internal counter back to the start, so the second >>>> subroutine tries to get the next_result and finds the first >>>> subroutine >>>> has already looked at the last result and so next_result returns >>>> false? >>>> >>>> From a quick look it wasn't obvious how to reset the counter. >>>> Hopefully >>>> this can be done and someone else knows how. >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Sat Jun 3 19:54:20 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Sun, 04 Jun 2006 09:54:20 +1000 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <4480E7AA.3020603@gmx.at> References: <000001c68691$8c4eeb40$15327e82@pyrimidine> <4480C78C.1000701@gmx.at> <4480DC8B.7070005@gmx.at> <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu> <4480E7AA.3020603@gmx.at> Message-ID: <4482212C.3000908@infotech.monash.edu.au> Hubert, > another question for parsing the xml output....is there a xml parser > available for blast xml output or how to start..... > I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, but > I'm not sure how to start....sorry, I guess I'm too stupid.... > is their maybe another introduction or an example. I think we already answered this question for you on 20 May 2006: http://bioperl.org/pipermail/bioperl-l/2006-May/021574.html http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#How_to_parse_BLAST_XML_output http://www.bioperl.org/wiki/HOWTO:SearchIO (search for "blastxml") --Torsten Seemann From cjfields at uiuc.edu Sun Jun 4 01:17:46 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 4 Jun 2006 00:17:46 -0500 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> Message-ID: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> There's an interesting addition to this I found while checking this out; looks like if you use: my @hits = $result->hits; to get all the hits, you don't need to use '$result->rewind'. The rewind method resets the iterator for the hit list back back to the beginning, but using the hits method to grab all the hits doesn't use the iterator at all. This works either pre- or post-iteration through the Hit::BlastHit objects. Another thing; Genevieve was passing the SearchIO report object (i.e. the parser object which was returned from StandAloneBlast, $blast_report) to the methods, not the Bio::Search::Result::BlastResult object; looks like there was some confusion between the two object types since she refers to the report as the result object when it's actually the SearchIO parser object. So, once the parser was passed into the first method, a result object was generated, then destroyed. When entering the second method, the parser had already read parsed the report and generated the objects, so it ended with no output. Though passing the BlastResult object is better since one should only have to parse the report once and use the objects, for curiosity's sake, is there a method to rewind the parser itself (in other words, read through the report again)? Chris On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote: > In the HOWTO hits() and hsps() were there, I just added rewind in the > table of methods. > If someone wanted to write a little section in the HOWTO about > resetting the iterator that would be great. > > -jason > On Jun 3, 2006, at 3:13 PM, Chris Fields wrote: > >> Nice! Didn't know I could do that. Maybe we should add some of this >> to the HOWTO (or is it already in there?). >> >> Chris >> >> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote: >> >>> you can get all the Hits or hsps with the following method: >>> my @hits = $result->hits; >>> my @hsps = $hit->hsps; >>> >>> >>> You can also reset the counter since these implementations are in- >>> memory and already parsed (and not a stream processor per se). >>> next_XX just iterates through the list stored in the parent object. >>> >>> $result->rewind; >>> >>> and >>> >>> $hit->rewind; >>> >>> >>> For example, the rewind needs to be called if you want to use a >>> ResultWriter object and filter some of the values for the final >>> writing after first inspecting them. >>> >>> -jason >>> >>> >>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote: >>> >>>> Thanks for your comment Sendu, it was very helpful. I think this >>>> must be >>>> what's going on.. I am using $blast_report->next_result in both >>>> subroutines. It appears that analyzing the blast results first >>>> w/ my >>>> sort subroutine empties (?) the $blast_result object so that when I >>>> try >>>> to print, there is nothing left to print. (and visa-versa when I >>>> print >>>> first then try to sort). >>>> So, from the looks of things, using next_result has the effect of >>>> popping the Bio::Search::Result::ResultI objects off of the >>>> SearchIO >>>> blast report object?? >>>> >>>> It seems I could get around this by making a copy of the blast >>>> report by >>>> setting it to another new variable...(not the most elegant >>>> solution) but >>>> I'm having trouble with this... >>>> >>>> If I do: >>>> >>>> my $blast_report_copy = $blast_report; >>>> >>>> I'm just copying the reference to the SearchIO blast result, so it >>>> doesn't help me. How can I make another physical copy of this blast >>>> result object? Seems like a simple thing but how to do it is >>>> escaping me. >>>> >>>> But better yet, the way to go is to 'reset the counter,' or to >>>> find a >>>> way to look at/print/sort the results without removing data from >>>> the >>>> blast result object. How is this done though?? >>>> >>>> Sendu and Brian, I didn't post the sort_results subroutine because >>>> it is >>>> sprawling, as is a lot of my code. The code I provided was more >>>> like an >>>> aid for my explanation of the problem.. it doesn't actually run - >>>> sorry >>>> for the confusion, I should have more clear on that. The important >>>> thing to know perhaps is that both sort_results and >>>> print_blast_results >>>> contain a foreach loop where I am using the 'next_results' >>>> method to >>>> view blast results. (And to clarify for Torsten, the blastall() is >>>> working just fine - the analysis/viewing of the results object is >>>> where >>>> I am encountering the problem.) >>>> >>>> >>>> Any other ideas would be greatly appreciated... >>>> >>>> Thank you, >>>> Genevieve >>>> >>>> >>>> >>>> >>>> Sendu Bala wrote: >>>> >>>>> Genevieve DeClerck wrote: >>>>> >>>>>> Hi, >>>>> >>>>> [snip] >>>>> >>>>>> If I've sorted the results the sorted-results will print to >>>>>> screen, >>>>>> however when I try to print the Hit Table results nothing is >>>>>> returned, >>>>>> as if the blast results have evaporated.... and visa versa, if i >>>>>> comment out the part where i point my sorting subroutine to the >>>>>> blast >>>>>> results reference, my hit table results suddenly prints to >>>>>> screen. >>>>> >>>>> [snip] >>>>> >>>>>> Here's an abbreviated version of my code: >>>>> >>>>> [snip] >>>>> >>>>>> ####### >>>>>> ### the following 2 actions seem to be mutually exclusive. >>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >>>>>> # SeqFeature objs stored in arrays. arrays are then printed >>>>>> # to stdout >>>>>> &sort_results($blast_report); >>>>>> >>>>>> # 2) print blast results >>>>>> &print_blast_results($blast_report); >>>>> >>>>> >>>>>> sub print_blast_results{ >>>>>> my $report = shift; >>>>>> while(my $result = $report->next_result()){ >>>>> >>>>> [snip] >>>>> >>>>> You didn't give us your sort_results subroutine, but is it as >>>>> simple as >>>>> they both use $report->next_result (and/or $result->next_hit), but >>>>> you >>>>> don't reset the internal counter back to the start, so the second >>>>> subroutine tries to get the next_result and finds the first >>>>> subroutine >>>>> has already looked at the last result and so next_result returns >>>>> false? >>>>> >>>>> From a quick look it wasn't obvious how to reset the counter. >>>>> Hopefully >>>>> this can be done and someone else knows how. >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> Christopher Fields >> Postdoctoral Researcher >> Lab of Dr. Robert Switzer >> Dept of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From jason.stajich at duke.edu Sun Jun 4 10:08:29 2006 From: jason.stajich at duke.edu (Jason Stajich) Date: Sun, 4 Jun 2006 10:08:29 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> Message-ID: right - you don't need rewind if you aren't going to use the iterator (next_XXX) -- we provide two different ways to get access to the data. you can do for my $hit ( $result->hits ) { } or while( my $hit = $result->next_hit ) { } If you want to rewind the parser then (assuming you are using a filestream and not a data stream from the web or zcat or something) just reset the filehandle seek($searchio->_fh, 0); but then you'll have to re-parse everything and pay that cost twice - it makes more sense to me to just save the results and put them in list if you are going to deliberately make two passes over all the results. You either pay the cost of memory (keeping all the objects) or time (reparse the results). -jason On Jun 4, 2006, at 1:17 AM, Chris Fields wrote: > There's an interesting addition to this I found while checking this > out; looks like if you use: > > my @hits = $result->hits; > > to get all the hits, you don't need to use '$result->rewind'. The > rewind method resets the iterator for the hit list back back to the > beginning, but using the hits method to grab all the hits doesn't > use the iterator at all. This works either pre- or post-iteration > through the Hit::BlastHit objects. > > Another thing; Genevieve was passing the SearchIO report object > (i.e. the parser object which was returned from StandAloneBlast, > $blast_report) to the methods, not the > Bio::Search::Result::BlastResult object; looks like there was some > confusion between the two object types since she refers to the > report as the result object when it's actually the SearchIO parser > object. So, once the parser was passed into the first method, a > result object was generated, then destroyed. When entering the > second method, the parser had already read parsed the report and > generated the objects, so it ended with no output. > > Though passing the BlastResult object is better since one should > only have to parse the report once and use the objects, for > curiosity's sake, is there a method to rewind the parser itself (in > other words, read through the report again)? > > Chris > > > On Jun 3, 2006, at 2:31 PM, Jason Stajich wrote: > >> In the HOWTO hits() and hsps() were there, I just added rewind in the >> table of methods. >> If someone wanted to write a little section in the HOWTO about >> resetting the iterator that would be great. >> >> -jason >> On Jun 3, 2006, at 3:13 PM, Chris Fields wrote: >> >>> Nice! Didn't know I could do that. Maybe we should add some of >>> this >>> to the HOWTO (or is it already in there?). >>> >>> Chris >>> >>> On Jun 3, 2006, at 10:29 AM, Jason Stajich wrote: >>> >>>> you can get all the Hits or hsps with the following method: >>>> my @hits = $result->hits; >>>> my @hsps = $hit->hsps; >>>> >>>> >>>> You can also reset the counter since these implementations are in- >>>> memory and already parsed (and not a stream processor per se). >>>> next_XX just iterates through the list stored in the parent object. >>>> >>>> $result->rewind; >>>> >>>> and >>>> >>>> $hit->rewind; >>>> >>>> >>>> For example, the rewind needs to be called if you want to use a >>>> ResultWriter object and filter some of the values for the final >>>> writing after first inspecting them. >>>> >>>> -jason >>>> >>>> >>>> On May 30, 2006, at 12:57 PM, Genevieve DeClerck wrote: >>>> >>>>> Thanks for your comment Sendu, it was very helpful. I think this >>>>> must be >>>>> what's going on.. I am using $blast_report->next_result in both >>>>> subroutines. It appears that analyzing the blast results first >>>>> w/ my >>>>> sort subroutine empties (?) the $blast_result object so that >>>>> when I >>>>> try >>>>> to print, there is nothing left to print. (and visa-versa when I >>>>> print >>>>> first then try to sort). >>>>> So, from the looks of things, using next_result has the effect of >>>>> popping the Bio::Search::Result::ResultI objects off of the >>>>> SearchIO >>>>> blast report object?? >>>>> >>>>> It seems I could get around this by making a copy of the blast >>>>> report by >>>>> setting it to another new variable...(not the most elegant >>>>> solution) but >>>>> I'm having trouble with this... >>>>> >>>>> If I do: >>>>> >>>>> my $blast_report_copy = $blast_report; >>>>> >>>>> I'm just copying the reference to the SearchIO blast result, so it >>>>> doesn't help me. How can I make another physical copy of this >>>>> blast >>>>> result object? Seems like a simple thing but how to do it is >>>>> escaping me. >>>>> >>>>> But better yet, the way to go is to 'reset the counter,' or to >>>>> find a >>>>> way to look at/print/sort the results without removing data >>>>> from the >>>>> blast result object. How is this done though?? >>>>> >>>>> Sendu and Brian, I didn't post the sort_results subroutine because >>>>> it is >>>>> sprawling, as is a lot of my code. The code I provided was more >>>>> like an >>>>> aid for my explanation of the problem.. it doesn't actually run - >>>>> sorry >>>>> for the confusion, I should have more clear on that. The >>>>> important >>>>> thing to know perhaps is that both sort_results and >>>>> print_blast_results >>>>> contain a foreach loop where I am using the 'next_results' >>>>> method to >>>>> view blast results. (And to clarify for Torsten, the blastall() is >>>>> working just fine - the analysis/viewing of the results object is >>>>> where >>>>> I am encountering the problem.) >>>>> >>>>> >>>>> Any other ideas would be greatly appreciated... >>>>> >>>>> Thank you, >>>>> Genevieve >>>>> >>>>> >>>>> >>>>> >>>>> Sendu Bala wrote: >>>>> >>>>>> Genevieve DeClerck wrote: >>>>>> >>>>>>> Hi, >>>>>> >>>>>> [snip] >>>>>> >>>>>>> If I've sorted the results the sorted-results will print to >>>>>>> screen, >>>>>>> however when I try to print the Hit Table results nothing is >>>>>>> returned, >>>>>>> as if the blast results have evaporated.... and visa versa, if i >>>>>>> comment out the part where i point my sorting subroutine to the >>>>>>> blast >>>>>>> results reference, my hit table results suddenly prints to >>>>>>> screen. >>>>>> >>>>>> [snip] >>>>>> >>>>>>> Here's an abbreviated version of my code: >>>>>> >>>>>> [snip] >>>>>> >>>>>>> ####### >>>>>>> ### the following 2 actions seem to be mutually exclusive. >>>>>>> # 1) sort results into 1-hitter, 2-hitter, etc. groups of >>>>>>> # SeqFeature objs stored in arrays. arrays are then printed >>>>>>> # to stdout >>>>>>> &sort_results($blast_report); >>>>>>> >>>>>>> # 2) print blast results >>>>>>> &print_blast_results($blast_report); >>>>>> >>>>>> >>>>>>> sub print_blast_results{ >>>>>>> my $report = shift; >>>>>>> while(my $result = $report->next_result()){ >>>>>> >>>>>> [snip] >>>>>> >>>>>> You didn't give us your sort_results subroutine, but is it as >>>>>> simple as >>>>>> they both use $report->next_result (and/or $result->next_hit), >>>>>> but >>>>>> you >>>>>> don't reset the internal counter back to the start, so the second >>>>>> subroutine tries to get the next_result and finds the first >>>>>> subroutine >>>>>> has already looked at the last result and so next_result returns >>>>>> false? >>>>>> >>>>>> From a quick look it wasn't obvious how to reset the counter. >>>>>> Hopefully >>>>>> this can be done and someone else knows how. >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> -- >>>> Jason Stajich >>>> Duke University >>>> http://www.duke.edu/~jes12 >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > -- Jason Stajich Duke University http://www.duke.edu/~jes12/ From cjfields at uiuc.edu Sun Jun 4 11:51:53 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 4 Jun 2006 10:51:53 -0500 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> Message-ID: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote: > right - you don't need rewind if you aren't going to use the > iterator (next_XXX) -- we provide two different ways to get access > to the data. > you can do > for my $hit ( $result->hits ) { > > } > or > while( my $hit = $result->next_hit ) { > } > > > If you want to rewind the parser then (assuming you are using a > filestream and not a data stream from the web or zcat or something) > just reset the filehandle > seek($searchio->_fh, 0); > > but then you'll have to re-parse everything and pay that cost twice > - it makes more sense to me to just save the results and put them > in list if you are going to deliberately make two passes over all > the results. You either pay the cost of memory (keeping all the > objects) or time (reparse the results). I agree there isn't any really good reason to rewind the parser; I was mainly just curious how this was accomlished. Your point about a memory or time hit might be a point we want to make in the HOWTO. I already added some example code about rewinding the iterator and hits, so I'll add a bit about this. I think a good deal of confusion here comes from not knowing how SearchIO works (i.e. that parsing a report can return several results, in turn which can return hits, in tur returning HSP's). Of course that doesn't include iterations in the case of PSI-BLAST. The HOWTO, I think, explains this all well so it may be a matter of just RTM (I left the 'F' out to be a bit more polite). Chris > -jason > On Jun 4, 2006, at 1:17 AM, Chris Fields wrote: > ... Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ewijaya at i2r.a-star.edu.sg Mon Jun 5 04:16:59 2006 From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward) Date: Mon, 05 Jun 2006 16:16:59 +0800 Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ? Message-ID: <63f5e7a446b.448458fb@i2r.a-star.edu.sg> Dear Lincoln and experts Curently I have a CGI application that does this: 1. read and uploaded file 2. check the content of the file whether fasta or not 3. print out the content of the file. Now the problem I'm facing is that on step three. The content of the file handled is altered namely the very first line does not get printed. So for example if "test1.fasta" looks like this: >Seq0 ATCGGGGG >Seq1 GGGGGGG >Seq2 ATCCCCCC When it was printed it gives only: ATCGGGGG >Seq1 GGGGGGG >Seq2 ATCCCCCC Why is this happening? Below is the complete cgi script that does the task I mentioned earlier. Did I missed out anything in my code? __BEGIN__ #!/usr/bin/perl -w use CGI qw/:standard :html3/; use CGI::Carp qw( fatalsToBrowser ); use Data::Dumper; BEGIN { if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) { # Blindly untaint. Taintchecking is to protect # from Web data; # the environment is under our control. eval "use lib '$_';" foreach ( reverse split( /:/, $1 ) ); } } use Bio::Tools::GuessSeqFormat; print header, start_html('file upload'), h1('file upload!'); print_form() unless param; print_results() if param; print end_html; sub print_form { print start_multipart_form(), filefield(-name=>'upload',-size=>60),br, submit(-label=>'Upload File'), end_form; } sub print_results { my $length; my $file = param('upload'); my $fh_upload = upload('upload'); my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload ); my $format_upload = $guesser_upload->guess; if ( !$file ) { print "No file uploaded."; return; } print h2('File name'), $file; print h2('Format'), $format_upload; print h2('The content is'),br; while (<$fh_upload>) { # The very first line of the file is not get printed here # Why? print; print br; $length += length($_); } print h2('File length'), $length; } __END__ Hope to hear from you again. Regards, Edward WIJAYA SINGAPORE ------------ Institute For Infocomm Research - Disclaimer ------------- This email is confidential and may be privileged. If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------- From sb at mrc-dunn.cam.ac.uk Mon Jun 5 05:02:48 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 05 Jun 2006 10:02:48 +0100 Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ? In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg> References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg> Message-ID: <4483F338.7090909@mrc-dunn.cam.ac.uk> Wijaya Edward wrote: > Dear Lincoln and experts > > Curently I have a CGI application that does this: > > 1. read and uploaded file > 2. check the content of the file whether fasta or not > 3. print out the content of the file. > > > Now the problem I'm facing is that > on step three. The content of the file handled is altered > namely the very first line does not get printed. The problem is almost certainly that the guessing is done by reading the first line of the filehandle, so that your subsequent while loop on that same filehandle starts at the second line. Just seek the filehandle back to the start before trying to print the contents out. ... my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload ); my $format_upload = $guesser_upload->guess; seek($fh_upload, 0, 0); ... while (<$fh_upload>) { ... } An alternative might be to pass GuessSeqFormat the filename in which case it would make its own filehandle and close it, leaving your own filehandle untouched. From sb at mrc-dunn.cam.ac.uk Mon Jun 5 05:57:52 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 05 Jun 2006 10:57:52 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu> Message-ID: <44840020.4020604@mrc-dunn.cam.ac.uk> Chris Fields wrote: > > On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote: > >> If you want to rewind the parser then (assuming you are using a >> filestream and not a data stream from the web or zcat or something) >> just reset the filehandle >> seek($searchio->_fh, 0); >> >> but then you'll have to re-parse everything and pay that cost twice - >> it makes more sense to me to just save the results and put them in >> list if you are going to deliberately make two passes over all the >> results. You either pay the cost of memory (keeping all the >> objects) or time (reparse the results). > > I agree there isn't any really good reason to rewind the parser; I was > mainly just curious how this was accomlished. Didn't you already explain why seeking a SearchIO wouldn't work? And indeed, didn't Genevieve already try to do this after I suggested it and found that it didn't work? Confused... From sb at mrc-dunn.cam.ac.uk Mon Jun 5 09:19:12 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 05 Jun 2006 14:19:12 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu> <44840020.4020604@mrc-dunn.cam.ac.uk> Message-ID: <44842F50.7090408@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > > On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote: > >> Didn't you already explain why seeking a SearchIO wouldn't work? And >> indeed, didn't Genevieve already try to do this after I suggested it and >> found that it didn't work? >> >> Confused... >> > There is an internal _rewind if you are using the next_XX methods that > resets the internal iterator (all the data has already been parsed). > > You >>can<< reseek the internal filehandle (accessible by calling > $object->_fh ), but you can't call seek on the searchio object itsself. ... poor choice of words on my part. Or maybe I'm not understanding you... I already suggested to Genevieve that she try: # in the following, $blast_report is a SearchIO > my $blast_report = $factory->blastall($ref_seq_objs); > my $blast_fh = $blast_report->fh(); > while (<$blast_fh>) { > # $_ is a ResultI object, use as normal > } > seek($blast_fh, 0, 0); # this would be great, but does it work? > while <$blast_fh>) { > # go through the results again in your second subroutine > } > > An alternative hacky way of doing it, which may also not work, would be > to go through your $blast_report as normal, but then before going > through it a second time, say > my $fh = $blast_report->_fh; > seek($fh, 0, 0); She reported that neither way of doing it worked. You seem to be saying that at least the second way should have. Is that right? rewind() would of course be preferable, I just wanted to know if my assumption about seek working was correct or not. From jason at bioperl.org Mon Jun 5 09:45:40 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Jun 2006 09:45:40 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44842F50.7090408@mrc-dunn.cam.ac.uk> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu> <44840020.4020604@mrc-dunn.cam.ac.uk> <44842F50.7090408@mrc-dunn.cam.ac.uk> Message-ID: It depends on how you have run StandAloneBlast -- if the stream you are dealing with is not a file, but a datastream as in the STDOUT from BLAST, then the seek won't work (as it wouldn't work for a zcat on gzipped file). I think the default StandAloneBlast behavior is to operate on a STDOUT stream so seeking won't work no matter what. On Jun 5, 2006, at 9:19 AM, Sendu Bala wrote: > Jason Stajich wrote: >> >> On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote: >> >>> Didn't you already explain why seeking a SearchIO wouldn't work? And >>> indeed, didn't Genevieve already try to do this after I suggested >>> it and >>> found that it didn't work? >>> >>> Confused... >>> >> There is an internal _rewind if you are using the next_XX methods >> that >> resets the internal iterator (all the data has already been parsed). >> >> You >>can<< reseek the internal filehandle (accessible by calling >> $object->_fh ), but you can't call seek on the searchio object >> itsself. > > ... poor choice of words on my part. Or maybe I'm not understanding > you... I already suggested to Genevieve that she try: > > # in the following, $blast_report is a SearchIO >> my $blast_report = $factory->blastall($ref_seq_objs); >> my $blast_fh = $blast_report->fh(); >> while (<$blast_fh>) { >> # $_ is a ResultI object, use as normal >> } >> seek($blast_fh, 0, 0); # this would be great, but does it work? >> while <$blast_fh>) { >> # go through the results again in your second subroutine >> } >> >> An alternative hacky way of doing it, which may also not work, >> would be >> to go through your $blast_report as normal, but then before going >> through it a second time, say >> my $fh = $blast_report->_fh; >> seek($fh, 0, 0); > > She reported that neither way of doing it worked. You seem to be > saying > that at least the second way should have. Is that right? > rewind() would of course be preferable, I just wanted to know if my > assumption about seek working was correct or not. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From sb at mrc-dunn.cam.ac.uk Mon Jun 5 10:13:03 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 05 Jun 2006 15:13:03 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu> <44840020.4020604@mrc-dunn.cam.ac.uk> <44842F50.7090408@mrc-dunn.cam.ac.uk> Message-ID: <44843BEF.6080609@mrc-dunn.cam.ac.uk> Jason Stajich wrote: > It depends on how you have run StandAloneBlast -- if the stream you are > dealing with is not a file, but a datastream as in the STDOUT from > BLAST, then the seek won't work (as it wouldn't work for a zcat on > gzipped file). I think the default StandAloneBlast behavior is to > operate on a STDOUT stream so seeking won't work no matter what. As far as I can see, when you say blastall() on a StandAloneBlast, it eventually does: if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i ) { $blast_obj = Bio::SearchIO->new(-file=>$outfile, -format => 'blast' ); } So seeking should work? Tools like StandAloneBlast creating temp files for their results prior to parsing is actually one of things I don't like about the bioperl tool system. From lstein at cshl.edu Mon Jun 5 10:51:52 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Mon, 5 Jun 2006 10:51:52 -0400 Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ? In-Reply-To: <63f5e7a446b.448458fb@i2r.a-star.edu.sg> References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg> Message-ID: <200606051051.52648.lstein@cshl.edu> Hi, From the Synopsis for GuessSeqFormat: # To guess the format from an already open filehandle: my $guesser = new Bio::Tools::GuessSeqFormat( -fh => $filehandle ); my $format = $guesser->guess; # If the filehandle is seekable (STDIN isn't), it will be # returned to its original position. The filehandle returned by CGI.pm is not seekable. Lincoln On Monday 05 June 2006 04:16, Wijaya Edward wrote: > Dear Lincoln and experts > > Curently I have a CGI application that does this: > > 1. read and uploaded file > 2. check the content of the file whether fasta or not > 3. print out the content of the file. > > > Now the problem I'm facing is that > on step three. The content of the file handled is altered > namely the very first line does not get printed. > > So for example if "test1.fasta" looks like this: > >Seq0 > > ATCGGGGG > > >Seq1 > > GGGGGGG > > >Seq2 > > ATCCCCCC > > When it was printed it gives only: > > ATCGGGGG > > >Seq1 > > GGGGGGG > > >Seq2 > > ATCCCCCC > > Why is this happening? > > Below is the complete cgi script that > does the task I mentioned earlier. > > Did I missed out anything in my code? > > > > __BEGIN__ > #!/usr/bin/perl -w > > use CGI qw/:standard :html3/; > use CGI::Carp qw( fatalsToBrowser ); > use Data::Dumper; > > BEGIN { > if ( $ENV{PERL5LIB} and $ENV{PERL5LIB} =~ /^(.*)$/ ) { > > # Blindly untaint. Taintchecking is to protect > # from Web data; > # the environment is under our control. > eval "use lib '$_';" foreach ( > reverse > split( /:/, $1 ) > ); > } > } > > > use Bio::Tools::GuessSeqFormat; > > print header, > start_html('file upload'), > h1('file upload!'); > print_form() unless param; > print_results() if param; > print end_html; > > sub print_form { > print start_multipart_form(), > filefield(-name=>'upload',-size=>60),br, > submit(-label=>'Upload File'), > end_form; > } > > sub print_results { > my $length; > my $file = param('upload'); > my $fh_upload = upload('upload'); > > my $guesser_upload = new Bio::Tools::GuessSeqFormat( -fh => $fh_upload > ); my $format_upload = $guesser_upload->guess; > > if ( !$file ) { > print "No file uploaded."; > return; > } > print h2('File name'), $file; > print h2('Format'), $format_upload; > print h2('The content is'),br; > > while (<$fh_upload>) { > > # The very first line of the file is not get printed here > # Why? > > print; > print br; > $length += length($_); > } > print h2('File length'), $length; > } > > > __END__ > > Hope to hear from you again. > > Regards, > Edward WIJAYA > SINGAPORE > > > ------------ Institute For Infocomm Research - Disclaimer ------------- > This email is confidential and may be privileged. If you are not the > intended recipient, please delete it and notify us immediately. Please do > not copy or use it for any purpose, or disclose its contents to any other > person. Thank you. -------------------------------------------------------- -- Lincoln Stein lstein at cshl.edu Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060605/0d6f7bb0/attachment.bin From cjfields at uiuc.edu Mon Jun 5 12:30:41 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Jun 2006 11:30:41 -0500 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44843BEF.6080609@mrc-dunn.cam.ac.uk> Message-ID: <006001c688bd$62d48850$15327e82@pyrimidine> If you want flexibility or added functionality then you can always contribute a patch, such as adding an option for filehandles, IO::String, pipes/forks, or whatever you wish. Or you could suggest such to the module maintainer, Torsten, and then it's his choice whether he wants to make it a priority to implement it. Simply stating this is 'one of things I don't like about the bioperl tool system' isn't productive here. It hasn't been a top priority to implement something along those lines since the module works for them as is, so if you want these options you'll have to add them, and add the appropriate tests. As for the seek issue, the file handle you get by using '$blast_report-fh()' isn't the raw input file stream but is a tied filehandle of a stream of ResultI objects: ================================== Jason's version: # seek called on the >>internal<< filehandle (from Bio::Root::IO) # this is the raw data input stream from a file, so should work seek($searchio->_fh, 0); ================================== Your version: # seek called on SearchIO object filehandle my $blast_report = $factory->blastall($ref_seq_objs); # this is a tied filehandle for an output stream of objects from SearchIO, # NOT the raw input stream my $blast_fh = $blast_report->fh(); while (<$blast_fh>) { # a stream of Bio::Search::Result::BlastResult objects } # can't use seek on a tied filehandle, won't work unless # SEEK class method is implemented (and it's not) seek($blast_fh, 0, 0); ================================== There's a good deal in Programming Perl about tied filehandles. You'll notice that Bio::SearchIO implements TIEHANDLE, READLINE, DESTROY, and PRINT methods, but not SEEK since we've never needed it. You can always add one if you want but I really don't see the point based on reasons Jason and I outlined before. Seems there is not much overall documentation on newFh or $blast_report->fh, but I believe it's analogous to the SeqIO version which is covered a bit in the bptutorial file, now on the wiki: http://www.bioperl.org/wiki/Bptutorial.pl#III.2.1_Transforming_sequence_file s_.28SeqIO.29 $in = Bio::SeqIO->newFh(-file => "inputfilename" , -format => 'fasta'); $out = Bio::SeqIO->newFh(-format => 'embl'); print $out $_ while <$in>; Wouldn't hurt if someone wants to add a bit more about these to the SearchIO HOWTO. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, June 05, 2006 9:13 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] results problem with StandAloneBlast > > Jason Stajich wrote: > > It depends on how you have run StandAloneBlast -- if the stream you are > > dealing with is not a file, but a datastream as in the STDOUT from > > BLAST, then the seek won't work (as it wouldn't work for a zcat on > > gzipped file). I think the default StandAloneBlast behavior is to > > operate on a STDOUT stream so seeking won't work no matter what. > > As far as I can see, when you say blastall() on a StandAloneBlast, it > eventually does: > > if ($self->_READMETHOD =~ /^(Blast|SearchIO)/i ) { > $blast_obj = Bio::SearchIO->new(-file=>$outfile, > -format => 'blast' ); > } > > So seeking should work? Tools like StandAloneBlast creating temp files > for their results prior to parsing is actually one of things I don't > like about the bioperl tool system. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Mon Jun 5 09:02:02 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 5 Jun 2006 09:02:02 -0400 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44840020.4020604@mrc-dunn.cam.ac.uk> References: <44775ED9.4020208@cornell.edu> <447BFB20.40501@mrc-dunn.cam.ac.uk> <447C7985.9000404@cornell.edu> <84DD0184-1472-4335-A0A8-E78BBA22E03D@duke.edu> <8B4A4CB9-767D-4560-BE87-CA37E1512A3B@uiuc.edu> <8B61C39F-6B6C-405F-BF9B-4B258E73419E@uiuc.edu> <44840020.4020604@mrc-dunn.cam.ac.uk> Message-ID: On Jun 5, 2006, at 5:57 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> On Jun 4, 2006, at 9:08 AM, Jason Stajich wrote: >> >>> If you want to rewind the parser then (assuming you are using a >>> filestream and not a data stream from the web or zcat or something) >>> just reset the filehandle >>> seek($searchio->_fh, 0); >>> >>> but then you'll have to re-parse everything and pay that cost >>> twice - >>> it makes more sense to me to just save the results and put them in >>> list if you are going to deliberately make two passes over all the >>> results. You either pay the cost of memory (keeping all the >>> objects) or time (reparse the results). >> >> I agree there isn't any really good reason to rewind the parser; I >> was >> mainly just curious how this was accomlished. > > Didn't you already explain why seeking a SearchIO wouldn't work? And > indeed, didn't Genevieve already try to do this after I suggested > it and > found that it didn't work? > > Confused... > There is an internal _rewind if you are using the next_XX methods that resets the internal iterator (all the data has already been parsed). You >>can<< reseek the internal filehandle (accessible by calling $object->_fh ), but you can't call seek on the searchio object itsself. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From sb at mrc-dunn.cam.ac.uk Mon Jun 5 13:23:36 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 05 Jun 2006 18:23:36 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <006001c688bd$62d48850$15327e82@pyrimidine> References: <006001c688bd$62d48850$15327e82@pyrimidine> Message-ID: <44846898.8020001@mrc-dunn.cam.ac.uk> Chris Fields wrote: > If you want flexibility or added functionality then you can always > contribute a patch, such as adding an option for filehandles, IO::String, > pipes/forks, or whatever you wish. Well, it wouldn't be a new feature per se, but just changing the way the modules work under the hood. > Or you could suggest such to the module > maintainer, Torsten, and then it's his choice whether he wants to make it a > priority to implement it. Simply stating this is 'one of things I don't > like about the bioperl tool system' isn't productive here. Yes, I apologise for that. I had thought too much would need to be changed and backward compatibility wouldn't be possible, but just changing StandAloneBlast should be possible. I use IPC::Open3 for blasts and have never run into problems, but it pretty much falls into the 'apt to cause deadlock' camp. It may pass tests on one machine but fail on others... is there any point in working up a patch (would something of questionable reliability ever be committed into bioperl)? > As for the seek issue, the file handle you get by using '$blast_report-fh()' > isn't the raw input file stream but is a tied filehandle of a stream of > ResultI objects: > ================================== > Jason's version: > # seek called on the >>internal<< filehandle (from Bio::Root::IO) > # this is the raw data input stream from a file, so should work > seek($searchio->_fh, 0); > ================================== > Your version: > # seek called on SearchIO object filehandle > my $blast_report = $factory->blastall($ref_seq_objs); > # this is a tied filehandle for an output stream of objects from SearchIO, > # NOT the raw input stream > my $blast_fh = $blast_report->fh(); For academic interest, how do I get the 'raw input stream'? Wasn't that what my second version did? > my $fh = $blast_report->_fh; > seek($fh, 0, 0); From hubert.prielinger at gmx.at Mon Jun 5 14:17:53 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 05 Jun 2006 12:17:53 -0600 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu> References: <000001c68691$8c4eeb40$15327e82@pyrimidine> <4480C78C.1000701@gmx.at> <4480DC8B.7070005@gmx.at> <573BA2B0-92CB-433D-AD8E-F3FFD6184CCB@uiuc.edu> <4480E7AA.3020603@gmx.at> <720C7194-B18C-4502-B4D7-93E41E0E33B5@uiuc.edu> Message-ID: <44847551.7040705@gmx.at> hi, you were right, removing the composition-based statistics solved the problem. Now I get the result viewed on STDIN, but it doesn't save the output in the file. I haved tried it by reopening the file and writing it to an other file again, but it doesn't work..... The strange thing is that if I retrieve text instead of xml output it works without any problem. Don't know why Hubert Chris Fields wrote: > On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote: > > >> hi chris, >> thanks but I never intended to run the remoteblast with so much, >> only a few of them, acutally I goal is to run the phiblast with >> regular expression, so that i just don't need that >> file anymore >> > > Not a problem. Just to let you know, I did manage to get the script > working, so I'm marking the bug INVALID. I think the problem isn't > that there is an infinite loop so much as setting composition-based > statistics causes the search to take much much longer; try removing > that line to see what I mean. > > Just so you know, using $result->query_name doesn't get you what you > would expect (it gives you a part of the RID, which you don't want; > this is something in the XML output that is beyond our control). You > might want to change it to something else or you'll get filenames > with numerical names. > > >> another question for parsing the xml output....is there a xml >> parser available for blast xml output or how to start..... >> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, >> but I'm not sure how to start....sorry, I guess I'm too stupid.... >> is their maybe another introduction or an example. >> > > Bio::SearchIO objects are used to parse BLAST XML output if you have > it saved to a file. For instance: > > my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml'); > > while (my $result = $factory->next_result) { > while (my $hit = $result->next_hit) { > while (my $hsp = $hit->next_hsp { > #do stuff here > } > } > } > > The only thing that changes in parsing a text BLAST report from an > XML BLAST report is the -format line (similar to the -readmethod > parameter in RemoteBlast). You shouldn't need to look up any more > documentation other than these on the wiki: > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > http://www.bioperl.org/wiki/Module:Bio::SearchIO > > http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml > > Pay attention to the fact you'll need to install XML::SAX (CPAN) and > that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding > up parsing. > > Chris > > >> thanks >> Hubert >> >> >> Chris Fields wrote: >> >>> Yes, I see the same error you do. But I have a similar script >>> (blastp, XML blast report, XML parsing, similar loop structure) >>> that works fine. I'm trying to dissect the problem but I think >>> it may be something logically wrong here (something not so >>> obvious) and not a bug... >>> >>> What I'm trying to say is, when you send sequences using >>> remoteblast like, this you are essentially spamming the NCBI >>> BLAST server with ~1600 requests. This script wasn't set up with >>> that intent in mind; you should really try to set up your own >>> local blast database if possible. If you can't, try running this >>> script in off-hours (10pm-6am EST or something like that). >>> >>> >>> Chris >>> >>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote: >>> >>> >>> >>>> hi, >>>> input database: swissprot >>>> matrix: pam30 >>>> count: 1 >>>> gapcosts: 9 1 >>>> >>>> I know that there are a lot of sequences, but that doesn't >>>> matter, you can delete all of them except one, the amount of the >>>> sequences is not the problem, the script reads one line and >>>> submits it.....then the second line and so on.....I have tried >>>> it with only one sequence either and I got the same result.... >>>> the script run at that time for more than 20 >>>> minutes!!!!!! .....and that should be enough time to retrieve >>>> the results for ONE sequence, I guess >>>> >>>> regards >>>> Hubert >>>> >>>> >>>> >>>> Chris Fields wrote: >>>> >>>> >>>>> You need to add the input conditions as well (you have several >>>>> lines which may play a role; I would like to know what >>>>> you normally enter for those). >>>>> >>>>> How long did you let the script run? I ran a quick check on >>>>> your sequences; you have almost 1600, so you have to expect >>>>> that you'll run into some problems here! Most here (including >>>>> me) would suggest you try installing a local blast setup for >>>>> something like this. >>>>> >>>>> Chris >>>>> >>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: >>>>> >>>>> >>>>> >>>>>> hi, >>>>>> I have submitted the bug -> Bug 2017 >>>>>> with the script and input file, just start it from command line >>>>>> >>>>>> thank you very much >>>>>> greetings >>>>>> >>>>>> Hubert >>>>>> >>>>>> Chris Fields wrote: >>>>>> >>>>>> >>>>>>> Hubert, >>>>>>> >>>>>>> I have a script that's using blastxml and XML output which >>>>>>> seems to work. >>>>>>> I'll try looking at it to get a better idea this weekend. >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> -----Original Message----- >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >>>>>>>> Sent: Friday, June 02, 2006 4:12 PM >>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; >>>>>>>> 'Sendu Bala' >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>>> >>>>>>>> hi, >>>>>>>> sorry, but I have updated the remoteblast module and I have >>>>>>>> run several >>>>>>>> attempts with the same results as before. It didn't work. >>>>>>>> I didn't get any results. >>>>>>>> >>>>>>>> regards >>>>>>>> Hubert >>>>>>>> >>>>>>>> >>>>>>>> Chris Fields wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Sendu, Hubert, >>>>>>>>> >>>>>>>>> >>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix >>>>>>>>> the problem >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> (break >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> out of that infinite loop). I applied Sendu's patch to >>>>>>>>> RemoteBlast in >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> CVS; >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> it passed all tests in RemoteBlast.t. Try updating from >>>>>>>>> CVS to see if >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> it >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> works. >>>>>>>>> >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM >>>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>>>>> >>>>>>>>>> Hubert Prielinger wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> hi, >>>>>>>>>>> I have the following program and it worked quite well, >>>>>>>>>>> for retrieving >>>>>>>>>>> remoteblast results in a textfile, >>>>>>>>>>> now I have altered it to to xml, and it didn't work >>>>>>>>>>> anymore..... >>>>>>>>>>> it takes all the parameter at the commandline, submits >>>>>>>>>>> the query, but >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> I >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>>>> don't retrieve any results file anymore..... >>>>>>>>>>> >>>>>>>>>>> it seems that it hangs in a endless loop...... >>>>>>>>>>> the only output I get is: $rc is not a ref! over and >>>>>>>>>>> over..... it >>>>>>>>>>> doesn't enter the else term anymore.... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> There is no problem with your code. The problem is with >>>>>>>>>> the NCBI server >>>>>>>>>> and should be reported to them. You can visit the site and >>>>>>>>>> do a blast, >>>>>>>>>> requesting xml format, and you will typically get one >>>>>>>>>> normal 'waiting' >>>>>>>>>> message and the promise that it will be updated in x >>>>>>>>>> seconds, but >>>>>>>>>> subsequent attempts to get progress information result in >>>>>>>>>> an xml error >>>>>>>>>> page because the NCBI server doesn't actually send any data. >>>>>>>>>> >>>>>>>>>> Unfortunately the way that the bioperl code is written, it >>>>>>>>>> treats no >>>>>>>>>> data as 'waiting' instead of an error. I've offered a >>>>>>>>>> patch to fix this >>>>>>>>>> at this bug page: >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>> >>>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher >>>>> Lab of Dr. Robert Switzer >>>>> Dept of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From cjfields at uiuc.edu Mon Jun 5 14:32:47 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Jun 2006 13:32:47 -0500 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <44847551.7040705@gmx.at> Message-ID: <006101c688ce$7185c330$15327e82@pyrimidine> Hubert, Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS. The option to save XML was committed relatively recently (last month or so). Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > Sent: Monday, June 05, 2006 1:18 PM > To: Chris Fields; bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] remoteblast xml problem > > hi, > you were right, removing the composition-based statistics solved the > problem. Now I get the result viewed on STDIN, but it doesn't save the > output in the file. > I haved tried it by reopening the file and writing it to an other file > again, but it doesn't work..... > The strange thing is that if I retrieve text instead of xml output it > works without any problem. Don't know why > > Hubert > > > > Chris Fields wrote: > > On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote: > > > > > >> hi chris, > >> thanks but I never intended to run the remoteblast with so much, > >> only a few of them, acutally I goal is to run the phiblast with > >> regular expression, so that i just don't need that > >> file anymore > >> > > > > Not a problem. Just to let you know, I did manage to get the script > > working, so I'm marking the bug INVALID. I think the problem isn't > > that there is an infinite loop so much as setting composition-based > > statistics causes the search to take much much longer; try removing > > that line to see what I mean. > > > > Just so you know, using $result->query_name doesn't get you what you > > would expect (it gives you a part of the RID, which you don't want; > > this is something in the XML output that is beyond our control). You > > might want to change it to something else or you'll get filenames > > with numerical names. > > > > > >> another question for parsing the xml output....is there a xml > >> parser available for blast xml output or how to start..... > >> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, > >> but I'm not sure how to start....sorry, I guess I'm too stupid.... > >> is their maybe another introduction or an example. > >> > > > > Bio::SearchIO objects are used to parse BLAST XML output if you have > > it saved to a file. For instance: > > > > my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml'); > > > > while (my $result = $factory->next_result) { > > while (my $hit = $result->next_hit) { > > while (my $hsp = $hit->next_hsp { > > #do stuff here > > } > > } > > } > > > > The only thing that changes in parsing a text BLAST report from an > > XML BLAST report is the -format line (similar to the -readmethod > > parameter in RemoteBlast). You shouldn't need to look up any more > > documentation other than these on the wiki: > > > > http://www.bioperl.org/wiki/HOWTO:SearchIO > > > > http://www.bioperl.org/wiki/Module:Bio::SearchIO > > > > http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml > > > > Pay attention to the fact you'll need to install XML::SAX (CPAN) and > > that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding > > up parsing. > > > > Chris > > > > > >> thanks > >> Hubert > >> > >> > >> Chris Fields wrote: > >> > >>> Yes, I see the same error you do. But I have a similar script > >>> (blastp, XML blast report, XML parsing, similar loop structure) > >>> that works fine. I'm trying to dissect the problem but I think > >>> it may be something logically wrong here (something not so > >>> obvious) and not a bug... > >>> > >>> What I'm trying to say is, when you send sequences using > >>> remoteblast like, this you are essentially spamming the NCBI > >>> BLAST server with ~1600 requests. This script wasn't set up with > >>> that intent in mind; you should really try to set up your own > >>> local blast database if possible. If you can't, try running this > >>> script in off-hours (10pm-6am EST or something like that). > >>> > >>> > >>> Chris > >>> > >>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote: > >>> > >>> > >>> > >>>> hi, > >>>> input database: swissprot > >>>> matrix: pam30 > >>>> count: 1 > >>>> gapcosts: 9 1 > >>>> > >>>> I know that there are a lot of sequences, but that doesn't > >>>> matter, you can delete all of them except one, the amount of the > >>>> sequences is not the problem, the script reads one line and > >>>> submits it.....then the second line and so on.....I have tried > >>>> it with only one sequence either and I got the same result.... > >>>> the script run at that time for more than 20 > >>>> minutes!!!!!! .....and that should be enough time to retrieve > >>>> the results for ONE sequence, I guess > >>>> > >>>> regards > >>>> Hubert > >>>> > >>>> > >>>> > >>>> Chris Fields wrote: > >>>> > >>>> > >>>>> You need to add the input conditions as well (you have several > >>>>> lines which may play a role; I would like to know what > >>>>> you normally enter for those). > >>>>> > >>>>> How long did you let the script run? I ran a quick check on > >>>>> your sequences; you have almost 1600, so you have to expect > >>>>> that you'll run into some problems here! Most here (including > >>>>> me) would suggest you try installing a local blast setup for > >>>>> something like this. > >>>>> > >>>>> Chris > >>>>> > >>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: > >>>>> > >>>>> > >>>>> > >>>>>> hi, > >>>>>> I have submitted the bug -> Bug 2017 > >>>>>> with the script and input file, just start it from command line > >>>>>> > >>>>>> thank you very much > >>>>>> greetings > >>>>>> > >>>>>> Hubert > >>>>>> > >>>>>> Chris Fields wrote: > >>>>>> > >>>>>> > >>>>>>> Hubert, > >>>>>>> > >>>>>>> I have a script that's using blastxml and XML output which > >>>>>>> seems to work. > >>>>>>> I'll try looking at it to get a better idea this weekend. > >>>>>>> > >>>>>>> Chris > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> -----Original Message----- > >>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger > >>>>>>>> Sent: Friday, June 02, 2006 4:12 PM > >>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; > >>>>>>>> 'Sendu Bala' > >>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem > >>>>>>>> > >>>>>>>> hi, > >>>>>>>> sorry, but I have updated the remoteblast module and I have > >>>>>>>> run several > >>>>>>>> attempts with the same results as before. It didn't work. > >>>>>>>> I didn't get any results. > >>>>>>>> > >>>>>>>> regards > >>>>>>>> Hubert > >>>>>>>> > >>>>>>>> > >>>>>>>> Chris Fields wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Sendu, Hubert, > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix > >>>>>>>>> the problem > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> (break > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> out of that infinite loop). I applied Sendu's patch to > >>>>>>>>> RemoteBlast in > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> CVS; > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> it passed all tests in RemoteBlast.t. Try updating from > >>>>>>>>> CVS to see if > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> it > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> works. > >>>>>>>>> > >>>>>>>>> Chris > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> -----Original Message----- > >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala > >>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM > >>>>>>>>>> To: bioperl-l at lists.open-bio.org > >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem > >>>>>>>>>> > >>>>>>>>>> Hubert Prielinger wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> hi, > >>>>>>>>>>> I have the following program and it worked quite well, > >>>>>>>>>>> for retrieving > >>>>>>>>>>> remoteblast results in a textfile, > >>>>>>>>>>> now I have altered it to to xml, and it didn't work > >>>>>>>>>>> anymore..... > >>>>>>>>>>> it takes all the parameter at the commandline, submits > >>>>>>>>>>> the query, but > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>> I > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>>>> don't retrieve any results file anymore..... > >>>>>>>>>>> > >>>>>>>>>>> it seems that it hangs in a endless loop...... > >>>>>>>>>>> the only output I get is: $rc is not a ref! over and > >>>>>>>>>>> over..... it > >>>>>>>>>>> doesn't enter the else term anymore.... > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> There is no problem with your code. The problem is with > >>>>>>>>>> the NCBI server > >>>>>>>>>> and should be reported to them. You can visit the site and > >>>>>>>>>> do a blast, > >>>>>>>>>> requesting xml format, and you will typically get one > >>>>>>>>>> normal 'waiting' > >>>>>>>>>> message and the promise that it will be updated in x > >>>>>>>>>> seconds, but > >>>>>>>>>> subsequent attempts to get progress information result in > >>>>>>>>>> an xml error > >>>>>>>>>> page because the NCBI server doesn't actually send any data. > >>>>>>>>>> > >>>>>>>>>> Unfortunately the way that the bioperl code is written, it > >>>>>>>>>> treats no > >>>>>>>>>> data as 'waiting' instead of an error. I've offered a > >>>>>>>>>> patch to fix this > >>>>>>>>>> at this bug page: > >>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 > >>>>>>>>>> _______________________________________________ > >>>>>>>>>> Bioperl-l mailing list > >>>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>> _______________________________________________ > >>>>>>>>> Bioperl-l mailing list > >>>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Bioperl-l mailing list > >>>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Bioperl-l mailing list > >>>>>> Bioperl-l at lists.open-bio.org > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>>>> > >>>>>> > >>>>> Christopher Fields > >>>>> Postdoctoral Researcher > >>>>> Lab of Dr. Robert Switzer > >>>>> Dept of Biochemistry > >>>>> University of Illinois Urbana-Champaign > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>> Christopher Fields > >>> Postdoctoral Researcher > >>> Lab of Dr. Robert Switzer > >>> Dept of Biochemistry > >>> University of Illinois Urbana-Champaign > >>> > >>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>> > >>> > >>> > >>> > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Jun 5 14:56:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Jun 2006 13:56:18 -0500 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <44846898.8020001@mrc-dunn.cam.ac.uk> Message-ID: <006201c688d1$bad2aff0$15327e82@pyrimidine> > Chris Fields wrote: > > If you want flexibility or added functionality then you can always > > contribute a patch, such as adding an option for filehandles, > IO::String, > > pipes/forks, or whatever you wish. > > Well, it wouldn't be a new feature per se, but just changing the way the > modules work under the hood. ... > I use IPC::Open3 for blasts and have never run into problems, but it > pretty much falls into the 'apt to cause deadlock' camp. It may pass > tests on one machine but fail on others... is there any point in working > up a patch (would something of questionable reliability ever be > committed into bioperl)? The main thing you should avoid is major API changes or issues which break this module on other OS's. I'm not sure that StandAloneBlast is 'broken' by using a tempfile as the location of the BLAST report. Any way you go about it, you'll have to capture the BLAST output as a stream and get it to persist in a SearchIO object somehow. It's can be a pretty decent memory hit to keep that report hanging around, esp. if it is larger. ... > For academic interest, how do I get the 'raw input stream'? Wasn't that > what my second version did? > > > my $fh = $blast_report->_fh; > > seek($fh, 0, 0); That should work, yes. Didn't see that one your previous response. I can get it work w/o problems with SearchIO directly but I haven't tried it with StandAloneBlast. Below is my script. Commenting the seek line below doesn't move the file pointer so the second round of parsing won't happen. my $parser = Bio::SearchIO->new( -file => shift, -format => 'blast'); my $fh = $parser->_fh; while (<$fh>) { print; } seek($fh, 0,0); $fh = $parser->fh; print "Second round:\n"; while (<$fh>) { while (my $hit = $_->next_hit) { print $hit->accession,"\n"; } } Chris From hubert.prielinger at gmx.at Mon Jun 5 15:12:37 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 05 Jun 2006 13:12:37 -0600 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <006101c688ce$7185c330$15327e82@pyrimidine> References: <006101c688ce$7185c330$15327e82@pyrimidine> Message-ID: <44848225.8080003@gmx.at> hi chris, sorry, I have tried it with the latest CVS version: # $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $ but it still doesn't work. Hubert Chris Fields wrote: > Hubert, > > Make sure you have the latest Bio::Tools::Run::RemoteBlast from CVS. The > option to save XML was committed relatively recently (last month or so). > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >> Sent: Monday, June 05, 2006 1:18 PM >> To: Chris Fields; bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] remoteblast xml problem >> >> hi, >> you were right, removing the composition-based statistics solved the >> problem. Now I get the result viewed on STDIN, but it doesn't save the >> output in the file. >> I haved tried it by reopening the file and writing it to an other file >> again, but it doesn't work..... >> The strange thing is that if I retrieve text instead of xml output it >> works without any problem. Don't know why >> >> Hubert >> >> >> >> Chris Fields wrote: >> >>> On Jun 2, 2006, at 8:36 PM, Hubert Prielinger wrote: >>> >>> >>> >>>> hi chris, >>>> thanks but I never intended to run the remoteblast with so much, >>>> only a few of them, acutally I goal is to run the phiblast with >>>> regular expression, so that i just don't need that >>>> file anymore >>>> >>>> >>> Not a problem. Just to let you know, I did manage to get the script >>> working, so I'm marking the bug INVALID. I think the problem isn't >>> that there is an infinite loop so much as setting composition-based >>> statistics causes the search to take much much longer; try removing >>> that line to see what I mean. >>> >>> Just so you know, using $result->query_name doesn't get you what you >>> would expect (it gives you a part of the RID, which you don't want; >>> this is something in the XML output that is beyond our control). You >>> might want to change it to something else or you'll get filenames >>> with numerical names. >>> >>> >>> >>>> another question for parsing the xml output....is there a xml >>>> parser available for blast xml output or how to start..... >>>> I have looked up at the wikiperl and cpan Bio::SearchIO::blastxml, >>>> but I'm not sure how to start....sorry, I guess I'm too stupid.... >>>> is their maybe another introduction or an example. >>>> >>>> >>> Bio::SearchIO objects are used to parse BLAST XML output if you have >>> it saved to a file. For instance: >>> >>> my $factory = Bio::SearchIO->new(-file => $file, -format => 'blastxml'); >>> >>> while (my $result = $factory->next_result) { >>> while (my $hit = $result->next_hit) { >>> while (my $hsp = $hit->next_hsp { >>> #do stuff here >>> } >>> } >>> } >>> >>> The only thing that changes in parsing a text BLAST report from an >>> XML BLAST report is the -format line (similar to the -readmethod >>> parameter in RemoteBlast). You shouldn't need to look up any more >>> documentation other than these on the wiki: >>> >>> http://www.bioperl.org/wiki/HOWTO:SearchIO >>> >>> http://www.bioperl.org/wiki/Module:Bio::SearchIO >>> >>> http://www.bioperl.org/wiki/Module:Bio::SearchIO::blastxml >>> >>> Pay attention to the fact you'll need to install XML::SAX (CPAN) and >>> that XML::SAX::ExpatXS (and Expat) is highly recommended for speeding >>> up parsing. >>> >>> Chris >>> >>> >>> >>>> thanks >>>> Hubert >>>> >>>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Yes, I see the same error you do. But I have a similar script >>>>> (blastp, XML blast report, XML parsing, similar loop structure) >>>>> that works fine. I'm trying to dissect the problem but I think >>>>> it may be something logically wrong here (something not so >>>>> obvious) and not a bug... >>>>> >>>>> What I'm trying to say is, when you send sequences using >>>>> remoteblast like, this you are essentially spamming the NCBI >>>>> BLAST server with ~1600 requests. This script wasn't set up with >>>>> that intent in mind; you should really try to set up your own >>>>> local blast database if possible. If you can't, try running this >>>>> script in off-hours (10pm-6am EST or something like that). >>>>> >>>>> >>>>> Chris >>>>> >>>>> On Jun 2, 2006, at 7:49 PM, Hubert Prielinger wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> hi, >>>>>> input database: swissprot >>>>>> matrix: pam30 >>>>>> count: 1 >>>>>> gapcosts: 9 1 >>>>>> >>>>>> I know that there are a lot of sequences, but that doesn't >>>>>> matter, you can delete all of them except one, the amount of the >>>>>> sequences is not the problem, the script reads one line and >>>>>> submits it.....then the second line and so on.....I have tried >>>>>> it with only one sequence either and I got the same result.... >>>>>> the script run at that time for more than 20 >>>>>> minutes!!!!!! .....and that should be enough time to retrieve >>>>>> the results for ONE sequence, I guess >>>>>> >>>>>> regards >>>>>> Hubert >>>>>> >>>>>> >>>>>> >>>>>> Chris Fields wrote: >>>>>> >>>>>> >>>>>> >>>>>>> You need to add the input conditions as well (you have several >>>>>>> lines which may play a role; I would like to know what >>>>>>> you normally enter for those). >>>>>>> >>>>>>> How long did you let the script run? I ran a quick check on >>>>>>> your sequences; you have almost 1600, so you have to expect >>>>>>> that you'll run into some problems here! Most here (including >>>>>>> me) would suggest you try installing a local blast setup for >>>>>>> something like this. >>>>>>> >>>>>>> Chris >>>>>>> >>>>>>> On Jun 2, 2006, at 6:19 PM, Hubert Prielinger wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> hi, >>>>>>>> I have submitted the bug -> Bug 2017 >>>>>>>> with the script and input file, just start it from command line >>>>>>>> >>>>>>>> thank you very much >>>>>>>> greetings >>>>>>>> >>>>>>>> Hubert >>>>>>>> >>>>>>>> Chris Fields wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Hubert, >>>>>>>>> >>>>>>>>> I have a script that's using blastxml and XML output which >>>>>>>>> seems to work. >>>>>>>>> I'll try looking at it to get a better idea this weekend. >>>>>>>>> >>>>>>>>> Chris >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> -----Original Message----- >>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Hubert Prielinger >>>>>>>>>> Sent: Friday, June 02, 2006 4:12 PM >>>>>>>>>> To: Chris Fields; bioperl-l at bioperl.org; Chris Fields; >>>>>>>>>> 'Sendu Bala' >>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>>>>> >>>>>>>>>> hi, >>>>>>>>>> sorry, but I have updated the remoteblast module and I have >>>>>>>>>> run several >>>>>>>>>> attempts with the same results as before. It didn't work. >>>>>>>>>> I didn't get any results. >>>>>>>>>> >>>>>>>>>> regards >>>>>>>>>> Hubert >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Chris Fields wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Sendu, Hubert, >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hubert, your code looks fine so Sendu's patch should fix >>>>>>>>>>> the problem >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> (break >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> out of that infinite loop). I applied Sendu's patch to >>>>>>>>>>> RemoteBlast in >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> CVS; >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> it passed all tests in RemoteBlast.t. Try updating from >>>>>>>>>>> CVS to see if >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> it >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> works. >>>>>>>>>>> >>>>>>>>>>> Chris >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> -----Original Message----- >>>>>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>>>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Sendu Bala >>>>>>>>>>>> Sent: Friday, June 02, 2006 4:04 AM >>>>>>>>>>>> To: bioperl-l at lists.open-bio.org >>>>>>>>>>>> Subject: Re: [Bioperl-l] remoteblast xml problem >>>>>>>>>>>> >>>>>>>>>>>> Hubert Prielinger wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> hi, >>>>>>>>>>>>> I have the following program and it worked quite well, >>>>>>>>>>>>> for retrieving >>>>>>>>>>>>> remoteblast results in a textfile, >>>>>>>>>>>>> now I have altered it to to xml, and it didn't work >>>>>>>>>>>>> anymore..... >>>>>>>>>>>>> it takes all the parameter at the commandline, submits >>>>>>>>>>>>> the query, but >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> I >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>>>> don't retrieve any results file anymore..... >>>>>>>>>>>>> >>>>>>>>>>>>> it seems that it hangs in a endless loop...... >>>>>>>>>>>>> the only output I get is: $rc is not a ref! over and >>>>>>>>>>>>> over..... it >>>>>>>>>>>>> doesn't enter the else term anymore.... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> There is no problem with your code. The problem is with >>>>>>>>>>>> the NCBI server >>>>>>>>>>>> and should be reported to them. You can visit the site and >>>>>>>>>>>> do a blast, >>>>>>>>>>>> requesting xml format, and you will typically get one >>>>>>>>>>>> normal 'waiting' >>>>>>>>>>>> message and the promise that it will be updated in x >>>>>>>>>>>> seconds, but >>>>>>>>>>>> subsequent attempts to get progress information result in >>>>>>>>>>>> an xml error >>>>>>>>>>>> page because the NCBI server doesn't actually send any data. >>>>>>>>>>>> >>>>>>>>>>>> Unfortunately the way that the bioperl code is written, it >>>>>>>>>>>> treats no >>>>>>>>>>>> data as 'waiting' instead of an error. I've offered a >>>>>>>>>>>> patch to fix this >>>>>>>>>>>> at this bug page: >>>>>>>>>>>> http://bugzilla.bioperl.org/show_bug.cgi?id=2015 >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Bioperl-l mailing list >>>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Christopher Fields >>>>>>> Postdoctoral Researcher >>>>>>> Lab of Dr. Robert Switzer >>>>>>> Dept of Biochemistry >>>>>>> University of Illinois Urbana-Champaign >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> Christopher Fields >>>>> Postdoctoral Researcher >>>>> Lab of Dr. Robert Switzer >>>>> Dept of Biochemistry >>>>> University of Illinois Urbana-Champaign >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>>> >>>>> >>>>> >>> Christopher Fields >>> Postdoctoral Researcher >>> Lab of Dr. Robert Switzer >>> Dept of Biochemistry >>> University of Illinois Urbana-Champaign >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >>> >>> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From sb at mrc-dunn.cam.ac.uk Mon Jun 5 15:14:08 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 05 Jun 2006 20:14:08 +0100 Subject: [Bioperl-l] results problem with StandAloneBlast In-Reply-To: <006201c688d1$bad2aff0$15327e82@pyrimidine> References: <006201c688d1$bad2aff0$15327e82@pyrimidine> Message-ID: <44848280.1080703@mrc-dunn.cam.ac.uk> Chris Fields wrote: >> Chris Fields wrote: >>> If you want flexibility or added functionality then you can >>> always contribute a patch, such as adding an option for >>> filehandles, IO::String, pipes/forks, or whatever you wish. >> >> Well, it wouldn't be a new feature per se, but just changing the >> way the modules work under the hood. > > ... > >> I use IPC::Open3 for blasts and have never run into problems, but >> it pretty much falls into the 'apt to cause deadlock' camp. It may >> pass tests on one machine but fail on others... is there any point >> in working up a patch (would something of questionable reliability >> ever be committed into bioperl)? > > The main thing you should avoid is major API changes or issues which > break this module on other OS's. I'm not sure that StandAloneBlast > is 'broken' by using a tempfile as the location of the BLAST report. > > > > Any way you go about it, you'll have to capture the BLAST output as a > stream and get it to persist in a SearchIO object somehow. It's can > be a pretty decent memory hit to keep that report hanging around, > esp. if it is larger. Well at the moment StandAloneBlast runs the blast program and stores its output to a temp file, then gives the temp file name as an arg to SearchIO. I (not using bioperl) would use IPC::Open3 to send the output of the blast program directly to my parser. The question is, why wasn't this done in StandAloneBlast? I would get the blast program output handle and pass it directly to SearchIO with the -fh option of new(). The only difference here is it's faster and more efficient with the direct pipe, but you can't subsequently seek the SearchIO's internal filehandle (as we discussing in this thread). There are no (additional) issues with memory. If it isn't done using IPC::Open3 (or similar) because the original author already knew it wouldn't be reliable enough, or for some other reason(s), fine. Does anyone know the reasons? From cjfields at uiuc.edu Mon Jun 5 15:43:50 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Jun 2006 14:43:50 -0500 Subject: [Bioperl-l] StandAloneBlast In-Reply-To: <44848280.1080703@mrc-dunn.cam.ac.uk> Message-ID: <006301c688d8$5e4ce910$15327e82@pyrimidine> > Well at the moment StandAloneBlast runs the blast program and stores its > output to a temp file, then gives the temp file name as an arg to > SearchIO. I (not using bioperl) would use IPC::Open3 to send the output > of the blast program directly to my parser. The question is, why wasn't > this done in StandAloneBlast? Probably for the reasons you outlined before: 'I use IPC::Open3 for blasts and have never run into problems, but it pretty much falls into the 'apt to cause deadlock' camp. It may pass tests on one machine but fail on others... ' Why would we take a chance on using something that works on one OS/machine and fails to work on another? > I would get the blast program output handle and pass it directly to > SearchIO with the -fh option of new(). > The only difference here is it's faster and more efficient with the > direct pipe, but you can't subsequently seek the SearchIO's internal > filehandle (as we discussing in this thread). There are no (additional) > issues with memory. Like I said before, you can make changes and submit a patch. The code here is over five years old, and many many things have changed since then, so you might find something works now which wasn't available or didn't work then. It hasn't really been a priority (it certainly hasn't been mine). Most people don't care b/c it just works and a vast majority don't worry/care about the internals. The issue at hand is whether any code changes will work on all OS's, not just yours. BioPerl is used the world over on just about every OS, so ANY code changes need to take that into consideration. I can guarantee that if you made changes that break or reduce performance on 50% of the OS's, it'll likely get rolled back. You need the best cross-platform compatibility possible. We've now veered WAY off topic here. If we intend on continuing this, we need to switch the thread topic. Chris > If it isn't done using IPC::Open3 (or similar) because the original > author already knew it wouldn't be reliable enough, or for some other > reason(s), fine. Does anyone know the reasons? From cjfields at uiuc.edu Mon Jun 5 16:30:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Jun 2006 15:30:01 -0500 Subject: [Bioperl-l] ListSummaries for May 10-31. Message-ID: <006401c688de$d38035b0$15327e82@pyrimidine> I have posted the ListSummaries for May 10-31 on the wiki. I haven't finished yet (BioSQL and Bioperl-guts isn't done yet) and there are probably some mangld worsd in there so have mercy on me! It's been a busy month. http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006 Fling your mud and abuses by responding to this thread per usual Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Mon Jun 5 23:42:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 5 Jun 2006 22:42:18 -0500 Subject: [Bioperl-l] remoteblast xml problem In-Reply-To: <44848225.8080003@gmx.at> References: <006101c688ce$7185c330$15327e82@pyrimidine> <44848225.8080003@gmx.at> Message-ID: Hubert, I had no trouble getting this to work; the script scans through each sequence and save the XML output to a file on both Windows and Mac OS X, both using bioperl-live. The older RemoteBlast would only save text; otherwise it saved an empty file. Using your script I get several XML BLAST output files (1.xml, 2.xml, etc) based on a counter, each about 1 MB. All were parseable by SearchIO. I did notice that if certain parameters weren't entered in correctly then you will get no data (such as setting the database to 'swiss' instead of 'swissprot'). A warning pops up stating that no data was returned when this occurs (it doesn't tell you what was wrong, just that no data came back from NCBI). If you see this then that is likely the problem. Besides that, I don't know what else it can be. Chris On Jun 5, 2006, at 2:12 PM, Hubert Prielinger wrote: > hi chris, > sorry, I have tried it with the latest CVS version: > > # $Id: RemoteBlast.pm,v 1.33 2006/06/03 06:26:41 cjfields Exp $ > > but it still doesn't work. > > Hubert > > Chris Fields wrote: >> Hubert, >> >> Make sure you have the latest Bio::Tools::Run::RemoteBlast from >> CVS. The >> option to save XML was committed relatively recently (last month >> or so). >> >> Chris >> ... Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From heikki at sanbi.ac.za Tue Jun 6 03:40:06 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 6 Jun 2006 09:40:06 +0200 Subject: [Bioperl-l] Bio::AlignIO::metafasta tests In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine> References: <000301c68678$a3cdaa40$15327e82@pyrimidine> Message-ID: <200606060940.07285.heikki@sanbi.ac.za> Chris, I am mystified. I'll try to get the massive 'return undef' change done first and the have an other look. -Heikki On Friday 02 June 2006 21:13, Chris Fields wrote: > Heikki, > > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up > when running AlignIO.t (I was fixing bug 2000): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2016 > > Not sure what's going on there but using read_aln and write_aln seem to > work normally. It may have something to do with Bio::SimpleAlign but I'm > not absolutely sure. > > Any ideas what may be going on here? > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Tue Jun 6 04:04:00 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Tue, 6 Jun 2006 10:04:00 +0200 Subject: [Bioperl-l] For CVS developers -potentialpitfallwith"returnundef" In-Reply-To: <200606020952.08034.heikki@sanbi.ac.za> References: <001201c685a3$59d78da0$15327e82@pyrimidine> <200606020952.08034.heikki@sanbi.ac.za> Message-ID: <200606061004.01193.heikki@sanbi.ac.za> OK. I've gone through all cases where return and undef are on the same lines. I've done changes in 185 files. My aims have ben the following: 1. Remove undef from return undef when not necessary. This will make it easier to spot cases where undef matters in the future Most of the changes fall into this category. The context is clearly scalar. 2. Returning undef when user expects en empty list is bad ./Bio/Tools/Est2Genome.pm fixed ./Bio/SearchIO/blast.pm:2330: should return (undef, undef) for clarity? not fixed ./Bio/Matrix/PSM/SiteMatrix.pm fixed ./Bio/Matrix/PSM/Psm fixed ./Bio/DB/Taxonomy::entrez.pm fixed 3. If docs say method returns nothing, explicit undef is not the right thing to return 4. do not return an explicit undef if the method is supposed to return false on failure Before I do the commit, I'd like to see number people to do 'make test' on bioperl-live and report back after the commit they see changes. There are quite a few tests that fail currently. I'll do the commit tomorrow Wednesday at 9 o'cock GMT. -Heikki On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote: > I've started going through the files that have 'return undef' lines. > I'll report back later. > > Initial impression is that there are a few cases where the context > indicates list to be returned but failure returns an explicit undef. I'll > fix those. > > Most of the cases are much more ambiguous. Even when documentation says the > failure returns undef, it is clearly meant to mean false. In most cases > documentation does not comment on return value at all. Luckily the context > is almost always scalar and therefore it does not matter too much. > > I seem to be changing 'return undef' to plain 'return' a bit overzealously, > so do not take it personally. > > -Heikki > > On Thursday 01 June 2006 19:46, Chris Fields wrote: > > .... > > > > > > Again, didn't do that. > > > > > > I'm very sorry that I allowed the ambiguity, but my comments were > > > certainly not directed at your recent changes to Bio::Restriction::IO. > > > In fact, I put in the above * comment to exclude your changes from my > > > discussion; you changed the docs because the code never did what they > > > said they did (the docs were bad). That's fine (good!). My comments > > > were a general point, slightly directed at the idea of changing all the > > > return undef;s - changing the code so that it no longer matches the > > > docs of a previously working method. That's what I think is bad. Though > > > in this particular case it shouldn't make any difference at all. > > > > Agreed. In any case, if tests have been properly set up then they should > > catch problems. This is, of course, if they are properly set up. > > > > Chris > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From sb at mrc-dunn.cam.ac.uk Tue Jun 6 05:17:48 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Tue, 06 Jun 2006 10:17:48 +0100 Subject: [Bioperl-l] Bio::AlignIO::metafasta tests In-Reply-To: <000301c68678$a3cdaa40$15327e82@pyrimidine> References: <000301c68678$a3cdaa40$15327e82@pyrimidine> Message-ID: <4485483C.4080505@mrc-dunn.cam.ac.uk> Chris Fields wrote: > Heikki, > > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped up > when running AlignIO.t (I was fixing bug 2000): > > http://bugzilla.open-bio.org/show_bug.cgi?id=2016 > > Not sure what's going on there but using read_aln and write_aln seem to work > normally. It may have something to do with Bio::SimpleAlign but I'm not > absolutely sure. > > Any ideas what may be going on here? Yes, see my replies on the bug page. But so more people see the question, I'll ask here: can anyone offer examples of metafasta files, especially multiple alignments? From cjfields at uiuc.edu Tue Jun 6 10:30:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Jun 2006 09:30:17 -0500 Subject: [Bioperl-l] Bio::AlignIO::metafasta tests In-Reply-To: <4485483C.4080505@mrc-dunn.cam.ac.uk> Message-ID: <000901c68975$bb9968d0$15327e82@pyrimidine> Sendu, This is Heikki's original submission for the specs for meta format: http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa sta So it's really a specialized FASTA format used to store meta information about sequences. Seems mainly useful for amino acid sequences, but is extended to include properties of nucleotides like DNA content, RNA sec. structure, and so on. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Tuesday, June 06, 2006 4:18 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests > > Chris Fields wrote: > > Heikki, > > > > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped > up > > when running AlignIO.t (I was fixing bug 2000): > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2016 > > > > Not sure what's going on there but using read_aln and write_aln seem to > work > > normally. It may have something to do with Bio::SimpleAlign but I'm not > > absolutely sure. > > > > Any ideas what may be going on here? > > Yes, see my replies on the bug page. But so more people see the > question, I'll ask here: can anyone offer examples of metafasta files, > especially multiple alignments? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Jun 6 10:36:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Jun 2006 09:36:16 -0500 Subject: [Bioperl-l] Bio::AlignIO::metafasta tests In-Reply-To: <200606060940.07285.heikki@sanbi.ac.za> Message-ID: <000a01c68976$9479e300$15327e82@pyrimidine> Heikki, I agree it's all a bit weird. Not too concerning at the moment though since it works at the moment but it might take some tinkering with SimpleAlign to get it to behave. This alignment format has some of the same characteristics as Stockholm alignment format but looks easier to work with. I work with RNA, specifically one with a conserved secondary structure so this format appeals to me quite a bit. If I get time (probably not for a while) I may tinker with Bio::AlignIO::stockholm to get a write_aln() method up-and-running and see if I can convert back-and-forth from the two. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho > Sent: Tuesday, June 06, 2006 2:40 AM > To: bioperl-l at lists.open-bio.org > Cc: Chris Fields > Subject: Re: [Bioperl-l] Bio::AlignIO::metafasta tests > > Chris, > > I am mystified. I'll try to get the massive 'return undef' change done > first > and the have an other look. > > -Heikki > > On Friday 02 June 2006 21:13, Chris Fields wrote: > > Heikki, > > > > I uncovered a weird bug concerning Bio::AlignIO::metafasta which popped > up > > when running AlignIO.t (I was fixing bug 2000): > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2016 > > > > Not sure what's going on there but using read_aln and write_aln seem to > > work normally. It may have something to do with Bio::SimpleAlign but > I'm > > not absolutely sure. > > > > Any ideas what may be going on here? > > > > Christopher Fields > > Postdoctoral Researcher - Switzer Lab > > Dept. of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sb at mrc-dunn.cam.ac.uk Tue Jun 6 11:40:05 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Tue, 06 Jun 2006 16:40:05 +0100 Subject: [Bioperl-l] Bio::AlignIO::metafasta tests In-Reply-To: <000901c68975$bb9968d0$15327e82@pyrimidine> References: <000901c68975$bb9968d0$15327e82@pyrimidine> Message-ID: <4485A1D5.5090805@mrc-dunn.cam.ac.uk> Chris Fields wrote: > Sendu, > > This is Heikki's original submission for the specs for meta format: > > http://article.gmane.org/gmane.comp.lang.perl.bio.general/1370/match=meta+fa > sta > > So it's really a specialized FASTA format used to store meta information > about sequences. Seems mainly useful for amino acid sequences, but is > extended to include properties of nucleotides like DNA content, RNA sec. > structure, and so on. Thanks. It's not really clear to me if the meta data needs to be considered in the context of an alignment. That is, if you have two meta sequences with the same primary sequence, will all their meta data necessarily be the same? Or could they be different? If the same, then the test data and test need to be fixed so my patched version of Bio::AlignIO::metafasta passes the tests. If different, how should the meta data be handled? Like the test implies with its expected value for the consensus (just treat the primary sequence and all meta data as one long string)? Is it really the intent to include characters from the meta data names when considering what symbols we've seen with symbol_chars() method? Do we include the meta data name symbols when numbering? Thoughts anyone? From cjfields at uiuc.edu Tue Jun 6 17:07:39 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Jun 2006 16:07:39 -0500 Subject: [Bioperl-l] ListSummaries for May 10-31. In-Reply-To: <006401c688de$d38035b0$15327e82@pyrimidine> Message-ID: <000601c689ad$3e6aec20$15327e82@pyrimidine> I hate talking to myself... I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l (appropriately enough, on 6-6-06). I am trying out a new script which helps with all the developer list noise; hope everybody likes it. Cheers, Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Chris Fields > Sent: Monday, June 05, 2006 3:30 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] ListSummaries for May 10-31. > > I have posted the ListSummaries for May 10-31 on the wiki. I haven't > finished yet (BioSQL and Bioperl-guts isn't done yet) and there are > probably > some mangld worsd in there so have mercy on me! It's been a busy month. > > http://www.bioperl.org/wiki/ListSummary:May_10-31%2C2006#May_10-31.2C_2006 > > Fling your mud and abuses by responding to this thread per usual > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Jun 6 20:41:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 6 Jun 2006 19:41:08 -0500 Subject: [Bioperl-l] ListSummaries for May 10-31. In-Reply-To: <44861D47.7090205@infotech.monash.edu.au> Message-ID: <000601c689cb$11f568a0$15327e82@pyrimidine> I could do something like that. Right now I have a script that just grabs the text from the web page: http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html and uses regexes and hashes to sort everything and make some sense of the noise. The resolution for a bug isn't on that page but in the linked message so I would need to grab the link from HTML, go to that page, then get the resolution if there is one, so at the moment I just check each one (thanks for the bug hunt Jason!). I usually have to do a little touching up afterwards, such as fix links and such, but the script really saves on time. As you can tell, it's been a busy month! I'm (very slowly) updating the script to go through the mail list threads recursively but haven't really gotten anywhere with that yet. Benchwork has intervened yet again! Chris > -----Original Message----- > From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] > Sent: Tuesday, June 06, 2006 7:27 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] ListSummaries for May 10-31. > > > I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l > > (appropriately enough, on 6-6-06). I am trying out a new script which > helps > > with all the developer list noise; hope everybody likes it. > > I like the CVS summaries. > > For the bug summaries, would it make sense to categorise/sort by > category/status eg. RESOLVED, WORKSFORME etc? > > -- > Dr Torsten Seemann http://www.vicbioinformatics.com > Victorian Bioinformatics Consortium, Monash University, Australia From torsten.seemann at infotech.monash.edu.au Tue Jun 6 20:26:47 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Wed, 07 Jun 2006 10:26:47 +1000 Subject: [Bioperl-l] ListSummaries for May 10-31. In-Reply-To: <000601c689ad$3e6aec20$15327e82@pyrimidine> References: <000601c689ad$3e6aec20$15327e82@pyrimidine> Message-ID: <44861D47.7090205@infotech.monash.edu.au> > I have updated the ListSummaries to include BioSQL-l and Bioperl-guts-l > (appropriately enough, on 6-6-06). I am trying out a new script which helps > with all the developer list noise; hope everybody likes it. I like the CVS summaries. For the bug summaries, would it make sense to categorise/sort by category/status eg. RESOLVED, WORKSFORME etc? -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From jason at bioperl.org Wed Jun 7 00:04:02 2006 From: jason at bioperl.org (Jason Stajich) Date: Wed, 7 Jun 2006 00:04:02 -0400 Subject: [Bioperl-l] ListSummaries for May 10-31. In-Reply-To: <000601c689cb$11f568a0$15327e82@pyrimidine> References: <000601c689cb$11f568a0$15327e82@pyrimidine> Message-ID: <8D9B514C-ADB4-409F-A55F-DC0C3DA9354A@bioperl.org> It is possible some of this can be extracted from the bugzilla as a query (all the changes from X to Y) and generate RSS or text that can be processed. -jason On Jun 6, 2006, at 8:41 PM, Chris Fields wrote: > I could do something like that. Right now I have a script that > just grabs > the text from the web page: > > http://bioperl.org/pipermail/bioperl-guts-l/2006-May/date.html > > and uses regexes and hashes to sort everything and make some sense > of the > noise. The resolution for a bug isn't on that page but in the linked > message so I would need to grab the link from HTML, go to that > page, then > get the resolution if there is one, so at the moment I just check > each one > (thanks for the bug hunt Jason!). I usually have to do a little > touching up > afterwards, such as fix links and such, but the script really saves > on time. > As you can tell, it's been a busy month! > > I'm (very slowly) updating the script to go through the mail list > threads > recursively but haven't really gotten anywhere with that yet. > Benchwork has > intervened yet again! > > Chris > >> -----Original Message----- >> From: Torsten Seemann [mailto:torsten.seemann at infotech.monash.edu.au] >> Sent: Tuesday, June 06, 2006 7:27 PM >> To: Chris Fields >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] ListSummaries for May 10-31. >> >>> I have updated the ListSummaries to include BioSQL-l and Bioperl- >>> guts-l >>> (appropriately enough, on 6-6-06). I am trying out a new script >>> which >> helps >>> with all the developer list noise; hope everybody likes it. >> >> I like the CVS summaries. >> >> For the bug summaries, would it make sense to categorise/sort by >> category/status eg. RESOLVED, WORKSFORME etc? >> >> -- >> Dr Torsten Seemann http://www.vicbioinformatics.com >> Victorian Bioinformatics Consortium, Monash University, Australia > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From heikki at sanbi.ac.za Wed Jun 7 05:57:47 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Wed, 7 Jun 2006 11:57:47 +0200 Subject: [Bioperl-l] For CVS developers -potentialpitfallwith"returnundef" In-Reply-To: <200606061004.01193.heikki@sanbi.ac.za> References: <001201c685a3$59d78da0$15327e82@pyrimidine> <200606020952.08034.heikki@sanbi.ac.za> <200606061004.01193.heikki@sanbi.ac.za> Message-ID: <200606071157.47736.heikki@sanbi.ac.za> Committed. Please report any surprising changes in functionality to the list. -Heikki On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote: > OK. I've gone through all cases where return and undef are on the same > lines. I've done changes in 185 files. > > My aims have ben the following: > > 1. Remove undef from return undef when not necessary. > This will make it easier to spot cases where undef matters in the future > Most of the changes fall into this category. The context is clearly > scalar. > > 2. Returning undef when user expects en empty list is bad > > ./Bio/Tools/Est2Genome.pm fixed > ./Bio/SearchIO/blast.pm:2330: should return (undef, undef) for clarity? > not fixed > ./Bio/Matrix/PSM/SiteMatrix.pm fixed > ./Bio/Matrix/PSM/Psm fixed > ./Bio/DB/Taxonomy::entrez.pm fixed > > 3. If docs say method returns nothing, explicit undef is not the right > thing to return > > 4. do not return an explicit undef if the method is supposed to return > false on failure > > > Before I do the commit, I'd like to see number people to do 'make test' on > bioperl-live and report back after the commit they see changes. There are > quite a few tests that fail currently. > > I'll do the commit tomorrow Wednesday at 9 o'cock GMT. > > -Heikki > > On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote: > > I've started going through the files that have 'return undef' lines. > > I'll report back later. > > > > Initial impression is that there are a few cases where the context > > indicates list to be returned but failure returns an explicit undef. I'll > > fix those. > > > > Most of the cases are much more ambiguous. Even when documentation says > > the failure returns undef, it is clearly meant to mean false. In most > > cases documentation does not comment on return value at all. Luckily the > > context is almost always scalar and therefore it does not matter too > > much. > > > > I seem to be changing 'return undef' to plain 'return' a bit > > overzealously, so do not take it personally. > > > > -Heikki > > > > On Thursday 01 June 2006 19:46, Chris Fields wrote: > > > .... > > > > > > > > Again, didn't do that. > > > > > > > > I'm very sorry that I allowed the ambiguity, but my comments were > > > > certainly not directed at your recent changes to > > > > Bio::Restriction::IO. In fact, I put in the above * comment to > > > > exclude your changes from my discussion; you changed the docs because > > > > the code never did what they said they did (the docs were bad). > > > > That's fine (good!). My comments were a general point, slightly > > > > directed at the idea of changing all the return undef;s - changing > > > > the code so that it no longer matches the docs of a previously > > > > working method. That's what I think is bad. Though in this particular > > > > case it shouldn't make any difference at all. > > > > > > Agreed. In any case, if tests have been properly set up then they > > > should catch problems. This is, of course, if they are properly set > > > up. > > > > > > Chris > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From Michael.Muratet at operon.com Tue Jun 6 14:34:38 2006 From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville) Date: Tue, 6 Jun 2006 13:34:38 -0500 Subject: [Bioperl-l] bioperl-db failing tests Message-ID: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local> Greetings I am trying to install bioperl-db in preparation for installing a biosql database. I'm running on a Dell PowerEdge with quad dual-core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl 1.5.1. I have installed mysql v5.0.21 from source with --with-innodb set for the configuration. I installed bioperl-db from cvs. I have the latest DBI and DBD:mysql installed a few weeks ago from CPAN. The installation has been working well with perl otherwise, for example, the Ensembl core API works OK. SHOW ENGINES indicates that innodb is enabled. I have attached a snippet from the top of the output below. I searched the web and the bioperl-db list and haven't found anything that appears to be relevant. I've done several of these installs and they've pretty much completed without a single glitch. Does anyone have any ideas how to isolate the problem? Thanks Mike [mmuratet at HSV-PROBE bioperl-db]$ make test PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t t/01dbadaptor.....ok 14/19 ------------- EXCEPTION ------------- MSG: failed to open connection: Transactions not supported by database STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255 STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:1477 STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BaseDriver.pm:518 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927 STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key /usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 STACK toplevel t/01dbadaptor.t:62 From hlapp at gmx.net Wed Jun 7 08:52:22 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 7 Jun 2006 08:52:22 -0400 Subject: [Bioperl-l] bioperl-db failing tests In-Reply-To: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local> References: <2000234931344D4BA434A0C235D1956B0C4852@hsv-exmail02.operonads.local> Message-ID: <4F23D2EA-2218-4023-A3F6-3284912952BE@gmx.net> Hi Michael, Bioperl-db will open all connections with AutoCommit => 0 in the DBI parameter hash. The test you're stumbling over is actually there to test that the database does support transactions, but apparently in 5.x versions MySQL no longer silently ignores the AutoCommit parameter if it doesn't support transactions (effectively preempting the test ...). Now you say that innodb shows as enabled - i.e., you can confirm that you changed the Mysql configuration parameter that designates the directory for innodb to store its files? You can confirm that transactions are supported by simple tests on the sql level. Open a mysql shell and do the following: -- BTW 'start transaction;' will (should) work too mysql> set autocommit = 0; mysql> insert into biodatabase (name) values ('__dummy__'); mysql> select name from biodatabase where name = '__dummy__'; mysql> rollback; mysql> select name from biodatabase where name = '__dummy__'; The first SELECT query should return one and the last query should return zero rows if transactions are supported, and there shouldn't be any error. If the above succeeds (which I don't expect it to) then it looks like the DBD::mysql driver thinks the database doesn't support transactions when in reality it does. Let me know the result. -hilmar On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote: > Greetings > > I am trying to install bioperl-db in preparation for installing a > biosql database. I'm running on a Dell PowerEdge with quad dual- > core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl > 1.5.1. I have installed mysql v5.0.21 from source with --with- > innodb set for the configuration. I installed bioperl-db from cvs. > I have the latest DBI and DBD:mysql installed a few weeks ago from > CPAN. The installation has been working well with perl otherwise, > for example, the Ensembl core API works OK. SHOW ENGINES indicates > that innodb is enabled. I have attached a snippet from the top of > the output below. I searched the web and the bioperl-db list and > haven't found anything that appears to be relevant. I've done > several of these installs and they've pretty much completed without > a single glitch. Does anyone have any ideas how to isolate the > problem? > > Thanks > > Mike > > [mmuratet at HSV-PROBE bioperl-db]$ make test > PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > t/01dbadaptor.....ok 14/19 > ------------- EXCEPTION ------------- > MSG: failed to open connection: Transactions not supported by database > STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ > 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255 > STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ > 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ > site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ > BioSQL/BasePersistenceAdaptor.pm:1477 > STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ > perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ > DB/BioSQL/BaseDriver.pm:518 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ > lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ > lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK toplevel t/01dbadaptor.t:62 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From nlhepler at umd.edu Wed Jun 7 09:46:32 2006 From: nlhepler at umd.edu (Nicolaus Hepler) Date: Wed, 7 Jun 2006 09:46:32 -0400 Subject: [Bioperl-l] GenBank Feature: variation Message-ID: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu> Hello, I am having some difficulty here. I have a list of accessions, which are the parameters for a get_Stream_by_acc() function on a Bio::DB::GenBank object. None of the returned GenBank information for any of my accessions seems to contain variation data, no matter how I try to coax it out with unflattener and typemapper. This data is, however, available via the web interface of NCBI Nucleotide, as an optional feature (SNP). I was wondering if there was some option I'm missing in the initialization of the Bio::DB::GenBank object (no options currently) that will coax the database into giving me this data? Or something else that I'm missing altogether. The organism of interest is human, taxon:9606. Nicolaus Lance Hepler nlhepler at mail dot umd dot edu From cjfields at uiuc.edu Wed Jun 7 09:56:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Jun 2006 08:56:16 -0500 Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef" In-Reply-To: <200606071157.47736.heikki@sanbi.ac.za> Message-ID: <000601c68a3a$265552a0$15327e82@pyrimidine> Yikes! I'll download a tarball from anon CVS and run a comparison (vs my pre-updated bioperl-live) on WinXP and Mac OS X 10.4 (Intel) and report back success/fail; may be a bit. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho > Sent: Wednesday, June 07, 2006 4:58 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers- > potentialpitfallwith"returnundef" > > Committed. > > Please report any surprising changes in functionality to the list. > > -Heikki > > On Tuesday 06 June 2006 10:04, Heikki Lehvaslaiho wrote: > > OK. I've gone through all cases where return and undef are on the same > > lines. I've done changes in 185 files. > > > > My aims have ben the following: > > > > 1. Remove undef from return undef when not necessary. > > This will make it easier to spot cases where undef matters in the > future > > Most of the changes fall into this category. The context is clearly > > scalar. > > > > 2. Returning undef when user expects en empty list is bad > > > > ./Bio/Tools/Est2Genome.pm fixed > > ./Bio/SearchIO/blast.pm:2330: should return (undef, undef) for clarity? > > not fixed > > ./Bio/Matrix/PSM/SiteMatrix.pm fixed > > ./Bio/Matrix/PSM/Psm fixed > > ./Bio/DB/Taxonomy::entrez.pm fixed > > > > 3. If docs say method returns nothing, explicit undef is not the right > > thing to return > > > > 4. do not return an explicit undef if the method is supposed to return > > false on failure > > > > > > Before I do the commit, I'd like to see number people to do 'make test' > on > > bioperl-live and report back after the commit they see changes. There > are > > quite a few tests that fail currently. > > > > I'll do the commit tomorrow Wednesday at 9 o'cock GMT. > > > > -Heikki > > > > On Friday 02 June 2006 09:52, Heikki Lehvaslaiho wrote: > > > I've started going through the files that have 'return undef' lines. > > > I'll report back later. > > > > > > Initial impression is that there are a few cases where the context > > > indicates list to be returned but failure returns an explicit undef. > I'll > > > fix those. > > > > > > Most of the cases are much more ambiguous. Even when documentation > says > > > the failure returns undef, it is clearly meant to mean false. In most > > > cases documentation does not comment on return value at all. Luckily > the > > > context is almost always scalar and therefore it does not matter too > > > much. > > > > > > I seem to be changing 'return undef' to plain 'return' a bit > > > overzealously, so do not take it personally. > > > > > > -Heikki > > > > > > On Thursday 01 June 2006 19:46, Chris Fields wrote: > > > > .... > > > > > > > > > > Again, didn't do that. > > > > > > > > > > I'm very sorry that I allowed the ambiguity, but my comments were > > > > > certainly not directed at your recent changes to > > > > > Bio::Restriction::IO. In fact, I put in the above * comment to > > > > > exclude your changes from my discussion; you changed the docs > because > > > > > the code never did what they said they did (the docs were bad). > > > > > That's fine (good!). My comments were a general point, slightly > > > > > directed at the idea of changing all the return undef;s - changing > > > > > the code so that it no longer matches the docs of a previously > > > > > working method. That's what I think is bad. Though in this > particular > > > > > case it shouldn't make any difference at all. > > > > > > > > Agreed. In any case, if tests have been properly set up then they > > > > should catch problems. This is, of course, if they are properly set > > > > up. > > > > > > > > Chris > > > > > > > > > _______________________________________________ > > > > > Bioperl-l mailing list > > > > > Bioperl-l at lists.open-bio.org > > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From osborne1 at optonline.net Wed Jun 7 11:42:32 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Wed, 07 Jun 2006 11:42:32 -0400 Subject: [Bioperl-l] GenBank Feature: variation In-Reply-To: <676D7B39-B042-442B-992D-92BBB20D6B31@umd.edu> Message-ID: Nicolaus, The short answer is no, there's no option that will omit or add a particular feature or annotation to the Sequence object returned by Bio::DB::GenBank. Can you give some example accessions? Brian O. On 6/7/06 9:46 AM, "Nicolaus Hepler" wrote: > Hello, > > I am having some difficulty here. I have a list of accessions, which > are the parameters for a get_Stream_by_acc() function on a > Bio::DB::GenBank object. None of the returned GenBank information > for any of my accessions seems to contain variation data, no matter > how I try to coax it out with unflattener and typemapper. This data > is, however, available via the web interface of NCBI Nucleotide, as > an optional feature (SNP). I was wondering if there was some option > I'm missing in the initialization of the Bio::DB::GenBank object (no > options currently) that will coax the database into giving me this > data? Or something else that I'm missing altogether. The organism > of interest is human, taxon:9606. > > Nicolaus Lance Hepler > nlhepler at mail dot umd dot edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From nlhepler at umd.edu Wed Jun 7 12:26:06 2006 From: nlhepler at umd.edu (Nicolaus Hepler) Date: Wed, 7 Jun 2006 12:26:06 -0400 Subject: [Bioperl-l] GenBank Feature: variation In-Reply-To: References: Message-ID: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu> Brian, A sample accession is BC000007. I figured a way around it though. Rather than automate the whole process, I just downloaded from Batch Entrez a flat .gb file of all my accessions. It's not flexible, and will be inconvenient when we expand the dataset, but it will provide me with data to work with for now. Nicolaus > Nicolaus, > > The short answer is no, there's no option that will omit or add a > particular > feature or annotation to the Sequence object returned by > Bio::DB::GenBank. > Can you give some example accessions? > > Brian O. > > > On 6/7/06 9:46 AM, "Nicolaus Hepler" wrote: > >> Hello, >> >> I am having some difficulty here. I have a list of accessions, which >> are the parameters for a get_Stream_by_acc() function on a >> Bio::DB::GenBank object. None of the returned GenBank information >> for any of my accessions seems to contain variation data, no matter >> how I try to coax it out with unflattener and typemapper. This data >> is, however, available via the web interface of NCBI Nucleotide, as >> an optional feature (SNP). I was wondering if there was some option >> I'm missing in the initialization of the Bio::DB::GenBank object (no >> options currently) that will coax the database into giving me this >> data? Or something else that I'm missing altogether. The organism >> of interest is human, taxon:9606. >> >> Nicolaus Lance Hepler >> nlhepler at mail dot umd dot edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From lstein at cshl.edu Wed Jun 7 12:50:24 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Wed, 7 Jun 2006 12:50:24 -0400 Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ? In-Reply-To: <4483F338.7090909@mrc-dunn.cam.ac.uk> References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg> <4483F338.7090909@mrc-dunn.cam.ac.uk> Message-ID: <200606071250.25026.lstein@cshl.edu> I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, because the CGI upload filehandle is not seekable (for good reasons that I won't inflict on you)! You'll have to write to a temporary file, or else read the whole sequence into memory. Sorry about this. Lincoln On Monday 05 June 2006 05:02, Sendu Bala wrote: > Wijaya Edward wrote: > > Dear Lincoln and experts > > > > Curently I have a CGI application that does this: > > > > 1. read and uploaded file > > 2. check the content of the file whether fasta or not > > 3. print out the content of the file. > > > > > > Now the problem I'm facing is that > > on step three. The content of the file handled is altered > > namely the very first line does not get printed. > > The problem is almost certainly that the guessing is done by reading the > first line of the filehandle, so that your subsequent while loop on that > same filehandle starts at the second line. > Just seek the filehandle back to the start before trying to print the > contents out. > > .. > my $guesser_upload = new Bio::Tools::GuessSeqFormat(-fh => $fh_upload ); > my $format_upload = $guesser_upload->guess; > seek($fh_upload, 0, 0); > .. > while (<$fh_upload>) { > ... > } > > An alternative might be to pass GuessSeqFormat the filename in which > case it would make its own filehandle and close it, leaving your own > filehandle untouched. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From paul.boutros at utoronto.ca Wed Jun 7 13:03:01 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Wed, 7 Jun 2006 13:03:01 -0400 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" Message-ID: <1149699781.448706c5e803d@webmail.utoronto.ca> Hi, Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 and I had a few failures: Failed Test Stat Wstat Total Fail List of Failed ------------------------------------------------------------------------------- t/Annotation.t 89 2 79 88 t/Biblio.t 24 1 2 t/LocusLink.t 23 1 23 t/PhysicalMap.t 14 2 11-12 t/RepeatMasker.t 6 3 1-2 6 t/StandAloneBlast.t 18 4 19-22 t/TaxonTree.t 17 30 11 18-42 t/alignUtilities.t 9 1 9 t/psm.t 255 65280 48 35 29 32-48 t/tutorial.t 21 15 7-21 Not sure if any of these are related to the "return undef" changes, or are known. I also had some warnings running BioGraphics.t t/BioGraphics................Use of uninitialized value in numeric lt (<) at Bio/Graphics/ FeatureFile.pm line 547, line 61. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, line 61. Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, line 61. Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, line 61. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, line 61. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, line 61. Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, line 61. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, line 61. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 548, line 62. Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, line 62. Use of uninitialized value in numeric le (<=) at Bio/Graphics/FeatureFile.pm line 568, line 62. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, line 62. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureFile.pm line 569, line 62. Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, line 62. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, line 62. Use of uninitialized value in numeric lt (<) at Bio/Graphics/FeatureBase.pm line 170, line 62. Use of uninitialized value in numeric gt (>) at Bio/Graphics/FeatureBase.pm line 171, line 62. t/BioGraphics................ok I also ran the tests manually and below I've attached what came out (doesn't always agree with the results of make test, and in a few cases (e.g. tutorial.t or StandAloneBlast.t) there were no errors running the tests manually. Paul Annotation.t ============ not ok 8 # Test 8 got: '' (t/Annotation.t at line 59) # Expected: '0' not ok 71 # Test 71 got: 'dumpster|test case|Ann:00001' (t/Annotation.t at line 187) # Expected: 'dumpster|test case|' not ok 79 # Failed test 79 in t/Annotation.t at line 217 ok 85 Use of uninitialized value in concatenation (.) or string at /db2blast/Paul/perl5.8.7/lib/ site_perl/5.8.7/Bio/Annotation/Annot ationFactory.pm line 236. ------------- EXCEPTION ------------- MSG: Bio::AnnotationI implementation Bio::Annotation:: failed to load: ------------- EXCEPTION ------------- MSG: Failed to load module Bio::Annotation::. Can't locate Bio/Annotation/.pm in @INC (@INC contains: t /db2blast/Paul/perl5.8 .7/lib/5.8.7/aix /db2blast/Paul/perl5.8.7/lib/5.8.7 /db2blast/Paul/perl5.8.7/lib/ site_perl/5.8.7/aix /db2blast/Paul/perl5.8.7/ lib/site_perl/5.8.7 /db2blast/Paul/perl5.8.7/lib/site_perl .) at /db2blast/Paul/perl5.8.7/ lib/site_perl/5.8.7/Bio/Root/Root.pm line 396. STACK Bio::Root::Root::_load_module /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Root/ Root.pm:398 STACK (eval) /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/Bio/Annotation/ AnnotationFactory.pm:149 STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/ site_perl/5.8.7/Bio/Annotation/Annotation Factory.pm:148 STACK toplevel t/Annotation.t:237 -------------------------------------- STACK Bio::Annotation::AnnotationFactory::create_object /db2blast/Paul/perl5.8.7/lib/ site_perl/5.8.7/Bio/Annotation/Annotation Factory.pm:152 STACK toplevel t/Annotation.t:237 -------------------------------------- PhysicalMap.t ============= not ok 11 # Test 11 got: (t/PhysicalMap.t at line 55) # Expected: '0' (code holds and returns a string, definition requires a boolean) not ok 12 # Test 12 got: '3' (t/PhysicalMap.t at line 56) # Expected: '1' (code holds and returns a string, definition requires a boolean) TaxonTree.t =========== ok 10 Use of uninitialized value in string eq at /db2blast/Paul/perl5.8.7/lib/site_perl/5.8.7/ Bio/Taxonomy/Taxon.pm line 559. not ok 11 # Test 11 got: (t/TaxonTree.t at line 35) # Expected: 'species' ok 12 # foo is not a rank, class variable @RANK not initialised ok 13 ok 14 ok 15 ok 16 ok 17 ok 18 Can't use string ("this could be anything") as a HASH ref while "strict refs" in use at / db2blast/Paul/perl5.8.7/lib/site_perl /5.8.7/Bio/Taxonomy/Taxon.pm line 452. alignUtilities.t ================ ok 6 -------------------- WARNING --------------------- MSG: Replacing one sequence [n1/1-36] --------------------------------------------------- -------------------- WARNING --------------------- MSG: Replacing one sequence [n1/1-36] --------------------------------------------------- -------------------- WARNING --------------------- MSG: Replacing one sequence [n1/1-36] --------------------------------------------------- -------------------- WARNING --------------------- MSG: Replacing one sequence [n1/1-36] --------------------------------------------------- -------------------- WARNING --------------------- MSG: Replacing one sequence [n1/1-36] --------------------------------------------------- -------------------- WARNING --------------------- MSG: Replacing one sequence [n1/1-36] --------------------------------------------------- ok 7 ok 8 not ok 9 # Test 9 got: '1' (t/alignUtilities.t at line 53) # Expected: '3' RepeatMasker.t ============== t/RepeatMasker...............FAILED tests 1-2, 6 Failed 3/6 tests, 50.00% okay StandAloneBlast.t ================= t/StandAloneBlast............FAILED tests 19-22 Failed 4/18 tests, 77.78% okay psm.t ===== t/Pseudowise.................ok t/psm........................NOK 29Illegal division by zero at t/psm.t line 147, line 36. t/psm........................dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 29, 32-48 Failed 18/48 tests, 62.50% okay t/QRNA.......................ok tutorial.t ========== t/tutorial...................ok 5/21 The following numeric arguments can be passed to run the corresponding demo-script. 1 => sequence_manipulations 2 => seqstats_and_seqwords 3 => restriction_and_sigcleave 4 => other_seq_utilities 5 => run_perl 6 => searchio_parsing 8 => hmmer_parsing 9 => simplealign 10 => gene_prediction_parsing 11 => access_remote_db 12 => index_local_db 13 => fetch_local_db (NOTE: needs to be run with demo 12) 14 => sequence_annotation 15 => largeseqs 16 => liveseqs 17 => run_struct 18 => demo_variations 19 => demo_xml 20 => run_tree 21 => run_map 22 => run_remoteblast 23 => run_standaloneblast 24 => run_clustalw_tcoffee 25 => run_psw_bl2seq In addition the argument "100" followed by the name of a single bioperl object will display a list of all the public methods available from that object and from what object they are inherited. Using the parameter "0" will run all the tests that do not require external programs (i.e. tests 1 to 22). Using any other argument (or no argument) will run this display. So typical command lines might be: To run all core demo scripts: > perl -w bptutorial.pl 0 or to just run the local indexing demos: > perl -w bptutorial.pl 12 13 or to list all the methods available for object Bio::Tools::SeqStats - > perl -w bptutorial.pl 100 Bio::Tools::SeqStats t/tutorial...................FAILED tests 7-21 Failed 15/21 tests, 28.57% okay > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho > Sent: Wednesday, June 07, 2006 4:58 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] For CVS developers- > potentialpitfallwith"returnundef" > > Committed. > > Please report any surprising changes in functionality to the list. > > -Heikki > From sb at mrc-dunn.cam.ac.uk Wed Jun 7 12:54:31 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 07 Jun 2006 17:54:31 +0100 Subject: [Bioperl-l] Conflicting CGI.pm with Bio::Tools::GuessSeqFormat ? In-Reply-To: <200606071250.25026.lstein@cshl.edu> References: <63f5e7a446b.448458fb@i2r.a-star.edu.sg> <4483F338.7090909@mrc-dunn.cam.ac.uk> <200606071250.25026.lstein@cshl.edu> Message-ID: <448704C7.6080201@mrc-dunn.cam.ac.uk> Lincoln Stein wrote: > I'm afraid this strategy won't work with the filehandle retrieved from CGI.pm, > because the CGI upload filehandle is not seekable (for good reasons that I > won't inflict on you)! You'll have to write to a temporary file, or else read > the whole sequence into memory. Sorry about this. The OP already had success with my alternative solution. >> An alternative might be to pass GuessSeqFormat the filename in which >> case it would make its own filehandle and close it, leaving your own >> filehandle untouched. From hlapp at gmx.net Wed Jun 7 13:25:25 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 7 Jun 2006 13:25:25 -0400 Subject: [Bioperl-l] bioperl-db failing tests In-Reply-To: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local> References: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local> Message-ID: <76434774-51A4-46E7-97AA-1E9227CB7771@gmx.net> Hi Michael, yes it looks like a problem in DBD if DBD::mysql fails to recognize that the mysql instance to which it is connected does support transactions. You can verify this by writing a simple script that tries to open a connection with { AutoCommit => 0 } as the parameter hash: use DBI; my $dbh = DBI->connect("dbi:mysql:database=;host=", "username","password", { AutoCommit => 0, RaiseError => 0 }); die DBI::errstr unless $dbh; $dbh->disconnect; If this succeeds fine then something in Biosql may be related to the problem, but otherwise not. -hilmar On Jun 7, 2006, at 12:01 PM, Michael Muratet US-Huntsville wrote: > Hilmar > > Pardon the top post. > > I tried the test below and it failed. So, I went back and redid the > Innodb configuration (deleted all the index files--they were empty > anyway, reinstalled biosql (which was empty,too) and restarted the > server. Now, the test below works. I went into the DBD-3.0003 and > did a distclean and reinstalled the package, but it fails the one > transaction test, too. So, it looks like the problem is in DBD, yes? > > We had a RAID 5 drive glitch the day before yesterday and rebuilt > it. That's the only thing that's changed that I know of that could > have caused the problem with ibxxx files. > > I have received a reply on the DBD list. Can you think of anything > else I should try from the biosql end? > > Thanks a million. > > Mike > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, June 07, 2006 7:52 AM > To: Michael Muratet US-Huntsville > Cc: Bioperl; BioSQL > Subject: Re: [Bioperl-l] bioperl-db failing tests > > > Hi Michael, > > Bioperl-db will open all connections with AutoCommit => 0 in the DBI > parameter hash. The test you're stumbling over is actually there to > test that the database does support transactions, but apparently in > 5.x versions MySQL no longer silently ignores the AutoCommit > parameter if it doesn't support transactions (effectively preempting > the test ...). > > Now you say that innodb shows as enabled - i.e., you can confirm that > you changed the Mysql configuration parameter that designates the > directory for innodb to store its files? > > You can confirm that transactions are supported by simple tests on > the sql level. Open a mysql shell and do the following: > > -- BTW 'start transaction;' will (should) work too > mysql> set autocommit = 0; > mysql> insert into biodatabase (name) values ('__dummy__'); > mysql> select name from biodatabase where name = '__dummy__'; > mysql> rollback; > mysql> select name from biodatabase where name = '__dummy__'; > > The first SELECT query should return one and the last query should > return zero rows if transactions are supported, and there shouldn't > be any error. > > If the above succeeds (which I don't expect it to) then it looks like > the DBD::mysql driver thinks the database doesn't support > transactions when in reality it does. Let me know the result. > > -hilmar > > On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote: > >> Greetings >> >> I am trying to install bioperl-db in preparation for installing a >> biosql database. I'm running on a Dell PowerEdge with quad dual- >> core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl >> 1.5.1. I have installed mysql v5.0.21 from source with --with- >> innodb set for the configuration. I installed bioperl-db from cvs. >> I have the latest DBI and DBD:mysql installed a few weeks ago from >> CPAN. The installation has been working well with perl otherwise, >> for example, the Ensembl core API works OK. SHOW ENGINES indicates >> that innodb is enabled. I have attached a snippet from the top of >> the output below. I searched the web and the bioperl-db list and >> haven't found anything that appears to be relevant. I've done >> several of these installs and they've pretty much completed without >> a single glitch. Does anyone have any ideas how to isolate the >> problem? >> >> Thanks >> >> Mike >> >> [mmuratet at HSV-PROBE bioperl-db]$ make test >> PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" >> "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t >> t/01dbadaptor.....ok 14/19 >> ------------- EXCEPTION ------------- >> MSG: failed to open connection: Transactions not supported by >> database >> STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ >> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: >> 255 >> STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ >> 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm: >> 215 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ >> site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ >> BioSQL/BasePersistenceAdaptor.pm:1477 >> STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ >> perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ >> DB/BioSQL/BaseDriver.pm:518 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / >> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ >> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927 >> STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / >> usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ >> lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 >> STACK toplevel t/01dbadaptor.t:62 >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Jun 7 14:08:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Jun 2006 13:08:19 -0500 Subject: [Bioperl-l] GenBank Feature: variation In-Reply-To: <94C2D1F8-29B3-4C59-88BC-12EC7D0C7885@umd.edu> Message-ID: <001501c68a5d$5db655a0$15327e82@pyrimidine> Nicolaus, Bio::DB::GenBank use NCBI's efetch mainly; I implemented epost but it's a hack at best and only works in certain circumstances. So you could get the sequence data directly but the links aren't included and are only given through NCBI's elink. There is no way I know of to get this information via bioperl as there isn't an interface to NCBI's elink AFAIK (Brian?). I'm working on a rewrite for a general NCBI eutils interface for each tool (efetch, epost, elink, etc), but it isn't working yet and probably won't be ready to go until the end of summer-beginning of fall. Just so you know how complex the situation is when using accessions, you can't use a sequence accession directly when querying elink (and most eutils), it has to be the GI number; I believe efetch is the only one that accepts accessions. So you would have to run esearch first using the accessions as a query, grab the GI from the XML, run elink with the GI, grab the SNP cluster ID, efetch the SNP data, and parse the data to get into Bio::ClusterIO. Fun, huh? You would think NCBI would try making this a little easier... There used to be a way to parse dbSNP data using Bio::ClusterIO but the XML schema changed so the parser is likely broken (the tests work but the file is from the old schema). I think Allen Day was in charge of it. I used the eutils test interface () to grab the SNP cluster accessions for your sequence using elink (note that the format is XML, which one would have to parse out to grab the cluster ID's): nucleotide 33875090 snp nucleotide_snp 4631 snp nucleotide_snp_genegenotype 28362589 4635949 28362591 11545838 4246814 28670911 4073746 9313754 11545840 17077806 28362590 4076327 9834 4073745 6879874 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Nicolaus Hepler > Sent: Wednesday, June 07, 2006 11:26 AM > To: Brian Osborne; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] GenBank Feature: variation > > Brian, > > A sample accession is BC000007. I figured a way around it though. > Rather than automate the whole process, I just downloaded from Batch > Entrez a flat .gb file of all my accessions. It's not flexible, and > will be inconvenient when we expand the dataset, but it will provide > me with data to work with for now. > > Nicolaus > > > Nicolaus, > > > > The short answer is no, there's no option that will omit or add a > > particular > > feature or annotation to the Sequence object returned by > > Bio::DB::GenBank. > > Can you give some example accessions? > > > > Brian O. > > > > > > On 6/7/06 9:46 AM, "Nicolaus Hepler" wrote: > > > >> Hello, > >> > >> I am having some difficulty here. I have a list of accessions, which > >> are the parameters for a get_Stream_by_acc() function on a > >> Bio::DB::GenBank object. None of the returned GenBank information > >> for any of my accessions seems to contain variation data, no matter > >> how I try to coax it out with unflattener and typemapper. This data > >> is, however, available via the web interface of NCBI Nucleotide, as > >> an optional feature (SNP). I was wondering if there was some option > >> I'm missing in the initialization of the Bio::DB::GenBank object (no > >> options currently) that will coax the database into giving me this > >> data? Or something else that I'm missing altogether. The organism > >> of interest is human, taxon:9606. > >> > >> Nicolaus Lance Hepler > >> nlhepler at mail dot umd dot edu > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Michael.Muratet at operon.com Wed Jun 7 12:01:29 2006 From: Michael.Muratet at operon.com (Michael Muratet US-Huntsville) Date: Wed, 7 Jun 2006 11:01:29 -0500 Subject: [Bioperl-l] bioperl-db failing tests Message-ID: <2000234931344D4BA434A0C235D1956B0C485C@hsv-exmail02.operonads.local> Hilmar Pardon the top post. I tried the test below and it failed. So, I went back and redid the Innodb configuration (deleted all the index files--they were empty anyway, reinstalled biosql (which was empty,too) and restarted the server. Now, the test below works. I went into the DBD-3.0003 and did a distclean and reinstalled the package, but it fails the one transaction test, too. So, it looks like the problem is in DBD, yes? We had a RAID 5 drive glitch the day before yesterday and rebuilt it. That's the only thing that's changed that I know of that could have caused the problem with ibxxx files. I have received a reply on the DBD list. Can you think of anything else I should try from the biosql end? Thanks a million. Mike -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Wednesday, June 07, 2006 7:52 AM To: Michael Muratet US-Huntsville Cc: Bioperl; BioSQL Subject: Re: [Bioperl-l] bioperl-db failing tests Hi Michael, Bioperl-db will open all connections with AutoCommit => 0 in the DBI parameter hash. The test you're stumbling over is actually there to test that the database does support transactions, but apparently in 5.x versions MySQL no longer silently ignores the AutoCommit parameter if it doesn't support transactions (effectively preempting the test ...). Now you say that innodb shows as enabled - i.e., you can confirm that you changed the Mysql configuration parameter that designates the directory for innodb to store its files? You can confirm that transactions are supported by simple tests on the sql level. Open a mysql shell and do the following: -- BTW 'start transaction;' will (should) work too mysql> set autocommit = 0; mysql> insert into biodatabase (name) values ('__dummy__'); mysql> select name from biodatabase where name = '__dummy__'; mysql> rollback; mysql> select name from biodatabase where name = '__dummy__'; The first SELECT query should return one and the last query should return zero rows if transactions are supported, and there shouldn't be any error. If the above succeeds (which I don't expect it to) then it looks like the DBD::mysql driver thinks the database doesn't support transactions when in reality it does. Let me know the result. -hilmar On Jun 6, 2006, at 2:34 PM, Michael Muratet US-Huntsville wrote: > Greetings > > I am trying to install bioperl-db in preparation for installing a > biosql database. I'm running on a Dell PowerEdge with quad dual- > core Xeons and RedHat Enterprise v4 and perl 5.8.5 and bioperl > 1.5.1. I have installed mysql v5.0.21 from source with --with- > innodb set for the configuration. I installed bioperl-db from cvs. > I have the latest DBI and DBD:mysql installed a few weeks ago from > CPAN. The installation has been working well with perl otherwise, > for example, the Ensembl core API works OK. SHOW ENGINES indicates > that innodb is enabled. I have attached a snippet from the top of > the output below. I searched the web and the bioperl-db list and > haven't found anything that appears to be relevant. I've done > several of these installs and they've pretty much completed without > a single glitch. Does anyone have any ideas how to isolate the > problem? > > Thanks > > Mike > > [mmuratet at HSV-PROBE bioperl-db]$ make test > PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t > t/01dbadaptor.....ok 14/19 > ------------- EXCEPTION ------------- > MSG: failed to open connection: Transactions not supported by database > STACK Bio::DB::DBI::base::new_connection /usr/lib/perl5/site_perl/ > 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:255 > STACK Bio::DB::DBI::base::get_connection /usr/lib/perl5/site_perl/ > 5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/DBI/base.pm:215 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::dbh /usr/lib/perl5/ > site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/DB/ > BioSQL/BasePersistenceAdaptor.pm:1477 > STACK Bio::DB::BioSQL::BaseDriver::prepare_findbyuk_sth /usr/lib/ > perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/lib/Bio/ > DB/BioSQL/BaseDriver.pm:518 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::_find_by_unique_key / > usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ > lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:927 > STACK Bio::DB::BioSQL::BasePersistenceAdaptor::find_by_unique_key / > usr/lib/perl5/site_perl/5.8.5/Bio/biosql-schema/sql/bioperl-db/blib/ > lib/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:855 > STACK toplevel t/01dbadaptor.t:62 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Jun 7 15:38:08 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Jun 2006 14:38:08 -0500 Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live Message-ID: <001901c68a69$e7ece8e0$15327e82@pyrimidine> All, Don't know how many people use Bio::ClusterIO this module, but it looks like Bio::ClusterIO::dbsnp is broken unless you are using older XML versions of the dbSNP database; the schema for ASN.1 and XML format for SNP has changed: http://www.ncbi.nlm.nih.gov/projects/SNP/ under 'Announcements'. I actually tried parsing the dbsnp test file and a newer schema XML file to confirm this; the new version doesn't work (returned object from next_cluster is undef). I'm filing a bug as a reminder. Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From paul.boutros at utoronto.ca Wed Jun 7 18:35:46 2006 From: paul.boutros at utoronto.ca (Paul Boutros) Date: Wed, 7 Jun 2006 18:35:46 -0400 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> References: <1149699781.448706c5e803d@webmail.utoronto.ca> <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> Message-ID: <1149719746.448754c2ef4e0@webmail.utoronto.ca> > Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG > that he didn't. Those only pop up when I run the optional remote- > server tests, however. Perhaps Paul didn't run those and that > accounts for the discrepancy? Yup yup, you're right. I should have mentioned in my original message that I didn't run any remote-server tests, and unfortunately can't do so on this box. Paul Quoting David Messina : > To look for problems related to Heikki's "return undef" sweep, I ran > 'make test' on both today's version of bioperl-live and on an older > version I had checked out on May 12. This was done on OS X 10.4.6 and > perl 5.8.6. > > > Here are the results: > > Failures in today's version of bioperl-live but NOT in 5/12 version > =================================================================== > - psm.t - > The psm.t error appears to be new, so the changes made to Bio/Matrix/ > PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may > need to be examined. > > Here's the error message: > Illegal division by zero at t/psm.t line 147, line 36. > t/psm........................dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 29, 32-48 > Failed 18/48 tests, 62.50% okay > > > Failures in 5/12 version of bioperl-live but NOT in today's version > =================================================================== > - OntologyStore.t - > Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been > touched between 5/12 and today. > > The error looks like a transient network problem to me, but I'm not > sure: > -------------------- WARNING --------------------- > MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ > *checkout*/song/ontology/so.definition?rev=HEAD, but server threw > 500. retrying... > --------------------------------------------------- > [REPEATED 5 times -Dave] > > t/OntologyStore..............FAILED tests 3-6 > Failed 4/6 tests, 33.33% okay > > > - RepeatMasker.t - > Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm > between 5/12 and today, so this appears to be not 'return undef'- > related. > > - SeqVersion.t - > The SeqVersion error was due to a failure to find and load > Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between > 5/12 and today, so this is not 'return undef'-related. > > > > All the other test failures appear in both versions of bioperl-live, > so presumably they are not affected by the 'return undef' changes. > > Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG > that he didn't. Those only pop up when I run the optional remote- > server tests, however. Perhaps Paul didn't run those and that > accounts for the discrepancy? > > Also, he saw errors in Biblio.t, Repeatmasker.t, and > StandAloneBlast.t that I did not. > > Dave > > > Today's bioperl-live test results: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > t/Annotation.t 89 2 2.25% 79 88 > t/DBCUTG.t 29 5 17.24% 26 30-32 > t/LocusLink.t 23 1 4.35% 23 > t/PhysicalMap.t 14 2 14.29% 11-12 > t/TaxonTree.t 17 30 176.47% 11 18-42 > t/alignUtilities.t 9 1 11.11% 9 > t/psm.t 255 65280 48 35 72.92% 29 32-48 > t/tutorial.t 21 15 71.43% 7-21 > 114 subtests skipped. > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed, > 99.84% okay. > > Note that this is including tests requiring a remote server. > > And here's the output from a May 12 checkout of bioperl-live: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > t/Annotation.t 89 2 2.25% 79 88 > t/DBCUTG.t 29 5 17.24% 26 30-32 > t/LocusLink.t 23 1 4.35% 23 > t/OntologyStore.t 6 4 66.67% 3-6 > t/PhysicalMap.t 14 2 14.29% 11-12 > t/RepeatMasker.t 6 3 50.00% 1-2 6 > t/SeqVersion.t 255 65280 6 10 166.67% 2-6 > t/TaxonTree.t 17 30 176.47% 11 18-42 > t/alignUtilities.t 9 1 11.11% 9 > t/tutorial.t 21 15 71.43% 7-21 > 114 subtests skipped. > Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed, > 99.89% okay. > > > > > On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote: > > > Hi, > > > > Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 > > and I had a few > > failures: > > > > Failed Test Stat Wstat Total Fail List of Failed > > ---------------------------------------------------------------------- > > --------- > > t/Annotation.t 89 2 79 88 > > t/Biblio.t 24 1 2 > > t/LocusLink.t 23 1 23 > > t/PhysicalMap.t 14 2 11-12 > > t/RepeatMasker.t 6 3 1-2 6 > > t/StandAloneBlast.t 18 4 19-22 > > t/TaxonTree.t 17 30 11 18-42 > > t/alignUtilities.t 9 1 9 > > t/psm.t 255 65280 48 35 29 32-48 > > t/tutorial.t 21 15 7-21 > > From dmessina at wustl.edu Wed Jun 7 18:26:25 2006 From: dmessina at wustl.edu (David Messina) Date: Wed, 7 Jun 2006 17:26:25 -0500 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" In-Reply-To: <1149699781.448706c5e803d@webmail.utoronto.ca> References: <1149699781.448706c5e803d@webmail.utoronto.ca> Message-ID: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> To look for problems related to Heikki's "return undef" sweep, I ran 'make test' on both today's version of bioperl-live and on an older version I had checked out on May 12. This was done on OS X 10.4.6 and perl 5.8.6. Here are the results: Failures in today's version of bioperl-live but NOT in 5/12 version =================================================================== - psm.t - The psm.t error appears to be new, so the changes made to Bio/Matrix/ PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may need to be examined. Here's the error message: Illegal division by zero at t/psm.t line 147, line 36. t/psm........................dubious Test returned status 255 (wstat 65280, 0xff00) DIED. FAILED tests 29, 32-48 Failed 18/48 tests, 62.50% okay Failures in 5/12 version of bioperl-live but NOT in today's version =================================================================== - OntologyStore.t - Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been touched between 5/12 and today. The error looks like a transient network problem to me, but I'm not sure: -------------------- WARNING --------------------- MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ *checkout*/song/ontology/so.definition?rev=HEAD, but server threw 500. retrying... --------------------------------------------------- [REPEATED 5 times -Dave] t/OntologyStore..............FAILED tests 3-6 Failed 4/6 tests, 33.33% okay - RepeatMasker.t - Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm between 5/12 and today, so this appears to be not 'return undef'- related. - SeqVersion.t - The SeqVersion error was due to a failure to find and load Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between 5/12 and today, so this is not 'return undef'-related. All the other test failures appear in both versions of bioperl-live, so presumably they are not affected by the 'return undef' changes. Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG that he didn't. Those only pop up when I run the optional remote- server tests, however. Perhaps Paul didn't run those and that accounts for the discrepancy? Also, he saw errors in Biblio.t, Repeatmasker.t, and StandAloneBlast.t that I did not. Dave Today's bioperl-live test results: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/Annotation.t 89 2 2.25% 79 88 t/DBCUTG.t 29 5 17.24% 26 30-32 t/LocusLink.t 23 1 4.35% 23 t/PhysicalMap.t 14 2 14.29% 11-12 t/TaxonTree.t 17 30 176.47% 11 18-42 t/alignUtilities.t 9 1 11.11% 9 t/psm.t 255 65280 48 35 72.92% 29 32-48 t/tutorial.t 21 15 71.43% 7-21 114 subtests skipped. Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed, 99.84% okay. Note that this is including tests requiring a remote server. And here's the output from a May 12 checkout of bioperl-live: Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/Annotation.t 89 2 2.25% 79 88 t/DBCUTG.t 29 5 17.24% 26 30-32 t/LocusLink.t 23 1 4.35% 23 t/OntologyStore.t 6 4 66.67% 3-6 t/PhysicalMap.t 14 2 14.29% 11-12 t/RepeatMasker.t 6 3 50.00% 1-2 6 t/SeqVersion.t 255 65280 6 10 166.67% 2-6 t/TaxonTree.t 17 30 176.47% 11 18-42 t/alignUtilities.t 9 1 11.11% 9 t/tutorial.t 21 15 71.43% 7-21 114 subtests skipped. Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed, 99.89% okay. On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote: > Hi, > > Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 > and I had a few > failures: > > Failed Test Stat Wstat Total Fail List of Failed > ---------------------------------------------------------------------- > --------- > t/Annotation.t 89 2 79 88 > t/Biblio.t 24 1 2 > t/LocusLink.t 23 1 23 > t/PhysicalMap.t 14 2 11-12 > t/RepeatMasker.t 6 3 1-2 6 > t/StandAloneBlast.t 18 4 19-22 > t/TaxonTree.t 17 30 11 18-42 > t/alignUtilities.t 9 1 9 > t/psm.t 255 65280 48 35 29 32-48 > t/tutorial.t 21 15 7-21 From cjfields at uiuc.edu Wed Jun 7 19:38:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 7 Jun 2006 18:38:10 -0500 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> References: <1149699781.448706c5e803d@webmail.utoronto.ca> <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> Message-ID: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu> I saw a ton of activity from Jason on bioperl-guts for test files and modules; you may want to check your tests vs. his changes in case they were fixed. I'll be running similar tests on WinXP ad Mac OS X; would be nice to see how my results compare to Dave's Chris On Jun 7, 2006, at 5:26 PM, David Messina wrote: > To look for problems related to Heikki's "return undef" sweep, I ran > 'make test' on both today's version of bioperl-live and on an older > version I had checked out on May 12. This was done on OS X 10.4.6 and > perl 5.8.6. > > > Here are the results: > > Failures in today's version of bioperl-live but NOT in 5/12 version > =================================================================== > - psm.t - > The psm.t error appears to be new, so the changes made to Bio/Matrix/ > PSM/SiteMatrix.pm, particularly the one in _uncompress_string, may > need to be examined. > > Here's the error message: > Illegal division by zero at t/psm.t line 147, line 36. > t/psm........................dubious > Test returned status 255 (wstat 65280, 0xff00) > DIED. FAILED tests 29, 32-48 > Failed 18/48 tests, 62.50% okay > > > Failures in 5/12 version of bioperl-live but NOT in today's version > =================================================================== > - OntologyStore.t - > Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been > touched between 5/12 and today. > > The error looks like a transient network problem to me, but I'm not > sure: > -------------------- WARNING --------------------- > MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ > *checkout*/song/ontology/so.definition?rev=HEAD, but server threw > 500. retrying... > --------------------------------------------------- > [REPEATED 5 times -Dave] > > t/OntologyStore..............FAILED tests 3-6 > Failed 4/6 tests, 33.33% okay > > > - RepeatMasker.t - > Jason made commits to RepeatMasker.t and Bio/Tools/RepeatMasker.pm > between 5/12 and today, so this appears to be not 'return undef'- > related. > > - SeqVersion.t - > The SeqVersion error was due to a failure to find and load > Bio::DB::SeqVersion::gi, which Brian O. noticed and corrected between > 5/12 and today, so this is not 'return undef'-related. > > > > All the other test failures appear in both versions of bioperl-live, > so presumably they are not affected by the 'return undef' changes. > > Comparing with Paul Boutros's earlier report, I saw errors in DBCUTG > that he didn't. Those only pop up when I run the optional remote- > server tests, however. Perhaps Paul didn't run those and that > accounts for the discrepancy? > > Also, he saw errors in Biblio.t, Repeatmasker.t, and > StandAloneBlast.t that I did not. > > Dave > > > Today's bioperl-live test results: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > -- > ------- > t/Annotation.t 89 2 2.25% 79 88 > t/DBCUTG.t 29 5 17.24% 26 30-32 > t/LocusLink.t 23 1 4.35% 23 > t/PhysicalMap.t 14 2 14.29% 11-12 > t/TaxonTree.t 17 30 176.47% 11 18-42 > t/alignUtilities.t 9 1 11.11% 9 > t/psm.t 255 65280 48 35 72.92% 29 32-48 > t/tutorial.t 21 15 71.43% 7-21 > 114 subtests skipped. > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed, > 99.84% okay. > > Note that this is including tests requiring a remote server. > > And here's the output from a May 12 checkout of bioperl-live: > > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > -- > ------- > t/Annotation.t 89 2 2.25% 79 88 > t/DBCUTG.t 29 5 17.24% 26 30-32 > t/LocusLink.t 23 1 4.35% 23 > t/OntologyStore.t 6 4 66.67% 3-6 > t/PhysicalMap.t 14 2 14.29% 11-12 > t/RepeatMasker.t 6 3 50.00% 1-2 6 > t/SeqVersion.t 255 65280 6 10 166.67% 2-6 > t/TaxonTree.t 17 30 176.47% 11 18-42 > t/alignUtilities.t 9 1 11.11% 9 > t/tutorial.t 21 15 71.43% 7-21 > 114 subtests skipped. > Failed 10/232 test scripts, 95.69% okay. 12/11049 subtests failed, > 99.89% okay. > > > > > On Jun 7, 2006, at 12:03 PM, Paul Boutros wrote: > >> Hi, >> >> Just did a make test on bioperl-live on AIX 5.2.0.0 and perl 5.8.7 >> and I had a few >> failures: >> >> Failed Test Stat Wstat Total Fail List of Failed >> --------------------------------------------------------------------- >> - >> --------- >> t/Annotation.t 89 2 79 88 >> t/Biblio.t 24 1 2 >> t/LocusLink.t 23 1 23 >> t/PhysicalMap.t 14 2 11-12 >> t/RepeatMasker.t 6 3 1-2 6 >> t/StandAloneBlast.t 18 4 19-22 >> t/TaxonTree.t 17 30 11 18-42 >> t/alignUtilities.t 9 1 9 >> t/psm.t 255 65280 48 35 29 32-48 >> t/tutorial.t 21 15 7-21 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From dmessina at wustl.edu Wed Jun 7 20:50:48 2006 From: dmessina at wustl.edu (David Messina) Date: Wed, 7 Jun 2006 19:50:48 -0500 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" In-Reply-To: <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu> References: <1149699781.448706c5e803d@webmail.utoronto.ca> <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu> Message-ID: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu> Thanks for letting me know, Chris. Here's a new round of results on bioperl-live checked out moments ago: [OS X 10.4.6, perl 5.8.6] Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------ ------- t/DBCUTG.t 29 5 17.24% 26 30-32 t/LocusLink.t 23 1 4.35% 23 t/PopGen.t 89 1 1.12% 85 t/psm.t 255 65280 48 35 72.92% 29 32-48 t/tutorial.t 21 15 71.43% 7-21 121 subtests skipped. Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed, 99.69% okay. Fixed since earlier today ========================= Annotation.t PhysicalMap.t TaxonTree.t alignUtilities.t New since earlier today ======================= PopGen.t t/PopGen.....................FAILED test 85 Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86 okay, 96.63%) Unchanged ========= DBCUTG.t LocusLink.t psm.t tutorial.t Remote-server tests were run like before. I forgot to mention last time that I skipped the local DB tests and I don't have bioperl-ext installed, so several staden-related tests were also skipped. Dave My results from earlier today for reference: > Failed Test Stat Wstat Total Fail Failed List of Failed > ---------------------------------------------------------------------- > -- > ------- > t/Annotation.t 89 2 2.25% 79 88 > t/DBCUTG.t 29 5 17.24% 26 30-32 > t/LocusLink.t 23 1 4.35% 23 > t/PhysicalMap.t 14 2 14.29% 11-12 > t/TaxonTree.t 17 30 176.47% 11 18-42 > t/alignUtilities.t 9 1 11.11% 9 > t/psm.t 255 65280 48 35 72.92% 29 32-48 > t/tutorial.t 21 15 71.43% 7-21 > 114 subtests skipped. > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed, > 99.84% okay. From heikki at sanbi.ac.za Thu Jun 8 04:49:38 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 8 Jun 2006 10:49:38 +0200 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu> References: <1149699781.448706c5e803d@webmail.utoronto.ca> <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu> <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu> Message-ID: <200606081049.40232.heikki@sanbi.ac.za> Looks like we survived the sweeping change - and fixed a number of existing bugs in the process. Thanks for everyone who helped! -Heikki On Thursday 08 June 2006 02:50, David Messina wrote: > Thanks for letting me know, Chris. > > Here's a new round of results on bioperl-live checked out moments ago: > [OS X 10.4.6, perl 5.8.6] > > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > t/DBCUTG.t 29 5 17.24% 26 30-32 > t/LocusLink.t 23 1 4.35% 23 > t/PopGen.t 89 1 1.12% 85 > t/psm.t 255 65280 48 35 72.92% 29 32-48 > t/tutorial.t 21 15 71.43% 7-21 > 121 subtests skipped. > Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed, > 99.69% okay. > > Fixed since earlier today > ========================= > Annotation.t > PhysicalMap.t > TaxonTree.t > alignUtilities.t > > New since earlier today > ======================= > PopGen.t > > t/PopGen.....................FAILED test 85 > Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86 > okay, 96.63%) > > Unchanged > ========= > DBCUTG.t > LocusLink.t > psm.t > tutorial.t > > Remote-server tests were run like before. I forgot to mention last > time that I skipped the local DB tests and I don't have bioperl-ext > installed, so several staden-related tests were also skipped. > > Dave > > My results from earlier today for reference: > > Failed Test Stat Wstat Total Fail Failed List of Failed > > ---------------------------------------------------------------------- > > -- > > ------- > > t/Annotation.t 89 2 2.25% 79 88 > > t/DBCUTG.t 29 5 17.24% 26 30-32 > > t/LocusLink.t 23 1 4.35% 23 > > t/PhysicalMap.t 14 2 14.29% 11-12 > > t/TaxonTree.t 17 30 176.47% 11 18-42 > > t/alignUtilities.t 9 1 11.11% 9 > > t/psm.t 255 65280 48 35 72.92% 29 32-48 > > t/tutorial.t 21 15 71.43% 7-21 > > 114 subtests skipped. > > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed, > > 99.84% okay. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Jun 8 04:52:27 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 8 Jun 2006 10:52:27 +0200 Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live In-Reply-To: <001901c68a69$e7ece8e0$15327e82@pyrimidine> References: <001901c68a69$e7ece8e0$15327e82@pyrimidine> Message-ID: <200606081052.27446.heikki@sanbi.ac.za> I sort of fixed this. At least the tests pass (I commented out two) when using the new sample XML. To be really usefull, the code need much more work, so I left the bug open. http://bugzilla.open-bio.org/show_bug.cgi?id=2018 -Heikki On Wednesday 07 June 2006 21:38, Chris Fields wrote: > All, > > Don't know how many people use Bio::ClusterIO this module, but it looks > like Bio::ClusterIO::dbsnp is broken unless you are using older XML > versions of the dbSNP database; the schema for ASN.1 and XML format for SNP > has changed: > > http://www.ncbi.nlm.nih.gov/projects/SNP/ > > under 'Announcements'. > > I actually tried parsing the dbsnp test file and a newer schema XML file to > confirm this; the new version doesn't work (returned object from > next_cluster is undef). I'm filing a bug as a reminder. > > Christopher Fields > Postdoctoral Researcher - Switzer Lab > Dept. of Biochemistry > University of Illinois Urbana-Champaign > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Thu Jun 8 04:49:38 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Thu, 8 Jun 2006 10:49:38 +0200 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" In-Reply-To: <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu> References: <1149699781.448706c5e803d@webmail.utoronto.ca> <2C9D69A9-E247-4C25-AB15-AB40769C84B4@uiuc.edu> <4A4E101C-3687-415E-9E79-0EA9CA7C74C5@wustl.edu> Message-ID: <200606081049.40232.heikki@sanbi.ac.za> Looks like we survived the sweeping change - and fixed a number of existing bugs in the process. Thanks for everyone who helped! -Heikki On Thursday 08 June 2006 02:50, David Messina wrote: > Thanks for letting me know, Chris. > > Here's a new round of results on bioperl-live checked out moments ago: > [OS X 10.4.6, perl 5.8.6] > > Failed Test Stat Wstat Total Fail Failed List of Failed > ------------------------------------------------------------------------ > ------- > t/DBCUTG.t 29 5 17.24% 26 30-32 > t/LocusLink.t 23 1 4.35% 23 > t/PopGen.t 89 1 1.12% 85 > t/psm.t 255 65280 48 35 72.92% 29 32-48 > t/tutorial.t 21 15 71.43% 7-21 > 121 subtests skipped. > Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed, > 99.69% okay. > > Fixed since earlier today > ========================= > Annotation.t > PhysicalMap.t > TaxonTree.t > alignUtilities.t > > New since earlier today > ======================= > PopGen.t > > t/PopGen.....................FAILED test 85 > Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86 > okay, 96.63%) > > Unchanged > ========= > DBCUTG.t > LocusLink.t > psm.t > tutorial.t > > Remote-server tests were run like before. I forgot to mention last > time that I skipped the local DB tests and I don't have bioperl-ext > installed, so several staden-related tests were also skipped. > > Dave > > My results from earlier today for reference: > > Failed Test Stat Wstat Total Fail Failed List of Failed > > ---------------------------------------------------------------------- > > -- > > ------- > > t/Annotation.t 89 2 2.25% 79 88 > > t/DBCUTG.t 29 5 17.24% 26 30-32 > > t/LocusLink.t 23 1 4.35% 23 > > t/PhysicalMap.t 14 2 14.29% 11-12 > > t/TaxonTree.t 17 30 176.47% 11 18-42 > > t/alignUtilities.t 9 1 11.11% 9 > > t/psm.t 255 65280 48 35 72.92% 29 32-48 > > t/tutorial.t 21 15 71.43% 7-21 > > 114 subtests skipped. > > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed, > > 99.84% okay. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From torsten.seemann at infotech.monash.edu.au Thu Jun 8 01:55:09 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Thu, 08 Jun 2006 15:55:09 +1000 Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file? Message-ID: <4487BBBD.6060702@infotech.monash.edu.au> Hi all, I've just been further auditing the Bioperl code and noticed that Bio::SeqIO::lasergene didn't even compile (now fixed) but I still can't locate an example/sample sequence file in "Lasergene" format. From the code it looks similar to 'raw' format but has "^^" as a separator character. Can anyone provide a real-life example so I can augment the t/lasergene.t tests? Thanks, -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From jrm62 at cam.ac.uk Thu Jun 8 07:38:40 2006 From: jrm62 at cam.ac.uk (John Mifsud) Date: 08 Jun 2006 12:38:40 +0100 Subject: [Bioperl-l] NCBI BLAST results parsing Message-ID: Dear all, Firstly I hope this is the right email list to write to! Secondly, I have a little program that parses the BLAST results i have got running remotely to the NCBI server and takes out all the hit sequences and converts them to FASTA format. Now when using BROAD BLAST and getting results this works fine (tblastn ver 2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and the output is different and the parsing no longer works. I was wondering if anyone knew of a new SearchIO module / script that is designed to blast the updated NCBI BLAST output? Thanks for your time, John From cjfields at uiuc.edu Thu Jun 8 08:56:27 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Jun 2006 07:56:27 -0500 Subject: [Bioperl-l] Bio::ClusterIO::dbsnp broken in bioperl-live In-Reply-To: <200606081052.27446.heikki@sanbi.ac.za> References: <001901c68a69$e7ece8e0$15327e82@pyrimidine> <200606081052.27446.heikki@sanbi.ac.za> Message-ID: Sounds good to me. If someone wants to use this down the line, they might be desperate enough to provide patches; there are a lot of commented out tags. Chris On Jun 8, 2006, at 3:52 AM, Heikki Lehvaslaiho wrote: > I sort of fixed this. > > At least the tests pass (I commented out two) when using the new > sample XML. > To be really usefull, the code need much more work, so I left the > bug open. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2018 > > > -Heikki > > > On Wednesday 07 June 2006 21:38, Chris Fields wrote: >> All, >> >> Don't know how many people use Bio::ClusterIO this module, but it >> looks >> like Bio::ClusterIO::dbsnp is broken unless you are using older XML >> versions of the dbSNP database; the schema for ASN.1 and XML >> format for SNP >> has changed: >> >> http://www.ncbi.nlm.nih.gov/projects/SNP/ >> >> under 'Announcements'. >> >> I actually tried parsing the dbsnp test file and a newer schema >> XML file to >> confirm this; the new version doesn't work (returned object from >> next_cluster is undef). I'm filing a bug as a reminder. >> >> Christopher Fields >> Postdoctoral Researcher - Switzer Lab >> Dept. of Biochemistry >> University of Illinois Urbana-Champaign >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sb at mrc-dunn.cam.ac.uk Thu Jun 8 09:03:05 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 08 Jun 2006 14:03:05 +0100 Subject: [Bioperl-l] NCBI BLAST results parsing In-Reply-To: References: Message-ID: <44882009.1040906@mrc-dunn.cam.ac.uk> John Mifsud wrote: > Dear all, > > Firstly I hope this is the right email list to write to! > > Secondly, I have a little program that parses the BLAST results i have got > running remotely to the NCBI server and takes out all the hit sequences and > converts them to FASTA format. > > Now when using BROAD BLAST and getting results this works fine (tblastn ver > 2.2.9). However, NCBI have just updated their BLAST server (to 2.2.14) and > the output is different and the parsing no longer works. I was wondering if > anyone knew of a new SearchIO module / script that is designed to blast the > updated NCBI BLAST output? You'll probably need to get the latest SearchIO blast module from bioperl-live. http://bioperl.org/wiki/Getting_BioPerl If you're having difficulties with your setup, John, I can just send you the relevant file(s). Mail me (or Alan) privately for that. From cjfields at uiuc.edu Thu Jun 8 09:12:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Jun 2006 08:12:23 -0500 Subject: [Bioperl-l] NCBI BLAST results parsing In-Reply-To: References: Message-ID: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu> I would say, based on previous responses, update to the latest CVS (bioperl-live). You could also try updating Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you don't want to update the entire toolkit. Running these with BLAST 2.2.14 output seems to work fine. Though this is the likely fix, if you have additional problems next time please make sure to include more information. We have no idea what OS, bioperl version, perl version you are running. And a code snippet and bug description would be nice (i.e. "it doesn't work" - not a good description; "the script freezes" is a little more informative). Chris On Jun 8, 2006, at 6:38 AM, John Mifsud wrote: > Dear all, > > Firstly I hope this is the right email list to write to! > > Secondly, I have a little program that parses the BLAST results i > have got > running remotely to the NCBI server and takes out all the hit > sequences and > converts them to FASTA format. > > Now when using BROAD BLAST and getting results this works fine > (tblastn ver > 2.2.9). However, NCBI have just updated their BLAST server (to > 2.2.14) and > the output is different and the parsing no longer works. I was > wondering if > anyone knew of a new SearchIO module / script that is designed to > blast the > updated NCBI BLAST output? > > Thanks for your time, > > > John > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sb at mrc-dunn.cam.ac.uk Thu Jun 8 12:03:21 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 08 Jun 2006 17:03:21 +0100 Subject: [Bioperl-l] For CVS developers - potential pitfallwith "returnundef" In-Reply-To: <447DAEB1.4040509@mrc-dunn.cam.ac.uk> References: <001801c68431$a586b2d0$15327e82@pyrimidine> <200605311255.19166.heikki@sanbi.ac.za> <447DAEB1.4040509@mrc-dunn.cam.ac.uk> Message-ID: <44884A49.6060805@mrc-dunn.cam.ac.uk> Sendu Bala wrote: > Heikki Lehvaslaiho wrote: >> In my opinion the sooner the bugs get exposed the better. It is much more >> likely that there is a well hidden bug caused by assigning accidentally undef >> into an one element array that someone intentionally writing code that >> expects that behaviour! >> >> I removed (but did not commit yet) all undefs from my old Bio::Variation code >> and could not see any differences in the test output. >> >> Let's remove them! > > Just looking for all return undef;s isn't enough. It's entirely possible > to do something like: > > my $return_value; > { > # do something that assigns to return_value on success > # on failure, just do nothing > } > return $return_value; Looks like Heikki's work went well. If there is any further interest in getting rid of all the remaining undef returns, this also need to be fixed: sub x { # return (...) on success # do nothing on failure } Needs to be changed to: sub x { # return (...) on success return; } From roy at colibase.bham.ac.uk Thu Jun 8 12:31:10 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Thu, 08 Jun 2006 17:31:10 +0100 Subject: [Bioperl-l] Truncate sequence with features Message-ID: <448850CE.1040105@colibase.bham.ac.uk> Hi all. I've been playing around with a subroutine to truncate a sequence and adjust the coordinates of any features that overlap the specified region- something that according to the comments in Bio::Location::Simple has been abortively worked on in the past. I've submitted the subroutine as an enhancement in Bugzilla. It's a bit hacky but works for what I needed it for. However I'm a bit unsure on the best way to deal with split locations where one of the sublocations is entirely outside the truncated region. My current method results in locations like: join(1..500, >1000..>1000) which is quite ugly and possibly invalid, but kind of makes sense. Does anyone know what would be the correct behaviour for this situation? Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk From cjfields at uiuc.edu Thu Jun 8 14:47:19 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Jun 2006 13:47:19 -0500 Subject: [Bioperl-l] NCBI BLAST results parsing In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu> Message-ID: <000701c68b2b$f8cc21e0$15327e82@pyrimidine> Thomas; That error isn't related to BioPerl. This is the standard HTML response NCBI gives as a web page; the error imbedded in the HTML you received as a warning has: ERROR: Cannot accept request, error code: 1Number of unfinished requests (151) from your IP address reached the HARD limit 150. So you may have too many requests in the BLAST queue. Chris > -----Original Message----- > From: Thomas J Keller [mailto:kellert at ohsu.edu] > Sent: Thursday, June 08, 2006 1:39 PM > To: Chris Fields > Cc: John Mifsud; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] NCBI BLAST results parsing > > I'm having the same problem bp_remote_blast.pl worked yesterday, > today it's busted. Incidently, I got the following email from NCBI > this morning: > The new version of the NCBI SOAP E-Utilities, which includes recent > changes to the NCBI sequence databases schema, was released today. > > Thank you. > NCBI E-Utilities Team > > I wouldn't have thought that that would affect > Bio::Tools::RemoteBlast but something has changed. > > Here's a snippet of the output after $ bp_remote_blast.pl -p blastn - > d nr -e 1e-3 -i nm_008540.fasta > > -------------------- WARNING --------------------- > MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi > User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4 > Content-Length: 267 > Content-Type: application/x-www-form-urlencoded > > DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+% > 25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C > +mRNA.% > 0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm > ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn > > > --------------------------------------------------- > > -------------------- WARNING --------------------- > MSG: NCBI Blast href="http://www.ncbi.nlm.nih.gov/corehtml/ncbi.css"> bgcolor="#FFFFFF" text="#000000" link="#CC6600" vlink="#CC6600" > onload="StartBlastCgi();"> width="600" cellspacing="0" cellpadding="0"> align="center">
colspan=4> coords="0,0,300,40" href="http://www.ncbi.nlm.nih.gov" alt="NCBI home > page"> NCBI BLAST home page > NAME="BlastHeaderGif" ALT="BLAST header image" WIDTH="600" > HEIGHT="45" BORDER="0" ALIGN="middle">
href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Nucleotides&NCBI_GI= > yes&FILTER=L&HITLIST_SIZE=100&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LIN > KOUT=yes" class="HELPBAR">Nucleotide FONT> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Proteins&NCBI_GI=yes > &HITLIST_SIZE=100&COMPOSITION_BASED_STATISTICS=yes&SHOW_OVERVIEW=yes&AUT > O_FORMAT=yes&CDD_SEARCH=yes&FILTER=L&SHOW_LINKOUT=yes" > class="HELPBAR">Protein width="150" bgcolor="#003366"> Translations FONT> href="http://www.ncbi.nlm.nih.gov/blast/Blast.cgi? > CMD=Web&LAYOUT=TwoWindows&AUTO_FORMAT=Semiauto&PAGE=Formating&NCBI_GI=ye > s&SHOW_OVERVIEW=yes&AUTO_FORMAT=yes&SHOW_LINKOUT=yes" > class="HELPBAR">Retrieve results for an RID FONT>

action="Blast.cgi" enctype="application/x-www-form-urlencoded" > method="POST">

color="red">ERROR: Cannot accept request, error code: 1Number of > unfinished requests (151) from your IP address reached the HARD > limit 150.
> --------------------------------------------------- > > On Jun 8, 2006, at 6:12 AM, Chris Fields wrote: > > > I would say, based on previous responses, update to the latest CVS > > (bioperl-live). You could also try updating > > Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you > > don't want to update the entire toolkit. Running these with BLAST > > 2.2.14 output seems to work fine. > > > > Though this is the likely fix, if you have additional problems next > > time please make sure to include more information. We have no idea > > what OS, bioperl version, perl version you are running. And a code > > snippet and bug description would be nice (i.e. "it doesn't work" - > > not a good description; "the script freezes" is a little more > > informative). > > > > Chris > > > > On Jun 8, 2006, at 6:38 AM, John Mifsud wrote: > > > >> Dear all, > >> > >> Firstly I hope this is the right email list to write to! > >> > >> Secondly, I have a little program that parses the BLAST results i > >> have got > >> running remotely to the NCBI server and takes out all the hit > >> sequences and > >> converts them to FASTA format. > >> > >> Now when using BROAD BLAST and getting results this works fine > >> (tblastn ver > >> 2.2.9). However, NCBI have just updated their BLAST server (to > >> 2.2.14) and > >> the output is different and the parsing no longer works. I was > >> wondering if > >> anyone knew of a new SearchIO module / script that is designed to > >> blast the > >> updated NCBI BLAST output? > >> > >> Thanks for your time, > >> > >> > >> John > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > Christopher Fields > > Postdoctoral Researcher > > Lab of Dr. Robert Switzer > > Dept of Biochemistry > > University of Illinois Urbana-Champaign > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From kellert at ohsu.edu Thu Jun 8 14:39:04 2006 From: kellert at ohsu.edu (Thomas J Keller) Date: Thu, 8 Jun 2006 11:39:04 -0700 Subject: [Bioperl-l] NCBI BLAST results parsing In-Reply-To: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu> References: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu> Message-ID: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu> I'm having the same problem bp_remote_blast.pl worked yesterday, today it's busted. Incidently, I got the following email from NCBI this morning: The new version of the NCBI SOAP E-Utilities, which includes recent changes to the NCBI sequence databases schema, was released today. Thank you. NCBI E-Utilities Team I wouldn't have thought that that would affect Bio::Tools::RemoteBlast but something has changed. Here's a snippet of the output after $ bp_remote_blast.pl -p blastn - d nr -e 1e-3 -i nm_008540.fasta -------------------- WARNING --------------------- MSG: req was POST http://www.ncbi.nlm.nih.gov/blast/Blast.cgi User-Agent: bioperl-Bio_Tools_Run_RemoteBlast/1.4 Content-Length: 267 Content-Type: application/x-www-form-urlencoded DATABASE=nr&COMPOSITION_BASED_STATISTICS=off&QUERY=%3ENM_008540_2927+% 25GC+55.0+Score+5+Mus+musculus+MAD+homolog+4+(Drosophila)+(Smad4)%2C +mRNA.% 0Acactctgcctgctgcttcactgt&EXPECT=1e-3&SERVICE=plain&FORMAT_OBJECT=Alignm ent&CMD=Put&FILTER=L&CDD_SEARCH=off&PROGRAM=blastn --------------------------------------------------- -------------------- WARNING --------------------- MSG: NCBI Blast
NCBI home  
page NCBI BLAST home page BLAST header image
Nucleotide Protein Translations Retrieve results for an RID



ERROR: Cannot accept request, error code: 1Number of unfinished requests (151) from your IP address reached the HARD limit 150.
--------------------------------------------------- On Jun 8, 2006, at 6:12 AM, Chris Fields wrote: > I would say, based on previous responses, update to the latest CVS > (bioperl-live). You could also try updating > Bio::Tools::Run::RemoteBlast.pm and Bio::SearchIO::blast.pm if you > don't want to update the entire toolkit. Running these with BLAST > 2.2.14 output seems to work fine. > > Though this is the likely fix, if you have additional problems next > time please make sure to include more information. We have no idea > what OS, bioperl version, perl version you are running. And a code > snippet and bug description would be nice (i.e. "it doesn't work" - > not a good description; "the script freezes" is a little more > informative). > > Chris > > On Jun 8, 2006, at 6:38 AM, John Mifsud wrote: > >> Dear all, >> >> Firstly I hope this is the right email list to write to! >> >> Secondly, I have a little program that parses the BLAST results i >> have got >> running remotely to the NCBI server and takes out all the hit >> sequences and >> converts them to FASTA format. >> >> Now when using BROAD BLAST and getting results this works fine >> (tblastn ver >> 2.2.9). However, NCBI have just updated their BLAST server (to >> 2.2.14) and >> the output is different and the parsing no longer works. I was >> wondering if >> anyone knew of a new SearchIO module / script that is designed to >> blast the >> updated NCBI BLAST output? >> >> Thanks for your time, >> >> >> John >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > Christopher Fields > Postdoctoral Researcher > Lab of Dr. Robert Switzer > Dept of Biochemistry > University of Illinois Urbana-Champaign > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Thu Jun 8 15:28:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Jun 2006 14:28:18 -0500 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "returnundef" In-Reply-To: <200606081049.40232.heikki@sanbi.ac.za> Message-ID: <000001c68b31$b5320390$15327e82@pyrimidine> Here are tests run from WinXP, ActivePerl 5.8.817; almost everything passes. Not sure what's going on with StandAloneBlast or the protgraph tests, so I'll check into it. The psm.t tests that failed are the same as the ones mentioned previously on other systems. As an aside, I hate that using '-w' flag with ActivePerl gives a thousand useless 'subroutines redefined' warnings; only way I found to turn it off is to not use the flag. Anyway, I pulled out the relevant chunks of code here; I'll submit the Mac results separately to not confuse the two. ... t/StandAloneBlast............FAILED tests 19-22 Failed 4/18 tests, 77.78% okay ... t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33, 36-37, 45, 48-56, 59-60, 65-66 Failed 22/66 tests, 66.67% okay ... t/psm........................Illegal division by zero at t/psm.t line 147, line 36. dubious Test returned status 9 (wstat 2304, 0x900) DIED. FAILED tests 29, 32-48 Failed 18/48 tests, 62.50% okay ... Failed Test Stat Wstat Total Fail Failed List of Failed ---------------------------------------------------------------------------- --- t/StandAloneBlast.t 18 4 22.22% 19-22 t/protgraph.t 66 22 33.33% 11 13 20-21 26 33 36-37 45 48-56 59-60 65-66 t/psm.t 9 2304 48 35 72.92% 29 32-48 39 subtests skipped. Failed 3/233 test scripts, 98.71% okay. 36/11100 subtests failed, 99.68% okay. NMAKE : U1077: Stop. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho > Sent: Thursday, June 08, 2006 3:50 AM > To: bioperl-l at lists.open-bio.org > Cc: Paul.Boutros at utoronto.ca; BioPerl Mailing List; Chris Fields > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall with > "returnundef" > > Looks like we survived the sweeping change - and fixed a number of > existing > bugs in the process. Thanks for everyone who helped! > > -Heikki > > On Thursday 08 June 2006 02:50, David Messina wrote: > > Thanks for letting me know, Chris. > > > > Here's a new round of results on bioperl-live checked out moments ago: > > [OS X 10.4.6, perl 5.8.6] > > > > Failed Test Stat Wstat Total Fail Failed List of Failed > > ------------------------------------------------------------------------ > > ------- > > t/DBCUTG.t 29 5 17.24% 26 30-32 > > t/LocusLink.t 23 1 4.35% 23 > > t/PopGen.t 89 1 1.12% 85 > > t/psm.t 255 65280 48 35 72.92% 29 32-48 > > t/tutorial.t 21 15 71.43% 7-21 > > 121 subtests skipped. > > Failed 5/232 test scripts, 97.84% okay. 34/11099 subtests failed, > > 99.69% okay. > > > > Fixed since earlier today > > ========================= > > Annotation.t > > PhysicalMap.t > > TaxonTree.t > > alignUtilities.t > > > > New since earlier today > > ======================= > > PopGen.t > > > > t/PopGen.....................FAILED test 85 > > Failed 1/89 tests, 98.88% okay (less 2 skipped tests: 86 > > okay, 96.63%) > > > > Unchanged > > ========= > > DBCUTG.t > > LocusLink.t > > psm.t > > tutorial.t > > > > Remote-server tests were run like before. I forgot to mention last > > time that I skipped the local DB tests and I don't have bioperl-ext > > installed, so several staden-related tests were also skipped. > > > > Dave > > > > My results from earlier today for reference: > > > Failed Test Stat Wstat Total Fail Failed List of Failed > > > ---------------------------------------------------------------------- > > > -- > > > ------- > > > t/Annotation.t 89 2 2.25% 79 88 > > > t/DBCUTG.t 29 5 17.24% 26 30-32 > > > t/LocusLink.t 23 1 4.35% 23 > > > t/PhysicalMap.t 14 2 14.29% 11-12 > > > t/TaxonTree.t 17 30 176.47% 11 18-42 > > > t/alignUtilities.t 9 1 11.11% 9 > > > t/psm.t 255 65280 48 35 72.92% 29 32-48 > > > t/tutorial.t 21 15 71.43% 7-21 > > > 114 subtests skipped. > > > Failed 8/232 test scripts, 96.55% okay. 18/11057 subtests failed, > > > 99.84% okay. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From fernan at iib.unsam.edu.ar Thu Jun 8 13:02:27 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Thu, 8 Jun 2006 14:02:27 -0300 Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file? In-Reply-To: <4487BBBD.6060702@infotech.monash.edu.au> References: <4487BBBD.6060702@infotech.monash.edu.au> Message-ID: <20060608170227.GF3334@iib.unsam.edu.ar> +----[ Torsten Seemann (08.Jun.2006 13:47): | | Hi all, | | I've just been further auditing the Bioperl code and noticed that | Bio::SeqIO::lasergene didn't even compile (now fixed) but I still | can't locate an example/sample sequence file in "Lasergene" format. | | From the code it looks similar to 'raw' format but has "^^" as | a separator character. | | Can anyone provide a real-life example so I can augment the | t/lasergene.t tests? | +----] See the attached file. The format seems to be plain text, beginning with a free text description that goes from the beginning of the file until the "^^" delimiter, and after that the sequence. Fernan -------------- next part -------------- Created: Jueves, 08 de Junio de 2006 01:56 p.m. This is a test sequence created with EditSeq (Lasergene's DNAStar) ^^ ATCGATCGATCG From freimuth at pathology.wustl.edu Thu Jun 8 13:12:36 2006 From: freimuth at pathology.wustl.edu (Freimuth, Robert) Date: Thu, 8 Jun 2006 12:12:36 -0500 Subject: [Bioperl-l] undef query_len error with Bio::Search::Hit::GenericHit::num_unaligned_query Message-ID: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu> Hi, I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set of hits from blast, then get some information about the tiled result. I thought I'd use the num_unaligned_query and num_unaligned_hit methods to get the number of unaligned bases in the tiled result, then subtract that from the length of the query/subject sequence to get the number of aligned bases in the region spanned by the hit(s). My code is below, followed by the error message. while( my $result_obj = $blast_obj->next_result() ) { while( my $hit_obj = $result_obj->next_hit() ) { my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name => $hit_obj->name() ); $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap > this number of bp while( my $hsp_obj = $hit_obj->next_hsp() ) { # add all HSPs to a GenericHit object so they can be tiled together $generic_hit_obj->add_hsp( $hsp_obj ); } my $num_unaligned_query = $generic_hit_obj->num_unaligned_query(); my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit(); ------------- EXCEPTION ------------- MSG: Must have defined query_len STACK Bio::Search::Hit::GenericHit::logical_length /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698 STACK Bio::Search::Hit::GenericHit::num_unaligned_query /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264 STACK main::process_blast_hit blast_needle_timetrials_1.pl:245 STACK toplevel blast_needle_timetrials_1.pl:94 -------------------------------------- I looked through the docs to try to find an explanation or some mention of how to set query_len, but I didn't find anything. Could someone please point out what I'm doing wrong? Additionally, if I'm making this harder than it needs to be, please give me a gentle whack with the clue stick. Thanks, Bob From osborne1 at optonline.net Thu Jun 8 15:42:07 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 08 Jun 2006 15:42:07 -0400 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "returnundef" In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine> Message-ID: Chris, Odd. protgraph.t passes all of its tests on my computer. Do you have the Clone module installed? Brian O. On 6/8/06 3:28 PM, "Chris Fields" wrote: > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33, > 36-37, 45, 48-56, 59-60, 65-66 > Failed 22/66 tests, 66.67% okay From osborne1 at optonline.net Thu Jun 8 15:42:07 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Thu, 08 Jun 2006 15:42:07 -0400 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "returnundef" In-Reply-To: <000001c68b31$b5320390$15327e82@pyrimidine> Message-ID: Chris, Odd. protgraph.t passes all of its tests on my computer. Do you have the Clone module installed? Brian O. On 6/8/06 3:28 PM, "Chris Fields" wrote: > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33, > 36-37, 45, 48-56, 59-60, 65-66 > Failed 22/66 tests, 66.67% okay From jason at bioperl.org Thu Jun 8 16:15:47 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 8 Jun 2006 16:15:47 -0400 Subject: [Bioperl-l] undef query_len error with Bio::Search::Hit::GenericHit::num_unaligned_query In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu> References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu> Message-ID: <84AC010A-25E6-48C7-A723-CE4688ECA926@bioperl.org> why are you trying to create new Hit objects? $hit_obj is-A GenericHit object... -jason On Jun 8, 2006, at 1:12 PM, Freimuth, Robert wrote: > Hi, > > I'm trying to use the Bio::Search::Hit::GenericHit class to tile a set > of hits from blast, then get some information about the tiled > result. I > thought I'd use the num_unaligned_query and num_unaligned_hit > methods to > get the number of unaligned bases in the tiled result, then subtract > that from the length of the query/subject sequence to get the > number of > aligned bases in the region spanned by the hit(s). My code is below, > followed by the error message. > > > while( my $result_obj = $blast_obj->next_result() ) > { > while( my $hit_obj = $result_obj->next_hit() ) > { > my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name > => $hit_obj->name() ); > $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap > > this number of bp > > while( my $hsp_obj = $hit_obj->next_hsp() ) > { > # add all HSPs to a GenericHit object so they can be tiled > together > $generic_hit_obj->add_hsp( $hsp_obj ); > } > > my $num_unaligned_query = > $generic_hit_obj->num_unaligned_query(); > my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit(); > > > > ------------- EXCEPTION ------------- > MSG: Must have defined query_len > STACK Bio::Search::Hit::GenericHit::logical_length > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:698 > STACK Bio::Search::Hit::GenericHit::num_unaligned_query > /usr/lib/perl5/site_perl/5.8.0/Bio/Search/Hit/GenericHit.pm:1264 > STACK main::process_blast_hit blast_needle_timetrials_1.pl:245 > STACK toplevel blast_needle_timetrials_1.pl:94 > > -------------------------------------- > > > I looked through the docs to try to find an explanation or some > mention > of how to set query_len, but I didn't find anything. Could someone > please point out what I'm doing wrong? Additionally, if I'm making > this > harder than it needs to be, please give me a gentle whack with the > clue > stick. > > Thanks, > Bob > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From torsten.seemann at infotech.monash.edu.au Thu Jun 8 18:36:00 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Fri, 09 Jun 2006 08:36:00 +1000 Subject: [Bioperl-l] Anyone have a "Lasergene" example sequence file? In-Reply-To: <20060608170227.GF3334@iib.unsam.edu.ar> References: <4487BBBD.6060702@infotech.monash.edu.au> <20060608170227.GF3334@iib.unsam.edu.ar> Message-ID: <4488A650.2050803@infotech.monash.edu.au> > I've just been further auditing the Bioperl code and noticed that > Bio::SeqIO::lasergene didn't even compile (now fixed) but I still > can't locate an example/sample sequence file in "Lasergene" format. Thanks to Fernan, Todd and Senthil who sent me example Lasergene files. Those will be enough examples to write some tests. --Torsten From kellert at ohsu.edu Thu Jun 8 20:29:10 2006 From: kellert at ohsu.edu (Thomas J Keller) Date: Thu, 8 Jun 2006 17:29:10 -0700 Subject: [Bioperl-l] fink and updating bioperl In-Reply-To: <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu> References: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu> <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu> Message-ID: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu> Greetings, Is fink still a reasonable way to install and maintain bioperl? (There's been some emails about instability.) How 'bout upgrades: the way I have fink installed it's path is first when perl reads @INC. So if I put a newer Bio::something in /usr/local/whereever it won't be seen if an older module is in the fink path. Can I upgrade in the fink "space" without messing up fink's database? Other options? Thanks, Tom K Tom Keller, Ph.D. kellert at ohsu.edu 503-494-2442 6339b Basic Science Bldg http://www.ohsu.edu/research/core From hlapp at gmx.net Thu Jun 8 21:19:28 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 8 Jun 2006 21:19:28 -0400 Subject: [Bioperl-l] fink and updating bioperl In-Reply-To: <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu> References: <17EBEDED-DDBD-49C9-BC56-D127D8187153@uiuc.edu> <8FAD18C6-73EE-4F9A-8AB7-C4392DB44A17@ohsu.edu> <1B4EB7C6-1412-466D-9ACA-FAAD52530F41@ohsu.edu> Message-ID: <060FC8CE-FD89-436E-B79C-135BB4F324CD@gmx.net> Why don't you remove the fink bioperl package if you want to install a newer version locally? BTW unless you use a custom-compiled perl your packages will end up in /Library/Perl/5.8.6/ (or /System/Library/Perl/5.8.6/), not /usr/ local, when you issue 'make install'. -hilmar On Jun 8, 2006, at 8:29 PM, Thomas J Keller wrote: > Greetings, > Is fink still a reasonable way to install and maintain bioperl? > (There's been some emails about instability.) How 'bout upgrades: the > way I have fink installed it's path is first when perl reads @INC. So > if I put a newer Bio::something in /usr/local/whereever it won't be > seen if an older module is in the fink path. Can I upgrade in the > fink "space" without messing up fink's database? Other options? > > Thanks, > Tom K > > > Tom Keller, Ph.D. > kellert at ohsu.edu > 503-494-2442 > 6339b Basic Science Bldg > http://www.ohsu.edu/research/core > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Thu Jun 8 22:30:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 8 Jun 2006 21:30:20 -0500 Subject: [Bioperl-l] For CVS developers-potentialpitfall with"returnundef" In-Reply-To: Message-ID: <000c01c68b6c$a8184710$15327e82@pyrimidine> Yes; using ActiveState's PPM: ppm> query CLone Querying target 1 (ActivePerl 5.8.7.815) 1. Clone [0.20] recursively copy Perl datatypes ppm> v. 0.20 is the latest in CPAN. I can try some additional tests with the relevant modules to see what the problem is. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Thursday, June 08, 2006 2:42 PM > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org > Cc: Paul.Boutros at utoronto.ca; bioperl-l > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall > with"returnundef" > > Chris, > > Odd. protgraph.t passes all of its tests on my computer. Do you have the > Clone module installed? > > Brian O. > > > On 6/8/06 3:28 PM, "Chris Fields" wrote: > > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, 33, > > 36-37, 45, 48-56, 59-60, 65-66 > > Failed 22/66 tests, 66.67% okay > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From heikki at sanbi.ac.za Fri Jun 9 03:35:12 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 9 Jun 2006 09:35:12 +0200 Subject: [Bioperl-l] Truncate sequence with features In-Reply-To: <448850CE.1040105@colibase.bham.ac.uk> References: <448850CE.1040105@colibase.bham.ac.uk> Message-ID: <200606090935.12758.heikki@sanbi.ac.za> Roy, The definitive document describing the locations is the feature table definition: http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html#3.5 but you probably know that already. Two questions come to mind: 1. Can you parse your joint location using bioperl without errors? 2. Is there a practical advantage in including a location which has no relevance to the sequence in hand? I notice that the /partial qualifier is deprecated and the docs suggest using signs to indicate that the sequence is partial, so I guess what you are doing is correct. -Heikki On Thursday 08 June 2006 18:31, Roy Chaudhuri wrote: > Hi all. > > I've been playing around with a subroutine to truncate a sequence and > adjust the coordinates of any features that overlap the specified > region- something that according to the comments in > Bio::Location::Simple has been abortively worked on in the past. > > I've submitted the subroutine as an enhancement in Bugzilla. It's a bit > hacky but works for what I needed it for. However I'm a bit unsure on > the best way to deal with split locations where one of the sublocations > is entirely outside the truncated region. My current method results in > locations like: > join(1..500, >1000..>1000) > > which is quite ugly and possibly invalid, but kind of makes sense. Does > anyone know what would be the correct behaviour for this situation? > > Roy. > -- > Dr. Roy Chaudhuri > Bioinformatics Research Fellow > Division of Immunity and Infection > University of Birmingham, U.K. > > http://xbase.bham.ac.uk > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From heikki at sanbi.ac.za Fri Jun 9 04:06:30 2006 From: heikki at sanbi.ac.za (Heikki Lehvaslaiho) Date: Fri, 9 Jun 2006 10:06:30 +0200 Subject: [Bioperl-l] For CVS developers-potentialpitfall with"returnundef" In-Reply-To: <000c01c68b6c$a8184710$15327e82@pyrimidine> References: <000c01c68b6c$a8184710$15327e82@pyrimidine> Message-ID: <200606091006.30893.heikki@sanbi.ac.za> I am using: This is perl, v5.8.7 built for i486-linux-gnu-thread-multi and I have Clone installed, but more than half the tests fail. Something is badly wrong. -Heikki bala ~/src/bioperl/core> perl -w t/protgraph.t 1..66 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 not ok 10 # Failed test 10 in t/protgraph.t at line 85 not ok 11 # Test 11 got: '5' (t/protgraph.t at line 86) # Expected: '13' not ok 12 # Failed test 12 in t/protgraph.t at line 94 not ok 13 # Test 13 got: '5' (t/protgraph.t at line 95) # Expected: '13' ok 14 ok 15 ok 16 ok 17 ok 18 ok 19 not ok 20 # Test 20 got: '0.013' (t/protgraph.t at line 113) # Expected: '0.027' .not ok 21 # Test 21 got: '1' (t/protgraph.t at line 114) # Expected: '' ..ok 22 .ok 23 ok 24 ..ok 25 .not ok 26 # Test 26 got: '1' (t/protgraph.t at line 122) # Expected: '5' ok 27 ok 28 ok 29 ok 30 ok 31 ok 32 not ok 33 # Test 33 got: '139' (t/protgraph.t at line 150) # Expected: '71' ok 34 ok 35 not ok 36 # Test 36 got: '126' (t/protgraph.t at line 158) # Expected: '58' .not ok 37 # Test 37 got: '1' (t/protgraph.t at line 163) # Expected: '15' ok 38 ok 39 ok 40 ok 41 ok 42 ok 43 ok 44 not ok 45 # Failed test 45 in t/protgraph.t at line 187 ok 46 ok 47 not ok 48 # Test 48 got: '75' (t/protgraph.t at line 212) # Expected: '72' not ok 49 # Test 49 got: '343' (t/protgraph.t at line 228) # Expected: '72' not ok 50 # Test 50 got: '368' (t/protgraph.t at line 229) # Expected: '74' not ok 51 # Test 51 got: '344' (t/protgraph.t at line 233) # Expected: '73' not ok 52 # Test 52 got: '368' (t/protgraph.t at line 234) # Expected: '74' not ok 53 # Test 53 got: '432' (t/protgraph.t at line 248) # Expected: '72' not ok 54 # Test 54 got: '461' (t/protgraph.t at line 249) # Expected: '74' not ok 55 # Test 55 got: '434' (t/protgraph.t at line 253) # Expected: '74' not ok 56 # Test 56 got: '463' (t/protgraph.t at line 254) # Expected: '76' ok 57 ok 58 not ok 59 # Test 59 got: '437' (t/protgraph.t at line 263) # Expected: '3' not ok 60 # Test 60 got: '467' (t/protgraph.t at line 264) # Expected: '4' ok 61 ok 62 ok 63 ok 64 not ok 65 # Test 65 got: '440' (t/protgraph.t at line 275) # Expected: '3' not ok 66 # Test 66 got: '472' (t/protgraph.t at line 276) # Expected: '5' On Friday 09 June 2006 04:30, Chris Fields wrote: > Yes; using ActiveState's PPM: > > ppm> query CLone > Querying target 1 (ActivePerl 5.8.7.815) > 1. Clone [0.20] recursively copy Perl datatypes > ppm> > > v. 0.20 is the latest in CPAN. > > I can try some additional tests with the relevant modules to see what the > problem is. > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > > Sent: Thursday, June 08, 2006 2:42 PM > > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org > > Cc: Paul.Boutros at utoronto.ca; bioperl-l > > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall > > with"returnundef" > > > > Chris, > > > > Odd. protgraph.t passes all of its tests on my computer. Do you have the > > Clone module installed? > > > > Brian O. > > > > On 6/8/06 3:28 PM, "Chris Fields" wrote: > > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, > > > 33, 36-37, 45, 48-56, 59-60, 65-66 > > > Failed 22/66 tests, 66.67% okay > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ______ _/ _/_____________________________________________________ _/ _/ _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho _/ _/ _/ SANBI, South African National Bioinformatics Institute _/ _/ _/ University of Western Cape, South Africa _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 ___ _/_/_/_/_/________________________________________________________ From sb at mrc-dunn.cam.ac.uk Fri Jun 9 04:08:18 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Fri, 09 Jun 2006 09:08:18 +0100 Subject: [Bioperl-l] undef query_len error with Bio::Search::Hit::GenericHit::num_unaligned_query In-Reply-To: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu> References: <71AE766382153B47AAB638DC83ED7F49010957D9@pathexch1.wusm-path.wustl.edu> Message-ID: <44892C72.2040605@mrc-dunn.cam.ac.uk> Freimuth, Robert wrote: > Hi, > > I'm trying to use the Bio::Search::Hit::GenericHit [snip] > while( my $result_obj = $blast_obj->next_result() ) > { > while( my $hit_obj = $result_obj->next_hit() ) > { > my $generic_hit_obj = Bio::Search::Hit::GenericHit->new( -name > => $hit_obj->name() ); > $generic_hit_obj->overlap( 0 ); # tile any hsps that overlap > > this number of bp > > while( my $hsp_obj = $hit_obj->next_hsp() ) > { > # add all HSPs to a GenericHit object so they can be tiled > together > $generic_hit_obj->add_hsp( $hsp_obj ); > } > > my $num_unaligned_query = > $generic_hit_obj->num_unaligned_query(); > my $num_unaligned_hit = $generic_hit_obj->num_unaligned_hit(); > > ------------- EXCEPTION ------------- > MSG: Must have defined query_len > STACK Bio::Search::Hit::GenericHit::logical_length [snip] > I looked through the docs to try to find an explanation or some mention > of how to set query_len, but I didn't find anything. As Jason asked, why are you essentially recreating the hit object? The problem you are seeing is that the query length is normally set via SearchIO stream via ResultI when it internally creates a new hit object. When you created your own hit object you didn't supply -query_len as an option to new(), nor did you later use the query_length() method to set it. If you really do need your $generic_hit_obj (instead of just using $hit_obj), do $generic_hit_obj->query_length($hit_obj->query_length); (Or if you know the length of your query sequence, supply that directly.) From zhangchnxp at gmail.com Fri Jun 9 05:05:36 2006 From: zhangchnxp at gmail.com (Zhang chnxp) Date: Fri, 9 Jun 2006 17:05:36 +0800 Subject: [Bioperl-l] Are there any modules handling the HLA Typing (Sequence Based Typing) ? Message-ID: <4d1768a60606090205m6e360413paf172fa4e731ef2e@mail.gmail.com> Hi there, I have some .abi trace files from an ABI3100 Genetic Analyzer. Are there any packages handling the typing work of HLA-A, -B, -C, -DRB1, etc.? Or are there any free softwares solving the ambiguity through the SBT? From cain at cshl.edu Wed Jun 7 19:02:43 2006 From: cain at cshl.edu (Scott Cain) Date: Wed, 07 Jun 2006 19:02:43 -0400 Subject: [Bioperl-l] For CVS developers-potentialpitfall with "return undef" In-Reply-To: <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> References: <1149699781.448706c5e803d@webmail.utoronto.ca> <3E934635-8C60-41C8-B92A-E5B976250C94@wustl.edu> Message-ID: <1149721363.12513.96.camel@localhost.localdomain> On Wed, 2006-06-07 at 17:26 -0500, David Messina wrote: > > > Failures in 5/12 version of bioperl-live but NOT in today's version > =================================================================== > - OntologyStore.t - > Neither OntologyStore.t or Bio/Ontology/OntologyStore.pm have been > touched between 5/12 and today. > > The error looks like a transient network problem to me, but I'm not > sure: > -------------------- WARNING --------------------- > MSG: [1/5] tried to fetch http://cvs.sourceforge.net/viewcvs.py/ > *checkout*/song/ontology/so.definition?rev=HEAD, but server threw > 500. retrying... > --------------------------------------------------- > [REPEATED 5 times -Dave] > > t/OntologyStore..............FAILED tests 3-6 > Failed 4/6 tests, 33.33% okay > That is a problem with the cvs server at SourceForge (where the Sequence Ontology is hosted). I changed the module that tries to get that file (I don't remember off hand what it was). Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060607/eca6cf35/attachment.bin From oldham at ucla.edu Thu Jun 8 22:07:34 2006 From: oldham at ucla.edu (Michael Oldham) Date: Thu, 8 Jun 2006 19:07:34 -0700 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file Message-ID: Dear all, I am a total Bioperl newbie struggling to accomplish a conceptually simple task. I have a single large fasta file containing about 200,000 probe sequences (from an Affymetrix microarray), each of which looks like this: >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense; TGGCTCCTGCTGAGGTCCCCTTTCC What I would like to do is extract from this file a subset of ~130,800 probes (both the header and the sequence) and output this subset into a new fasta file. These 130,800 probes correspond to 8,175 probe set IDs ("1138_at" is the probe set ID in the header listed above); I have these 8,175 IDs listed in a separate file. I *think* that I managed to create an index of all 200,000 probes in the original fasta file using the following script: #!/usr/bin/perl -w # script 1: create the index use Bio::Index::Fasta; use strict; my $Index_File_Name = shift; my $inx = Bio::Index::Fasta->new( -filename => $Index_File_Name, -write_flag => 1); $inx->make_index(@ARGV); I'm not sure if this is the most sensible approach, and even if it is, I'm not sure what to do next. Any help would be greatly appreciated! Many thanks, Mike O. -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006 From sb at mrc-dunn.cam.ac.uk Fri Jun 9 10:52:59 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Fri, 09 Jun 2006 15:52:59 +0100 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: References: Message-ID: <44898B4B.8080901@mrc-dunn.cam.ac.uk> Michael Oldham wrote: > Dear all, > > I am a total Bioperl newbie struggling to accomplish a conceptually simple > task. I have a single large fasta file containing about 200,000 probe > sequences (from an Affymetrix microarray), each of which looks like this: > >> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; Antisense; > TGGCTCCTGCTGAGGTCCCCTTTCC > > What I would like to do is extract from this file a subset of ~130,800 > probes [snip] > #!/usr/bin/perl -w > > # script 1: create the index > > use Bio::Index::Fasta; [snip] > I'm not sure if this is the most sensible approach, and even if it is, I'm > not sure what to do next. Any help would be greatly appreciated! I'd say you're on the right lines. Next, you should continue reading the rest of the synopsis and description in the docs for Bio::Index::Fasta. Perhaps it's not clear, but you don't need to say $inx->make_index(@ARGV); if you've already provided -file to new() and are only dealing with one file. You also can't supply -file to new() if you want to change the id_parser (which you do, since you need to tell it how to detect your probe set ID). Having indexed your file you can then output the desired sequences, just like the foreach loop suggested in the synopsis. (You could have that in the same script.) One thing I'm not clear on is why it needs -write_flag => 1. Why can't it index a read-only database? Even when you set -write_flag allowing it to work, it doesn't write anything... From simon.andrews at bbsrc.ac.uk Fri Jun 9 11:01:05 2006 From: simon.andrews at bbsrc.ac.uk (simon andrews (BI)) Date: Fri, 9 Jun 2006 16:01:05 +0100 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file Message-ID: > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Michael Oldham > Sent: 09 June 2006 03:08 > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Output a subset of FASTA data from a > single large file > > Dear all, > > I am a total Bioperl newbie struggling to accomplish a > conceptually simple task. I have a single large fasta file > containing about 200,000 probe sequences (from an Affymetrix > microarray), each of which looks like this: > > >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; > >Antisense; > TGGCTCCTGCTGAGGTCCCCTTTCC Unfortunately that's not Fasta format (which only has a single header line starting with a '>'. I'd imagine that most programs which deal with fasta which read that entry would see it as two sequences, the first of which is empty. > What I would like to do is extract from this file a subset of > ~130,800 probes (both the header and the sequence) and output > this subset into a new fasta file. These 130,800 probes > correspond to 8,175 probe set IDs ("1138_at" is the probe set > ID in the header listed above) If you're only having to do this once then it should be fairly quick to knock up a one off script to do this. Since you've only got 8000ish probeset ids then you can probably just read those into a hash to start with then parse through your big sequence file with something like; #!perl use warnings; use strict; my %probe_ids; # Add real code here to populate your hash $probe_ids{1138_at} = 1; ########################################## open (IN,'your_affy_file.txt') or die "Can't read affy file: $!"; open (OUT,'>','probe_list.txt') or die "Can't write output: $!"; while () { if (/^>probe/) { # This assumes there are always 3 lines per probe entry if (exists $probe_ids{(split(/:/))[2]}) { print OUT; print OUT scalar ; print OUT scalar ; } } } From MEC at stowers-institute.org Fri Jun 9 10:58:22 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 9 Jun 2006 09:58:22 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file Message-ID: I wouldn't bioperl for this, or create an index. Perl would do fine and probably be faster. Assuming your ids are one per line in a file named id.dat looking like this 1138_at 1134_at etc.. this should work: perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat mybigfile.fa good luck --Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Michael Oldham >Sent: Thursday, June 08, 2006 9:08 PM >To: bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Output a subset of FASTA data from a >single large file > >Dear all, > >I am a total Bioperl newbie struggling to accomplish a >conceptually simple >task. I have a single large fasta file containing about 200,000 probe >sequences (from an Affymetrix microarray), each of which looks >like this: > >>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >Antisense; >TGGCTCCTGCTGAGGTCCCCTTTCC > >What I would like to do is extract from this file a subset of ~130,800 >probes (both the header and the sequence) and output this >subset into a new >fasta file. These 130,800 probes correspond to 8,175 probe set IDs >("1138_at" is the probe set ID in the header listed above); I >have these >8,175 IDs listed in a separate file. I *think* that I managed >to create an >index of all 200,000 probes in the original fasta file using >the following >script: > >#!/usr/bin/perl -w > > # script 1: create the index > > use Bio::Index::Fasta; > use strict; > my $Index_File_Name = shift; > my $inx = Bio::Index::Fasta->new( > -filename => $Index_File_Name, > -write_flag => 1); > $inx->make_index(@ARGV); > >I'm not sure if this is the most sensible approach, and even >if it is, I'm >not sure what to do next. Any help would be greatly appreciated! > >Many thanks, >Mike O. > > > > >-- >No virus found in this outgoing message. >Checked by AVG Free Edition. >Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006 > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From senthil at cdfd.org.in Fri Jun 9 18:21:11 2006 From: senthil at cdfd.org.in (M Senthil Kumar) Date: Fri, 9 Jun 2006 15:21:11 -0700 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: Message-ID: On Fri, 9 Jun 2006, simon andrews (BI) wrote: | | |> -----Original Message----- |> From: bioperl-l-bounces at lists.open-bio.org |> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of |> Michael Oldham |> Sent: 09 June 2006 03:08 |> To: bioperl-l at lists.open-bio.org |> Subject: [Bioperl-l] Output a subset of FASTA data from a |> single large file |> |> Dear all, |> |> I am a total Bioperl newbie struggling to accomplish a |> conceptually simple task. I have a single large fasta file |> containing about 200,000 probe sequences (from an Affymetrix |> microarray), each of which looks like this: |> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; |> >Antisense; |> TGGCTCCTGCTGAGGTCCCCTTTCC | |Unfortunately that's not Fasta format (which only has a single header |line starting with a '>'. I'd imagine that most programs which deal |with fasta which read that entry would see it as two sequences, the |first of which is empty. | [snipped] hi, I think the file is in fasta format and probably you might have seen it differently because of your mail transport agent. Senthil From cjfields at uiuc.edu Fri Jun 9 13:59:18 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Jun 2006 12:59:18 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: Message-ID: <002b01c68bee$6e3237e0$15327e82@pyrimidine> No; I saw the same thing here. It's not FASTA in the traditional sense: http://www.bioperl.org/wiki/FASTA_sequence_format though he did get it to build a database successfully. Well, 'success' in the sense that no errors were thrown. I've learned the absence of error messages does not necessarily mean that everything went as planned; it depends on how much error handling has been added to the module by the submitting author. It's possible that the second annotation line was ignored completely. I suppose it's also possible that two sequences are entered into the database, an empty sequence for the first '>' line and the full sequence for the second. It's all dependent on how the parser handles this. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of M Senthil Kumar > Sent: Friday, June 09, 2006 5:21 PM > To: simon andrews (BI) > Cc: bioperl-l at lists.open-bio.org; Michael Oldham > Subject: Re: [Bioperl-l] Output a subset of FASTA data from a single large > file > > > > On Fri, 9 Jun 2006, simon andrews (BI) wrote: > | > | > |> -----Original Message----- > |> From: bioperl-l-bounces at lists.open-bio.org > |> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > |> Michael Oldham > |> Sent: 09 June 2006 03:08 > |> To: bioperl-l at lists.open-bio.org > |> Subject: [Bioperl-l] Output a subset of FASTA data from a > |> single large file > |> > |> Dear all, > |> > |> I am a total Bioperl newbie struggling to accomplish a > |> conceptually simple task. I have a single large fasta file > |> containing about 200,000 probe sequences (from an Affymetrix > |> microarray), each of which looks like this: > |> > |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; > |> >Antisense; > |> TGGCTCCTGCTGAGGTCCCCTTTCC > | > |Unfortunately that's not Fasta format (which only has a single header > |line starting with a '>'. I'd imagine that most programs which deal > |with fasta which read that entry would see it as two sequences, the > |first of which is empty. > | > > [snipped] > > hi, > > I think the file is in fasta format and probably you might have seen it > differently because of your mail transport agent. > > Senthil > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Jun 9 13:59:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Jun 2006 12:59:31 -0500 Subject: [Bioperl-l] For CVS developers-potentialpitfallwith"returnundef" In-Reply-To: <200606091006.30893.heikki@sanbi.ac.za> Message-ID: <002c01c68bee$76219ef0$15327e82@pyrimidine> I ran tests this morning on protgraph.t using bioperl-live, Mac OS X (Intel) running perl 5.8.6 and all tests passed, but I haven't updated from CVS since June 7th. The test results are almost exactly alike; most failed tests are from unexpected results (with exactly the same results for both OS's). A few look more serious: test 45 failed on both and tests 10 and 12 failed on linux (the only noticeable difference between the two) ... ok 10 not ok 11 # Test 11 got: '5' (t\protgraph.t at line 85) # Expected: '13' ok 12 not ok 13 # Test 13 got: '5' (t\protgraph.t at line 94) # Expected: '13' ... The line numbers seem to also be off by one (linux tests seem to have one extra line); not sure if that means anything. Here's the full WinXP protgraph.t results: 1..66 ok 1 ok 2 ok 3 ok 4 ok 5 ok 6 ok 7 ok 8 ok 9 ok 10 not ok 11 # Test 11 got: '5' (t\protgraph.t at line 85) # Expected: '13' ok 12 not ok 13 # Test 13 got: '5' (t\protgraph.t at line 94) # Expected: '13' ok 14 ok 15 ok 16 ok 17 ok 18 ok 19 not ok 20 # Test 20 got: '0.013' (t\protgraph.t at line 112) # Expected: '0.027' .not ok 21 # Test 21 got: '1' (t\protgraph.t at line 113) # Expected: '' ..ok 22 .ok 23 ok 24 ..ok 25 .not ok 26 # Test 26 got: '1' (t\protgraph.t at line 121) # Expected: '5' ok 27 ok 28 ok 29 ok 30 ok 31 ok 32 not ok 33 # Test 33 got: '139' (t\protgraph.t at line 149) # Expected: '71' ok 34 ok 35 not ok 36 # Test 36 got: '126' (t\protgraph.t at line 157) # Expected: '58' .not ok 37 # Test 37 got: '1' (t\protgraph.t at line 162) # Expected: '15' ok 38 ok 39 ok 40 ok 41 ok 42 ok 43 ok 44 not ok 45 # Failed test 45 in t\protgraph.t at line 186 ok 46 ok 47 not ok 48 # Test 48 got: '75' (t\protgraph.t at line 211) # Expected: '72' not ok 49 # Test 49 got: '343' (t\protgraph.t at line 227) # Expected: '72' not ok 50 # Test 50 got: '368' (t\protgraph.t at line 228) # Expected: '74' not ok 51 # Test 51 got: '344' (t\protgraph.t at line 232) # Expected: '73' not ok 52 # Test 52 got: '368' (t\protgraph.t at line 233) # Expected: '74' not ok 53 # Test 53 got: '432' (t\protgraph.t at line 247) # Expected: '72' not ok 54 # Test 54 got: '461' (t\protgraph.t at line 248) # Expected: '74' not ok 55 # Test 55 got: '434' (t\protgraph.t at line 252) # Expected: '74' not ok 56 # Test 56 got: '463' (t\protgraph.t at line 253) # Expected: '76' ok 57 ok 58 not ok 59 # Test 59 got: '437' (t\protgraph.t at line 262) # Expected: '3' not ok 60 # Test 60 got: '467' (t\protgraph.t at line 263) # Expected: '4' ok 61 ok 62 ok 63 ok 64 not ok 65 # Test 65 got: '440' (t\protgraph.t at line 274) # Expected: '3' not ok 66 # Test 66 got: '472' (t\protgraph.t at line 275) # Expected: '5' Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Heikki Lehvaslaiho > Sent: Friday, June 09, 2006 3:07 AM > To: bioperl-l at lists.open-bio.org > Cc: Chris Fields; 'Brian Osborne' > Subject: Re: [Bioperl-l] For CVS developers- > potentialpitfallwith"returnundef" > > I am using: > This is perl, v5.8.7 built for i486-linux-gnu-thread-multi > and I have Clone installed, but more than half the tests fail. > > Something is badly wrong. > > > -Heikki > bala ~/src/bioperl/core> perl -w t/protgraph.t > 1..66 > ok 1 > ok 2 > ok 3 > ok 4 > ok 5 > ok 6 > ok 7 > ok 8 > ok 9 > not ok 10 > # Failed test 10 in t/protgraph.t at line 85 > not ok 11 > # Test 11 got: '5' (t/protgraph.t at line 86) > # Expected: '13' > not ok 12 > # Failed test 12 in t/protgraph.t at line 94 > not ok 13 > # Test 13 got: '5' (t/protgraph.t at line 95) > # Expected: '13' > ok 14 > ok 15 > ok 16 > ok 17 > ok 18 > ok 19 > not ok 20 > # Test 20 got: '0.013' (t/protgraph.t at line 113) > # Expected: '0.027' > .not ok 21 > # Test 21 got: '1' (t/protgraph.t at line 114) > # Expected: '' > ..ok 22 > .ok 23 > ok 24 > ..ok 25 > .not ok 26 > # Test 26 got: '1' (t/protgraph.t at line 122) > # Expected: '5' > ok 27 > ok 28 > ok 29 > ok 30 > ok 31 > ok 32 > not ok 33 > # Test 33 got: '139' (t/protgraph.t at line 150) > # Expected: '71' > ok 34 > ok 35 > not ok 36 > # Test 36 got: '126' (t/protgraph.t at line 158) > # Expected: '58' > .not ok 37 > # Test 37 got: '1' (t/protgraph.t at line 163) > # Expected: '15' > ok 38 > ok 39 > ok 40 > ok 41 > ok 42 > ok 43 > ok 44 > not ok 45 > # Failed test 45 in t/protgraph.t at line 187 > ok 46 > ok 47 > not ok 48 > # Test 48 got: '75' (t/protgraph.t at line 212) > # Expected: '72' > not ok 49 > # Test 49 got: '343' (t/protgraph.t at line 228) > # Expected: '72' > not ok 50 > # Test 50 got: '368' (t/protgraph.t at line 229) > # Expected: '74' > not ok 51 > # Test 51 got: '344' (t/protgraph.t at line 233) > # Expected: '73' > not ok 52 > # Test 52 got: '368' (t/protgraph.t at line 234) > # Expected: '74' > not ok 53 > # Test 53 got: '432' (t/protgraph.t at line 248) > # Expected: '72' > not ok 54 > # Test 54 got: '461' (t/protgraph.t at line 249) > # Expected: '74' > not ok 55 > # Test 55 got: '434' (t/protgraph.t at line 253) > # Expected: '74' > not ok 56 > # Test 56 got: '463' (t/protgraph.t at line 254) > # Expected: '76' > ok 57 > ok 58 > not ok 59 > # Test 59 got: '437' (t/protgraph.t at line 263) > # Expected: '3' > not ok 60 > # Test 60 got: '467' (t/protgraph.t at line 264) > # Expected: '4' > ok 61 > ok 62 > ok 63 > ok 64 > not ok 65 > # Test 65 got: '440' (t/protgraph.t at line 275) > # Expected: '3' > not ok 66 > # Test 66 got: '472' (t/protgraph.t at line 276) > # Expected: '5' > > > On Friday 09 June 2006 04:30, Chris Fields wrote: > > Yes; using ActiveState's PPM: > > > > ppm> query CLone > > Querying target 1 (ActivePerl 5.8.7.815) > > 1. Clone [0.20] recursively copy Perl datatypes > > ppm> > > > > v. 0.20 is the latest in CPAN. > > > > I can try some additional tests with the relevant modules to see what > the > > problem is. > > > > Chris > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > > > Sent: Thursday, June 08, 2006 2:42 PM > > > To: Chris Fields; 'Heikki Lehvaslaiho'; bioperl-l at lists.open-bio.org > > > Cc: Paul.Boutros at utoronto.ca; bioperl-l > > > Subject: Re: [Bioperl-l] For CVS developers-potentialpitfall > > > with"returnundef" > > > > > > Chris, > > > > > > Odd. protgraph.t passes all of its tests on my computer. Do you have > the > > > Clone module installed? > > > > > > Brian O. > > > > > > On 6/8/06 3:28 PM, "Chris Fields" wrote: > > > > t/protgraph..........................FAILED tests 11, 13, 20-21, 26, > > > > 33, 36-37, 45, 48-56, 59-60, 65-66 > > > > Failed 22/66 tests, 66.67% okay > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > ______ _/ _/_____________________________________________________ > _/ _/ > _/ _/ _/ Heikki Lehvaslaiho heikki at_sanbi _ac _za > _/_/_/_/_/ Associate Professor skype: heikki_lehvaslaiho > _/ _/ _/ SANBI, South African National Bioinformatics Institute > _/ _/ _/ University of Western Cape, South Africa > _/ Phone: +27 21 959 2096 FAX: +27 21 959 2512 > ___ _/_/_/_/_/________________________________________________________ > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sdavis2 at mail.nih.gov Fri Jun 9 14:29:53 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Fri, 09 Jun 2006 14:29:53 -0400 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: <002b01c68bee$6e3237e0$15327e82@pyrimidine> Message-ID: On 6/9/06 1:59 PM, "Chris Fields" wrote: > No; I saw the same thing here. It's not FASTA in the traditional sense: > > http://www.bioperl.org/wiki/FASTA_sequence_format > > though he did get it to build a database successfully. Well, 'success' in > the sense that no errors were thrown. I've learned the absence of error > messages does not necessarily mean that everything went as planned; it > depends on how much error handling has been added to the module by the > submitting author. > > It's possible that the second annotation line was ignored completely. I > suppose it's also possible that two sequences are entered into the database, > an empty sequence for the first '>' line and the full sequence for the > second. It's all dependent on how the parser handles this. I think that Senthil was pointing out that even though >Antisense looks to be on its own line, it isn't, but is simply a continutation of the FASTA header. Judging from the context, that is the only interpretation that makes sense. Sean >> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >> |> >Antisense; >> |> TGGCTCCTGCTGAGGTCCCCTTTCC >> | >> |Unfortunately that's not Fasta format (which only has a single header >> |line starting with a '>'. I'd imagine that most programs which deal >> |with fasta which read that entry would see it as two sequences, the >> |first of which is empty. >> | >> >> [snipped] >> >> hi, >> >> I think the file is in fasta format and probably you might have seen it >> differently because of your mail transport agent. From cjfields at uiuc.edu Fri Jun 9 15:05:44 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Jun 2006 14:05:44 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: Message-ID: <002e01c68bf7$b594d210$15327e82@pyrimidine> There's information in the HOWTOs: http://www.bioperl.org/wiki/HOWTO:Flat_databases http://www.bioperl.org/wiki/HOWTO:OBDA Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO ('fasta' format I/O) and this is what I got as output: >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; i.e. an empty sequence, which is what I guessed might happen, though I thought it might pick up the second '>' and the full sequence there. Since the sequence is tossed you'll have to prescreen your sequence input stream by either concatenating the two '>' lines together or screening for the relevant information you want to retain. You can try maybe getting this info into Bio::Seq objects and writing to a Bio::SeqIO stream (to file or file handle). Once you have that set up, the HOWTO tells you how to set up custom or secondary namespaces, so you can use a regex to parse out the information for a primary or secondary keys: http://www.bioperl.org/wiki/HOWTO:Flat_databases#Secondary_or_custom_namespa ces then you could select specific sequences this way (per the HOWTO): $db->secondary_namespaces("GI"); my $acc_seq = $db->get_Seq_by_id("P84139"); my $gi_seq = $db->get_Seq_by_secondary("GI",443893); or for multiple sequences (judging from the POD): my $acc_seqio = $db->get_Stream_by_id(@ids); Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Michael Oldham > Sent: Thursday, June 08, 2006 9:08 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Output a subset of FASTA data from a single large > file > > Dear all, > > I am a total Bioperl newbie struggling to accomplish a conceptually simple > task. I have a single large fasta file containing about 200,000 probe > sequences (from an Affymetrix microarray), each of which looks like this: > > >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; > >Antisense; > TGGCTCCTGCTGAGGTCCCCTTTCC > > What I would like to do is extract from this file a subset of ~130,800 > probes (both the header and the sequence) and output this subset into a > new > fasta file. These 130,800 probes correspond to 8,175 probe set IDs > ("1138_at" is the probe set ID in the header listed above); I have these > 8,175 IDs listed in a separate file. I *think* that I managed to create > an > index of all 200,000 probes in the original fasta file using the following > script: > > #!/usr/bin/perl -w > > # script 1: create the index > > use Bio::Index::Fasta; > use strict; > my $Index_File_Name = shift; > my $inx = Bio::Index::Fasta->new( > -filename => $Index_File_Name, > -write_flag => 1); > $inx->make_index(@ARGV); > > I'm not sure if this is the most sensible approach, and even if it is, I'm > not sure what to do next. Any help would be greatly appreciated! > > Many thanks, > Mike O. > > > > > -- > No virus found in this outgoing message. > Checked by AVG Free Edition. > Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Fri Jun 9 15:49:51 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 9 Jun 2006 14:49:51 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: Message-ID: <002f01c68bfd$e1111e20$15327e82@pyrimidine> > On 6/9/06 1:59 PM, "Chris Fields" wrote: > > > No; I saw the same thing here. It's not FASTA in the traditional sense: > > > > http://www.bioperl.org/wiki/FASTA_sequence_format > > > > though he did get it to build a database successfully. Well, 'success' > in > > the sense that no errors were thrown. I've learned the absence of error > > messages does not necessarily mean that everything went as planned; it > > depends on how much error handling has been added to the module by the > > submitting author. > > > > It's possible that the second annotation line was ignored completely. I > > suppose it's also possible that two sequences are entered into the > database, > > an empty sequence for the first '>' line and the full sequence for the > > second. It's all dependent on how the parser handles this. > > I think that Senthil was pointing out that even though >Antisense looks to > be on its own line, it isn't, but is simply a continutation of the FASTA > header. Judging from the context, that is the only interpretation that > makes sense. > > Sean Sorry. Just checked through another mail client and you're right. That's what I get for trusting Mr. Gates (stupid Outlook). I have seen a few funky FASTA derivations, so I thought that's what was going on here. My bad! My point, though erroneous, was that the fasta format parser may not parse this data correctly if he did have two description lines, but may not indicate there are problems by throwing an exception. I demonstrated that using Bio::SeqIO as an example (you get empty sequences). Bio::Index::Fasta parses the file itself using this loop to index: # Main indexing loop while () { if (/^>/) { # $begin is the position of the first character after the '>' my $begin = tell(FASTA) - length( $_ ) + 1; foreach my $id (&$id_parser($_)) { $self->add_record($id, $i, $begin); } } } Which simply looks for '>'. That's fine for a vast majority of sequences. I thought it would be nice to have something that's a little more strenuous in verifying the format rather than trusting it implicitly, maybe by using an eval{} block to make sure the format is FASTA-like and looks like DNA/RNA/protein. Chris > >> |> >probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; > >> |> >Antisense; > >> |> TGGCTCCTGCTGAGGTCCCCTTTCC > >> | > >> |Unfortunately that's not Fasta format (which only has a single header > >> |line starting with a '>'. I'd imagine that most programs which deal > >> |with fasta which read that entry would see it as two sequences, the > >> |first of which is empty. > >> | > >> > >> [snipped] > >> > >> hi, > >> > >> I think the file is in fasta format and probably you might have seen it > >> differently because of your mail transport agent. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Fri Jun 9 09:23:21 2006 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 9 Jun 2006 15:23:21 +0200 Subject: [Bioperl-l] SimpleAlign Message-ID: <716af09c0606090623v37c72bc5r1ddbcb2b8355a4a0@mail.gmail.com> Hi, Two queries with respect to SimpleAlign. I am using the following code based on the POD. my $in = Bio::AlignIO->newFh(-file => $file, -format => 'fasta'); my $out = Bio::AlignIO->newFh('-format' => 'clustalw'); print $out $_ while <$in>; 1) is it possible to set set_displayname_flat() globally without doing $_->set_displayname_flat() per alignment. 2) My input files have an ID and description line for each seq in the alignment. When the file is converted I loose the description line. I know I can get the description of the sequences (e.g. $aln->get_seq_by_pos(2)->description()). How could I export the complete fasta defline including the description (I realize that general clustal format has a limit on the number of characters, but still). Regards, Bernd From oldham at ucla.edu Fri Jun 9 21:39:45 2006 From: oldham at ucla.edu (Michael Oldham) Date: Fri, 9 Jun 2006 18:39:45 -0700 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: Message-ID: Thanks to everyone for their helpful advice. I think I am getting closer, but no cigar quite yet. The script below runs quickly with no errors--but the output file is empty. It seems that the problem must lie somewhere in the 'while' loop, and I'm sure it's quite obvious to a more experienced eye--but not to mine! Any suggestions? Thanks again for your help. --Mike O. #!/usr/bin/perl -w use strict; my $IDs = 'ID.dat.txt'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta.txt'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; my @ID = ; chomp @ID; my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and all values=1. while () { my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; if ($idmatch){ print OUT; } } exit; -----Original Message----- From: Cook, Malcolm [mailto:MEC at stowers-institute.org] Sent: Friday, June 09, 2006 7:58 AM To: Michael Oldham; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single large file I wouldn't bioperl for this, or create an index. Perl would do fine and probably be faster. Assuming your ids are one per line in a file named id.dat looking like this 1138_at 1134_at etc.. this should work: perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat mybigfile.fa good luck --Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Michael Oldham >Sent: Thursday, June 08, 2006 9:08 PM >To: bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Output a subset of FASTA data from a >single large file > >Dear all, > >I am a total Bioperl newbie struggling to accomplish a >conceptually simple >task. I have a single large fasta file containing about 200,000 probe >sequences (from an Affymetrix microarray), each of which looks >like this: > >>probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >Antisense; >TGGCTCCTGCTGAGGTCCCCTTTCC > >What I would like to do is extract from this file a subset of ~130,800 >probes (both the header and the sequence) and output this >subset into a new >fasta file. These 130,800 probes correspond to 8,175 probe set IDs >("1138_at" is the probe set ID in the header listed above); I >have these >8,175 IDs listed in a separate file. I *think* that I managed >to create an >index of all 200,000 probes in the original fasta file using >the following >script: > >#!/usr/bin/perl -w > > # script 1: create the index > > use Bio::Index::Fasta; > use strict; > my $Index_File_Name = shift; > my $inx = Bio::Index::Fasta->new( > -filename => $Index_File_Name, > -write_flag => 1); > $inx->make_index(@ARGV); > >I'm not sure if this is the most sensible approach, and even >if it is, I'm >not sure what to do next. Any help would be greatly appreciated! > >Many thanks, >Mike O. > > > > >-- >No virus found in this outgoing message. >Checked by AVG Free Edition. >Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006 > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: 6/8/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: 6/9/2006 From cjfields at uiuc.edu Sun Jun 11 00:32:04 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 10 Jun 2006 23:32:04 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: References: Message-ID: What happens if you just print $idmatch or $1 (i.e. check to see if the regex matches anything)? If there is nothing printed then either the regex isn't working as expected or there is something logically wrong. The problem may be that the captured string must match the id exactly, the id being the key to the %ID hash; any extra characters picked up by the regex outside of your id key and you will not get anything. Looking at Malcolm's regex it should work just fine, but we only had one example sequence to try here. If your while loop is set up like this won't it only print only the matched description lines to the outfile (no sequence) even if there is a match? Or is this what you wanted? If you want the sequence you should add 'print OUT ;' after the 'print OUT;' line. Chris On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: > Thanks to everyone for their helpful advice. I think I am getting > closer, > but no cigar quite yet. The script below runs quickly with no > errors--but > the output file is empty. It seems that the problem must lie > somewhere in > the 'while' loop, and I'm sure it's quite obvious to a more > experienced > eye--but not to mine! Any suggestions? Thanks again for your help. > > --Mike O. > > > #!/usr/bin/perl -w > > use strict; > > my $IDs = 'ID.dat.txt'; > > unless (open(IDFILE, $IDs)) { > print "Could not open file $IDs!\n"; > } > > my $probes = 'HG_U95Av2_probe_fasta.txt'; > > unless (open(PROBES, $probes)) { > print "Could not open file $probes!\n"; > } > > open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; > > my @ID = ; > chomp @ID; > my %ID = map {($_, 1)} @ID; #Note: This creates a hash with > keys=PSIDs and > all values=1. > > while () { > my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; > if ($idmatch){ > print OUT; > } > } > exit; > > > -----Original Message----- > From: Cook, Malcolm [mailto:MEC at stowers-institute.org] > Sent: Friday, June 09, 2006 7:58 AM > To: Michael Oldham; bioperl-l at lists.open-bio.org > Subject: RE: [Bioperl-l] Output a subset of FASTA data from a > single large > file > > > > I wouldn't bioperl for this, or create an index. Perl would do > fine and > probably be faster. > > Assuming your ids are one per line in a file named id.dat looking like > this > > 1138_at > 1134_at > etc.. > > this should work: > > perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = > ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = > exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat > mybigfile.fa > > good luck > > --Malcolm Cook > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org >> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >> Michael Oldham >> Sent: Thursday, June 08, 2006 9:08 PM >> To: bioperl-l at lists.open-bio.org >> Subject: [Bioperl-l] Output a subset of FASTA data from a >> single large file >> >> Dear all, >> >> I am a total Bioperl newbie struggling to accomplish a >> conceptually simple >> task. I have a single large fasta file containing about 200,000 >> probe >> sequences (from an Affymetrix microarray), each of which looks >> like this: >> >>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >> Antisense; >> TGGCTCCTGCTGAGGTCCCCTTTCC >> >> What I would like to do is extract from this file a subset of >> ~130,800 >> probes (both the header and the sequence) and output this >> subset into a new >> fasta file. These 130,800 probes correspond to 8,175 probe set IDs >> ("1138_at" is the probe set ID in the header listed above); I >> have these >> 8,175 IDs listed in a separate file. I *think* that I managed >> to create an >> index of all 200,000 probes in the original fasta file using >> the following >> script: >> >> #!/usr/bin/perl -w >> >> # script 1: create the index >> >> use Bio::Index::Fasta; >> use strict; >> my $Index_File_Name = shift; >> my $inx = Bio::Index::Fasta->new( >> -filename => $Index_File_Name, >> -write_flag => 1); >> $inx->make_index(@ARGV); >> >> I'm not sure if this is the most sensible approach, and even >> if it is, I'm >> not sure what to do next. Any help would be greatly appreciated! >> >> Many thanks, >> Mike O. >> >> >> >> >> -- >> No virus found in this outgoing message. >> Checked by AVG Free Edition. >> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >> 6/8/2006 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > -- > No virus found in this incoming message. > Checked by AVG Free Edition. > Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: > 6/8/2006 > > -- > No virus found in this outgoing message. > Checked by AVG Free Edition. > Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: > 6/9/2006 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From sb at mrc-dunn.cam.ac.uk Mon Jun 12 04:21:31 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 12 Jun 2006 09:21:31 +0100 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: <002e01c68bf7$b594d210$15327e82@pyrimidine> References: <002e01c68bf7$b594d210$15327e82@pyrimidine> Message-ID: <448D240B.6040508@mrc-dunn.cam.ac.uk> Chris Fields wrote: > There's information in the HOWTOs: > > http://www.bioperl.org/wiki/HOWTO:Flat_databases > > http://www.bioperl.org/wiki/HOWTO:OBDA > > Just so you know, I tried, out of curiosity, passing this through Bio::SeqIO > ('fasta' format I/O) and this is what I got as output: > >> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; > > > i.e. an empty sequence, which is what I guessed might happen [snip] As you later discovered, that was an Outlook problem. Just to make this thread relevant to bioperl, the bioperl solution is: use Bio::SeqIO; use Bio::Index::Fasta; my $inx = Bio::Index::Fasta->new(-write_flag => 1); $inx->id_parser(\&get_id); $inx->make_index(shift); my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT); my $wanted_ids_file = shift; open(IDS, $wanted_ids_file); while () { chomp; my $seq = $inx->fetch($_); $out->write_seq($seq); } sub get_id { my $line = shift; $line =~ /^>probe:\S+?:(\S+?):/; $1; } It works for me on the sample sequence given by the OP. From sb at mrc-dunn.cam.ac.uk Mon Jun 12 04:49:49 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Mon, 12 Jun 2006 09:49:49 +0100 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: References: Message-ID: <448D2AAD.3030601@mrc-dunn.cam.ac.uk> Michael Oldham wrote: > Thanks to everyone for their helpful advice. I think I am getting closer, > but no cigar quite yet. The script below runs quickly with no errors--but > the output file is empty. It seems that the problem must lie somewhere in > the 'while' loop, and I'm sure it's quite obvious to a more experienced > eye--but not to mine! Any suggestions? Thanks again for your help. > > --Mike O. > > > #!/usr/bin/perl -w > > use strict; > > my $IDs = 'ID.dat.txt'; > > unless (open(IDFILE, $IDs)) { > print "Could not open file $IDs!\n"; > } > > my $probes = 'HG_U95Av2_probe_fasta.txt'; > > unless (open(PROBES, $probes)) { > print "Could not open file $probes!\n"; > } > > open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; > > my @ID = ; > chomp @ID; > my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and > all values=1. > > while () { > my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; > if ($idmatch){ > print OUT; > } > } > exit; Not sure why it would print nothing (are the ids in IDFILE the same case as the ids in the fasta file, do they only contain word characters?), but even if it did you would only be printing out the fasta headers and not the sequences. Doing it the bioperl way gives you more flexibility in the future; you may want to do something with the sequences after printing them out, in which case do it in bioperl using Seq objects and skip the intermediate step of printing them. From MEC at stowers-institute.org Mon Jun 12 11:28:41 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Mon, 12 Jun 2006 10:28:41 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file Message-ID: Michael, I don't think you can call perl's `print` on just a filehandle as you are doing. This is probably your problem. If you call `select OUT` after opeining it, print will print $_ to it. And, every line in the fasta record whose header matches on of the IDS will get printed, not just the fasta header lines. Read the code again nothing that $idmatch is only getting reset when a correctly formatted fasta header line is matched. --Malcolm >-----Original Message----- >From: Chris Fields [mailto:cjfields at uiuc.edu] >Sent: Saturday, June 10, 2006 11:32 PM >To: Michael Oldham >Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >single large file > >What happens if you just print $idmatch or $1 (i.e. check to see if >the regex matches anything)? If there is nothing printed then either >the regex isn't working as expected or there is something logically >wrong. The problem may be that the captured string must match the id >exactly, the id being the key to the %ID hash; any extra characters >picked up by the regex outside of your id key and you will not get >anything. Looking at Malcolm's regex it should work just fine, but >we only had one example sequence to try here. > >If your while loop is set up like this won't it only print only the >matched description lines to the outfile (no sequence) even if there >is a match? Or is this what you wanted? If you want the sequence >you should add 'print OUT ;' after the 'print OUT;' line. > >Chris > >On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: > >> Thanks to everyone for their helpful advice. I think I am getting >> closer, >> but no cigar quite yet. The script below runs quickly with no >> errors--but >> the output file is empty. It seems that the problem must lie >> somewhere in >> the 'while' loop, and I'm sure it's quite obvious to a more >> experienced >> eye--but not to mine! Any suggestions? Thanks again for your help. >> >> --Mike O. >> >> >> #!/usr/bin/perl -w >> >> use strict; >> >> my $IDs = 'ID.dat.txt'; >> >> unless (open(IDFILE, $IDs)) { >> print "Could not open file $IDs!\n"; >> } >> >> my $probes = 'HG_U95Av2_probe_fasta.txt'; >> >> unless (open(PROBES, $probes)) { >> print "Could not open file $probes!\n"; >> } >> >> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; >> >> my @ID = ; >> chomp @ID; >> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with >> keys=PSIDs and >> all values=1. >> >> while () { >> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; >> if ($idmatch){ >> print OUT; >> } >> } >> exit; >> >> >> -----Original Message----- >> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >> Sent: Friday, June 09, 2006 7:58 AM >> To: Michael Oldham; bioperl-l at lists.open-bio.org >> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >> single large >> file >> >> >> >> I wouldn't bioperl for this, or create an index. Perl would do >> fine and >> probably be faster. >> >> Assuming your ids are one per line in a file named id.dat >looking like >> this >> >> 1138_at >> 1134_at >> etc.. >> >> this should work: >> >> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = >> ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = >> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat >> mybigfile.fa >> >> good luck >> >> --Malcolm Cook >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org >>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>> Michael Oldham >>> Sent: Thursday, June 08, 2006 9:08 PM >>> To: bioperl-l at lists.open-bio.org >>> Subject: [Bioperl-l] Output a subset of FASTA data from a >>> single large file >>> >>> Dear all, >>> >>> I am a total Bioperl newbie struggling to accomplish a >>> conceptually simple >>> task. I have a single large fasta file containing about 200,000 >>> probe >>> sequences (from an Affymetrix microarray), each of which looks >>> like this: >>> >>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >>> Antisense; >>> TGGCTCCTGCTGAGGTCCCCTTTCC >>> >>> What I would like to do is extract from this file a subset of >>> ~130,800 >>> probes (both the header and the sequence) and output this >>> subset into a new >>> fasta file. These 130,800 probes correspond to 8,175 probe set IDs >>> ("1138_at" is the probe set ID in the header listed above); I >>> have these >>> 8,175 IDs listed in a separate file. I *think* that I managed >>> to create an >>> index of all 200,000 probes in the original fasta file using >>> the following >>> script: >>> >>> #!/usr/bin/perl -w >>> >>> # script 1: create the index >>> >>> use Bio::Index::Fasta; >>> use strict; >>> my $Index_File_Name = shift; >>> my $inx = Bio::Index::Fasta->new( >>> -filename => $Index_File_Name, >>> -write_flag => 1); >>> $inx->make_index(@ARGV); >>> >>> I'm not sure if this is the most sensible approach, and even >>> if it is, I'm >>> not sure what to do next. Any help would be greatly appreciated! >>> >>> Many thanks, >>> Mike O. >>> >>> >>> >>> >>> -- >>> No virus found in this outgoing message. >>> Checked by AVG Free Edition. >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>> 6/8/2006 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> No virus found in this incoming message. >> Checked by AVG Free Edition. >> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >> 6/8/2006 >> >> -- >> No virus found in this outgoing message. >> Checked by AVG Free Edition. >> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: >> 6/9/2006 >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >Christopher Fields >Postdoctoral Researcher >Lab of Dr. Robert Switzer >Dept of Biochemistry >University of Illinois Urbana-Champaign > > > > From MEC at stowers-institute.org Mon Jun 12 11:47:09 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Mon, 12 Jun 2006 10:47:09 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile Message-ID: ooops, in my message >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Cook, Malcolm >Sent: Monday, June 12, 2006 10:29 AM >To: Chris Fields; Michael Oldham >Cc: bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >single largefile > >Michael, > >I don't think you can call perl's `print` on just a filehandle as you >are doing. This is probably your problem. > >If you call `select OUT` after opeining it, print will print $_ to it. >And, every line in the fasta record whose header matches on of the IDS >will get printed, not just the fasta header lines. Read the code again >nothing that $idmatch is only getting reset when a correctly formatted >fasta header line is matched. > >--Malcolm > > >>-----Original Message----- >>From: Chris Fields [mailto:cjfields at uiuc.edu] >>Sent: Saturday, June 10, 2006 11:32 PM >>To: Michael Oldham >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >>single large file >> >>What happens if you just print $idmatch or $1 (i.e. check to see if >>the regex matches anything)? If there is nothing printed >then either >>the regex isn't working as expected or there is something logically >>wrong. The problem may be that the captured string must >match the id >>exactly, the id being the key to the %ID hash; any extra characters >>picked up by the regex outside of your id key and you will not get >>anything. Looking at Malcolm's regex it should work just fine, but >>we only had one example sequence to try here. >> >>If your while loop is set up like this won't it only print only the >>matched description lines to the outfile (no sequence) even if there >>is a match? Or is this what you wanted? If you want the sequence >>you should add 'print OUT ;' after the 'print OUT;' line. >> >>Chris >> >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: >> >>> Thanks to everyone for their helpful advice. I think I am getting >>> closer, >>> but no cigar quite yet. The script below runs quickly with no >>> errors--but >>> the output file is empty. It seems that the problem must lie >>> somewhere in >>> the 'while' loop, and I'm sure it's quite obvious to a more >>> experienced >>> eye--but not to mine! Any suggestions? Thanks again for your help. >>> >>> --Mike O. >>> >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> >>> my $IDs = 'ID.dat.txt'; >>> >>> unless (open(IDFILE, $IDs)) { >>> print "Could not open file $IDs!\n"; >>> } >>> >>> my $probes = 'HG_U95Av2_probe_fasta.txt'; >>> >>> unless (open(PROBES, $probes)) { >>> print "Could not open file $probes!\n"; >>> } >>> >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; >>> >>> my @ID = ; >>> chomp @ID; >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with >>> keys=PSIDs and >>> all values=1. >>> >>> while () { >>> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; >>> if ($idmatch){ >>> print OUT; >>> } >>> } >>> exit; >>> >>> >>> -----Original Message----- >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >>> Sent: Friday, June 09, 2006 7:58 AM >>> To: Michael Oldham; bioperl-l at lists.open-bio.org >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >>> single large >>> file >>> >>> >>> >>> I wouldn't bioperl for this, or create an index. Perl would do >>> fine and >>> probably be faster. >>> >>> Assuming your ids are one per line in a file named id.dat >>looking like >>> this >>> >>> 1138_at >>> 1134_at >>> etc.. >>> >>> this should work: >>> >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = >>> ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat >>> mybigfile.fa >>> >>> good luck >>> >>> --Malcolm Cook >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>> Michael Oldham >>>> Sent: Thursday, June 08, 2006 9:08 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a >>>> single large file >>>> >>>> Dear all, >>>> >>>> I am a total Bioperl newbie struggling to accomplish a >>>> conceptually simple >>>> task. I have a single large fasta file containing about 200,000 >>>> probe >>>> sequences (from an Affymetrix microarray), each of which looks >>>> like this: >>>> >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >>>> Antisense; >>>> TGGCTCCTGCTGAGGTCCCCTTTCC >>>> >>>> What I would like to do is extract from this file a subset of >>>> ~130,800 >>>> probes (both the header and the sequence) and output this >>>> subset into a new >>>> fasta file. These 130,800 probes correspond to 8,175 probe set IDs >>>> ("1138_at" is the probe set ID in the header listed above); I >>>> have these >>>> 8,175 IDs listed in a separate file. I *think* that I managed >>>> to create an >>>> index of all 200,000 probes in the original fasta file using >>>> the following >>>> script: >>>> >>>> #!/usr/bin/perl -w >>>> >>>> # script 1: create the index >>>> >>>> use Bio::Index::Fasta; >>>> use strict; >>>> my $Index_File_Name = shift; >>>> my $inx = Bio::Index::Fasta->new( >>>> -filename => $Index_File_Name, >>>> -write_flag => 1); >>>> $inx->make_index(@ARGV); >>>> >>>> I'm not sure if this is the most sensible approach, and even >>>> if it is, I'm >>>> not sure what to do next. Any help would be greatly appreciated! >>>> >>>> Many thanks, >>>> Mike O. >>>> >>>> >>>> >>>> >>>> -- >>>> No virus found in this outgoing message. >>>> Checked by AVG Free Edition. >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>>> 6/8/2006 >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> No virus found in this incoming message. >>> Checked by AVG Free Edition. >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>> 6/8/2006 >>> >>> -- >>> No virus found in this outgoing message. >>> Checked by AVG Free Edition. >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: >>> 6/9/2006 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>Christopher Fields >>Postdoctoral Researcher >>Lab of Dr. Robert Switzer >>Dept of Biochemistry >>University of Illinois Urbana-Champaign >> >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From MEC at stowers-institute.org Mon Jun 12 11:48:02 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Mon, 12 Jun 2006 10:48:02 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile Message-ID: oops, s/matches on of/matches one of/ s/nothing that/noting that/ --Malcolm >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Cook, Malcolm >Sent: Monday, June 12, 2006 10:29 AM >To: Chris Fields; Michael Oldham >Cc: bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >single largefile > >Michael, > >I don't think you can call perl's `print` on just a filehandle as you >are doing. This is probably your problem. > >If you call `select OUT` after opeining it, print will print $_ to it. >And, every line in the fasta record whose header matches on of the IDS >will get printed, not just the fasta header lines. Read the code again >nothing that $idmatch is only getting reset when a correctly formatted >fasta header line is matched. > >--Malcolm > > >>-----Original Message----- >>From: Chris Fields [mailto:cjfields at uiuc.edu] >>Sent: Saturday, June 10, 2006 11:32 PM >>To: Michael Oldham >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >>single large file >> >>What happens if you just print $idmatch or $1 (i.e. check to see if >>the regex matches anything)? If there is nothing printed >then either >>the regex isn't working as expected or there is something logically >>wrong. The problem may be that the captured string must >match the id >>exactly, the id being the key to the %ID hash; any extra characters >>picked up by the regex outside of your id key and you will not get >>anything. Looking at Malcolm's regex it should work just fine, but >>we only had one example sequence to try here. >> >>If your while loop is set up like this won't it only print only the >>matched description lines to the outfile (no sequence) even if there >>is a match? Or is this what you wanted? If you want the sequence >>you should add 'print OUT ;' after the 'print OUT;' line. >> >>Chris >> >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: >> >>> Thanks to everyone for their helpful advice. I think I am getting >>> closer, >>> but no cigar quite yet. The script below runs quickly with no >>> errors--but >>> the output file is empty. It seems that the problem must lie >>> somewhere in >>> the 'while' loop, and I'm sure it's quite obvious to a more >>> experienced >>> eye--but not to mine! Any suggestions? Thanks again for your help. >>> >>> --Mike O. >>> >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> >>> my $IDs = 'ID.dat.txt'; >>> >>> unless (open(IDFILE, $IDs)) { >>> print "Could not open file $IDs!\n"; >>> } >>> >>> my $probes = 'HG_U95Av2_probe_fasta.txt'; >>> >>> unless (open(PROBES, $probes)) { >>> print "Could not open file $probes!\n"; >>> } >>> >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; >>> >>> my @ID = ; >>> chomp @ID; >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with >>> keys=PSIDs and >>> all values=1. >>> >>> while () { >>> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; >>> if ($idmatch){ >>> print OUT; >>> } >>> } >>> exit; >>> >>> >>> -----Original Message----- >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >>> Sent: Friday, June 09, 2006 7:58 AM >>> To: Michael Oldham; bioperl-l at lists.open-bio.org >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >>> single large >>> file >>> >>> >>> >>> I wouldn't bioperl for this, or create an index. Perl would do >>> fine and >>> probably be faster. >>> >>> Assuming your ids are one per line in a file named id.dat >>looking like >>> this >>> >>> 1138_at >>> 1134_at >>> etc.. >>> >>> this should work: >>> >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = >>> ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat >>> mybigfile.fa >>> >>> good luck >>> >>> --Malcolm Cook >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>> Michael Oldham >>>> Sent: Thursday, June 08, 2006 9:08 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a >>>> single large file >>>> >>>> Dear all, >>>> >>>> I am a total Bioperl newbie struggling to accomplish a >>>> conceptually simple >>>> task. I have a single large fasta file containing about 200,000 >>>> probe >>>> sequences (from an Affymetrix microarray), each of which looks >>>> like this: >>>> >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >>>> Antisense; >>>> TGGCTCCTGCTGAGGTCCCCTTTCC >>>> >>>> What I would like to do is extract from this file a subset of >>>> ~130,800 >>>> probes (both the header and the sequence) and output this >>>> subset into a new >>>> fasta file. These 130,800 probes correspond to 8,175 probe set IDs >>>> ("1138_at" is the probe set ID in the header listed above); I >>>> have these >>>> 8,175 IDs listed in a separate file. I *think* that I managed >>>> to create an >>>> index of all 200,000 probes in the original fasta file using >>>> the following >>>> script: >>>> >>>> #!/usr/bin/perl -w >>>> >>>> # script 1: create the index >>>> >>>> use Bio::Index::Fasta; >>>> use strict; >>>> my $Index_File_Name = shift; >>>> my $inx = Bio::Index::Fasta->new( >>>> -filename => $Index_File_Name, >>>> -write_flag => 1); >>>> $inx->make_index(@ARGV); >>>> >>>> I'm not sure if this is the most sensible approach, and even >>>> if it is, I'm >>>> not sure what to do next. Any help would be greatly appreciated! >>>> >>>> Many thanks, >>>> Mike O. >>>> >>>> >>>> >>>> >>>> -- >>>> No virus found in this outgoing message. >>>> Checked by AVG Free Edition. >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>>> 6/8/2006 >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> No virus found in this incoming message. >>> Checked by AVG Free Edition. >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>> 6/8/2006 >>> >>> -- >>> No virus found in this outgoing message. >>> Checked by AVG Free Edition. >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: >>> 6/9/2006 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>Christopher Fields >>Postdoctoral Researcher >>Lab of Dr. Robert Switzer >>Dept of Biochemistry >>University of Illinois Urbana-Champaign >> >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hubert.prielinger at gmx.at Mon Jun 12 14:29:19 2006 From: hubert.prielinger at gmx.at (Hubert Prielinger) Date: Mon, 12 Jun 2006 12:29:19 -0600 Subject: [Bioperl-l] How to use gi2taxonid Message-ID: <448DB27F.6090107@gmx.at> hi, I have downloaded the gi2taxonid file to get the taxonid for a GI number taken from a report as recommended here, but I don't know how to use the gi2taxonid file. Jason wrote in a previous post that you have to make a DB_File out of it, but I don't know how....and finally tie it to a hash.... Can anybody give me a hint how to use it..... my final goal is to get the taxonomy. thanks Hubert From cjfields at uiuc.edu Mon Jun 12 15:13:30 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Jun 2006 14:13:30 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile In-Reply-To: Message-ID: <000f01c68e54$4d155ac0$15327e82@pyrimidine> Michael, Malcolm et al, I ran Michael's code (not Malcolm's one-liner), with and w/o adding the file handle line that I suggested. My suggestion works b/c I'm calling the file handle in scalar context, which reads the next line, just like '$foo = ' or 'while() {}' advances to the next line (with $/ = "\n") each time the file handle is called. You could use: $_ = ; print OUT; I just chopped it down to one line. Without the extra line I suggested I get only the description line (I used this as a test file based on the original sequence and Michael's description of the ID): >probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense; >probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense; >probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense; >probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense; >probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense ; Which I don't think Michael wants (he mentioned sequence and description, I think). Modifying the loop in Michael's code to: ... while () { my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; if ($idmatch){ print OUT; print OUT ; # grabs next line and prints } } Gets: >probe:HG_U95Av2:1138_at:395:301;Interrogation_Position=2631;Antisense; AGGCTCCTGCTGAGGTCCCCTTTCC >probe:HG_U95Av2:1267_at:395:301;Interrogation_Position=2631;Antisense; TGGCTCCTGCTGAGGTCCCCTTTCC >probe:HG_U95Av2:1500_at:395:301;Interrogation_Position=2631;Antisense; TGGCTCCTGCTGAGGTCCCCTATCC >probe:HG_U95Av2:2037_at:395:301;Interrogation_Position=2631;Antisense; TGGATCCTGCTGAGGTCCCCTTTCC >probe:HG_U95Av2:3289390128_at:395:301;Interrogation_Position=2631;Antisense ; TGGCTACTGCTGAGGTCCCCTTTCC Which matches the ID's in the ID file (there are 10 sequences in the probes file). I did notice one odd thing; I tried the above code on Mac OS X and it worked fine (i.e. printed only the descriptions and sequences for the ID's in the ID hash). If I used Windows, I needed to use this version: while () { my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; if ($idmatch){ print OUT; print OUT scalar(); } } Or 'print ;' prints all sequences (I guess it assumes list context instead of scalar context when printing, so this forces it to be scalar). Like I said, I haven't tried Malcolm's one-liner. It's possible that it works just as well as what I suggested. I'm just responding to Michael's code request. Chris > -----Original Message----- > From: Cook, Malcolm [mailto:MEC at stowers-institute.org] > Sent: Monday, June 12, 2006 10:48 AM > To: Cook, Malcolm; Chris Fields; Michael Oldham > Cc: bioperl-l at lists.open-bio.org > Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single > largefile > > oops, > > s/matches on of/matches one of/ > s/nothing that/noting that/ > > --Malcolm > > > >-----Original Message----- > >From: bioperl-l-bounces at lists.open-bio.org > >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >Cook, Malcolm > >Sent: Monday, June 12, 2006 10:29 AM > >To: Chris Fields; Michael Oldham > >Cc: bioperl-l at lists.open-bio.org > >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a > >single largefile > > > >Michael, > > > >I don't think you can call perl's `print` on just a filehandle as you > >are doing. This is probably your problem. > > > >If you call `select OUT` after opeining it, print will print $_ to it. > >And, every line in the fasta record whose header matches on of the IDS > >will get printed, not just the fasta header lines. Read the code again > >nothing that $idmatch is only getting reset when a correctly formatted > >fasta header line is matched. > > > >--Malcolm > > > > > >>-----Original Message----- > >>From: Chris Fields [mailto:cjfields at uiuc.edu] > >>Sent: Saturday, June 10, 2006 11:32 PM > >>To: Michael Oldham > >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org > >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a > >>single large file > >> > >>What happens if you just print $idmatch or $1 (i.e. check to see if > >>the regex matches anything)? If there is nothing printed > >then either > >>the regex isn't working as expected or there is something logically > >>wrong. The problem may be that the captured string must > >match the id > >>exactly, the id being the key to the %ID hash; any extra characters > >>picked up by the regex outside of your id key and you will not get > >>anything. Looking at Malcolm's regex it should work just fine, but > >>we only had one example sequence to try here. > >> > >>If your while loop is set up like this won't it only print only the > >>matched description lines to the outfile (no sequence) even if there > >>is a match? Or is this what you wanted? If you want the sequence > >>you should add 'print OUT ;' after the 'print OUT;' line. > >> > >>Chris > >> > >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: > >> > >>> Thanks to everyone for their helpful advice. I think I am getting > >>> closer, > >>> but no cigar quite yet. The script below runs quickly with no > >>> errors--but > >>> the output file is empty. It seems that the problem must lie > >>> somewhere in > >>> the 'while' loop, and I'm sure it's quite obvious to a more > >>> experienced > >>> eye--but not to mine! Any suggestions? Thanks again for your help. > >>> > >>> --Mike O. > >>> > >>> > >>> #!/usr/bin/perl -w > >>> > >>> use strict; > >>> > >>> my $IDs = 'ID.dat.txt'; > >>> > >>> unless (open(IDFILE, $IDs)) { > >>> print "Could not open file $IDs!\n"; > >>> } > >>> > >>> my $probes = 'HG_U95Av2_probe_fasta.txt'; > >>> > >>> unless (open(PROBES, $probes)) { > >>> print "Could not open file $probes!\n"; > >>> } > >>> > >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; > >>> > >>> my @ID = ; > >>> chomp @ID; > >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with > >>> keys=PSIDs and > >>> all values=1. > >>> > >>> while () { > >>> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; > >>> if ($idmatch){ > >>> print OUT; > >>> } > >>> } > >>> exit; > >>> > >>> > >>> -----Original Message----- > >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] > >>> Sent: Friday, June 09, 2006 7:58 AM > >>> To: Michael Oldham; bioperl-l at lists.open-bio.org > >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a > >>> single large > >>> file > >>> > >>> > >>> > >>> I wouldn't bioperl for this, or create an index. Perl would do > >>> fine and > >>> probably be faster. > >>> > >>> Assuming your ids are one per line in a file named id.dat > >>looking like > >>> this > >>> > >>> 1138_at > >>> 1134_at > >>> etc.. > >>> > >>> this should work: > >>> > >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = > >>> ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = > >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat > >>> mybigfile.fa > >>> > >>> good luck > >>> > >>> --Malcolm Cook > >>> > >>>> -----Original Message----- > >>>> From: bioperl-l-bounces at lists.open-bio.org > >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > >>>> Michael Oldham > >>>> Sent: Thursday, June 08, 2006 9:08 PM > >>>> To: bioperl-l at lists.open-bio.org > >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a > >>>> single large file > >>>> > >>>> Dear all, > >>>> > >>>> I am a total Bioperl newbie struggling to accomplish a > >>>> conceptually simple > >>>> task. I have a single large fasta file containing about 200,000 > >>>> probe > >>>> sequences (from an Affymetrix microarray), each of which looks > >>>> like this: > >>>> > >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; > >>>> Antisense; > >>>> TGGCTCCTGCTGAGGTCCCCTTTCC > >>>> > >>>> What I would like to do is extract from this file a subset of > >>>> ~130,800 > >>>> probes (both the header and the sequence) and output this > >>>> subset into a new > >>>> fasta file. These 130,800 probes correspond to 8,175 probe set IDs > >>>> ("1138_at" is the probe set ID in the header listed above); I > >>>> have these > >>>> 8,175 IDs listed in a separate file. I *think* that I managed > >>>> to create an > >>>> index of all 200,000 probes in the original fasta file using > >>>> the following > >>>> script: > >>>> > >>>> #!/usr/bin/perl -w > >>>> > >>>> # script 1: create the index > >>>> > >>>> use Bio::Index::Fasta; > >>>> use strict; > >>>> my $Index_File_Name = shift; > >>>> my $inx = Bio::Index::Fasta->new( > >>>> -filename => $Index_File_Name, > >>>> -write_flag => 1); > >>>> $inx->make_index(@ARGV); > >>>> > >>>> I'm not sure if this is the most sensible approach, and even > >>>> if it is, I'm > >>>> not sure what to do next. Any help would be greatly appreciated! > >>>> > >>>> Many thanks, > >>>> Mike O. > >>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> No virus found in this outgoing message. > >>>> Checked by AVG Free Edition. > >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: > >>>> 6/8/2006 > >>>> > >>>> _______________________________________________ > >>>> Bioperl-l mailing list > >>>> Bioperl-l at lists.open-bio.org > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> -- > >>> No virus found in this incoming message. > >>> Checked by AVG Free Edition. > >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: > >>> 6/8/2006 > >>> > >>> -- > >>> No virus found in this outgoing message. > >>> Checked by AVG Free Edition. > >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: > >>> 6/9/2006 > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >>Christopher Fields > >>Postdoctoral Researcher > >>Lab of Dr. Robert Switzer > >>Dept of Biochemistry > >>University of Illinois Urbana-Champaign > >> > >> > >> > >> > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From hlapp at gmx.net Mon Jun 12 16:06:23 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 12 Jun 2006 16:06:23 -0400 Subject: [Bioperl-l] How to use gi2taxonid In-Reply-To: <448DB27F.6090107@gmx.at> References: <448DB27F.6090107@gmx.at> Message-ID: <878FB829-AD31-457D-957E-210448D7F6F5@gmx.net> Thought about typing $ perldoc DB_File at the command line? Hubert, are you trying to outsource what should be your own work to the bioperl list, or what motivates you to waste everybody's time? If you google 'how to ask good questions' this (indeed frequently cited, also on the bioperl list if you had paid attention) comes up as the first link: http://www.catb.org/~esr/faqs/smart-questions.html There's nothing I can add, except to read it in full before your next posting or you may reach the point fast at which nobody will bother to respond to you and do your homework for you. On Jun 12, 2006, at 2:29 PM, Hubert Prielinger wrote: > hi, > I have downloaded the gi2taxonid file to get the taxonid for a GI > number > taken from a report as recommended here, but I don't know how to > use the > gi2taxonid file. > Jason wrote in a previous post that you have to make a DB_File out of > it, but I don't know how....and finally tie it to a hash.... > Can anybody give me a hint how to use it..... my final goal is to get > the taxonomy. > > thanks > Hubert > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Mon Jun 12 16:35:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Jun 2006 15:35:10 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: <448D240B.6040508@mrc-dunn.cam.ac.uk> Message-ID: <001201c68e5f$b34ec8c0$15327e82@pyrimidine> ... > Chris Fields wrote: > > There's information in the HOWTOs: > > > > http://www.bioperl.org/wiki/HOWTO:Flat_databases > > > > http://www.bioperl.org/wiki/HOWTO:OBDA > > ... > As you later discovered, that was an Outlook problem. Just to make this > thread relevant to bioperl, the bioperl solution is: Agreed (stupid Outlook). It might be much faster to use non-Bioperl-ish ways, but it is easier to further manipulate sequences (convert format, analyze sequences, etc) using Bioperl directly. I haven't used flat databases much but it should move very quickly, even in an OO environment. The one problem with the proposed non-bioperl method is, if you wanted 100,000 sequences (based on ID's) in a FASTA database file containing 200,000 sequences, all ID's would need to be stored (1) in an array (which gulped the data from the ID file) and then map the ID's to (2) a hash; that's may be a pretty big memory footprint depending on your system. Sendu's BioPerl version indexes the FASTA file based on the ID, then (1) reads the ID's in one at a time from the file, (2) retrieves the data, then (3) prints it out. The advantage of this approach is that the built index can be used in other bioperl scripts as well w/o having to rebuild it again, so if you wanted a different set of ID's later on you can access the database using the prebuilt index. More can be found in the Bio::Index::Fasta POD. You can also use the ideas and code in the HOWTO (Flat Databases) I mentioned, which focuses on the Bio::DB::Flat system and ODBA. The advantage of these is that you can use Sleepycat's Berkeley Database through the Perl BerkeleyDB module (more functionality than DB_File) which is faster than a standard flat database. In the HOWTO, specifically look under 'Secondary or custom namespaces' for ideas on how to use your ID as a primary or secondary key. Chris > use Bio::SeqIO; > use Bio::Index::Fasta; > my $inx = Bio::Index::Fasta->new(-write_flag => 1); > $inx->id_parser(\&get_id); > $inx->make_index(shift); > > my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT); > my $wanted_ids_file = shift; > open(IDS, $wanted_ids_file); > while () { > chomp; > my $seq = $inx->fetch($_); > $out->write_seq($seq); > } > > sub get_id { > my $line = shift; > $line =~ /^>probe:\S+?:(\S+?):/; > $1; > } > > It works for me on the sample sequence given by the OP. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Mon Jun 12 16:23:45 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Mon, 12 Jun 2006 16:23:45 -0400 Subject: [Bioperl-l] Test errors in bioperl-run Message-ID: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1> I'm trying to install the bioperl-run package and an getting errors from make test regarding PAML: t/PAML....................ok 2/18Can't call method "get_MLmatrix" on an undefined value at t/PAML.t line 85, line 85. t/PAML....................dubious Test returned status 2 (wstat 512, 0x200) after all the subtests completed successfully Is this a legitimate error or am I missing something? Ryan From MEC at stowers-institute.org Mon Jun 12 17:15:35 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Mon, 12 Jun 2006 16:15:35 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile Message-ID: Yeah, good points... ... my recommendation of the one-liner was motivated based on a small number of IDs and no other applications needing to index the entire fasta database. --Malcolm [At which point he bowed out of this fray] >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields >Sent: Monday, June 12, 2006 3:35 PM >To: 'Sendu Bala'; bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >single largefile > >... >> Chris Fields wrote: >> > There's information in the HOWTOs: >> > >> > http://www.bioperl.org/wiki/HOWTO:Flat_databases >> > >> > http://www.bioperl.org/wiki/HOWTO:OBDA >> > >... >> As you later discovered, that was an Outlook problem. Just >to make this >> thread relevant to bioperl, the bioperl solution is: > >Agreed (stupid Outlook). It might be much faster to use >non-Bioperl-ish >ways, but it is easier to further manipulate sequences (convert format, >analyze sequences, etc) using Bioperl directly. I haven't used flat >databases much but it should move very quickly, even in an OO >environment. > >The one problem with the proposed non-bioperl method is, if you wanted >100,000 sequences (based on ID's) in a FASTA database file containing >200,000 sequences, all ID's would need to be stored (1) in an >array (which >gulped the data from the ID file) and then map the ID's to (2) a hash; >that's may be a pretty big memory footprint depending on your system. > >Sendu's BioPerl version indexes the FASTA file based on the >ID, then (1) >reads the ID's in one at a time from the file, (2) retrieves >the data, then >(3) prints it out. The advantage of this approach is that >the built index >can be used in other bioperl scripts as well w/o having to >rebuild it again, >so if you wanted a different set of ID's later on you can access the >database using the prebuilt index. More can be found in the >Bio::Index::Fasta POD. > >You can also use the ideas and code in the HOWTO (Flat Databases) I >mentioned, which focuses on the Bio::DB::Flat system and ODBA. The >advantage of these is that you can use Sleepycat's Berkeley >Database through >the Perl BerkeleyDB module (more functionality than DB_File) >which is faster >than a standard flat database. In the HOWTO, specifically look under >'Secondary or custom namespaces' for ideas on how to use your ID as a >primary or secondary key. > >Chris > >> use Bio::SeqIO; >> use Bio::Index::Fasta; >> my $inx = Bio::Index::Fasta->new(-write_flag => 1); >> $inx->id_parser(\&get_id); >> $inx->make_index(shift); >> >> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT); >> my $wanted_ids_file = shift; >> open(IDS, $wanted_ids_file); >> while () { >> chomp; >> my $seq = $inx->fetch($_); >> $out->write_seq($seq); >> } >> >> sub get_id { >> my $line = shift; >> $line =~ /^>probe:\S+?:(\S+?):/; >> $1; >> } >> >> It works for me on the sample sequence given by the OP. >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Mon Jun 12 17:20:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 12 Jun 2006 16:20:55 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile In-Reply-To: Message-ID: <001601c68e66$17b760a0$15327e82@pyrimidine> Sorry Malcolm. I didn't want to imply that your way or the bioperl way was best, just point out advantages/disadvantages. Oops, didn't point out the possible Bioperl disadvantage (too many objects generated = slow slow slow). Chris > -----Original Message----- > From: Cook, Malcolm [mailto:MEC at stowers-institute.org] > Sent: Monday, June 12, 2006 4:16 PM > To: Chris Fields; Sendu Bala; bioperl-l at lists.open-bio.org > Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single > largefile > > Yeah, good points... > > ... my recommendation of the one-liner was motivated based on a small > number of IDs and no other applications needing to index the entire > fasta database. > > > --Malcolm [At which point he bowed out of this fray] > > >-----Original Message----- > >From: bioperl-l-bounces at lists.open-bio.org > >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields > >Sent: Monday, June 12, 2006 3:35 PM > >To: 'Sendu Bala'; bioperl-l at lists.open-bio.org > >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a > >single largefile > > > >... > >> Chris Fields wrote: > >> > There's information in the HOWTOs: > >> > > >> > http://www.bioperl.org/wiki/HOWTO:Flat_databases > >> > > >> > http://www.bioperl.org/wiki/HOWTO:OBDA > >> > > >... > >> As you later discovered, that was an Outlook problem. Just > >to make this > >> thread relevant to bioperl, the bioperl solution is: > > > >Agreed (stupid Outlook). It might be much faster to use > >non-Bioperl-ish > >ways, but it is easier to further manipulate sequences (convert format, > >analyze sequences, etc) using Bioperl directly. I haven't used flat > >databases much but it should move very quickly, even in an OO > >environment. > > > >The one problem with the proposed non-bioperl method is, if you wanted > >100,000 sequences (based on ID's) in a FASTA database file containing > >200,000 sequences, all ID's would need to be stored (1) in an > >array (which > >gulped the data from the ID file) and then map the ID's to (2) a hash; > >that's may be a pretty big memory footprint depending on your system. > > > >Sendu's BioPerl version indexes the FASTA file based on the > >ID, then (1) > >reads the ID's in one at a time from the file, (2) retrieves > >the data, then > >(3) prints it out. The advantage of this approach is that > >the built index > >can be used in other bioperl scripts as well w/o having to > >rebuild it again, > >so if you wanted a different set of ID's later on you can access the > >database using the prebuilt index. More can be found in the > >Bio::Index::Fasta POD. > > > >You can also use the ideas and code in the HOWTO (Flat Databases) I > >mentioned, which focuses on the Bio::DB::Flat system and ODBA. The > >advantage of these is that you can use Sleepycat's Berkeley > >Database through > >the Perl BerkeleyDB module (more functionality than DB_File) > >which is faster > >than a standard flat database. In the HOWTO, specifically look under > >'Secondary or custom namespaces' for ideas on how to use your ID as a > >primary or secondary key. > > > >Chris > > > >> use Bio::SeqIO; > >> use Bio::Index::Fasta; > >> my $inx = Bio::Index::Fasta->new(-write_flag => 1); > >> $inx->id_parser(\&get_id); > >> $inx->make_index(shift); > >> > >> my $out = Bio::SeqIO->new(-format => 'Fasta', -fh => \*STDOUT); > >> my $wanted_ids_file = shift; > >> open(IDS, $wanted_ids_file); > >> while () { > >> chomp; > >> my $seq = $inx->fetch($_); > >> $out->write_seq($seq); > >> } > >> > >> sub get_id { > >> my $line = shift; > >> $line =~ /^>probe:\S+?:(\S+?):/; > >> $1; > >> } > >> > >> It works for me on the sample sequence given by the OP. > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From roy at colibase.bham.ac.uk Mon Jun 12 11:46:49 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Mon, 12 Jun 2006 16:46:49 +0100 Subject: [Bioperl-l] Truncate sequence with features In-Reply-To: <200606090935.12758.heikki@sanbi.ac.za> References: <448850CE.1040105@colibase.bham.ac.uk> <200606090935.12758.heikki@sanbi.ac.za> Message-ID: <448D8C69.4030005@colibase.bham.ac.uk> Hi Heikki. > Two questions come to mind: > > 1. Can you parse your joint location using bioperl without errors? Seems to work fine as far as I can tell (no errors, and to_FTstring reproduces the location as expected). > 2. Is there a practical advantage in including a location which has no > relevance to the sequence in hand? I think it would be misleading to imply that a location was complete when it is only a part of the originally annotated feature. From the FT definition the other possibility would be to include the missing parts of the feature as remote locations, I guess that may be more satisfactory. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk From colin.erdman at du.edu Mon Jun 12 15:52:45 2006 From: colin.erdman at du.edu (Colin Erdman) Date: Mon, 12 Jun 2006 13:52:45 -0600 Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA Message-ID: <1150141965.2992.17.camel@localhost.localdomain> Hello all, I am doing a project relating to some forensic analysis of mitochondrial DNA. I would like to write a script that will take a reference sequence, in this case the Anderson sequence which is the standard mitochondrial sequence which sample sequences are compared to, and compare it to an unknown sequence. I have been using this script: use Bio::SearchIO; use strict; my $fh; my @nomatches; open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p blastn |") || die $!; my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh); if( my $result = $parser->next_result ) { if( my $hit = $result->next_hit ) { if( my $hsp = $hit->next_hsp ) { my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch'); my ( @hitbases) = $hsp->hit_string; my ( @querybases) = $hsp->query_string; my $seq_string = join("", at querybases); my $seq_string1 = join("", at hitbases); for my $base ( @qmismatches ) { print "base $base of the hit sequence is a mismatch: "; print substr $seq_string, $base-1, 1; print "->"; print substr $seq_string1, $base-1, 1; print "\n"; } } } } The problem is, that some mitochondrial sequences from individuals have insertions, deletion etc, that cause them to be offset from the reference sequence, this then offsets the numbering system. To provide an example: >Anderson Reference Sequence|HV2 ATTTGGT... 1234567 >Sample|HV2.... ATTTG|C|GT 12345,5.1,67 The |C| denote an insertion, and traditionally in the forensics community this would be called position 5.1G, but the program reads it as position 6. So basically I need to figure out how to modify a perl script in order to recognize that 5.1G is an insertion, and that it is not position 6, position 6 is actually the G to the right of it, followed by position 7-T. Any ideas and suggestions would be greatly helpful, I know this could be very tricky, or very easy - I just have come to the point where the idea flow has stopped and would love to gather some outside input. Thanks Colin Erdman colin.erdman at du.edu Undergraduate Research Associate Institute For Forensic Genetic University of Denver From jason at bioperl.org Tue Jun 13 10:19:04 2006 From: jason at bioperl.org (Jason Stajich) Date: Tue, 13 Jun 2006 10:19:04 -0400 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1> References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1> Message-ID: The latest version of YN00 (3.15) doesn't work with the current code as the output has changed substantially as Yang is now provided several different method's simple Ka and Ks calculations. Downgrade to PAML 3.14 or roll up your sleeves and figure out what is breaking -- which is the regexp in about line 363 that detects when to start parsing for the Pairwise data as well as the function parse_YN_Pairwise.... I just don't have very much time anymore to follow changes to the software packages so I am hopeful that other developers that use our software as do molecular evolutionary studies will get involved to help this effort. I may have to run a few batches of analyses myself later in the week using PAML so I will try and fix this if I can make the time. -jason On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > I'm trying to install the bioperl-run package and an getting errors > from > make test regarding PAML: > > t/PAML....................ok 2/18Can't call method "get_MLmatrix" > on an > undefined value at t/PAML.t line 85, line 85. > t/PAML....................dubious > Test returned status 2 (wstat 512, 0x200) > after all the subtests completed successfully > > Is this a legitimate error or am I missing something? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From jason at bioperl.org Tue Jun 13 11:45:27 2006 From: jason at bioperl.org (Jason Stajich) Date: Tue, 13 Jun 2006 11:45:27 -0400 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: References: <050701c68e5e$1afa0220$e6028a0a@GOLHARMOBILE1> Message-ID: And just to say - codeml 3.15 parsing does work - yn00 parsing just hasn't been updated. I agree that it is bad the test is failing but it is dependent on the version that is installed and we should put some sort of detect version-skip test code in there so it doesn't cause the tests to fail. Just need more hands on deck tracking these sort of things.... -jason On Jun 13, 2006, at 10:19 AM, Jason Stajich wrote: > The latest version of YN00 (3.15) doesn't work with the current code > as the output has changed substantially as Yang is now provided > several different method's simple Ka and Ks calculations. Downgrade > to PAML 3.14 or roll up your sleeves and figure out what is breaking > -- which is the regexp in about line 363 that detects when to start > parsing for the Pairwise data as well as the function > parse_YN_Pairwise.... > > I just don't have very much time anymore to follow changes to the > software packages so I am hopeful that other developers that use our > software as do molecular evolutionary studies will get involved to > help this effort. > > I may have to run a few batches of analyses myself later in the week > using PAML so I will try and fix this if I can make the time. > > -jason > On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > >> I'm trying to install the bioperl-run package and an getting errors >> from >> make test regarding PAML: >> >> t/PAML....................ok 2/18Can't call method "get_MLmatrix" >> on an >> undefined value at t/PAML.t line 85, line 85. >> t/PAML....................dubious >> Test returned status 2 (wstat 512, 0x200) >> after all the subtests completed successfully >> >> Is this a legitimate error or am I missing something? >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From golharam at umdnj.edu Tue Jun 13 12:04:46 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 13 Jun 2006 12:04:46 -0400 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: Message-ID: <001001c68f03$17429070$e6028a0a@GOLHARMOBILE1> I'll take a look at it and see what I can do. While I'm at it, bioperl-run tests a module called Coil, but I don't have that installed. The documentation doesn't specify where I can get this application. Does anyone know where Coil comes from? -----Original Message----- From: Jason Stajich [mailto:jason at bioperl.org] Sent: Tuesday, June 13, 2006 10:19 AM To: golharam at umdnj.edu Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Test errors in bioperl-run The latest version of YN00 (3.15) doesn't work with the current code as the output has changed substantially as Yang is now provided several different method's simple Ka and Ks calculations. Downgrade to PAML 3.14 or roll up your sleeves and figure out what is breaking -- which is the regexp in about line 363 that detects when to start parsing for the Pairwise data as well as the function parse_YN_Pairwise.... I just don't have very much time anymore to follow changes to the software packages so I am hopeful that other developers that use our software as do molecular evolutionary studies will get involved to help this effort. I may have to run a few batches of analyses myself later in the week using PAML so I will try and fix this if I can make the time. -jason On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > I'm trying to install the bioperl-run package and an getting errors > from > make test regarding PAML: > > t/PAML....................ok 2/18Can't call method "get_MLmatrix" > on an > undefined value at t/PAML.t line 85, line 85. > t/PAML....................dubious > Test returned status 2 (wstat 512, 0x200) > after all the subtests completed successfully > > Is this a legitimate error or am I missing something? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Kevin.M.Brown at asu.edu Tue Jun 13 13:42:40 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Tue, 13 Jun 2006 10:42:40 -0700 Subject: [Bioperl-l] Blast or blat against custom db? Message-ID: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu> I've been tasked to write a "small" application for the lab I work at that basically starts with an NCBI file for an organism and goes through a number of steps to distill out the unique protein coding sequences and then designs oligos for the building of the genes. One of the steps is comparing the overlap region of the oligos to all the others designed to try and prevent mismatches in the build that might truncate a gene or splice in another gene into it during the build step. I tried to do this within perl with just a looped string comparison regex, but by my calculations comparing each half of an oligo with all the other oligos for this organism results in well over 8 BILLION comparisons needed. The system was still crunching at it 3 days later with no sign of nearing completion. So, my thought was to utilize something like blastall from within the script to find other oligos of similar match, but it means that I need to dump out the oligos designed, create the db with formatdb. Then do the blast and finally analyze the result file to see what needs to be changed in the oligos to prevent a mismatch redesigning any matches. I'm just trying to figure out how to do it all without leaving the script, but as yet haven't noticed a way to create a db from within perl using bioperl? Any thoughts on directions I should look? From aaron.j.mackey at gsk.com Tue Jun 13 08:19:11 2006 From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com) Date: Tue, 13 Jun 2006 08:19:11 -0400 Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA In-Reply-To: <1150141965.2992.17.camel@localhost.localdomain> Message-ID: See Bio::LocatableSeq -Aaron bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM: > Hello all, > > I am doing a project relating to some forensic analysis of mitochondrial > DNA. > > I would like to write a script that will take a reference sequence, in > this case the Anderson sequence which is the standard mitochondrial > sequence which sample sequences are compared to, and compare it to an > unknown sequence. > > I have been using this script: > > use Bio::SearchIO; > use strict; > my $fh; > my @nomatches; > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p > blastn |") || die $!; > > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh); > > if( my $result = $parser->next_result ) { > if( my $hit = $result->next_hit ) { > if( my $hsp = $hit->next_hsp ) { > my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch'); > my ( @hitbases) = $hsp->hit_string; > my ( @querybases) = $hsp->query_string; > my $seq_string = join("", at querybases); > my $seq_string1 = join("", at hitbases); > for my $base ( @qmismatches ) { > print "base $base of the hit sequence is a mismatch: "; > print substr $seq_string, $base-1, 1; > print "->"; > print substr $seq_string1, $base-1, 1; > print "\n"; > } > > } > } > } > > > The problem is, that some mitochondrial sequences from individuals have > insertions, deletion etc, that cause them to be offset from the > reference sequence, this then offsets the numbering system. > > To provide an example: > > >Anderson Reference Sequence|HV2 > ATTTGGT... > 1234567 > > >Sample|HV2.... > ATTTG|C|GT > 12345,5.1,67 > > The |C| denote an insertion, and traditionally in the forensics community > this would be called position 5.1G, but the program reads it as position 6. > > So basically I need to figure out how to modify a perl script in > order to recognize > that 5.1G is an insertion, and that it is not position 6, position 6 > is actually > the G to the right of it, followed by position 7-T. > > Any ideas and suggestions would be greatly helpful, I know this > could be very tricky, > or very easy - I just have come to the point where the idea flow has > stopped and would > love to gather some outside input. > > Thanks > Colin Erdman > colin.erdman at du.edu > Undergraduate Research Associate > Institute For Forensic Genetic > University of Denver > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From colin.erdman at du.edu Tue Jun 13 11:12:45 2006 From: colin.erdman at du.edu (Colin Erdman) Date: Tue, 13 Jun 2006 09:12:45 -0600 Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA In-Reply-To: References: Message-ID: <1150211566.7034.1.camel@localhost.localdomain> I could see how this will help... but I am not sure how to implement it in my situation, I am not very familiar with the Bio::Range or Bio::Location modules... Thanks very much, Colin E. On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote: > See Bio::LocatableSeq > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM: > > > Hello all, > > > > I am doing a project relating to some forensic analysis of mitochondrial > > DNA. > > > > I would like to write a script that will take a reference sequence, in > > this case the Anderson sequence which is the standard mitochondrial > > sequence which sample sequences are compared to, and compare it to an > > unknown sequence. > > > > I have been using this script: > > > > use Bio::SearchIO; > > use strict; > > my $fh; > > my @nomatches; > > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p > > blastn |") || die $!; > > > > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh); > > > > if( my $result = $parser->next_result ) { > > if( my $hit = $result->next_hit ) { > > if( my $hsp = $hit->next_hsp ) { > > my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch'); > > my ( @hitbases) = $hsp->hit_string; > > my ( @querybases) = $hsp->query_string; > > my $seq_string = join("", at querybases); > > my $seq_string1 = join("", at hitbases); > > for my $base ( @qmismatches ) { > > print "base $base of the hit sequence is a mismatch: "; > > print substr $seq_string, $base-1, 1; > > print "->"; > > print substr $seq_string1, $base-1, 1; > > print "\n"; > > } > > > > } > > } > > } > > > > > > The problem is, that some mitochondrial sequences from individuals have > > insertions, deletion etc, that cause them to be offset from the > > reference sequence, this then offsets the numbering system. > > > > To provide an example: > > > > >Anderson Reference Sequence|HV2 > > ATTTGGT... > > 1234567 > > > > >Sample|HV2.... > > ATTTG|C|GT > > 12345,5.1,67 > > > > The |C| denote an insertion, and traditionally in the forensics > community > > this would be called position 5.1G, but the program reads it as position > 6. > > > > So basically I need to figure out how to modify a perl script in > > order to recognize > > that 5.1G is an insertion, and that it is not position 6, position 6 > > is actually > > the G to the right of it, followed by position 7-T. > > > > Any ideas and suggestions would be greatly helpful, I know this > > could be very tricky, > > or very easy - I just have come to the point where the idea flow has > > stopped and would > > love to gather some outside input. > > > > Thanks > > Colin Erdman > > colin.erdman at du.edu > > Undergraduate Research Associate > > Institute For Forensic Genetic > > University of Denver > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From colin.erdman at du.edu Tue Jun 13 12:05:30 2006 From: colin.erdman at du.edu (Colin Erdman) Date: Tue, 13 Jun 2006 10:05:30 -0600 Subject: [Bioperl-l] Tricky pairwise sequence alignment for mtDNA In-Reply-To: References: Message-ID: <1150214730.12044.2.camel@localhost.localdomain> I actually have found EMBOSS DiffSeq to work quite well for detecting the insertions and SNPs in the "sample sequence" as compared to the "reference sequence". If I get this all figured out and integrated I will post a method, I imagine this would prove useful to others as well. Thanks all, Colin On Tue, 2006-06-13 at 08:19 -0400, aaron.j.mackey at gsk.com wrote: > See Bio::LocatableSeq > > -Aaron > > bioperl-l-bounces at lists.open-bio.org wrote on 06/12/2006 03:52:45 PM: > > > Hello all, > > > > I am doing a project relating to some forensic analysis of mitochondrial > > DNA. > > > > I would like to write a script that will take a reference sequence, in > > this case the Anderson sequence which is the standard mitochondrial > > sequence which sample sequences are compared to, and compare it to an > > unknown sequence. > > > > I have been using this script: > > > > use Bio::SearchIO; > > use strict; > > my $fh; > > my @nomatches; > > open($fh, "bl2seq -i refseqs/andhv2.fa -j refseqs/testhv2.fa -p > > blastn |") || die $!; > > > > my $parser = Bio::SearchIO->new(-format => 'blast',fh => $fh); > > > > if( my $result = $parser->next_result ) { > > if( my $hit = $result->next_hit ) { > > if( my $hsp = $hit->next_hsp ) { > > my ( @qmismatches) = $hsp->seq_inds('query', 'nomatch'); > > my ( @hitbases) = $hsp->hit_string; > > my ( @querybases) = $hsp->query_string; > > my $seq_string = join("", at querybases); > > my $seq_string1 = join("", at hitbases); > > for my $base ( @qmismatches ) { > > print "base $base of the hit sequence is a mismatch: "; > > print substr $seq_string, $base-1, 1; > > print "->"; > > print substr $seq_string1, $base-1, 1; > > print "\n"; > > } > > > > } > > } > > } > > > > > > The problem is, that some mitochondrial sequences from individuals have > > insertions, deletion etc, that cause them to be offset from the > > reference sequence, this then offsets the numbering system. > > > > To provide an example: > > > > >Anderson Reference Sequence|HV2 > > ATTTGGT... > > 1234567 > > > > >Sample|HV2.... > > ATTTG|C|GT > > 12345,5.1,67 > > > > The |C| denote an insertion, and traditionally in the forensics > community > > this would be called position 5.1G, but the program reads it as position > 6. > > > > So basically I need to figure out how to modify a perl script in > > order to recognize > > that 5.1G is an insertion, and that it is not position 6, position 6 > > is actually > > the G to the right of it, followed by position 7-T. > > > > Any ideas and suggestions would be greatly helpful, I know this > > could be very tricky, > > or very easy - I just have come to the point where the idea flow has > > stopped and would > > love to gather some outside input. > > > > Thanks > > Colin Erdman > > colin.erdman at du.edu > > Undergraduate Research Associate > > Institute For Forensic Genetic > > University of Denver > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > From golharam at umdnj.edu Tue Jun 13 14:59:59 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 13 Jun 2006 14:59:59 -0400 Subject: [Bioperl-l] Test errors in bioperl-run Message-ID: <002301c68f1b$917b8c80$e6028a0a@GOLHARMOBILE1> Nevermind - don't check it in yet. There are still some other problems not being picked up by the test suite. I'll work on that and add to the test suite. Jason, I'll send you everything once I have it complete. -----Original Message----- From: Ryan Golhar [mailto:golharam at umdnj.edu] Sent: Tuesday, June 13, 2006 2:34 PM To: 'Jason Stajich' Cc: 'bioperl-l at bioperl.org' Subject: RE: [Bioperl-l] Test errors in bioperl-run It looks like the output contains two new sections at the bottom and the comment sections have been changed slightly. I've modified bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00. I've attached it to this message. It passs all the PAML tests from bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00. Can you (or someone) can check it into CVS? Ryan -----Original Message----- From: Jason Stajich [mailto:jason at bioperl.org] Sent: Tuesday, June 13, 2006 10:19 AM To: golharam at umdnj.edu Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Test errors in bioperl-run The latest version of YN00 (3.15) doesn't work with the current code as the output has changed substantially as Yang is now provided several different method's simple Ka and Ks calculations. Downgrade to PAML 3.14 or roll up your sleeves and figure out what is breaking -- which is the regexp in about line 363 that detects when to start parsing for the Pairwise data as well as the function parse_YN_Pairwise.... I just don't have very much time anymore to follow changes to the software packages so I am hopeful that other developers that use our software as do molecular evolutionary studies will get involved to help this effort. I may have to run a few batches of analyses myself later in the week using PAML so I will try and fix this if I can make the time. -jason On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > I'm trying to install the bioperl-run package and an getting errors > from make test regarding PAML: > > t/PAML....................ok 2/18Can't call method "get_MLmatrix" on > an undefined value at t/PAML.t line 85, line 85. > t/PAML....................dubious > Test returned status 2 (wstat 512, 0x200) > after all the subtests completed successfully > > Is this a legitimate error or am I missing something? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From Jonathan_Epstein at nih.gov Tue Jun 13 14:21:00 2006 From: Jonathan_Epstein at nih.gov (Jonathan_Epstein at nih.gov) Date: Tue, 13 Jun 2006 14:21:00 -0400 Subject: [Bioperl-l] Blast or blat against custom db? Message-ID: <0J0T001LE9O5M6@lswsmta04.nmcc.sprintspectrum.com> sounds like a job for MUMMER (from Steven Salzberg's group). Jonathan Epstein ----------- Sent from my Treo -----Original Message----- From: "Kevin Brown" Subj: [Bioperl-l] Blast or blat against custom db? Date: Tue Jun 13, 2006 2:17 pm Size: 1K To: I've been tasked to write a "small" application for the lab I work at that basically starts with an NCBI file for an organism and goes through a number of steps to distill out the unique protein coding sequences and then designs oligos for the building of the genes. One of the steps is comparing the overlap region of the oligos to all the others designed to try and prevent mismatches in the build that might truncate a gene or splice in another gene into it during the build step. I tried to do this within perl with just a looped string comparison regex, but by my calculations comparing each half of an oligo with all the other oligos for this organism results in well over 8 BILLION comparisons needed. The system was still crunching at it 3 days later with no sign of nearing completion. So, my thought was to utilize something like blastall from within the script to find other oligos of similar match, but it means that I need to dump out the oligos designed, create the db with formatdb. Then do the blast and finally analyze the result file to see what needs to be changed in the oligos to prevent a mismatch redesigning any matches. I'm just trying to figure out how to do it all without leaving the script, but as yet haven't noticed a way to create a db from within perl using bioperl? Any thoughts on directions I should look? _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l --- message truncated --- From golharam at umdnj.edu Tue Jun 13 14:34:00 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 13 Jun 2006 14:34:00 -0400 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: Message-ID: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1> It looks like the output contains two new sections at the bottom and the comment sections have been changed slightly. I've modified bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00. I've attached it to this message. It passs all the PAML tests from bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00. Can you (or someone) can check it into CVS? Ryan -----Original Message----- From: Jason Stajich [mailto:jason at bioperl.org] Sent: Tuesday, June 13, 2006 10:19 AM To: golharam at umdnj.edu Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Test errors in bioperl-run The latest version of YN00 (3.15) doesn't work with the current code as the output has changed substantially as Yang is now provided several different method's simple Ka and Ks calculations. Downgrade to PAML 3.14 or roll up your sleeves and figure out what is breaking -- which is the regexp in about line 363 that detects when to start parsing for the Pairwise data as well as the function parse_YN_Pairwise.... I just don't have very much time anymore to follow changes to the software packages so I am hopeful that other developers that use our software as do molecular evolutionary studies will get involved to help this effort. I may have to run a few batches of analyses myself later in the week using PAML so I will try and fix this if I can make the time. -jason On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > I'm trying to install the bioperl-run package and an getting errors > from > make test regarding PAML: > > t/PAML....................ok 2/18Can't call method "get_MLmatrix" > on an > undefined value at t/PAML.t line 85, line 85. > t/PAML....................dubious > Test returned status 2 (wstat 512, 0x200) > after all the subtests completed successfully > > Is this a legitimate error or am I missing something? > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 -------------- next part -------------- A non-text attachment was scrubbed... Name: PAML.pm Type: application/octet-stream Size: 43262 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060613/566881b4/attachment-0001.obj From cjfields at uiuc.edu Tue Jun 13 21:41:45 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Jun 2006 20:41:45 -0500 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1> Message-ID: <000601c68f53$b1e4b090$15327e82@pyrimidine> I committed it. Passes PAML.t for bioperl-live. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Tuesday, June 13, 2006 1:34 PM > To: 'Jason Stajich' > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Test errors in bioperl-run > > It looks like the output contains two new sections at the bottom and the > comment sections have been changed slightly. I've modified > bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00. > I've attached it to this message. It passs all the PAML tests from > bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00. Can you > (or someone) can check it into CVS? > > Ryan > > -----Original Message----- > From: Jason Stajich [mailto:jason at bioperl.org] > Sent: Tuesday, June 13, 2006 10:19 AM > To: golharam at umdnj.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Test errors in bioperl-run > > > The latest version of YN00 (3.15) doesn't work with the current code > as the output has changed substantially as Yang is now provided > several different method's simple Ka and Ks calculations. Downgrade > to PAML 3.14 or roll up your sleeves and figure out what is breaking > -- which is the regexp in about line 363 that detects when to start > parsing for the Pairwise data as well as the function > parse_YN_Pairwise.... > > I just don't have very much time anymore to follow changes to the > software packages so I am hopeful that other developers that use our > software as do molecular evolutionary studies will get involved to > help this effort. > > I may have to run a few batches of analyses myself later in the week > using PAML so I will try and fix this if I can make the time. > > -jason > On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > > > I'm trying to install the bioperl-run package and an getting errors > > from > > make test regarding PAML: > > > > t/PAML....................ok 2/18Can't call method "get_MLmatrix" > > on an > > undefined value at t/PAML.t line 85, line 85. > > t/PAML....................dubious > > Test returned status 2 (wstat 512, 0x200) > > after all the subtests completed successfully > > > > Is this a legitimate error or am I missing something? > > > > Ryan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From cjfields at uiuc.edu Tue Jun 13 21:42:25 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 13 Jun 2006 20:42:25 -0500 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1> Message-ID: <000701c68f53$c9addcb0$15327e82@pyrimidine> Sorry, Brian beat me to it. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Tuesday, June 13, 2006 1:34 PM > To: 'Jason Stajich' > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Test errors in bioperl-run > > It looks like the output contains two new sections at the bottom and the > comment sections have been changed slightly. I've modified > bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00. > I've attached it to this message. It passs all the PAML tests from > bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00. Can you > (or someone) can check it into CVS? > > Ryan > > -----Original Message----- > From: Jason Stajich [mailto:jason at bioperl.org] > Sent: Tuesday, June 13, 2006 10:19 AM > To: golharam at umdnj.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Test errors in bioperl-run > > > The latest version of YN00 (3.15) doesn't work with the current code > as the output has changed substantially as Yang is now provided > several different method's simple Ka and Ks calculations. Downgrade > to PAML 3.14 or roll up your sleeves and figure out what is breaking > -- which is the regexp in about line 363 that detects when to start > parsing for the Pairwise data as well as the function > parse_YN_Pairwise.... > > I just don't have very much time anymore to follow changes to the > software packages so I am hopeful that other developers that use our > software as do molecular evolutionary studies will get involved to > help this effort. > > I may have to run a few batches of analyses myself later in the week > using PAML so I will try and fix this if I can make the time. > > -jason > On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > > > I'm trying to install the bioperl-run package and an getting errors > > from > > make test regarding PAML: > > > > t/PAML....................ok 2/18Can't call method "get_MLmatrix" > > on an > > undefined value at t/PAML.t line 85, line 85. > > t/PAML....................dubious > > Test returned status 2 (wstat 512, 0x200) > > after all the subtests completed successfully > > > > Is this a legitimate error or am I missing something? > > > > Ryan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 From osborne1 at optonline.net Tue Jun 13 21:38:09 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 13 Jun 2006 21:38:09 -0400 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: <001c01c68f17$f0881df0$e6028a0a@GOLHARMOBILE1> Message-ID: Checked in. On 6/13/06 2:34 PM, "Ryan Golhar" wrote: > It looks like the output contains two new sections at the bottom and the > comment sections have been changed slightly. I've modified > bioperl-live/Bio/Tools/PAML.pm to read the new and old format from YN00. > I've attached it to this message. It passs all the PAML tests from > bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00. Can you > (or someone) can check it into CVS? > > Ryan > > -----Original Message----- > From: Jason Stajich [mailto:jason at bioperl.org] > Sent: Tuesday, June 13, 2006 10:19 AM > To: golharam at umdnj.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Test errors in bioperl-run > > > The latest version of YN00 (3.15) doesn't work with the current code > as the output has changed substantially as Yang is now provided > several different method's simple Ka and Ks calculations. Downgrade > to PAML 3.14 or roll up your sleeves and figure out what is breaking > -- which is the regexp in about line 363 that detects when to start > parsing for the Pairwise data as well as the function > parse_YN_Pairwise.... > > I just don't have very much time anymore to follow changes to the > software packages so I am hopeful that other developers that use our > software as do molecular evolutionary studies will get involved to > help this effort. > > I may have to run a few batches of analyses myself later in the week > using PAML so I will try and fix this if I can make the time. > > -jason > On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > >> I'm trying to install the bioperl-run package and an getting errors >> from >> make test regarding PAML: >> >> t/PAML....................ok 2/18Can't call method "get_MLmatrix" >> on an >> undefined value at t/PAML.t line 85, line 85. >> t/PAML....................dubious >> Test returned status 2 (wstat 512, 0x200) >> after all the subtests completed successfully >> >> Is this a legitimate error or am I missing something? >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Tue Jun 13 21:55:49 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Tue, 13 Jun 2006 21:55:49 -0400 Subject: [Bioperl-l] Test errors in bioperl-run In-Reply-To: <000601c68f53$b1e4b090$15327e82@pyrimidine> Message-ID: <000101c68f55$a9fa8ec0$2f01a8c0@GOLHARMOBILE1> Okay, that's fine. It does pass the bioperl-live tests. When I ran the bp_pairwise_kaks script, it didn't work, the script doesn't work with 3.15. It looks like the current test suite is not exhaustive. When I looked into the code more so, I see that codeml 3.15 generates some files slightly different than 3.14 which needs to be accounted for. I'll work on that and post it here...shouldn't be too long. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields Sent: Tuesday, June 13, 2006 9:42 PM To: golharam at umdnj.edu; 'Jason Stajich' Cc: bioperl-l at bioperl.org Subject: Re: [Bioperl-l] Test errors in bioperl-run I committed it. Passes PAML.t for bioperl-live. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Tuesday, June 13, 2006 1:34 PM > To: 'Jason Stajich' > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Test errors in bioperl-run > > It looks like the output contains two new sections at the bottom and > the comment sections have been changed slightly. I've modified > bioperl-live/Bio/Tools/PAML.pm to read the new and old format from > YN00. I've attached it to this message. It passs all the PAML tests > from bioperl-live and bioperl-run using both 3.14 and 3.15 of YN00. > Can you (or someone) can check it into CVS? > > Ryan > > -----Original Message----- > From: Jason Stajich [mailto:jason at bioperl.org] > Sent: Tuesday, June 13, 2006 10:19 AM > To: golharam at umdnj.edu > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] Test errors in bioperl-run > > > The latest version of YN00 (3.15) doesn't work with the current code > as the output has changed substantially as Yang is now provided > several different method's simple Ka and Ks calculations. Downgrade > to PAML 3.14 or roll up your sleeves and figure out what is breaking > -- which is the regexp in about line 363 that detects when to start > parsing for the Pairwise data as well as the function > parse_YN_Pairwise.... > > I just don't have very much time anymore to follow changes to the > software packages so I am hopeful that other developers that use our > software as do molecular evolutionary studies will get involved to > help this effort. > > I may have to run a few batches of analyses myself later in the week > using PAML so I will try and fix this if I can make the time. > > -jason > On Jun 12, 2006, at 4:23 PM, Ryan Golhar wrote: > > > I'm trying to install the bioperl-run package and an getting errors > > from make test regarding PAML: > > > > t/PAML....................ok 2/18Can't call method "get_MLmatrix" on > > an undefined value at t/PAML.t line 85, line 85. > > t/PAML....................dubious > > Test returned status 2 (wstat 512, 0x200) > > after all the subtests completed successfully > > > > Is this a legitimate error or am I missing something? > > > > Ryan > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From ULNJUJERYDIX at spammotel.com Tue Jun 13 21:10:04 2006 From: ULNJUJERYDIX at spammotel.com (Kevin Lam Koiyau) Date: Wed, 14 Jun 2006 09:10:04 +0800 Subject: [Bioperl-l] SimpleAlign /Bio::AlignIO; POD code doesn't work for me Message-ID: <5b6410e0606131810k495d8f55mc6dc73f0cd5a6df5@mail.gmail.com> > > Hi, > > Two queries with respect to SimpleAlign. I am using the following code > based on the POD. > > my $in = Bio::AlignIO->newFh(-file => $file, -format => 'fasta'); > my $out = Bio::AlignIO->newFh('-format' => 'clustalw'); > print $out $_ while <$in>; > > 1) is it possible to set set_displayname_flat() globally without doing > $_->set_displayname_flat() per alignment. > > 2) My input files have an ID and description line for each seq in the > alignment. When the file is converted I loose the description line. I > know I can get the description of the sequences (e.g. > $aln->get_seq_by_pos(2)->description()). > How could I export the complete fasta defline including the > description (I realize that general clustal format has a limit on the > number of characters, but still). > > Regards, > Bernd > _______________________________________________ > I might be totally wrong here but what I understand about the FASTA format is that the first word (ie no spaces) is the only true name of the seq. So anything other than the first word is discarded. putting underscores for me works. on a sidenote does ur 3rd line work? it doesn't on my 1.5rc1 I had to add the bold line which was missing in the POD doc. dont' think it was the use strict pragma open MYIN,"<$file" or die "Can't open input alignment"; open MYOUT, ">$file2" or die "can't write to output"; my $in = Bio::AlignIO->newFh(-fh => \*MYIN, -format => 'fasta'); my $out = Bio::AlignIO->newFh(-fh => \*MYOUT, -format => 'clustalw'); print $out $_ while <$in>; Cheers kevin From sb at mrc-dunn.cam.ac.uk Wed Jun 14 03:49:10 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Wed, 14 Jun 2006 08:49:10 +0100 Subject: [Bioperl-l] Blast or blat against custom db? In-Reply-To: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu> References: <1A4207F8295607498283FE9E93B775B4018130E4@EX02.asurite.ad.asu.edu> Message-ID: <448FBF76.1090505@mrc-dunn.cam.ac.uk> Kevin Brown wrote: [snip] > So, my thought was to utilize something like blastall from within the > script to find other oligos of similar match, but it means that I need > to dump out the oligos designed, create the db with formatdb. [snip] > I'm just trying to figure out how to do it all without leaving the > script, but as yet haven't noticed a way to create a db from within perl > using bioperl? > > Any thoughts on directions I should look? AFAIK there's no bioperl interface onto formatdb, but the way to do it is make a fasta file (perhaps using bioperl) with all the oligos (what you want to become the db), then use a perl system call (or similar) to run formatdb. Still in the same script you'd then run and analyse the blast with bioperl calls (presumably starting with StandAloneBlast - http://bioperl.org/wiki/HOWTO:Beginners#BLAST if you need it). Just be sure to carefully craft your blast parameters so they're suitable for oligo-sized matches and test the 3' base of hits are identical. From MEC at stowers-institute.org Wed Jun 14 09:47:59 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Wed, 14 Jun 2006 08:47:59 -0500 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile Message-ID: Did you try my one-liner? Anyway, try this 1) predeclare $idmatch before the while loop 2) use ` select OUT` and print with no args to get $_ into it like this: #!/usr/bin/perl -w use strict; my $IDs = 'ID_dat.txt'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta.txt'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; select OUT; my @ID = ; chomp @ID; my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and all values=1. my $idmatch; while () { $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; if ($idmatch){ print ; } } exit; >-----Original Message----- >From: Michael Oldham [mailto:oldham at ucla.edu] >Sent: Tuesday, June 13, 2006 9:03 PM >To: Cook, Malcolm; Chris Fields >Cc: bioperl-l at lists.open-bio.org >Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >single largefile > >Dear Malcolm, Chris, et al, > >Thanks to everyone for your helpful suggestions. When I run the code >below using an ID list ('ID_dat.txt') containing all 8175 IDs, the >output file is still blank. If I replace this list with a single ID >("542_at"), it works: > >>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense; >GCGCAGCAGCGAGAATTTCGACGAG >>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense; >GAATTTCGACGAGCTGCTGAAGGCA >>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense; >CGACGAGCTGCTGAAGGCACTGGGT >........etc. > >If I try a list of two IDs ("542_at" and "31799_at"), only the last one >is present in the output: > >>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; >Antisense; >GTTCATCACAAATCTATTGTGCTTG >>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126; >Antisense; >GTCCACTAAATGTAGTAACGAAATG >>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127; >Antisense; >TCCACTAAATGTAGTAACGAAATGT >........etc. > >The same thing seems to happen if I go to 3 IDs, or 4 IDs >(only the last >ID is present in the output file). At this point I have no idea why >this is happening, and I am not sure how to interpret >Malcolm's comment: > >oops, > >s/matches on of/matches one of/ >s/nothing that/noting that/ > >Any ideas? Thanks again................! > >Mike O. > > >#!/usr/bin/perl -w > >use strict; > >my $IDs = 'ID_dat.txt'; > >unless (open(IDFILE, $IDs)) { > print "Could not open file $IDs!\n"; > } > >my $probes = 'HG_U95Av2_probe_fasta.txt'; > >unless (open(PROBES, $probes)) { > print "Could not open file $probes!\n"; > } > >open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; > >my @ID = ; >chomp @ID; >my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs >and all values=1. > > while () { > my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; > if ($idmatch){ > print OUT; > print OUT scalar(); > } > } >exit; > > >-----Original Message----- >From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >Sent: Monday, June 12, 2006 8:48 AM >To: Cook, Malcolm; Chris Fields; Michael Oldham >Cc: bioperl-l at lists.open-bio.org >Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single >largefile > > >oops, > >s/matches on of/matches one of/ >s/nothing that/noting that/ > >--Malcolm > > >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>Cook, Malcolm >>Sent: Monday, June 12, 2006 10:29 AM >>To: Chris Fields; Michael Oldham >>Cc: bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >>single largefile >> >>Michael, >> >>I don't think you can call perl's `print` on just a filehandle as you >>are doing. This is probably your problem. >> >>If you call `select OUT` after opeining it, print will print $_ to it. >>And, every line in the fasta record whose header matches on of the IDS >>will get printed, not just the fasta header lines. Read the >code again >>nothing that $idmatch is only getting reset when a correctly formatted >>fasta header line is matched. >> >>--Malcolm >> >> >>>-----Original Message----- >>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>Sent: Saturday, June 10, 2006 11:32 PM >>>To: Michael Oldham >>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org >>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >>>single large file >>> >>>What happens if you just print $idmatch or $1 (i.e. check to see if >>>the regex matches anything)? If there is nothing printed >>then either >>>the regex isn't working as expected or there is something logically >>>wrong. The problem may be that the captured string must >>match the id >>>exactly, the id being the key to the %ID hash; any extra characters >>>picked up by the regex outside of your id key and you will not get >>>anything. Looking at Malcolm's regex it should work just fine, but >>>we only had one example sequence to try here. >>> >>>If your while loop is set up like this won't it only print only the >>>matched description lines to the outfile (no sequence) even if there >>>is a match? Or is this what you wanted? If you want the sequence >>>you should add 'print OUT ;' after the 'print OUT;' line. >>> >>>Chris >>> >>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: >>> >>>> Thanks to everyone for their helpful advice. I think I am getting >>>> closer, >>>> but no cigar quite yet. The script below runs quickly with no >>>> errors--but >>>> the output file is empty. It seems that the problem must lie >>>> somewhere in >>>> the 'while' loop, and I'm sure it's quite obvious to a more >>>> experienced >>>> eye--but not to mine! Any suggestions? Thanks again for >your help. >>>> >>>> --Mike O. >>>> >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use strict; >>>> >>>> my $IDs = 'ID.dat.txt'; >>>> >>>> unless (open(IDFILE, $IDs)) { >>>> print "Could not open file $IDs!\n"; >>>> } >>>> >>>> my $probes = 'HG_U95Av2_probe_fasta.txt'; >>>> >>>> unless (open(PROBES, $probes)) { >>>> print "Could not open file $probes!\n"; >>>> } >>>> >>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; >>>> >>>> my @ID = ; >>>> chomp @ID; >>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with >>>> keys=PSIDs and >>>> all values=1. >>>> >>>> while () { >>>> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; >>>> if ($idmatch){ >>>> print OUT; >>>> } >>>> } >>>> exit; >>>> >>>> >>>> -----Original Message----- >>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >>>> Sent: Friday, June 09, 2006 7:58 AM >>>> To: Michael Oldham; bioperl-l at lists.open-bio.org >>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >>>> single large >>>> file >>>> >>>> >>>> >>>> I wouldn't bioperl for this, or create an index. Perl would do >>>> fine and >>>> probably be faster. >>>> >>>> Assuming your ids are one per line in a file named id.dat >>>looking like >>>> this >>>> >>>> 1138_at >>>> 1134_at >>>> etc.. >>>> >>>> this should work: >>>> >>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = >>>> ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = >>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat >>>> mybigfile.fa >>>> >>>> good luck >>>> >>>> --Malcolm Cook >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>> Michael Oldham >>>>> Sent: Thursday, June 08, 2006 9:08 PM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a >>>>> single large file >>>>> >>>>> Dear all, >>>>> >>>>> I am a total Bioperl newbie struggling to accomplish a >>>>> conceptually simple >>>>> task. I have a single large fasta file containing about 200,000 >>>>> probe >>>>> sequences (from an Affymetrix microarray), each of which looks >>>>> like this: >>>>> >>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >>>>> Antisense; >>>>> TGGCTCCTGCTGAGGTCCCCTTTCC >>>>> >>>>> What I would like to do is extract from this file a subset of >>>>> ~130,800 >>>>> probes (both the header and the sequence) and output this >>>>> subset into a new >>>>> fasta file. These 130,800 probes correspond to 8,175 >probe set IDs >>>>> ("1138_at" is the probe set ID in the header listed above); I >>>>> have these >>>>> 8,175 IDs listed in a separate file. I *think* that I managed >>>>> to create an >>>>> index of all 200,000 probes in the original fasta file using >>>>> the following >>>>> script: >>>>> >>>>> #!/usr/bin/perl -w >>>>> >>>>> # script 1: create the index >>>>> >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> my $Index_File_Name = shift; >>>>> my $inx = Bio::Index::Fasta->new( >>>>> -filename => $Index_File_Name, >>>>> -write_flag => 1); >>>>> $inx->make_index(@ARGV); >>>>> >>>>> I'm not sure if this is the most sensible approach, and even >>>>> if it is, I'm >>>>> not sure what to do next. Any help would be greatly appreciated! >>>>> >>>>> Many thanks, >>>>> Mike O. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> No virus found in this outgoing message. >>>>> Checked by AVG Free Edition. >>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>>>> 6/8/2006 >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> -- >>>> No virus found in this incoming message. >>>> Checked by AVG Free Edition. >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>>> 6/8/2006 >>>> >>>> -- >>>> No virus found in this outgoing message. >>>> Checked by AVG Free Edition. >>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: >>>> 6/9/2006 >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>>Christopher Fields >>>Postdoctoral Researcher >>>Lab of Dr. Robert Switzer >>>Dept of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >-- >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: >6/11/2006 > >-- >No virus found in this outgoing message. >Checked by AVG Free Edition. >Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: >6/13/2006 > > From oldham at ucla.edu Tue Jun 13 22:03:04 2006 From: oldham at ucla.edu (Michael Oldham) Date: Tue, 13 Jun 2006 19:03:04 -0700 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile In-Reply-To: Message-ID: Dear Malcolm, Chris, et al, Thanks to everyone for your helpful suggestions. When I run the code below using an ID list ('ID_dat.txt') containing all 8175 IDs, the output file is still blank. If I replace this list with a single ID ("542_at"), it works: >probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense; GCGCAGCAGCGAGAATTTCGACGAG >probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense; GAATTTCGACGAGCTGCTGAAGGCA >probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense; CGACGAGCTGCTGAAGGCACTGGGT ........etc. If I try a list of two IDs ("542_at" and "31799_at"), only the last one is present in the output: >probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; Antisense; GTTCATCACAAATCTATTGTGCTTG >probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126; Antisense; GTCCACTAAATGTAGTAACGAAATG >probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127; Antisense; TCCACTAAATGTAGTAACGAAATGT ........etc. The same thing seems to happen if I go to 3 IDs, or 4 IDs (only the last ID is present in the output file). At this point I have no idea why this is happening, and I am not sure how to interpret Malcolm's comment: oops, s/matches on of/matches one of/ s/nothing that/noting that/ Any ideas? Thanks again................! Mike O. #!/usr/bin/perl -w use strict; my $IDs = 'ID_dat.txt'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta.txt'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; my @ID = ; chomp @ID; my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and all values=1. while () { my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; if ($idmatch){ print OUT; print OUT scalar(); } } exit; -----Original Message----- From: Cook, Malcolm [mailto:MEC at stowers-institute.org] Sent: Monday, June 12, 2006 8:48 AM To: Cook, Malcolm; Chris Fields; Michael Oldham Cc: bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single largefile oops, s/matches on of/matches one of/ s/nothing that/noting that/ --Malcolm >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >Cook, Malcolm >Sent: Monday, June 12, 2006 10:29 AM >To: Chris Fields; Michael Oldham >Cc: bioperl-l at lists.open-bio.org >Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >single largefile > >Michael, > >I don't think you can call perl's `print` on just a filehandle as you >are doing. This is probably your problem. > >If you call `select OUT` after opeining it, print will print $_ to it. >And, every line in the fasta record whose header matches on of the IDS >will get printed, not just the fasta header lines. Read the code again >nothing that $idmatch is only getting reset when a correctly formatted >fasta header line is matched. > >--Malcolm > > >>-----Original Message----- >>From: Chris Fields [mailto:cjfields at uiuc.edu] >>Sent: Saturday, June 10, 2006 11:32 PM >>To: Michael Oldham >>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >>single large file >> >>What happens if you just print $idmatch or $1 (i.e. check to see if >>the regex matches anything)? If there is nothing printed >then either >>the regex isn't working as expected or there is something logically >>wrong. The problem may be that the captured string must >match the id >>exactly, the id being the key to the %ID hash; any extra characters >>picked up by the regex outside of your id key and you will not get >>anything. Looking at Malcolm's regex it should work just fine, but >>we only had one example sequence to try here. >> >>If your while loop is set up like this won't it only print only the >>matched description lines to the outfile (no sequence) even if there >>is a match? Or is this what you wanted? If you want the sequence >>you should add 'print OUT ;' after the 'print OUT;' line. >> >>Chris >> >>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: >> >>> Thanks to everyone for their helpful advice. I think I am getting >>> closer, >>> but no cigar quite yet. The script below runs quickly with no >>> errors--but >>> the output file is empty. It seems that the problem must lie >>> somewhere in >>> the 'while' loop, and I'm sure it's quite obvious to a more >>> experienced >>> eye--but not to mine! Any suggestions? Thanks again for your help. >>> >>> --Mike O. >>> >>> >>> #!/usr/bin/perl -w >>> >>> use strict; >>> >>> my $IDs = 'ID.dat.txt'; >>> >>> unless (open(IDFILE, $IDs)) { >>> print "Could not open file $IDs!\n"; >>> } >>> >>> my $probes = 'HG_U95Av2_probe_fasta.txt'; >>> >>> unless (open(PROBES, $probes)) { >>> print "Could not open file $probes!\n"; >>> } >>> >>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; >>> >>> my @ID = ; >>> chomp @ID; >>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with >>> keys=PSIDs and >>> all values=1. >>> >>> while () { >>> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; >>> if ($idmatch){ >>> print OUT; >>> } >>> } >>> exit; >>> >>> >>> -----Original Message----- >>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >>> Sent: Friday, June 09, 2006 7:58 AM >>> To: Michael Oldham; bioperl-l at lists.open-bio.org >>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >>> single large >>> file >>> >>> >>> >>> I wouldn't bioperl for this, or create an index. Perl would do >>> fine and >>> probably be faster. >>> >>> Assuming your ids are one per line in a file named id.dat >>looking like >>> this >>> >>> 1138_at >>> 1134_at >>> etc.. >>> >>> this should work: >>> >>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = >>> ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = >>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat >>> mybigfile.fa >>> >>> good luck >>> >>> --Malcolm Cook >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org >>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>> Michael Oldham >>>> Sent: Thursday, June 08, 2006 9:08 PM >>>> To: bioperl-l at lists.open-bio.org >>>> Subject: [Bioperl-l] Output a subset of FASTA data from a >>>> single large file >>>> >>>> Dear all, >>>> >>>> I am a total Bioperl newbie struggling to accomplish a >>>> conceptually simple >>>> task. I have a single large fasta file containing about 200,000 >>>> probe >>>> sequences (from an Affymetrix microarray), each of which looks >>>> like this: >>>> >>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >>>> Antisense; >>>> TGGCTCCTGCTGAGGTCCCCTTTCC >>>> >>>> What I would like to do is extract from this file a subset of >>>> ~130,800 >>>> probes (both the header and the sequence) and output this >>>> subset into a new >>>> fasta file. These 130,800 probes correspond to 8,175 probe set IDs >>>> ("1138_at" is the probe set ID in the header listed above); I >>>> have these >>>> 8,175 IDs listed in a separate file. I *think* that I managed >>>> to create an >>>> index of all 200,000 probes in the original fasta file using >>>> the following >>>> script: >>>> >>>> #!/usr/bin/perl -w >>>> >>>> # script 1: create the index >>>> >>>> use Bio::Index::Fasta; >>>> use strict; >>>> my $Index_File_Name = shift; >>>> my $inx = Bio::Index::Fasta->new( >>>> -filename => $Index_File_Name, >>>> -write_flag => 1); >>>> $inx->make_index(@ARGV); >>>> >>>> I'm not sure if this is the most sensible approach, and even >>>> if it is, I'm >>>> not sure what to do next. Any help would be greatly appreciated! >>>> >>>> Many thanks, >>>> Mike O. >>>> >>>> >>>> >>>> >>>> -- >>>> No virus found in this outgoing message. >>>> Checked by AVG Free Edition. >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>>> 6/8/2006 >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> -- >>> No virus found in this incoming message. >>> Checked by AVG Free Edition. >>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>> 6/8/2006 >>> >>> -- >>> No virus found in this outgoing message. >>> Checked by AVG Free Edition. >>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: >>> 6/9/2006 >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >>Christopher Fields >>Postdoctoral Researcher >>Lab of Dr. Robert Switzer >>Dept of Biochemistry >>University of Illinois Urbana-Champaign >> >> >> >> > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: 6/11/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006 From s_maheshwari84 at rediffmail.com Thu Jun 15 07:42:24 2006 From: s_maheshwari84 at rediffmail.com (saurabh maheshwari) Date: 15 Jun 2006 11:42:24 -0000 Subject: [Bioperl-l] simple problem plz look Message-ID: <20060615114224.21669.qmail@webmail31.rediffmail.com> I m using this statement : my $data[0][0] = 'P_p'; what is wrong i this as i am getting syntax error> with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI From rkulasekaran at accelrys.com Thu Jun 15 08:06:30 2006 From: rkulasekaran at accelrys.com (rkulasekaran at accelrys.com) Date: Thu, 15 Jun 2006 17:36:30 +0530 Subject: [Bioperl-l] simple problem plz look In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com> Message-ID: Hi, Can you declare the array ( my @data ) before reading the index. I guess that will work fine. - Raja "saurabh maheshwari" Sent by: bioperl-l-bounces at lists.open-bio.org 15/06/2006 17:12 Please respond to saurabh maheshwari To bioperl-l at lists.open-bio.org cc Subject [Bioperl-l] simple problem plz look I m using this statement : my $data[0][0] = 'P_p'; what is wrong i this as i am getting syntax error> with Regards SAURABH MAHESHWARI M.Sc. (BIOINFORMATICS) JAMIA MILLIA ISLAMIA NEW DELHI _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Click on the link below to report this email as spam https://www.mailcontrol.com/sr/behF6u7j0vHYfoNqVfMn0T6lftsSPmT67PBEri3aA93L4mIZnnEsbOOgcm5LPEUItueIAtlw4aAQAjnhffjwxluskn5SCC6PU4sqvHqdy3UBLnb7IgqQIpogrs47CqHnPsig3hjMwg17c5A4zs49QdfwQIXZ3EkZGQpytOaqXTas8SlXA7tRyL!Oh9pq4bqQJsTF3icLnDHTJZLEigD5cPnlrScQD5EK From sb at mrc-dunn.cam.ac.uk Thu Jun 15 08:52:53 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 15 Jun 2006 13:52:53 +0100 Subject: [Bioperl-l] simple problem plz look In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com> References: <20060615114224.21669.qmail@webmail31.rediffmail.com> Message-ID: <44915825.8040902@mrc-dunn.cam.ac.uk> saurabh maheshwari wrote: > I m using this statement : > > my $data[0][0] = 'P_p'; > > what is wrong i this as i am getting syntax error> I don't think general Perl problems are appropriate for this list. Try subscribing to the beginners mailing list via http://learn.perl.org/ But in any case, say: my @data; $data[0][0] = 'P_p'; From cjfields at uiuc.edu Thu Jun 15 11:18:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Jun 2006 10:18:32 -0500 Subject: [Bioperl-l] simple problem plz look In-Reply-To: <20060615114224.21669.qmail@webmail31.rediffmail.com> Message-ID: <002001c6908e$f8b11b30$15327e82@pyrimidine> And exactly how is this applicable to BioPerl? Start here: http://learn.perl.org/ My guess: you need to declare 'my @data;' first. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of saurabh maheshwari > Sent: Thursday, June 15, 2006 6:42 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] simple problem plz look > > I m using this statement : > > my $data[0][0] = 'P_p'; > > what is wrong i this as i am getting syntax error> > > > with Regards > SAURABH MAHESHWARI > M.Sc. (BIOINFORMATICS) > JAMIA MILLIA ISLAMIA > NEW DELHI > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sjmiller at email.arizona.edu Thu Jun 15 13:42:52 2006 From: sjmiller at email.arizona.edu (Susan J. Miller) Date: Thu, 15 Jun 2006 10:42:52 -0700 Subject: [Bioperl-l] Parsing BLAST 2.2.14 output In-Reply-To: References: Message-ID: <44919C1C.1060901@email.arizona.edu> We are unable to parse BLAST 2.2.14 results from the NCBI website using SearchIO. I have updated Bio::SearchIO::blast.pm to what's in bioperl-live, but when users download either plain text or HTML blast outputs from the NCBI page, SearchIO cannot parse them. This used to work prior to BLAST 2.2.14. Should I try installing the entire bioperl-live distribution? (We are running Solaris 8 and perl 5.8 if that makes any difference.) Thanks, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 From sb at mrc-dunn.cam.ac.uk Thu Jun 15 15:00:38 2006 From: sb at mrc-dunn.cam.ac.uk (Sendu Bala) Date: Thu, 15 Jun 2006 20:00:38 +0100 Subject: [Bioperl-l] Parsing BLAST 2.2.14 output In-Reply-To: <44919C1C.1060901@email.arizona.edu> References: <44919C1C.1060901@email.arizona.edu> Message-ID: <4491AE56.6090505@mrc-dunn.cam.ac.uk> Susan J. Miller wrote: > We are unable to parse BLAST 2.2.14 results from the NCBI website using > SearchIO. I have updated Bio::SearchIO::blast.pm to what's in > bioperl-live, but when users download either plain text or HTML blast > outputs from the NCBI page, SearchIO cannot parse them. This used to > work prior to BLAST 2.2.14. Should I try installing the entire > bioperl-live distribution? (We are running Solaris 8 and perl 5.8 if > that makes any difference.) Parsing saved results from the website works fine here. Please be more specific in what you mean by 'unable to parse'. What error messages do you get? What exact code did you use to get those errors? Exactly what input data did you use? Exactly how did you generate that data? Cheers, Sendu. From cjfields at uiuc.edu Thu Jun 15 17:06:13 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Jun 2006 16:06:13 -0500 Subject: [Bioperl-l] Parsing BLAST 2.2.14 output In-Reply-To: <44919C1C.1060901@email.arizona.edu> Message-ID: <002701c690bf$8b732410$15327e82@pyrimidine> Bio::SearchIO can't handle HTML output directly; you have to junk the tags first, and we can't really guarantee anymore that will work either (I haven't tried it). The FAQ tells you how: http://www.bioperl.org/wiki/FAQ I would avoid HTML parsing altogether. The only sure-fire method that will always work, according to NCBI, is XML output, and that's parsable using Bio::SearchIO::blastxml. You can also try tabular format, which Bio::SearchIO::blasttable can parse as well. However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly) to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work as well using BLASTP (and that's still set up to parse text output using SearchIO I believe). Could you give us an example of the type of BLAST you were running, the sequence you used, and the error you had? It could be program-specific output that may be causing the problems. The last time text parsing broke it was changes specifically to only BLASTN/TBLASTX output or something along those lines. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Susan J. Miller > Sent: Thursday, June 15, 2006 12:43 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Parsing BLAST 2.2.14 output > > We are unable to parse BLAST 2.2.14 results from the NCBI website using > SearchIO. I have updated Bio::SearchIO::blast.pm to what's in > bioperl-live, but when users download either plain text or HTML blast > outputs from the NCBI page, SearchIO cannot parse them. This used to > work prior to BLAST 2.2.14. Should I try installing the entire > bioperl-live distribution? (We are running Solaris 8 and perl 5.8 if > that makes any difference.) > > Thanks, > -susan > > Susan J. Miller > Biotechnology Computing Facility > Arizona Research Laboratories > Bio West 228 > University of Arizona > Tucson, AZ 85721 > (520) 626-2597 > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From sjmiller at email.arizona.edu Thu Jun 15 17:43:59 2006 From: sjmiller at email.arizona.edu (Susan J. Miller) Date: Thu, 15 Jun 2006 14:43:59 -0700 Subject: [Bioperl-l] Parsing BLAST 2.2.14 output In-Reply-To: <002701c690bf$8b732410$15327e82@pyrimidine> References: <002701c690bf$8b732410$15327e82@pyrimidine> Message-ID: <4491D49F.4030208@email.arizona.edu> Chris Fields wrote: > > However, like Sendu, I get BLASTP 2.2.14 output (saved from NCBI directly) > to parse using bioperl-live; Bio::Tools::Run::RemoteBlast also seems to work > as well using BLASTP (and that's still set up to parse text output using > SearchIO I believe). Could you give us an example of the type of BLAST you > were running, the sequence you used, and the error you had? It could be > program-specific output that may be causing the problems. The last time > text parsing broke it was changes specifically to only BLASTN/TBLASTX output > or something along those lines. Hi Chris and Sendu, Thanks for your replies. I am using blastp from the NCBI BLAST page, with this input sequence: MSRLLWRKVAGATVGPGPVPAPGRWVSSSVPASDPSDGQRRRQQQQQQQQQQQQQPQQPQVLSSEGGQLR HNPLDIQMLSRGLHEQIFGQGGEMPGEAAVRRSVEHLQKHGLWGQPAVPLPDVELRLPPLYGDNLDQHFR LLAQKQSLPYLEAANLLLQAQLPPKPPAWAWAEGWTRYGPEGEAVPVAIPEERALVFDVEVCLAEGTCPT LAVAISPSAWYSWCSQRLVEERYSWTSQLSPADLIPLEVPTGASSPTQRDWQEQLVVGHNVSFDRAHIRE QYLIQGSRMRFLDTMSMHMAISGLSSFQRSLWIAAKQGKHKVQPPTKQGQKSQRKARRGPAISSWDWLDI I have tried saving HTML (with and without the graphical overview), plain text, and XML. I am parsing with this script: #!/usr/local/bin/perl -w use Bio::SearchIO; while ($fil = shift(@ARGV)) { $srchio = new Bio::SearchIO ('-format' => 'blast', '-file' => $fil); while ($result = $srchio->next_result) { $db = $result->database_name; $alg = $result->algorithm; print "DB $db\n ALG $alg\n"; $qid = $result->query_name; print "QRY $qid\n"; while ($hit = $result->next_hit) { $hitnam = $hit->name; print "\t$hitnam\n"; $nhsp = 0; while ($hit->next_hsp) { $nhsp++; } print "\tHSPS: $nhsp\n"; } # end next_hit } } Interestingly, the results are different (but never correct) for the different types of output I've tried. For xml, the script runs but produces no output, for plain text the script hangs with no output, and for html, I get these errors: -------------------- WARNING --------------------- MSG: No HSPs for this Hit (gi|27502689|gb|AAH42571.1|) --------------------------------------------------- Use of uninitialized value in numeric le (<=) at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm line 349, line 308. Use of uninitialized value in numeric le (<=) at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm line 304, line 308. -------------------- WARNING --------------------- MSG: No HSPs for this Hit (gi|21779923|gb|AAM77583.1|) --------------------------------------------------- Use of uninitialized value in numeric le (<=) at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm line 349, line 333. Use of uninitialized value in numeric le (<=) at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm line 304, line 333. -------------------- WARNING --------------------- MSG: No HSPs for this Hit (gi|1644239|dbj|BAA12223.1|) --------------------------------------------------- Use of uninitialized value in numeric le (<=) at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm line 349, line 358. Use of uninitialized value in numeric le (<=) at /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/IteratedSearchResultEventBuilder.pm line 304, line 358. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: no data for midline Positives = 270/273 (98%), Gaps = 0/273 (0%) Query 78 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/local/lib/perl5/site_perl/5.6.0/Bio/Root/Root.pm:328 STACK: Bio::SearchIO::blast::next_result /usr/local/lib/perl5/site_perl/5.6.0/Bio/SearchIO/blast.pm:1172 STACK: ./srchio.pl:8 At this point I should probably try installing all of bioperl-live, or at least get IteratedSearchResultEventBuilder.pm - or would you recommend something else? Let me know if you need more info. Thanks again, -susan Susan J. Miller Biotechnology Computing Facility Arizona Research Laboratories Bio West 228 University of Arizona Tucson, AZ 85721 (520) 626-2597 From cjfields at uiuc.edu Thu Jun 15 19:03:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 15 Jun 2006 18:03:37 -0500 Subject: [Bioperl-l] Parsing BLAST 2.2.14 output In-Reply-To: <4491D49F.4030208@email.arizona.edu> Message-ID: <002b01c690cf$efa05510$15327e82@pyrimidine> ... > Hi Chris and Sendu, > > Thanks for your replies. I am using blastp from the NCBI BLAST page, > with this input sequence: ... > I have tried saving HTML (with and without the graphical overview), > plain text, and XML. I am parsing with this script: > #!/usr/local/bin/perl -w > > use Bio::SearchIO; > ... > } I got this script to work. I used your sequence and retrieved BLASTP text output from NCBI BLASTP 2.2.14, then saved it from the web browser, and just copied it to three separate files. Using those files as input, they all parse fine, with output like this: DB All non-redundant GenBank CDStranslations+PDB+SwissProt+PIR+PRF excluding environmental samples ALG BLASTP QRY gi|27502689|gb|AAH42571.1| HSPS: 1 gi|21779923|gb|AAM77583.1| HSPS: 1 ... > Interestingly, the results are different (but never correct) for the > different types of output I've tried. For xml, the script runs but > produces no output, for plain text the script hangs with no output, and > for html, I get these errors: What's interesting is that HTML did anything at all. You MUST strip out the HTML tags as per the FAQ, which I pointed out before: http://www.bioperl.org/wiki/FAQ See the question : Does Bio::SearchIO parse the HTML output that BLAST creates using the -T option? Again, I would NOT attempt parsing HTML. The only reason we have a FAQ question about it is b/c it popped up on the list many many times in the past (i.e. it is a FAQ) and someone found out that HTML::Strip works. We will never adequately support it beyond suggesting stripping the tags out. NCBI changes their HTML output more often than their text output. If you tried parsing XML with the format set to 'blast' you'll get nothing (the blast text parser looks for text output using regexes, so it just bypasses all the XML tags). You must set: -format => 'blastxml' You'll also need to install XML::SAX, and I would suggest installing XML::SAX::ExpatXS and the Expat XML parser for your system to speed things up. The 'hanging' you mention using text parsing sounds like the old bug where it got caught in an infinite loop. I don't have this problem. It could be a couple of things: 1) You have an old version of bioperl and updated Bio::SearchIO, but you haven't updated Bio::SearchIO::blast. That's the plugin module where the error was (not Bio::SearchIO). Try updating either that or install the entire distribution from scratch. 2) You have two versions of Bioperl installed (an old one and bioperl-live) and perl is using the old version of bioperl (and the old version of SearchIO::blast). Make sure you only have one version installed and that it is bioperl-live. > At this point I should probably try installing all of bioperl-live, or > at least get IteratedSearchResultEventBuilder.pm - or would you > recommend something else? Let me know if you need more info. If you have the entire distribution installed, you should have ISREB anyway. ISREB (IteratedSearchResultEventBuilder) has nothing to do with the problems here, though. Chris > Thanks again, > -susan From cain at cshl.edu Thu Jun 15 11:25:54 2006 From: cain at cshl.edu (Scott Cain) Date: Thu, 15 Jun 2006 11:25:54 -0400 Subject: [Bioperl-l] Set::Scalar missing causing a test failure for Bio::Tree::Compatible Message-ID: <1150385154.2622.152.camel@localhost.localdomain> Hi all, When running make test on a fairly new system, I got the following failure: t/Compatible.................No Set::Scalar. Unable to test Bio::Tree::Compatible Can't locate Set/Scalar.pm in @INC .... BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl-live/blib/lib/Bio/Tree/Compatible.pm line 138. Compilation failed in require at t/Compatible.t line 42. BEGIN failed--compilation aborted at t/Compatible.t line 42. t/Compatible.................dubious Test returned status 2 (wstat 512, 0x200) after all the subtests completed successfully Set::Scalar is mentioned in Makefile.PL as an optional package (but not required) and isn't mentioned in the INSTALL doc anywhere. It looks like the author of the test (t/Compatible.t) is trying to skip this test if Set::Scalar isn't found, but the 'dubious' result gets marked ultimately as a failure. What is the right thing to do here? Thanks, Scott -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/8cb53ee4/attachment.bin From hlapp at gmx.net Fri Jun 16 00:42:25 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 16 Jun 2006 00:42:25 -0400 Subject: [Bioperl-l] Set::Scalar missing causing a test failure for Bio::Tree::Compatible In-Reply-To: <1150385154.2622.152.camel@localhost.localdomain> References: <1150385154.2622.152.camel@localhost.localdomain> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Should be fixed on the main trunk. -hilmar On Jun 15, 2006, at 11:25 AM, Scott Cain wrote: > Hi all, > > When running make test on a fairly new system, I got the following > failure: > > t/Compatible.................No Set::Scalar. Unable to test > Bio::Tree::Compatible > Can't locate Set/Scalar.pm in @INC > .... > BEGIN failed--compilation aborted at /home/cain/cvs_stuff/bioperl- > live/blib/lib/Bio/Tree/Compatible.pm line 138. > Compilation failed in require at t/Compatible.t line 42. > BEGIN failed--compilation aborted at t/Compatible.t line 42. > t/Compatible.................dubious > Test returned status 2 (wstat 512, 0x200) > after all the subtests completed successfully > > Set::Scalar is mentioned in Makefile.PL as an optional package (but > not > required) and isn't mentioned in the INSTALL doc anywhere. It looks > like the author of the test (t/Compatible.t) is trying to skip this > test > if Set::Scalar isn't found, but the 'dubious' result gets marked > ultimately as a failure. > > What is the right thing to do here? > > Thanks, > Scott > > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l - -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) iD8DBQFEkja5uV6N2JxL7qsRAjqCAJ9RTgPntJ+dmGHeiovS5FeG3QvZagCeMzmw sKkizbLUYAsyJqVw/2SplcQ= =ehd6 -----END PGP SIGNATURE----- From rmb32 at cornell.edu Thu Jun 15 21:37:03 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 15 Jun 2006 18:37:03 -0700 Subject: [Bioperl-l] reading and writing GFF3 Message-ID: <44920B3F.90405@cornell.edu> There is stuff in bioperl for reading and writing GFF3. There's Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which is the 'best' one to use? Neither of these is working very well for me. My proximate use case is reading in a RepeatMasker report with Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then writing those out to a GFF3 file. Bio::Tools::GFF will take these things and write out something that closely resembles GFF3, but with Target attributes that don't seem to comply with Lincoln's GFF3 spec, since its coordinates are join()ed with commas instead of spaces. I'm attaching a little script that illustrates this. Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the features contained in them, throwing 'only Bio::SeqFeature::Annotated objects are writeable'. This seems a bit silly, since one of the whole points of Bioperl is using polymorphism to make it easy to connect things together. I've attached a little script to illustrate this one too. So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF deprecated? Why does Bio::FeatureIO::gff only accept Bio::SeqFeature::Annotated objects? Thanks in advance. Rob -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: bio_featureio_gff_test.pl Type: application/x-perl Size: 1455 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: bio_tools_gff_test.pl Type: application/x-perl Size: 1436 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060615/17e565f4/attachment-0001.bin From cain at cshl.edu Fri Jun 16 10:18:13 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 16 Jun 2006 10:18:13 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <44920B3F.90405@cornell.edu> References: <44920B3F.90405@cornell.edu> Message-ID: <1150467493.2622.209.camel@localhost.localdomain> Hi Rob, I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2, but that is actually a good thing. The tighter constraints results in a better, more consistent file format. The reason only BSF::Annotated features are writable is that there needs to be tight control on the 'type' of the feature, to insure that the type is part of the Sequence Ontology. It also makes it much easier to properly write out the attributes in the ninth column, particularly the ones that are 'reserved', like Parent, Dbxref, and Ontology_term. BTG is still usable, but the GFF3 it puts out is actually more 'GFF3-like'; that is, it looks like GFF3, but because there are no constraints on the type and the terms that are used in the ninth column, you have to be very careful using it to produce GFF3, by making sure that your feature objects conform to the standard before BTG tries to write them out. (Of course, one way to do that would be to convert your feature objects to BSF::Annotated objects, but then you could use BFIO::gff :-) [Long pause while scott goes and monkeys with Bio::Tools::GFF] OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for your sample data. Conveniently, since 'nucleotide_motif' is a SO term, this is completely valid. (I even fixed the escaping the of the stray '=' in 'hind_R=2046'.) The output I get is this: ##gff-version 3 C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556 918 - . Target=Contig151 325 832 C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736 488 - . Target=Contig386 1 124 C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105 1718 + . Target=Contig358 1 311 C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036 312 - . Target=hind_R%3D2046 59 120 Scott On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote: > There is stuff in bioperl for reading and writing GFF3. There's > Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which > is the 'best' one to use? > > Neither of these is working very well for me. > > My proximate use case is reading in a RepeatMasker report with > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then > writing those out to a GFF3 file. > > Bio::Tools::GFF will take these things and write out something that > closely resembles GFF3, but with Target attributes that don't seem to > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with > commas instead of spaces. I'm attaching a little script that > illustrates this. > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the > features contained in them, throwing 'only Bio::SeqFeature::Annotated > objects are writeable'. This seems a bit silly, since one of the whole > points of Bioperl is using polymorphism to make it easy to connect > things together. I've attached a little script to illustrate this one too. > > So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF > deprecated? Why does Bio::FeatureIO::gff only accept > Bio::SeqFeature::Annotated objects? > > Thanks in advance. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/572bd98b/attachment.bin From rmb32 at cornell.edu Fri Jun 16 14:36:22 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 16 Jun 2006 11:36:22 -0700 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain> References: <44920B3F.90405@cornell.edu> <1150467493.2622.209.camel@localhost.localdomain> Message-ID: <4492FA26.6030909@cornell.edu> Thanks for the reply Scott. It's good that the BSF::Annotated features control the type to be in the SO. I sort of came to the "BTG is only gff3-/like/" conclusion myself as I poked around in the two modules in question, so I'd much rather use BSF::gff. So I guess the question now is (and this will probably be a pretty common use case) how does one take an "old" Bio::SeqFeature::Generic or the like object and make it into a Bio::SeqFeature::Annotated? Rob Scott Cain wrote: > Hi Rob, > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2, > but that is actually a good thing. The tighter constraints results in a > better, more consistent file format. > > The reason only BSF::Annotated features are writable is that there needs > to be tight control on the 'type' of the feature, to insure that the > type is part of the Sequence Ontology. It also makes it much easier to > properly write out the attributes in the ninth column, particularly the > ones that are 'reserved', like Parent, Dbxref, and Ontology_term. > > BTG is still usable, but the GFF3 it puts out is actually more > 'GFF3-like'; that is, it looks like GFF3, but because there are no > constraints on the type and the terms that are used in the ninth column, > you have to be very careful using it to produce GFF3, by making sure > that your feature objects conform to the standard before BTG tries to > write them out. (Of course, one way to do that would be to convert your > feature objects to BSF::Annotated objects, but then you could use > BFIO::gff :-) > > [Long pause while scott goes and monkeys with Bio::Tools::GFF] > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for > your sample data. Conveniently, since 'nucleotide_motif' is a SO term, > this is completely valid. (I even fixed the escaping the of the stray > '=' in 'hind_R=2046'.) The output I get is this: > > ##gff-version 3 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556 918 - . Target=Contig151 325 832 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736 488 - . Target=Contig386 1 124 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105 1718 + . Target=Contig358 1 311 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036 312 - . Target=hind_R%3D2046 59 120 > > Scott > > > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote: > >> There is stuff in bioperl for reading and writing GFF3. There's >> Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which >> is the 'best' one to use? >> >> Neither of these is working very well for me. >> >> My proximate use case is reading in a RepeatMasker report with >> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then >> writing those out to a GFF3 file. >> >> Bio::Tools::GFF will take these things and write out something that >> closely resembles GFF3, but with Target attributes that don't seem to >> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with >> commas instead of spaces. I'm attaching a little script that >> illustrates this. >> >> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the >> features contained in them, throwing 'only Bio::SeqFeature::Annotated >> objects are writeable'. This seems a bit silly, since one of the whole >> points of Bioperl is using polymorphism to make it easy to connect >> things together. I've attached a little script to illustrate this one too. >> >> So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF >> deprecated? Why does Bio::FeatureIO::gff only accept >> Bio::SeqFeature::Annotated objects? >> >> Thanks in advance. >> >> Rob >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at uiuc.edu Fri Jun 16 15:12:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 16 Jun 2006 14:12:28 -0500 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <1150467493.2622.209.camel@localhost.localdomain> Message-ID: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> Scott, Looks like Robert also submitted a bug report related to this as well. Could you check into it (pretty-please)? I'm still GFF3-illiterate. http://bugzilla.open-bio.org/show_bug.cgi?id=2025 Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Scott Cain > Sent: Friday, June 16, 2006 9:18 AM > To: Robert Buels > Cc: bioperl-l at bioperl.org > Subject: Re: [Bioperl-l] reading and writing GFF3 > > Hi Rob, > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2, > but that is actually a good thing. The tighter constraints results in a > better, more consistent file format. > > The reason only BSF::Annotated features are writable is that there needs > to be tight control on the 'type' of the feature, to insure that the > type is part of the Sequence Ontology. It also makes it much easier to > properly write out the attributes in the ninth column, particularly the > ones that are 'reserved', like Parent, Dbxref, and Ontology_term. > > BTG is still usable, but the GFF3 it puts out is actually more > 'GFF3-like'; that is, it looks like GFF3, but because there are no > constraints on the type and the terms that are used in the ninth column, > you have to be very careful using it to produce GFF3, by making sure > that your feature objects conform to the standard before BTG tries to > write them out. (Of course, one way to do that would be to convert your > feature objects to BSF::Annotated objects, but then you could use > BFIO::gff :-) > > [Long pause while scott goes and monkeys with Bio::Tools::GFF] > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for > your sample data. Conveniently, since 'nucleotide_motif' is a SO term, > this is completely valid. (I even fixed the escaping the of the stray > '=' in 'hind_R=2046'.) The output I get is this: > > ##gff-version 3 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556 > 918 - . Target=Contig151 325 832 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736 > 488 - . Target=Contig386 1 124 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105 > 1718 + . Target=Contig358 1 311 > C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036 > 312 - . Target=hind_R%3D2046 59 120 > > Scott > > > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote: > > There is stuff in bioperl for reading and writing GFF3. There's > > Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which > > is the 'best' one to use? > > > > Neither of these is working very well for me. > > > > My proximate use case is reading in a RepeatMasker report with > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then > > writing those out to a GFF3 file. > > > > Bio::Tools::GFF will take these things and write out something that > > closely resembles GFF3, but with Target attributes that don't seem to > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with > > commas instead of spaces. I'm attaching a little script that > > illustrates this. > > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the > > features contained in them, throwing 'only Bio::SeqFeature::Annotated > > objects are writeable'. This seems a bit silly, since one of the whole > > points of Bioperl is using polymorphism to make it easy to connect > > things together. I've attached a little script to illustrate this one > too. > > > > So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF > > deprecated? Why does Bio::FeatureIO::gff only accept > > Bio::SeqFeature::Annotated objects? > > > > Thanks in advance. > > > > Rob > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- > ------------------------------------------------------------------------ > Scott Cain, Ph. D. cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > Cold Spring Harbor Laboratory From rmb32 at cornell.edu Fri Jun 16 15:30:23 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 16 Jun 2006 12:30:23 -0700 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> Message-ID: <449306CF.1030301@cornell.edu> Woops, I should have said something about that. I submitted it before I saw that Scott had already done the escaping in CVS. Chris Fields wrote: > Scott, > > Looks like Robert also submitted a bug report related to this as well. > Could you check into it (pretty-please)? I'm still GFF3-illiterate. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2025 > > Chris > > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Scott Cain >> Sent: Friday, June 16, 2006 9:18 AM >> To: Robert Buels >> Cc: bioperl-l at bioperl.org >> Subject: Re: [Bioperl-l] reading and writing GFF3 >> >> Hi Rob, >> >> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2, >> but that is actually a good thing. The tighter constraints results in a >> better, more consistent file format. >> >> The reason only BSF::Annotated features are writable is that there needs >> to be tight control on the 'type' of the feature, to insure that the >> type is part of the Sequence Ontology. It also makes it much easier to >> properly write out the attributes in the ninth column, particularly the >> ones that are 'reserved', like Parent, Dbxref, and Ontology_term. >> >> BTG is still usable, but the GFF3 it puts out is actually more >> 'GFF3-like'; that is, it looks like GFF3, but because there are no >> constraints on the type and the terms that are used in the ninth column, >> you have to be very careful using it to produce GFF3, by making sure >> that your feature objects conform to the standard before BTG tries to >> write them out. (Of course, one way to do that would be to convert your >> feature objects to BSF::Annotated objects, but then you could use >> BFIO::gff :-) >> >> [Long pause while scott goes and monkeys with Bio::Tools::GFF] >> >> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for >> your sample data. Conveniently, since 'nucleotide_motif' is a SO term, >> this is completely valid. (I even fixed the escaping the of the stray >> '=' in 'hind_R=2046'.) The output I get is this: >> >> ##gff-version 3 >> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556 >> 918 - . Target=Contig151 325 832 >> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736 >> 488 - . Target=Contig386 1 124 >> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105 >> 1718 + . Target=Contig358 1 311 >> C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036 >> 312 - . Target=hind_R%3D2046 59 120 >> >> Scott >> >> >> >> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote: >> >>> There is stuff in bioperl for reading and writing GFF3. There's >>> Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which >>> is the 'best' one to use? >>> >>> Neither of these is working very well for me. >>> >>> My proximate use case is reading in a RepeatMasker report with >>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then >>> writing those out to a GFF3 file. >>> >>> Bio::Tools::GFF will take these things and write out something that >>> closely resembles GFF3, but with Target attributes that don't seem to >>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with >>> commas instead of spaces. I'm attaching a little script that >>> illustrates this. >>> >>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the >>> features contained in them, throwing 'only Bio::SeqFeature::Annotated >>> objects are writeable'. This seems a bit silly, since one of the whole >>> points of Bioperl is using polymorphism to make it easy to connect >>> things together. I've attached a little script to illustrate this one >>> >> too. >> >>> So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF >>> deprecated? Why does Bio::FeatureIO::gff only accept >>> Bio::SeqFeature::Annotated objects? >>> >>> Thanks in advance. >>> >>> Rob >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> -- >> ------------------------------------------------------------------------ >> Scott Cain, Ph. D. cain at cshl.edu >> GMOD Coordinator (http://www.gmod.org/) 216-392-3087 >> Cold Spring Harbor Laboratory >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rmb32 at cornell.edu Fri Jun 16 15:34:16 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 16 Jun 2006 12:34:16 -0700 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <1150486453.4412.30.camel@localhost.localdomain> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> Message-ID: <449307B8.5040802@cornell.edu> So about that converting ye olde feature objects into Bio::SeqFeature::Annotated objects. How do I do it? Scott Cain wrote: > That's OK--You added a few items that should be escaped that weren't, so > I added those too. > > Thanks, > Scott > > > On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: > >> Woops, I should have said something about that. I submitted it before >> I saw that Scott had already done the escaping in CVS. >> >> Chris Fields wrote: >> >>> Scott, >>> >>> Looks like Robert also submitted a bug report related to this as well. >>> Could you check into it (pretty-please)? I'm still GFF3-illiterate. >>> >>> http://bugzilla.open-bio.org/show_bug.cgi?id=2025 >>> >>> Chris >>> >>> >>> >>>> -----Original Message----- >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>>> bounces at lists.open-bio.org] On Behalf Of Scott Cain >>>> Sent: Friday, June 16, 2006 9:18 AM >>>> To: Robert Buels >>>> Cc: bioperl-l at bioperl.org >>>> Subject: Re: [Bioperl-l] reading and writing GFF3 >>>> >>>> Hi Rob, >>>> >>>> I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2, >>>> but that is actually a good thing. The tighter constraints results in a >>>> better, more consistent file format. >>>> >>>> The reason only BSF::Annotated features are writable is that there needs >>>> to be tight control on the 'type' of the feature, to insure that the >>>> type is part of the Sequence Ontology. It also makes it much easier to >>>> properly write out the attributes in the ninth column, particularly the >>>> ones that are 'reserved', like Parent, Dbxref, and Ontology_term. >>>> >>>> BTG is still usable, but the GFF3 it puts out is actually more >>>> 'GFF3-like'; that is, it looks like GFF3, but because there are no >>>> constraints on the type and the terms that are used in the ninth column, >>>> you have to be very careful using it to produce GFF3, by making sure >>>> that your feature objects conform to the standard before BTG tries to >>>> write them out. (Of course, one way to do that would be to convert your >>>> feature objects to BSF::Annotated objects, but then you could use >>>> BFIO::gff :-) >>>> >>>> [Long pause while scott goes and monkeys with Bio::Tools::GFF] >>>> >>>> OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for >>>> your sample data. Conveniently, since 'nucleotide_motif' is a SO term, >>>> this is completely valid. (I even fixed the escaping the of the stray >>>> '=' in 'hind_R=2046'.) The output I get is this: >>>> >>>> ##gff-version 3 >>>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556 >>>> 918 - . Target=Contig151 325 832 >>>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736 >>>> 488 - . Target=Contig386 1 124 >>>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105 >>>> 1718 + . Target=Contig358 1 311 >>>> C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036 >>>> 312 - . Target=hind_R%3D2046 59 120 >>>> >>>> Scott >>>> >>>> >>>> >>>> On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote: >>>> >>>> >>>>> There is stuff in bioperl for reading and writing GFF3. There's >>>>> Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which >>>>> is the 'best' one to use? >>>>> >>>>> Neither of these is working very well for me. >>>>> >>>>> My proximate use case is reading in a RepeatMasker report with >>>>> Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then >>>>> writing those out to a GFF3 file. >>>>> >>>>> Bio::Tools::GFF will take these things and write out something that >>>>> closely resembles GFF3, but with Target attributes that don't seem to >>>>> comply with Lincoln's GFF3 spec, since its coordinates are join()ed with >>>>> commas instead of spaces. I'm attaching a little script that >>>>> illustrates this. >>>>> >>>>> Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the >>>>> features contained in them, throwing 'only Bio::SeqFeature::Annotated >>>>> objects are writeable'. This seems a bit silly, since one of the whole >>>>> points of Bioperl is using polymorphism to make it easy to connect >>>>> things together. I've attached a little script to illustrate this one >>>>> >>>>> >>>> too. >>>> >>>> >>>>> So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF >>>>> deprecated? Why does Bio::FeatureIO::gff only accept >>>>> Bio::SeqFeature::Annotated objects? >>>>> >>>>> Thanks in advance. >>>>> >>>>> Rob >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> -- >>>> ------------------------------------------------------------------------ >>>> Scott Cain, Ph. D. cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> >>>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> -- >> Robert Buels >> SGN Bioinformatics Analyst >> 252A Emerson Hall, Cornell University >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> >> -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cain at cshl.edu Fri Jun 16 15:28:52 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 16 Jun 2006 15:28:52 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> Message-ID: <1150486133.4412.25.camel@localhost.localdomain> I tweaked the patch and applied it, and closed the bug. Thanks for pointing it out--I doubt I would have noticed it in the bioper-guts mailing, which I generally don't look too closely at :-o Scott On Fri, 2006-06-16 at 14:12 -0500, Chris Fields wrote: > Scott, > > Looks like Robert also submitted a bug report related to this as well. > Could you check into it (pretty-please)? I'm still GFF3-illiterate. > > http://bugzilla.open-bio.org/show_bug.cgi?id=2025 > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Scott Cain > > Sent: Friday, June 16, 2006 9:18 AM > > To: Robert Buels > > Cc: bioperl-l at bioperl.org > > Subject: Re: [Bioperl-l] reading and writing GFF3 > > > > Hi Rob, > > > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2, > > but that is actually a good thing. The tighter constraints results in a > > better, more consistent file format. > > > > The reason only BSF::Annotated features are writable is that there needs > > to be tight control on the 'type' of the feature, to insure that the > > type is part of the Sequence Ontology. It also makes it much easier to > > properly write out the attributes in the ninth column, particularly the > > ones that are 'reserved', like Parent, Dbxref, and Ontology_term. > > > > BTG is still usable, but the GFF3 it puts out is actually more > > 'GFF3-like'; that is, it looks like GFF3, but because there are no > > constraints on the type and the terms that are used in the ninth column, > > you have to be very careful using it to produce GFF3, by making sure > > that your feature objects conform to the standard before BTG tries to > > write them out. (Of course, one way to do that would be to convert your > > feature objects to BSF::Annotated objects, but then you could use > > BFIO::gff :-) > > > > [Long pause while scott goes and monkeys with Bio::Tools::GFF] > > > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for > > your sample data. Conveniently, since 'nucleotide_motif' is a SO term, > > this is completely valid. (I even fixed the escaping the of the stray > > '=' in 'hind_R=2046'.) The output I get is this: > > > > ##gff-version 3 > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556 > > 918 - . Target=Contig151 325 832 > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736 > > 488 - . Target=Contig386 1 124 > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105 > > 1718 + . Target=Contig358 1 311 > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036 > > 312 - . Target=hind_R%3D2046 59 120 > > > > Scott > > > > > > > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote: > > > There is stuff in bioperl for reading and writing GFF3. There's > > > Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which > > > is the 'best' one to use? > > > > > > Neither of these is working very well for me. > > > > > > My proximate use case is reading in a RepeatMasker report with > > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then > > > writing those out to a GFF3 file. > > > > > > Bio::Tools::GFF will take these things and write out something that > > > closely resembles GFF3, but with Target attributes that don't seem to > > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with > > > commas instead of spaces. I'm attaching a little script that > > > illustrates this. > > > > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the > > > features contained in them, throwing 'only Bio::SeqFeature::Annotated > > > objects are writeable'. This seems a bit silly, since one of the whole > > > points of Bioperl is using polymorphism to make it easy to connect > > > things together. I've attached a little script to illustrate this one > > too. > > > > > > So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF > > > deprecated? Why does Bio::FeatureIO::gff only accept > > > Bio::SeqFeature::Annotated objects? > > > > > > Thanks in advance. > > > > > > Rob > > > > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > > ------------------------------------------------------------------------ > > Scott Cain, Ph. D. cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > Cold Spring Harbor Laboratory > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/912257e8/attachment.bin From cain at cshl.edu Fri Jun 16 15:34:13 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 16 Jun 2006 15:34:13 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <449306CF.1030301@cornell.edu> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> Message-ID: <1150486453.4412.30.camel@localhost.localdomain> That's OK--You added a few items that should be escaped that weren't, so I added those too. Thanks, Scott On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: > Woops, I should have said something about that. I submitted it before > I saw that Scott had already done the escaping in CVS. > > Chris Fields wrote: > > Scott, > > > > Looks like Robert also submitted a bug report related to this as well. > > Could you check into it (pretty-please)? I'm still GFF3-illiterate. > > > > http://bugzilla.open-bio.org/show_bug.cgi?id=2025 > > > > Chris > > > > > > > -----Original Message----- > > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > > bounces at lists.open-bio.org] On Behalf Of Scott Cain > > > Sent: Friday, June 16, 2006 9:18 AM > > > To: Robert Buels > > > Cc: bioperl-l at bioperl.org > > > Subject: Re: [Bioperl-l] reading and writing GFF3 > > > > > > Hi Rob, > > > > > > I sympathize with your pain--writing GFF3 isn't as easy as writing GFF2, > > > but that is actually a good thing. The tighter constraints results in a > > > better, more consistent file format. > > > > > > The reason only BSF::Annotated features are writable is that there needs > > > to be tight control on the 'type' of the feature, to insure that the > > > type is part of the Sequence Ontology. It also makes it much easier to > > > properly write out the attributes in the ninth column, particularly the > > > ones that are 'reserved', like Parent, Dbxref, and Ontology_term. > > > > > > BTG is still usable, but the GFF3 it puts out is actually more > > > 'GFF3-like'; that is, it looks like GFF3, but because there are no > > > constraints on the type and the terms that are used in the ninth column, > > > you have to be very careful using it to produce GFF3, by making sure > > > that your feature objects conform to the standard before BTG tries to > > > write them out. (Of course, one way to do that would be to convert your > > > feature objects to BSF::Annotated objects, but then you could use > > > BFIO::gff :-) > > > > > > [Long pause while scott goes and monkeys with Bio::Tools::GFF] > > > > > > OK, I just committed a fix Bio::Tools::GFF which produces valid GFF3 for > > > your sample data. Conveniently, since 'nucleotide_motif' is a SO term, > > > this is completely valid. (I even fixed the escaping the of the stray > > > '=' in 'hind_R=2046'.) The output I get is this: > > > > > > ##gff-version 3 > > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2095 2556 > > > 918 - . Target=Contig151 325 832 > > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2590 2736 > > > 488 - . Target=Contig386 1 124 > > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 2787 3105 > > > 1718 + . Target=Contig358 1 311 > > > C08HBa0001K22.1 RepeatMasker nucleotide_motif 3974 4036 > > > 312 - . Target=hind_R%3D2046 59 120 > > > > > > Scott > > > > > > > > > > > > On Thu, 2006-06-15 at 18:37 -0700, Robert Buels wrote: > > > > > > > There is stuff in bioperl for reading and writing GFF3. There's > > > > Bio::Tools::GFF. There's Bio::FeatureIO::gff. Are there more? Which > > > > is the 'best' one to use? > > > > > > > > Neither of these is working very well for me. > > > > > > > > My proximate use case is reading in a RepeatMasker report with > > > > Bio::Tools::RepeatMasker, which spits out FeaturePair objects, then > > > > writing those out to a GFF3 file. > > > > > > > > Bio::Tools::GFF will take these things and write out something that > > > > closely resembles GFF3, but with Target attributes that don't seem to > > > > comply with Lincoln's GFF3 spec, since its coordinates are join()ed with > > > > commas instead of spaces. I'm attaching a little script that > > > > illustrates this. > > > > > > > > Bio::FeatureIO::gff refuses to take these FeaturePairs or either of the > > > > features contained in them, throwing 'only Bio::SeqFeature::Annotated > > > > objects are writeable'. This seems a bit silly, since one of the whole > > > > points of Bioperl is using polymorphism to make it easy to connect > > > > things together. I've attached a little script to illustrate this one > > > > > > > too. > > > > > > > So my questions are: what _should_ I be doing here? Is Bio::Tools::GFF > > > > deprecated? Why does Bio::FeatureIO::gff only accept > > > > Bio::SeqFeature::Annotated objects? > > > > > > > > Thanks in advance. > > > > > > > > Rob > > > > > > > > _______________________________________________ > > > > Bioperl-l mailing list > > > > Bioperl-l at lists.open-bio.org > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > -- > > > ------------------------------------------------------------------------ > > > Scott Cain, Ph. D. cain at cshl.edu > > > GMOD Coordinator (http://www.gmod.org/) 216-392-3087 > > > Cold Spring Harbor Laboratory > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > -- > Robert Buels > SGN Bioinformatics Analyst > 252A Emerson Hall, Cornell University > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/3dfde2ea/attachment.bin From cain at cshl.edu Fri Jun 16 15:55:31 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 16 Jun 2006 15:55:31 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <449307B8.5040802@cornell.edu> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> Message-ID: <1150487731.4412.35.camel@localhost.localdomain> Um, yeah, good question. The reason I didn't answer you when you wrote before is that I was hoping for divine inspiration for an answer (or for somebody else to answer, which would have been really great :-) The short answer (and easy one for me to type) is that you will probably need an ad hoc method to do it, which is the same thing I do when I need to convert gff2 to gff3, to make sure the things I need mapped get mapped the 'right' way (that is, the way I want them to go). I don't have any sample code that does this, but if you want to start working up an ad hoc method, I will certainly try to help you as much as I can. Scott On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: > So about that converting ye olde feature objects into > Bio::SeqFeature::Annotated objects. How do I do it? > > > Scott Cain wrote: > > That's OK--You added a few items that should be escaped that weren't, so > > I added those too. > > > > Thanks, > > Scott > > > > > > On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: > > > >> Woops, I should have said something about that. I submitted it before > >> I saw that Scott had already done the escaping in CVS. > >> > >> Chris Fields wrote: > >> > >>> Scott, > >>> > >>> Looks like Robert also submitted a bug report related to this as well From rmb32 at cornell.edu Fri Jun 16 16:31:08 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 16 Jun 2006 13:31:08 -0700 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <1150487731.4412.35.camel@localhost.localdomain> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> Message-ID: <4493150C.1080909@cornell.edu> Rather than cobble together some ad-hoc solution, I would be interested in working on a good solution to this problem, because it seems like it's just going to get more common as more people start wanting to write GFF3. What about some code in whatever customarily makes these objects (probably BSF::Annotated's new() method?) that could take another type of Feature object and attempt to shoehorn its data into a new BSF::Annotated? If it failed (because the type isn't in SO or whatever), it could throw() some informative error message. Then, people could write straightforward code something like: while(my $oldstylefeature = $features_in->next_feature) { $oldstylefeature->primary_tag('something_that_is_in_so'); $oldstylefeature->something_else('some other something that needs to be changed for compliance'); my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature); $gff3_out->write_feature($newfeature); } Does that sound like a good idea? I'd be more than willing to implement this, since I'm going to need to do this sort of thing with many more things than just RepeatMasker. Rob Scott Cain wrote: > Um, yeah, good question. The reason I didn't answer you when you wrote > before is that I was hoping for divine inspiration for an answer (or for > somebody else to answer, which would have been really great :-) > > The short answer (and easy one for me to type) is that you will probably > need an ad hoc method to do it, which is the same thing I do when I need > to convert gff2 to gff3, to make sure the things I need mapped get > mapped the 'right' way (that is, the way I want them to go). I don't > have any sample code that does this, but if you want to start working up > an ad hoc method, I will certainly try to help you as much as I can. > > Scott > > > On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: > >> So about that converting ye olde feature objects into >> Bio::SeqFeature::Annotated objects. How do I do it? >> >> >> Scott Cain wrote: >> >>> That's OK--You added a few items that should be escaped that weren't, so >>> I added those too. >>> >>> Thanks, >>> Scott >>> >>> >>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: >>> >>> >>>> Woops, I should have said something about that. I submitted it before >>>> I saw that Scott had already done the escaping in CVS. >>>> >>>> Chris Fields wrote: >>>> >>>> >>>>> Scott, >>>>> >>>>> Looks like Robert also submitted a bug report related to this as well= >>>>> ------------------------------------------------------------------------ >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From rmb32 at cornell.edu Sat Jun 17 06:36:59 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 17 Jun 2006 03:36:59 -0700 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> <1150516605.2600.9.camel@localhost.localdomain> Message-ID: <4493DB4B.4020509@cornell.edu> Yep. I'm almost finished with the first draft of a function that does this. I'll polish it up over the weekend then on Monday I'll submit a bugzilla bug and patch with it so you can take a look. Rob Scott Cain wrote: > Rob, > > I came to the same conclusion as well; I wrote my response as I was > heading out the door and while I was running errands, I realized the > right thing to do is to write a Bio::SeqFeature::Annotated method called > new_from_object, whose usage would be: > > my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args); > > where you would give it a Bio::SeqFeatureI compliant object and try to > create a BSFA like use suggested below. You could allow passing in args > to control how different things are handled, like mapping non-SO types > to SO types. I'll think about this over the weekend and let you know if > brilliance strikes me. > > Scott > > > On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote: > >> Rather than cobble together some ad-hoc solution, I would be interested >> in working on a good solution to this problem, because it seems like >> it's just going to get more common as more people start wanting to write >> GFF3. What about some code in whatever customarily makes these objects >> (probably BSF::Annotated's new() method?) that could take another type >> of Feature object and attempt to shoehorn its data into a new >> BSF::Annotated? If it failed (because the type isn't in SO or >> whatever), it could throw() some informative error message. >> >> Then, people could write straightforward code something like: >> >> while(my $oldstylefeature = $features_in->next_feature) { >> $oldstylefeature->primary_tag('something_that_is_in_so'); >> $oldstylefeature->something_else('some other something that needs to >> be changed for compliance'); >> my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature); >> $gff3_out->write_feature($newfeature); >> } >> >> Does that sound like a good idea? I'd be more than willing to implement >> this, since I'm going to need to do this sort of thing with many more >> things than just RepeatMasker. >> >> Rob >> >> Scott Cain wrote: >> >>> Um, yeah, good question. The reason I didn't answer you when you wrote >>> before is that I was hoping for divine inspiration for an answer (or for >>> somebody else to answer, which would have been really great :-) >>> >>> The short answer (and easy one for me to type) is that you will probably >>> need an ad hoc method to do it, which is the same thing I do when I need >>> to convert gff2 to gff3, to make sure the things I need mapped get >>> mapped the 'right' way (that is, the way I want them to go). I don't >>> have any sample code that does this, but if you want to start working up >>> an ad hoc method, I will certainly try to help you as much as I can. >>> >>> Scott >>> >>> >>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: >>> >>> >>>> So about that converting ye olde feature objects into >>>> Bio::SeqFeature::Annotated objects. How do I do it? >>>> >>>> >>>> Scott Cain wrote: >>>> >>>> >>>>> That's OK--You added a few items that should be escaped that weren't, so >>>>> I added those too. >>>>> >>>>> Thanks, >>>>> Scott >>>>> >>>>> >>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: >>>>> >>>>> >>>>> >>>>>> Woops, I should have said something about that. I submitted it before >>>>>> I saw that Scott had already done the escaping in CVS. >>>>>> >>>>>> Chris Fields wrote: >>>>>> >>>>>> >>>>>> >>>>>>> Scott, >>>>>>> >>>>>>> Looks like Robert also submitted a bug report related to this as well= >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cain at cshl.edu Fri Jun 16 23:56:44 2006 From: cain at cshl.edu (Scott Cain) Date: Fri, 16 Jun 2006 23:56:44 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <4493150C.1080909@cornell.edu> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> Message-ID: <1150516605.2600.9.camel@localhost.localdomain> Rob, I came to the same conclusion as well; I wrote my response as I was heading out the door and while I was running errands, I realized the right thing to do is to write a Bio::SeqFeature::Annotated method called new_from_object, whose usage would be: my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, %args); where you would give it a Bio::SeqFeatureI compliant object and try to create a BSFA like use suggested below. You could allow passing in args to control how different things are handled, like mapping non-SO types to SO types. I'll think about this over the weekend and let you know if brilliance strikes me. Scott On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote: > Rather than cobble together some ad-hoc solution, I would be interested > in working on a good solution to this problem, because it seems like > it's just going to get more common as more people start wanting to write > GFF3. What about some code in whatever customarily makes these objects > (probably BSF::Annotated's new() method?) that could take another type > of Feature object and attempt to shoehorn its data into a new > BSF::Annotated? If it failed (because the type isn't in SO or > whatever), it could throw() some informative error message. > > Then, people could write straightforward code something like: > > while(my $oldstylefeature = $features_in->next_feature) { > $oldstylefeature->primary_tag('something_that_is_in_so'); > $oldstylefeature->something_else('some other something that needs to > be changed for compliance'); > my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature); > $gff3_out->write_feature($newfeature); > } > > Does that sound like a good idea? I'd be more than willing to implement > this, since I'm going to need to do this sort of thing with many more > things than just RepeatMasker. > > Rob > > Scott Cain wrote: > > Um, yeah, good question. The reason I didn't answer you when you wrote > > before is that I was hoping for divine inspiration for an answer (or for > > somebody else to answer, which would have been really great :-) > > > > The short answer (and easy one for me to type) is that you will probably > > need an ad hoc method to do it, which is the same thing I do when I need > > to convert gff2 to gff3, to make sure the things I need mapped get > > mapped the 'right' way (that is, the way I want them to go). I don't > > have any sample code that does this, but if you want to start working up > > an ad hoc method, I will certainly try to help you as much as I can. > > > > Scott > > > > > > On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: > > > >> So about that converting ye olde feature objects into > >> Bio::SeqFeature::Annotated objects. How do I do it? > >> > >> > >> Scott Cain wrote: > >> > >>> That's OK--You added a few items that should be escaped that weren't, so > >>> I added those too. > >>> > >>> Thanks, > >>> Scott > >>> > >>> > >>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: > >>> > >>> > >>>> Woops, I should have said something about that. I submitted it before > >>>> I saw that Scott had already done the escaping in CVS. > >>>> > >>>> Chris Fields wrote: > >>>> > >>>> > >>>>> Scott, > >>>>> > >>>>> Looks like Robert also submitted a bug report related to this as well= > >>>>> ------------------------------------------------------------------------ > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060616/7ff49e0d/attachment.bin From hlapp at gmx.net Sat Jun 17 12:20:08 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 17 Jun 2006 12:20:08 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <1150516605.2600.9.camel@localhost.localdomain> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> <1150516605.2600.9.camel@localhost.localdomain> Message-ID: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 You don't need a new method for this. Instead, support a -feature argument. my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature); This should work for any instance of Bio::SeqFeatureI. If it is a B::SF::Annotated already it is obviously just a deep copy (if copy is desired - could be another parameter). Otherwise more will be involved. Alternatively, and possibly better, is to write a specialized SeqFeatureI factory (that would implement Bio::Factory::ObjectFactoryI) and then delegate this job to it: my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new( -type_ontology => $sequence_ontology, -source_ontology => $feature_source_ontology, -unflatten => 1); my $bsfa = $feat_factory->create_object({-feature => $feature}); This is preferable because it separates business logic that isn't necessarily related into defined units. I.e., the logic necessary to convert an ordinary feature into a strongly typed one is different from how to represent a strongly typed feature. IMHO anyway ... Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan started as the result of a discussion thread earlier this (or last?) year. Bio::SeqFeature::Annotated as such may as well be obsoleted, though not in concept. Maybe we need to get together again and thrash out a strategy; or a BOF at the GMOD meeting? I feel this does need a core group of people who care, hash out a strategy that will also solve the backwards compatibility problem with the current Bio::SeqFeatureI state-of- limbo, and allow us to implement the decisions with a few people in a concentrated effort. This will then also remove the only real large stumbling block towards a 1.6 release. Maybe we should think about a little pre-GMOD hackathon to clear up this mess? Scott, you'll be there a day early? I'll be already back and Jason I believe will still be in town, although he may have other commitments already. Nonetheless, it shouldn't really take that much but rather dedicated time, a whiteboard, and a few people who care thrashing this out and then do it. Thoughts? -hilmar On Jun 16, 2006, at 11:56 PM, Scott Cain wrote: > Rob, > > I came to the same conclusion as well; I wrote my response as I was > heading out the door and while I was running errands, I realized the > right thing to do is to write a Bio::SeqFeature::Annotated method > called > new_from_object, whose usage would be: > > my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object > ($my_BSFI, %args); > > where you would give it a Bio::SeqFeatureI compliant object and try to > create a BSFA like use suggested below. You could allow passing in > args > to control how different things are handled, like mapping non-SO types > to SO types. I'll think about this over the weekend and let you > know if > brilliance strikes me. > > Scott > > > On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote: >> Rather than cobble together some ad-hoc solution, I would be >> interested >> in working on a good solution to this problem, because it seems like >> it's just going to get more common as more people start wanting to >> write >> GFF3. What about some code in whatever customarily makes these >> objects >> (probably BSF::Annotated's new() method?) that could take another >> type >> of Feature object and attempt to shoehorn its data into a new >> BSF::Annotated? If it failed (because the type isn't in SO or >> whatever), it could throw() some informative error message. >> >> Then, people could write straightforward code something like: >> >> while(my $oldstylefeature = $features_in->next_feature) { >> $oldstylefeature->primary_tag('something_that_is_in_so'); >> $oldstylefeature->something_else('some other something that >> needs to >> be changed for compliance'); >> my $newfeature = Bio::SeqFeature::Annotated->new >> ($oldstylefeature); >> $gff3_out->write_feature($newfeature); >> } >> >> Does that sound like a good idea? I'd be more than willing to >> implement >> this, since I'm going to need to do this sort of thing with many more >> things than just RepeatMasker. >> >> Rob >> >> Scott Cain wrote: >>> Um, yeah, good question. The reason I didn't answer you when you >>> wrote >>> before is that I was hoping for divine inspiration for an answer >>> (or for >>> somebody else to answer, which would have been really great :-) >>> >>> The short answer (and easy one for me to type) is that you will >>> probably >>> need an ad hoc method to do it, which is the same thing I do when >>> I need >>> to convert gff2 to gff3, to make sure the things I need mapped get >>> mapped the 'right' way (that is, the way I want them to go). I >>> don't >>> have any sample code that does this, but if you want to start >>> working up >>> an ad hoc method, I will certainly try to help you as much as I can. >>> >>> Scott >>> >>> >>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: >>> >>>> So about that converting ye olde feature objects into >>>> Bio::SeqFeature::Annotated objects. How do I do it? >>>> >>>> >>>> Scott Cain wrote: >>>> >>>>> That's OK--You added a few items that should be escaped that >>>>> weren't, so >>>>> I added those too. >>>>> >>>>> Thanks, >>>>> Scott >>>>> >>>>> >>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: >>>>> >>>>> >>>>>> Woops, I should have said something about that. I submitted >>>>>> it before >>>>>> I saw that Scott had already done the escaping in CVS. >>>>>> >>>>>> Chris Fields wrote: >>>>>> >>>>>> >>>>>>> Scott, >>>>>>> >>>>>>> Looks like Robert also submitted a bug report related to this >>>>>>> as well= >>>>>>> ---------------------------------------------------------------- >>>>>>> -------- >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Bioperl-l mailing list >>>>>>> Bioperl-l at lists.open-bio.org >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > -- > ---------------------------------------------------------------------- > -- > Scott Cain, Ph. D. > cain at cshl.edu > GMOD Coordinator (http://www.gmod.org/) > 216-392-3087 > Cold Spring Harbor Laboratory > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l - -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (Darwin) iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V ImoAXD/jrbF0gXzSr2CY4tQ= =XfDq -----END PGP SIGNATURE----- From rmb32 at cornell.edu Sat Jun 17 14:36:28 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 17 Jun 2006 11:36:28 -0700 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> <1150516605.2600.9.camel@localhost.localdomain> <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> Message-ID: <44944BAC.7000302@cornell.edu> I'd love to help more with this, since with the new tomato genome coming in my job is going to be working more and more with annotations, but I'm not a core person and I can't go to the meeting in NC. In the interests of getting my job done right now, I've implemented a -feature argument to Bio::SeqFeature::Annotated's constructor, which calls uses a method from_feature() I added. If you guys want it, it's attached to bug 2026. From the perspective of a casual bioperl user, anything you guys can do to make the handling of features and annotations less fragmented and more robust would be wonderful. I'd be happy to help with implementation if one of you grizzled veterans would give me marching orders. :-) Rob Hilmar Lapp wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > You don't need a new method for this. Instead, support a -feature > argument. > > my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature); > > This should work for any instance of Bio::SeqFeatureI. If it is a > B::SF::Annotated already it is obviously just a deep copy (if copy is > desired - could be another parameter). Otherwise more will be involved. > > Alternatively, and possibly better, is to write a specialized > SeqFeatureI factory (that would implement > Bio::Factory::ObjectFactoryI) and then delegate this job to it: > > my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new( > -type_ontology => $sequence_ontology, > -source_ontology => $feature_source_ontology, > -unflatten => 1); > my $bsfa = $feat_factory->create_object({-feature => $feature}); > > This is preferable because it separates business logic that isn't > necessarily related into defined units. I.e., the logic necessary to > convert an ordinary feature into a strongly typed one is different > from how to represent a strongly typed feature. IMHO anyway ... > > Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan > started as the result of a discussion thread earlier this (or last?) > year. Bio::SeqFeature::Annotated as such may as well be obsoleted, > though not in concept. > > Maybe we need to get together again and thrash out a strategy; or a > BOF at the GMOD meeting? I feel this does need a core group of people > who care, hash out a strategy that will also solve the backwards > compatibility problem with the current Bio::SeqFeatureI > state-of-limbo, and allow us to implement the decisions with a few > people in a concentrated effort. This will then also remove the only > real large stumbling block towards a 1.6 release. > > Maybe we should think about a little pre-GMOD hackathon to clear up > this mess? Scott, you'll be there a day early? I'll be already back > and Jason I believe will still be in town, although he may have other > commitments already. Nonetheless, it shouldn't really take that much > but rather dedicated time, a whiteboard, and a few people who care > thrashing this out and then do it. > > Thoughts? > > -hilmar > > On Jun 16, 2006, at 11:56 PM, Scott Cain wrote: > >> Rob, >> >> I came to the same conclusion as well; I wrote my response as I was >> heading out the door and while I was running errands, I realized the >> right thing to do is to write a Bio::SeqFeature::Annotated method called >> new_from_object, whose usage would be: >> >> my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object($my_BSFI, >> %args); >> >> where you would give it a Bio::SeqFeatureI compliant object and try to >> create a BSFA like use suggested below. You could allow passing in args >> to control how different things are handled, like mapping non-SO types >> to SO types. I'll think about this over the weekend and let you know if >> brilliance strikes me. >> >> Scott >> >> >> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote: >>> Rather than cobble together some ad-hoc solution, I would be interested >>> in working on a good solution to this problem, because it seems like >>> it's just going to get more common as more people start wanting to >>> write >>> GFF3. What about some code in whatever customarily makes these objects >>> (probably BSF::Annotated's new() method?) that could take another type >>> of Feature object and attempt to shoehorn its data into a new >>> BSF::Annotated? If it failed (because the type isn't in SO or >>> whatever), it could throw() some informative error message. >>> >>> Then, people could write straightforward code something like: >>> >>> while(my $oldstylefeature = $features_in->next_feature) { >>> $oldstylefeature->primary_tag('something_that_is_in_so'); >>> $oldstylefeature->something_else('some other something that >>> needs to >>> be changed for compliance'); >>> my $newfeature = Bio::SeqFeature::Annotated->new($oldstylefeature); >>> $gff3_out->write_feature($newfeature); >>> } >>> >>> Does that sound like a good idea? I'd be more than willing to >>> implement >>> this, since I'm going to need to do this sort of thing with many more >>> things than just RepeatMasker. >>> >>> Rob >>> >>> Scott Cain wrote: >>>> Um, yeah, good question. The reason I didn't answer you when you >>>> wrote >>>> before is that I was hoping for divine inspiration for an answer >>>> (or for >>>> somebody else to answer, which would have been really great :-) >>>> >>>> The short answer (and easy one for me to type) is that you will >>>> probably >>>> need an ad hoc method to do it, which is the same thing I do when I >>>> need >>>> to convert gff2 to gff3, to make sure the things I need mapped get >>>> mapped the 'right' way (that is, the way I want them to go). I don't >>>> have any sample code that does this, but if you want to start >>>> working up >>>> an ad hoc method, I will certainly try to help you as much as I can. >>>> >>>> Scott >>>> >>>> >>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: >>>> >>>>> So about that converting ye olde feature objects into >>>>> Bio::SeqFeature::Annotated objects. How do I do it? >>>>> >>>>> >>>>> Scott Cain wrote: >>>>> >>>>>> That's OK--You added a few items that should be escaped that >>>>>> weren't, so >>>>>> I added those too. >>>>>> >>>>>> Thanks, >>>>>> Scott >>>>>> >>>>>> >>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: >>>>>> >>>>>> >>>>>>> Woops, I should have said something about that. I submitted it >>>>>>> before >>>>>>> I saw that Scott had already done the escaping in CVS. >>>>>>> >>>>>>> Chris Fields wrote: >>>>>>> >>>>>>> >>>>>>>> Scott, >>>>>>>> >>>>>>>> Looks like Robert also submitted a bug report related to this >>>>>>>> as well= >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> -------------------------------------------------------------------------- >> >> Scott Cain, Ph. D. cain at cshl.edu >> GMOD Coordinator (http://www.gmod.org/) 216-392-3087 >> Cold Spring Harbor Laboratory >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > - -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (Darwin) > > iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V > ImoAXD/jrbF0gXzSr2CY4tQ= > =XfDq > -----END PGP SIGNATURE----- -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at uiuc.edu Sat Jun 17 16:21:37 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 17 Jun 2006 15:21:37 -0500 Subject: [Bioperl-l] OT : Re: reading and writing GFF3 In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> <1150516605.2600.9.camel@localhost.localdomain> <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> Message-ID: <1D0C8412-3705-47EF-9AAA-1DD0B09AD6B5@uiuc.edu> On Jun 17, 2006, at 11:20 AM, Hilmar Lapp wrote: > > Maybe we need to get together again and thrash out a strategy; or a > BOF at the GMOD meeting? I feel this does need a core group of people > who care, hash out a strategy that will also solve the backwards > compatibility problem with the current Bio::SeqFeatureI state-of- > limbo, and allow us to implement the decisions with a few people in a > concentrated effort. This will then also remove the only real large > stumbling block towards a 1.6 release. That would be fantastic! A bit OT, but if plans are afoot for a 1.6 release maybe the 'core group' that meets at NC could start drawing up a list of ideas/plans towards that release, even if it is still a ways off. A roadmap of sorts so the community knows where to put forth the majority of their effort and focus. Chris Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd.web at gmail.com Mon Jun 19 06:16:57 2006 From: bernd.web at gmail.com (Bernd Web) Date: Mon, 19 Jun 2006 12:16:57 +0200 Subject: [Bioperl-l] doc.bioperl Message-ID: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com> Hi, I just noted that it can happen that the pages at doc.bioperl.org state "No synopsis" whereas there is one in the PM file (use perldoc or the CVS). An example: http://doc.bioperl.org/releases/bioperl-current/bioperl-live/Bio/DB/Fasta.html No synopsis, No description, but http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup shows both. So, if you're looking for documentation don't forget to do e.g. "perldoc Bio::DB::Fasta" regards, bernd From cjfields at uiuc.edu Mon Jun 19 10:38:01 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Jun 2006 09:38:01 -0500 Subject: [Bioperl-l] doc.bioperl In-Reply-To: <716af09c0606190316l5038da7j480f96f423617d80@mail.gmail.com> Message-ID: <001501c693ad$f7689790$15327e82@pyrimidine> This has been reported as a bug: http://bugzilla.open-bio.org/show_bug.cgi?id=1926 Jason mentions in the bug report that the POD may contain something that messes with the way PDOC deals with code so should be rewritten. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bernd Web > Sent: Monday, June 19, 2006 5:17 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] doc.bioperl > > Hi, > > I just noted that it can happen that the pages at doc.bioperl.org > state "No synopsis" whereas there is one in the PM file (use perldoc > or the CVS). > An example: > > http://doc.bioperl.org/releases/bioperl-current/bioperl- > live/Bio/DB/Fasta.html > No synopsis, No description, but > > http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl- > live/Bio/DB/Fasta.pm?rev=1.43&content-type=text/vnd.viewcvs-markup > > shows both. > > So, if you're looking for documentation don't forget to do e.g. > "perldoc Bio::DB::Fasta" > > regards, > bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From dmessina at wustl.edu Mon Jun 19 10:59:23 2006 From: dmessina at wustl.edu (David Messina) Date: Mon, 19 Jun 2006 09:59:23 -0500 Subject: [Bioperl-l] YAPC anyone? Message-ID: <83485BEB-2457-4FD6-90B8-353228868C3A@wustl.edu> Hi, Just curious if any other BioPerlers will be at the YAPC conference in Chicago next week (http://yapcchicago.org/). Some of us from the WashU GSC will be there, and it might be fun to meet some other BioPerl people over lunch or something. If there's enough interest, I will organize. By the way, if you're unfamiliar with the conference and are interested in attending, I think registration is still open. The fee is low ($100). Dave -- Dave Messina Informatics Analyst WashU Genome Sequencing Center dmessina at wustl.edu 314-286-1415 From ClarkeW at AGR.GC.CA Mon Jun 19 18:34:37 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Mon, 19 Jun 2006 18:34:37 -0400 Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq Message-ID: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca> Hi, I am getting the following warning and then exception -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is 1 --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [ACTG*] which does not look healthy NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA sequence) when extracting display name and sequence from a MYSQL database. My code is as follows: my $sql = "select Clone_Name,Sequence from tbl_bgene"; my $sth = $dbh->prepare($sql); $sth->execute(); while (my $hash = $sth->fetchrow_hashref()) { # print("Name: ".$hash->{'Clone_Name'}."\n"); my $seq = new Bio::Seq( -display_id => $hash->{'Clone_Name'}, -seq => $hash->{'Sequence'}); $handle->write_seq($seq); # print("Sequence: ".$hash->{'Sequence'}."\n"); } For some reason it is failing on a particular sequence, which is a valid DNA sequence. If anyone has any ideas on why this is I would appreciate it. Thanks, Wayne From torsten.seemann at infotech.monash.edu.au Mon Jun 19 19:30:19 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 20 Jun 2006 09:30:19 +1000 Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca> References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca> Message-ID: <4497338B.3030609@infotech.monash.edu.au> > -------------------- WARNING --------------------- > MSG: seq doesn't validate, mismatch is 1 > MSG: Attempting to set the sequence to [ACTG*] which does not look > healthy > NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA > sequence) > For some reason it is failing on a particular sequence, which is a valid > DNA sequence. If anyone has any ideas on why this is I would appreciate > it. Usually a '*' indicates a STOP codon in a protein sequence. I don't think it is valid in a DNA sequence? So my guess is that BioPerl is auto-detecting it as Protein sequence, as A,C,T,G are all valid amino acids, and * is a stop codon. So I think BioPerl is doing the right thing. If you want to force it, try adding a " -alphabet=>'protein' " parameter to the Bio:Seq constructor. -- Torsten Seemann Victorian Bioinformatics Consortium, Monash University, Australia http://www.vicbioinformatics.com/ From taerwin at gmail.com Mon Jun 19 21:38:14 2006 From: taerwin at gmail.com (Tim Erwin) Date: Tue, 20 Jun 2006 11:38:14 +1000 Subject: [Bioperl-l] cap3 runnable? Message-ID: Hi all, Does anyone have a runnable for cap3? There seems to be some discussion about one in the mailing archives ( http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot find any code. Regards, Tim From osborne1 at optonline.net Mon Jun 19 22:23:43 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Mon, 19 Jun 2006 22:23:43 -0400 Subject: [Bioperl-l] cap3 runnable? In-Reply-To: Message-ID: Tim, The code seems to be here, not clear if there's an executable: http://seq.cs.iastate.edu/download.html Brian O. On 6/19/06 9:38 PM, "Tim Erwin" wrote: > Hi all, > > Does anyone have a runnable for cap3? There seems to be some discussion > about one in the mailing archives ( > http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I cannot > find any code. > > > > Regards, > > Tim > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Mon Jun 19 23:23:26 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 19 Jun 2006 22:23:26 -0500 Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca> Message-ID: <000701c69418$e53b9110$15327e82@pyrimidine> You really haven't given us much to work with more than "this doesn't work." We need the following information, otherwise we can't do anything. 1) Bioperl version (1.4, 1.5.1, live) 2) OS 3) The exception trace (not just the chunk you've shown) 4) The full script. What is $handle? A Bio::SeqIO object? At first glance I would say Torsten's right, that it could be the '*' in the sequence. The problem is, I don't think validate_seq (from PrimarySeq and where the warning came from) distinguishes between nucleotides and amino acids, and it allows for '*' and various gap symbols in sequences. If this caused the problem, the error would be: MSG: seq doesn't validate, mismatch is * The actual error is: MSG: seq doesn't validate, mismatch is 1 It looks like something is being evaluated in the wrong context (scalar context is expected, but looks like it's evaluating a list). Maybe it thinks $hash->{'Sequence'} is a complex data type such as an array; hence the mismatch is 1. What do you get printing $hash using Data::Dumper? I tried using this anon hash and it work fine when a new Bio::Seq is constructed. my $hash = {'Clone_Name' => 'test', 'Sequence' => 'ACTG*'}; Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Monday, June 19, 2006 5:35 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq > > Hi, > > I am getting the following warning and then exception > > > > -------------------- WARNING --------------------- > > MSG: seq doesn't validate, mismatch is 1 > > --------------------------------------------------- > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Attempting to set the sequence to [ACTG*] which does not look > healthy > > > > NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA > sequence) > > > > when extracting display name and sequence from a MYSQL database. My code > is as follows: > > > > my $sql = "select Clone_Name,Sequence from tbl_bgene"; > > my $sth = $dbh->prepare($sql); > > $sth->execute(); > > while (my $hash = $sth->fetchrow_hashref()) { > > # print("Name: ".$hash->{'Clone_Name'}."\n"); > > my $seq = new Bio::Seq( -display_id => > $hash->{'Clone_Name'}, > > -seq => $hash->{'Sequence'}); > > $handle->write_seq($seq); > > # print("Sequence: ".$hash->{'Sequence'}."\n"); > > } > > > > For some reason it is failing on a particular sequence, which is a valid > DNA sequence. If anyone has any ideas on why this is I would appreciate > it. > > > > Thanks, Wayne > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From taerwin at gmail.com Mon Jun 19 23:05:13 2006 From: taerwin at gmail.com (Tim Erwin) Date: Tue, 20 Jun 2006 13:05:13 +1000 Subject: [Bioperl-l] cap3 runnable? In-Reply-To: References: Message-ID: Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3 Regards, Tim On 6/20/06, Brian Osborne wrote: > > Tim, > > The code seems to be here, not clear if there's an executable: > > http://seq.cs.iastate.edu/download.html > > > Brian O. > > > On 6/19/06 9:38 PM, "Tim Erwin" wrote: > > > Hi all, > > > > Does anyone have a runnable for cap3? There seems to be some discussion > > about one in the mailing archives ( > > http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I > cannot > > find any code. > > > > > > > > Regards, > > > > Tim > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > From torsten.seemann at infotech.monash.edu.au Mon Jun 19 23:07:12 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 20 Jun 2006 13:07:12 +1000 Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq In-Reply-To: <4497338B.3030609@infotech.monash.edu.au> References: <320530F83FA47047823E57F110DDEAADB15A64@onncrxms4.agr.gc.ca> <4497338B.3030609@infotech.monash.edu.au> Message-ID: <44976660.7030107@infotech.monash.edu.au> > If you want to force it, try adding a " -alphabet=>'protein' " parameter to the > Bio:Seq constructor. That should be -alphabet => 'dna'. D'oh! -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From Marc.Logghe at DEVGEN.com Tue Jun 20 03:13:22 2006 From: Marc.Logghe at DEVGEN.com (Marc Logghe) Date: Tue, 20 Jun 2006 09:13:22 +0200 Subject: [Bioperl-l] cap3 runnable? Message-ID: <0C528E3670D8CE4B8E013F6749231AA6D3D60B@ANTARESIA.be.devgen.com> It is about 3 years old and did not test it with the current bioperl release. Feel free to play with it. Cheers, Marc > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Tim Erwin > Sent: Tuesday, June 20, 2006 5:05 AM > To: Brian Osborne > Cc: Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] cap3 runnable? > > Sorry, I meant a bioperl wrapper (Bio::Tools::Run) for cap3 > > Regards, > > Tim > > On 6/20/06, Brian Osborne wrote: > > > > Tim, > > > > The code seems to be here, not clear if there's an executable: > > > > http://seq.cs.iastate.edu/download.html > > > > > > Brian O. > > > > > > On 6/19/06 9:38 PM, "Tim Erwin" wrote: > > > > > Hi all, > > > > > > Does anyone have a runnable for cap3? There seems to be some > > > discussion about one in the mailing archives ( > > > > http://bioperl.org/pipermail/bioperl-l/2004-July/016371.html) but I > > cannot > > > find any code. > > > > > > > > > > > > Regards, > > > > > > Tim > > > _______________________________________________ > > > Bioperl-l mailing list > > > Bioperl-l at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -------------- next part -------------- A non-text attachment was scrubbed... Name: Cap3.pm Type: application/octet-stream Size: 3374 bytes Desc: Cap3.pm Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/0976a7d9/attachment.obj From G.Tzotzos at unido.org Tue Jun 20 05:18:48 2006 From: G.Tzotzos at unido.org (George Tzotzos) Date: Tue, 20 Jun 2006 11:18:48 +0200 Subject: [Bioperl-l] Error message Message-ID: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org> I'm a BioPerl novice. I used CPAN to install BioPerl and run the following script to test the installation: use Bio::Perl; use strict; use warnings; my $seq_object = get_sequence('swissprot', "P09651"); write_sequence(">roa1.fasta", 'fasta', $seq_object); I used as argument both "ROA1_HUMAN" and "P09651". In both cases I get the message below. Any help on the nature of the problem and how to overcome it would be greatly appreciated. Thanks George ------------- EXCEPTION ------------- MSG: swissprot stream with no ID. Not swissprot in my book STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ swiss.pm:179 STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ WebDBSeqI.pm:153 STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 STACK toplevel tut2.pl:5 George T. Tzotzos Ph.D Wagramerstrasse 5 A-1400 Vienna Austria Email: g.tzotzos at unido.org From G.Tzotzos at unido.org Tue Jun 20 07:36:18 2006 From: G.Tzotzos at unido.org (George Tzotzos) Date: Tue, 20 Jun 2006 13:36:18 +0200 Subject: [Bioperl-l] Error message Message-ID: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org> I'm a BioPerl novice. I used CPAN to install BioPerl and run the following script to test the installation: use Bio::Perl; use strict; use warnings; my $seq_object = get_sequence('swissprot', "P09651"); write_sequence(">roa1.fasta", 'fasta', $seq_object); I used as argument both "ROA1_HUMAN" and "P09651". In both cases I get the message below. Any help on the nature of the problem and how to overcome it would be greatly appreciated. Thanks George ------------- EXCEPTION ------------- MSG: swissprot stream with no ID. Not swissprot in my book STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ swiss.pm:179 STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ WebDBSeqI.pm:153 STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 STACK toplevel tut2.pl:5 George T. Tzotzos Ph.D Vienna, Austria From s-merchant at northwestern.edu Tue Jun 20 10:41:33 2006 From: s-merchant at northwestern.edu (Sohel Merchant) Date: Tue, 20 Jun 2006 09:41:33 -0500 Subject: [Bioperl-l] YAPC anyone? Message-ID: <002701c69477$9ffa7c10$c2987ca5@pc13> Hey Dave, I am doing a talk on dictyBase at the YAPC . I think it would be great to meet for lunch. Cheers, Sohel Merchant. dictyBase Northwestern University, Chicago > >Just curious if any other BioPerlers will be at the YAPC conference in >Chicago next week ( http://yapcchicago.org/). Some of us from the WashU >GSC will be there, and it might be fun to meet some other BioPerl >people over lunch or something. If there's enough interest, I will >organize. > >By the way, if you're unfamiliar with the conference and are interested >in attending, I think registration is still open. The fee is low >($100). > >Dave > > >-- From cain at cshl.edu Tue Jun 20 12:03:26 2006 From: cain at cshl.edu (Scott Cain) Date: Tue, 20 Jun 2006 12:03:26 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> <1150516605.2600.9.camel@localhost.localdomain> <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> Message-ID: <1150819406.2585.27.camel@localhost.localdomain> Hi Hilmar, Of course you are right--I was under the influence of a perl module that I work with that does something similar, but both of your solutions are better. I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a look this week. As for next week, I plan on spending the day at NESCent on Wednesday (though I haven't told Todd or Jeff that I am arriving early yet) just to make sure all the details are in place. I imagine I'll have a fair amount of free time to hash this stuff out. Anyone else who is in town (that is, in Durham, NC, USA) is welcome to come draw on a white board too. :-) Scott On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > You don't need a new method for this. Instead, support a -feature > argument. > > my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature); > > This should work for any instance of Bio::SeqFeatureI. If it is a > B::SF::Annotated already it is obviously just a deep copy (if copy is > desired - could be another parameter). Otherwise more will be involved. > > Alternatively, and possibly better, is to write a specialized > SeqFeatureI factory (that would implement > Bio::Factory::ObjectFactoryI) and then delegate this job to it: > > my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new( > -type_ontology => $sequence_ontology, > -source_ontology => $feature_source_ontology, > -unflatten => 1); > my $bsfa = $feat_factory->create_object({-feature => $feature}); > > This is preferable because it separates business logic that isn't > necessarily related into defined units. I.e., the logic necessary to > convert an ordinary feature into a strongly typed one is different > from how to represent a strongly typed feature. IMHO anyway ... > > Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan > started as the result of a discussion thread earlier this (or last?) > year. Bio::SeqFeature::Annotated as such may as well be obsoleted, > though not in concept. > > Maybe we need to get together again and thrash out a strategy; or a > BOF at the GMOD meeting? I feel this does need a core group of people > who care, hash out a strategy that will also solve the backwards > compatibility problem with the current Bio::SeqFeatureI state-of- > limbo, and allow us to implement the decisions with a few people in a > concentrated effort. This will then also remove the only real large > stumbling block towards a 1.6 release. > > Maybe we should think about a little pre-GMOD hackathon to clear up > this mess? Scott, you'll be there a day early? I'll be already back > and Jason I believe will still be in town, although he may have other > commitments already. Nonetheless, it shouldn't really take that much > but rather dedicated time, a whiteboard, and a few people who care > thrashing this out and then do it. > > Thoughts? > > -hilmar > > On Jun 16, 2006, at 11:56 PM, Scott Cain wrote: > > > Rob, > > > > I came to the same conclusion as well; I wrote my response as I was > > heading out the door and while I was running errands, I realized the > > right thing to do is to write a Bio::SeqFeature::Annotated method > > called > > new_from_object, whose usage would be: > > > > my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object > > ($my_BSFI, %args); > > > > where you would give it a Bio::SeqFeatureI compliant object and try to > > create a BSFA like use suggested below. You could allow passing in > > args > > to control how different things are handled, like mapping non-SO types > > to SO types. I'll think about this over the weekend and let you > > know if > > brilliance strikes me. > > > > Scott > > > > > > On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote: > >> Rather than cobble together some ad-hoc solution, I would be > >> interested > >> in working on a good solution to this problem, because it seems like > >> it's just going to get more common as more people start wanting to > >> write > >> GFF3. What about some code in whatever customarily makes these > >> objects > >> (probably BSF::Annotated's new() method?) that could take another > >> type > >> of Feature object and attempt to shoehorn its data into a new > >> BSF::Annotated? If it failed (because the type isn't in SO or > >> whatever), it could throw() some informative error message. > >> > >> Then, people could write straightforward code something like: > >> > >> while(my $oldstylefeature = $features_in->next_feature) { > >> $oldstylefeature->primary_tag('something_that_is_in_so'); > >> $oldstylefeature->something_else('some other something that > >> needs to > >> be changed for compliance'); > >> my $newfeature = Bio::SeqFeature::Annotated->new > >> ($oldstylefeature); > >> $gff3_out->write_feature($newfeature); > >> } > >> > >> Does that sound like a good idea? I'd be more than willing to > >> implement > >> this, since I'm going to need to do this sort of thing with many more > >> things than just RepeatMasker. > >> > >> Rob > >> > >> Scott Cain wrote: > >>> Um, yeah, good question. The reason I didn't answer you when you > >>> wrote > >>> before is that I was hoping for divine inspiration for an answer > >>> (or for > >>> somebody else to answer, which would have been really great :-) > >>> > >>> The short answer (and easy one for me to type) is that you will > >>> probably > >>> need an ad hoc method to do it, which is the same thing I do when > >>> I need > >>> to convert gff2 to gff3, to make sure the things I need mapped get > >>> mapped the 'right' way (that is, the way I want them to go). I > >>> don't > >>> have any sample code that does this, but if you want to start > >>> working up > >>> an ad hoc method, I will certainly try to help you as much as I can. > >>> > >>> Scott > >>> > >>> > >>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: > >>> > >>>> So about that converting ye olde feature objects into > >>>> Bio::SeqFeature::Annotated objects. How do I do it? > >>>> > >>>> > >>>> Scott Cain wrote: > >>>> > >>>>> That's OK--You added a few items that should be escaped that > >>>>> weren't, so > >>>>> I added those too. > >>>>> > >>>>> Thanks, > >>>>> Scott > >>>>> > >>>>> > >>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: > >>>>> > >>>>> > >>>>>> Woops, I should have said something about that. I submitted > >>>>>> it before > >>>>>> I saw that Scott had already done the escaping in CVS. > >>>>>> > >>>>>> Chris Fields wrote: > >>>>>> > >>>>>> > >>>>>>> Scott, > >>>>>>> > >>>>>>> Looks like Robert also submitted a bug report related to this > >>>>>>> as well= > >>>>>>> ---------------------------------------------------------------- > >>>>>>> -------- > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Bioperl-l mailing list > >>>>>>> Bioperl-l at lists.open-bio.org > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > -- > > ---------------------------------------------------------------------- > > -- > > Scott Cain, Ph. D. > > cain at cshl.edu > > GMOD Coordinator (http://www.gmod.org/) > > 216-392-3087 > > Cold Spring Harbor Laboratory > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > - -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (Darwin) > > iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V > ImoAXD/jrbF0gXzSr2CY4tQ= > =XfDq > -----END PGP SIGNATURE----- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain at cshl.edu GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/4b71554e/attachment-0001.bin From osborne1 at optonline.net Tue Jun 20 12:13:51 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 20 Jun 2006 12:13:51 -0400 Subject: [Bioperl-l] Error message In-Reply-To: <9B7CFB0E-7F0D-481F-83B4-FDE1A7000D20@unido.org> Message-ID: George, The docs I'm reading say to use 'swiss', not 'swissprot' but I think there's some other problem that may be specific to SwissProt. Can you retrieve from GenBank? E.g.: my $seq_object = get_sequence('genbank', 2); Brian O. On 6/20/06 7:36 AM, "George Tzotzos" wrote: > I'm a BioPerl novice. I used CPAN to install BioPerl and run the > following script to test the installation: > > use Bio::Perl; > use strict; > use warnings; > > my $seq_object = get_sequence('swissprot', "P09651"); > > write_sequence(">roa1.fasta", 'fasta', $seq_object); > > I used as argument both "ROA1_HUMAN" and "P09651". In both cases I > get the message below. > > Any help on the nature of the problem and how to overcome it would be > greatly appreciated. > > Thanks > > George > > > ------------- EXCEPTION ------------- > MSG: swissprot stream with no ID. Not swissprot in my book > STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ > swiss.pm:179 > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ > WebDBSeqI.pm:153 > STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 > STACK toplevel tut2.pl:5 > > > > George T. Tzotzos Ph.D > Vienna, Austria > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From G.Tzotzos at unido.org Tue Jun 20 12:21:32 2006 From: G.Tzotzos at unido.org (George Tzotzos) Date: Tue, 20 Jun 2006 18:21:32 +0200 Subject: [Bioperl-l] Error message In-Reply-To: References: Message-ID: <76750E11-3BD6-42EB-832D-3A12BC6B4BEE@unido.org> Brian Neither nor work. However, your suggestion does work fine. So does Chandan's. Many thanks to both. Cheers George On 20 Jun 2006, at 18:13, Brian Osborne wrote: > George, > > The docs I'm reading say to use 'swiss', not 'swissprot' but I > think there's > some other problem that may be specific to SwissProt. Can you > retrieve from > GenBank? E.g.: > > my $seq_object = get_sequence('genbank', 2); > > Brian O. > > > On 6/20/06 7:36 AM, "George Tzotzos" wrote: > >> I'm a BioPerl novice. I used CPAN to install BioPerl and run the >> following script to test the installation: >> >> use Bio::Perl; >> use strict; >> use warnings; >> >> my $seq_object = get_sequence('swissprot', "P09651"); >> >> write_sequence(">roa1.fasta", 'fasta', $seq_object); >> >> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I >> get the message below. >> >> Any help on the nature of the problem and how to overcome it would be >> greatly appreciated. >> >> Thanks >> >> George >> >> >> ------------- EXCEPTION ------------- >> MSG: swissprot stream with no ID. Not swissprot in my book >> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >> swiss.pm:179 >> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >> WebDBSeqI.pm:153 >> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >> STACK toplevel tut2.pl:5 >> >> >> >> George T. Tzotzos Ph.D >> Vienna, Austria >> >> >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From ClarkeW at AGR.GC.CA Tue Jun 20 12:57:34 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 20 Jun 2006 12:57:34 -0400 Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq Message-ID: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca> The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the trace is STACK: Error::throw STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328 STACK: Bio::PrimarySeq::seq /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267 STACK: Bio::PrimarySeq::new /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217 STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498 STACK: /home/wayne/bin/mast_fasta.pl:59 And the full script is attached. However I would like to clarify that the actual sequence is not ACTG*, this was a notation to represent that I had checked it to be sure that it was a valid DNA sequence but due to confidentiality I cannot disclose the actual sequence. I know this makes it more difficult and that I perhaps should have been clearer about this originally. The $handle is a Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone name 'Clone_Name' => 'sJ1485' }; then the error message. I hope this is more helpful than my last message. Thanks, Wayne -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Monday, June 19, 2006 9:23 PM To: Clarke, Wayne; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq You really haven't given us much to work with more than "this doesn't work." We need the following information, otherwise we can't do anything. 1) Bioperl version (1.4, 1.5.1, live) 2) OS 3) The exception trace (not just the chunk you've shown) 4) The full script. What is $handle? A Bio::SeqIO object? At first glance I would say Torsten's right, that it could be the '*' in the sequence. The problem is, I don't think validate_seq (from PrimarySeq and where the warning came from) distinguishes between nucleotides and amino acids, and it allows for '*' and various gap symbols in sequences. If this caused the problem, the error would be: MSG: seq doesn't validate, mismatch is * The actual error is: MSG: seq doesn't validate, mismatch is 1 It looks like something is being evaluated in the wrong context (scalar context is expected, but looks like it's evaluating a list). Maybe it thinks $hash->{'Sequence'} is a complex data type such as an array; hence the mismatch is 1. What do you get printing $hash using Data::Dumper? I tried using this anon hash and it work fine when a new Bio::Seq is constructed. my $hash = {'Clone_Name' => 'test', 'Sequence' => 'ACTG*'}; Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Monday, June 19, 2006 5:35 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq > > Hi, > > I am getting the following warning and then exception > > > > -------------------- WARNING --------------------- > > MSG: seq doesn't validate, mismatch is 1 > > --------------------------------------------------- > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Attempting to set the sequence to [ACTG*] which does not look > healthy > > > > NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA > sequence) > > > > when extracting display name and sequence from a MYSQL database. My code > is as follows: > > > > my $sql = "select Clone_Name,Sequence from tbl_bgene"; > > my $sth = $dbh->prepare($sql); > > $sth->execute(); > > while (my $hash = $sth->fetchrow_hashref()) { > > # print("Name: ".$hash->{'Clone_Name'}."\n"); > > my $seq = new Bio::Seq( -display_id => > $hash->{'Clone_Name'}, > > -seq => $hash->{'Sequence'}); > > $handle->write_seq($seq); > > # print("Sequence: ".$hash->{'Sequence'}."\n"); > > } > > > > For some reason it is failing on a particular sequence, which is a valid > DNA sequence. If anyone has any ideas on why this is I would appreciate > it. > > > > Thanks, Wayne > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -------------- next part -------------- A non-text attachment was scrubbed... Name: mast_fasta.pl Type: application/octet-stream Size: 1998 bytes Desc: mast_fasta.pl Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060620/53770697/attachment.obj From cjfields at uiuc.edu Tue Jun 20 13:16:32 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Jun 2006 12:16:32 -0500 Subject: [Bioperl-l] Error message In-Reply-To: Message-ID: <000c01c6948d$46e992d0$15327e82@pyrimidine> Brian, Brian, Looks like EBI switched the url parameter for swissprot 'swall' to 'UniProtKB'. I committed a change to Bio::DB::SwissProt in CVS which fixes this and solves the issue. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Brian Osborne > Sent: Tuesday, June 20, 2006 11:14 AM > To: George Tzotzos; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Error message > > George, > > The docs I'm reading say to use 'swiss', not 'swissprot' but I think > there's > some other problem that may be specific to SwissProt. Can you retrieve > from > GenBank? E.g.: > > my $seq_object = get_sequence('genbank', 2); > > Brian O. > > > On 6/20/06 7:36 AM, "George Tzotzos" wrote: > > > I'm a BioPerl novice. I used CPAN to install BioPerl and run the > > following script to test the installation: > > > > use Bio::Perl; > > use strict; > > use warnings; > > > > my $seq_object = get_sequence('swissprot', "P09651"); > > > > write_sequence(">roa1.fasta", 'fasta', $seq_object); > > > > I used as argument both "ROA1_HUMAN" and "P09651". In both cases I > > get the message below. > > > > Any help on the nature of the problem and how to overcome it would be > > greatly appreciated. > > > > Thanks > > > > George > > > > > > ------------- EXCEPTION ------------- > > MSG: swissprot stream with no ID. Not swissprot in my book > > STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ > > swiss.pm:179 > > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ > > WebDBSeqI.pm:153 > > STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 > > STACK toplevel tut2.pl:5 > > > > > > > > George T. Tzotzos Ph.D > > Vienna, Austria > > > > > > > > > > > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chandan.kr.singh at gmail.com Tue Jun 20 10:46:01 2006 From: chandan.kr.singh at gmail.com (CHANDAN SINGH) Date: Tue, 20 Jun 2006 20:16:01 +0530 Subject: [Bioperl-l] Error message In-Reply-To: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org> References: <7D3D8F5D-DBA6-4BCE-A396-5F3DB831DB35@unido.org> Message-ID: <2d4f320606200746ja53cebs73923c510b535c44@mail.gmail.com> Hi It seems the 'swall' servertype on EBI no longer exists. May be this has already been reported and debugged. I hope somebody throws light on it. As for George, if u r in hurry u can use Bio::DB::SwissProt module directly. Here is a typical code to do this use strict ; use warnings ; use Bio::DB::SwissProt ; use Bio::Perl ; my $seq_obj = new Bio::DB::SwissProt('-servertype' => 'expasy' , '-hostlocation' => 'us') ; my $seq = $seq_obj->get_Seq_by_id('ROA1_HUMAN') ; write_sequence("> roa.sp" , 'fasta' , $seq) ; See the module for any help . cheers Chandan On 6/20/06, George Tzotzos wrote: > > I'm a BioPerl novice. I used CPAN to install BioPerl and run the > following script to test the installation: > > use Bio::Perl; > use strict; > use warnings; > > my $seq_object = get_sequence('swissprot', "P09651"); > > write_sequence(">roa1.fasta", 'fasta', $seq_object); > > I used as argument both "ROA1_HUMAN" and "P09651". In both cases I > get the message below. > > Any help on the nature of the problem and how to overcome it would be > greatly appreciated. > > Thanks > > George > > > ------------- EXCEPTION ------------- > MSG: swissprot stream with no ID. Not swissprot in my book > STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ > swiss.pm:179 > STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ > WebDBSeqI.pm:153 > STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 > STACK toplevel tut2.pl:5 > > > > > > George T. Tzotzos Ph.D > > Wagramerstrasse 5 > A-1400 Vienna > Austria > > Email: g.tzotzos at unido.org > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From osborne1 at optonline.net Tue Jun 20 13:33:07 2006 From: osborne1 at optonline.net (Brian Osborne) Date: Tue, 20 Jun 2006 13:33:07 -0400 Subject: [Bioperl-l] Error message In-Reply-To: <000c01c6948d$46e992d0$15327e82@pyrimidine> Message-ID: Chris, You beat me to it! Brian O. On 6/20/06 1:16 PM, "Chris Fields" wrote: > Brian, > > Brian, > > Looks like EBI switched the url parameter for swissprot 'swall' to > 'UniProtKB'. I committed a change to Bio::DB::SwissProt in CVS which fixes > this and solves the issue. > > Chris > >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Brian Osborne >> Sent: Tuesday, June 20, 2006 11:14 AM >> To: George Tzotzos; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Error message >> >> George, >> >> The docs I'm reading say to use 'swiss', not 'swissprot' but I think >> there's >> some other problem that may be specific to SwissProt. Can you retrieve >> from >> GenBank? E.g.: >> >> my $seq_object = get_sequence('genbank', 2); >> >> Brian O. >> >> >> On 6/20/06 7:36 AM, "George Tzotzos" wrote: >> >>> I'm a BioPerl novice. I used CPAN to install BioPerl and run the >>> following script to test the installation: >>> >>> use Bio::Perl; >>> use strict; >>> use warnings; >>> >>> my $seq_object = get_sequence('swissprot', "P09651"); >>> >>> write_sequence(">roa1.fasta", 'fasta', $seq_object); >>> >>> I used as argument both "ROA1_HUMAN" and "P09651". In both cases I >>> get the message below. >>> >>> Any help on the nature of the problem and how to overcome it would be >>> greatly appreciated. >>> >>> Thanks >>> >>> George >>> >>> >>> ------------- EXCEPTION ------------- >>> MSG: swissprot stream with no ID. Not swissprot in my book >>> STACK Bio::SeqIO::swiss::next_seq /Library/Perl/5.8.6/Bio/SeqIO/ >>> swiss.pm:179 >>> STACK Bio::DB::WebDBSeqI::get_Seq_by_id /Library/Perl/5.8.6/Bio/DB/ >>> WebDBSeqI.pm:153 >>> STACK Bio::Perl::get_sequence /Library/Perl/5.8.6/Bio/Perl.pm:513 >>> STACK toplevel tut2.pl:5 >>> >>> >>> >>> George T. Tzotzos Ph.D >>> Vienna, Austria >>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Tue Jun 20 13:44:42 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Tue, 20 Jun 2006 13:44:42 -0400 Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq Message-ID: <320530F83FA47047823E57F110DDEAADB15A66@onncrxms4.agr.gc.ca> Hi all, It seems that there is a newline character which is causing the problem, this wasn't obvious at first due to the size of my shell window but that is what is giving the mismatch error. Thanks to Chris and Torsten for the help and for pointing me in the direction of validate_seq which was helpful in finding the problem. Cheers, Wayne -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Monday, June 19, 2006 9:23 PM To: Clarke, Wayne; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq You really haven't given us much to work with more than "this doesn't work." We need the following information, otherwise we can't do anything. 1) Bioperl version (1.4, 1.5.1, live) 2) OS 3) The exception trace (not just the chunk you've shown) 4) The full script. What is $handle? A Bio::SeqIO object? At first glance I would say Torsten's right, that it could be the '*' in the sequence. The problem is, I don't think validate_seq (from PrimarySeq and where the warning came from) distinguishes between nucleotides and amino acids, and it allows for '*' and various gap symbols in sequences. If this caused the problem, the error would be: MSG: seq doesn't validate, mismatch is * The actual error is: MSG: seq doesn't validate, mismatch is 1 It looks like something is being evaluated in the wrong context (scalar context is expected, but looks like it's evaluating a list). Maybe it thinks $hash->{'Sequence'} is a complex data type such as an array; hence the mismatch is 1. What do you get printing $hash using Data::Dumper? I tried using this anon hash and it work fine when a new Bio::Seq is constructed. my $hash = {'Clone_Name' => 'test', 'Sequence' => 'ACTG*'}; Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Clarke, Wayne > Sent: Monday, June 19, 2006 5:35 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq > > Hi, > > I am getting the following warning and then exception > > > > -------------------- WARNING --------------------- > > MSG: seq doesn't validate, mismatch is 1 > > --------------------------------------------------- > > > > ------------- EXCEPTION: Bio::Root::Exception ------------- > > MSG: Attempting to set the sequence to [ACTG*] which does not look > healthy > > > > NOTE: ACTG* represents a sequence of those 4 characters(a valid DNA > sequence) > > > > when extracting display name and sequence from a MYSQL database. My code > is as follows: > > > > my $sql = "select Clone_Name,Sequence from tbl_bgene"; > > my $sth = $dbh->prepare($sql); > > $sth->execute(); > > while (my $hash = $sth->fetchrow_hashref()) { > > # print("Name: ".$hash->{'Clone_Name'}."\n"); > > my $seq = new Bio::Seq( -display_id => > $hash->{'Clone_Name'}, > > -seq => $hash->{'Sequence'}); > > $handle->write_seq($seq); > > # print("Sequence: ".$hash->{'Sequence'}."\n"); > > } > > > > For some reason it is failing on a particular sequence, which is a valid > DNA sequence. If anyone has any ideas on why this is I would appreciate > it. > > > > Thanks, Wayne > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Tue Jun 20 13:55:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Jun 2006 12:55:28 -0500 Subject: [Bioperl-l] Bio::Root::Exception when using Bio::Seq In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A65@onncrxms4.agr.gc.ca> Message-ID: <000e01c69492$b74e0ec0$15327e82@pyrimidine> > -----Original Message----- > From: Clarke, Wayne [mailto:ClarkeW at AGR.GC.CA] > Sent: Tuesday, June 20, 2006 11:58 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: RE: [Bioperl-l] Bio::Root::Exception when using Bio::Seq > > > The Bioperl version is 1.4, the OS is Redhat linux fedora core 4, the > trace is > STACK: Error::throw > STACK: Bio::Root::Root::throw > /usr/lib/perl5/site_perl/5.8.0/Bio/Root/Root.pm:328 > STACK: Bio::PrimarySeq::seq > /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:267 > STACK: Bio::PrimarySeq::new > /usr/lib/perl5/site_perl/5.8.0/Bio/PrimarySeq.pm:217 > STACK: Bio::Seq::new /usr/lib/perl5/site_perl/5.8.0/Bio/Seq.pm:498 > STACK: /home/wayne/bin/mast_fasta.pl:59 > > And the full script is attached. Have you tried a newer version of Bioperl to see if it fixed the issue? v. 1.5.1 has been out for a bit now and it's pretty stable. > However I would like to clarify that the actual sequence is not ACTG*, > this was a notation to represent that I had checked it to be sure that > it was a valid DNA sequence but due to confidentiality I cannot disclose > the actual sequence. I know this makes it more difficult and that I > perhaps should have been clearer about this originally. That's not a problem. We run into that here a bit. Example data is fine. > The $handle is a > Bio::SeqIO object. When I use Data::Dumper I get a printout of the clone > name > > 'Clone_Name' => 'sJ1485' > }; > then the error message. I hope this is more helpful than my last > message. > > Thanks, Wayne Make sure you aren't using bioperl-specific methods when you run Data::Dumper on your hash or the script crashes. Okay, I was able to reproduce your error using PrimarySeq from v. 1.4 (BTW, the error message changes if you use a newer version of Bioperl but it is still there). See if you can follow me here... I used this script: ------------------------- use Bio::Seq; use Bio::SeqIO; use Data::Dumper; my $hash = {'Clone' => 'test', 'Sequence' => 'ACTG*'}; my $seqout = Bio::SeqIO->new (-format => 'fasta', -fh => \*STDOUT); print Dumper($hash); my $seq = Bio::Seq->new(-seq => $hash->{'Sequence'}, -display_id => $hash->{'Clone'}); $seqout->write_seq($seq); ------------------------- And everything works fine, with this output: $VAR1 = { 'Clone' => 'test', 'Sequence' => 'ACTG*' }; >test ACTG* Changing the anonymous hash to this causes the crash and error. my $hash = {'Clone' => 'test', 'Sequence' => ['ACTG*']}; Gets this: $VAR1 = { 'Clone' => 'test', 'Sequence' => [ 'ACTG*' ] }; -------------------- WARNING --------------------- MSG: seq doesn't validate, mismatch is 1 --------------------------------------------------- ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Attempting to set the sequence to [ARRAY(0x2354b0)] which does not look healthy STACK: Error::throw STACK: Bio::Root::Root::throw C:\Perl\src\bioperl\core/Bio/Root/Root.pm:328 STACK: Bio::PrimarySeq::seq C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:268 STACK: Bio::PrimarySeq::new C:\Perl\src\bioperl\core/Bio/PrimarySeq.pm:217 STACK: Bio::Seq::new C:\Perl\src\bioperl\core/Bio/Seq.pm:497 STACK: C:\Perl\Scripts\seq-test\test.pl:17 ----------------------------------------------------------- It could be that the sequence data is stored in another complex data type (object, hash) that's causing the problem. Looks like you retrieve your hash from another method ('my $hash = $sth->fetchrow_hashref()'); you might want to check that method to make sure you're getting the right kind of data into your hash. Chris From rmb32 at cornell.edu Tue Jun 20 14:09:38 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 20 Jun 2006 12:09:38 -0600 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <1150819406.2585.27.camel@localhost.localdomain> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> <1150516605.2600.9.camel@localhost.localdomain> <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> <1150819406.2585.27.camel@localhost.localdomain> Message-ID: <449839E2.5080402@cornell.edu> Getting to know this code a little better, I notice a couple of little things: 1.) my patch attached to bug 2026 draws unnecessary distinctions between feature types that use tags, and those that use annotations, since all features are now Bio::AnnotatableI's and the *_tags_* methods are implemented in AnnotatableI in terms of annotation objects now. You guys should probably just ignore it, since from the sound of it you're going to be changing all of this around anyway. Wish I could be there to help and learn more. 2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar accessors to use when translating Bio::Annotation::* objects to and from scalar tags. Seems to me, this would be much better accomplished by using polymorphism of some sort, probably adding a multipurpose as_tag() accessor in Bio::AnnotationI and the objects that implement it, then using that in Bio::AnnotatableI instead of %tag2text. Does this make sense, or am I misinterpreting something here? Reason I've noticed this is because I've been wrestling with how to translate Bio::Annotation::Target objects to and from scalar tag values, since a Target is being represented as an ordered list of 3 or 4 scalar tags in old things that were designed to interoperate with gff2, and I can't figure out a nice way to do it using the rather inflexible %tag2text mechanism. Sorry to be a pain, just wanted to get that in there before you guys start your jam session in Durham. Rob Scott Cain wrote: > Hi Hilmar, > > Of course you are right--I was under the influence of a perl module that > I work with that does something similar, but both of your solutions are > better. > > I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a > look this week. > > As for next week, I plan on spending the day at NESCent on Wednesday > (though I haven't told Todd or Jeff that I am arriving early yet) just > to make sure all the details are in place. I imagine I'll have a fair > amount of free time to hash this stuff out. Anyone else who is in town > (that is, in Durham, NC, USA) is welcome to come draw on a white board > too. :-) > > Scott > > > On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote: > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> You don't need a new method for this. Instead, support a -feature >> argument. >> >> my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature); >> >> This should work for any instance of Bio::SeqFeatureI. If it is a >> B::SF::Annotated already it is obviously just a deep copy (if copy is >> desired - could be another parameter). Otherwise more will be involved. >> >> Alternatively, and possibly better, is to write a specialized >> SeqFeatureI factory (that would implement >> Bio::Factory::ObjectFactoryI) and then delegate this job to it: >> >> my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new( >> -type_ontology => $sequence_ontology, >> -source_ontology => $feature_source_ontology, >> -unflatten => 1); >> my $bsfa = $feat_factory->create_object({-feature => $feature}); >> >> This is preferable because it separates business logic that isn't >> necessarily related into defined units. I.e., the logic necessary to >> convert an ordinary feature into a strongly typed one is different >> from how to represent a strongly typed feature. IMHO anyway ... >> >> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan >> started as the result of a discussion thread earlier this (or last?) >> year. Bio::SeqFeature::Annotated as such may as well be obsoleted, >> though not in concept. >> >> Maybe we need to get together again and thrash out a strategy; or a >> BOF at the GMOD meeting? I feel this does need a core group of people >> who care, hash out a strategy that will also solve the backwards >> compatibility problem with the current Bio::SeqFeatureI state-of- >> limbo, and allow us to implement the decisions with a few people in a >> concentrated effort. This will then also remove the only real large >> stumbling block towards a 1.6 release. >> >> Maybe we should think about a little pre-GMOD hackathon to clear up >> this mess? Scott, you'll be there a day early? I'll be already back >> and Jason I believe will still be in town, although he may have other >> commitments already. Nonetheless, it shouldn't really take that much >> but rather dedicated time, a whiteboard, and a few people who care >> thrashing this out and then do it. >> >> Thoughts? >> >> -hilmar >> >> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote: >> >> >>> Rob, >>> >>> I came to the same conclusion as well; I wrote my response as I was >>> heading out the door and while I was running errands, I realized the >>> right thing to do is to write a Bio::SeqFeature::Annotated method >>> called >>> new_from_object, whose usage would be: >>> >>> my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object >>> ($my_BSFI, %args); >>> >>> where you would give it a Bio::SeqFeatureI compliant object and try to >>> create a BSFA like use suggested below. You could allow passing in >>> args >>> to control how different things are handled, like mapping non-SO types >>> to SO types. I'll think about this over the weekend and let you >>> know if >>> brilliance strikes me. >>> >>> Scott >>> >>> >>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote: >>> >>>> Rather than cobble together some ad-hoc solution, I would be >>>> interested >>>> in working on a good solution to this problem, because it seems like >>>> it's just going to get more common as more people start wanting to >>>> write >>>> GFF3. What about some code in whatever customarily makes these >>>> objects >>>> (probably BSF::Annotated's new() method?) that could take another >>>> type >>>> of Feature object and attempt to shoehorn its data into a new >>>> BSF::Annotated? If it failed (because the type isn't in SO or >>>> whatever), it could throw() some informative error message. >>>> >>>> Then, people could write straightforward code something like: >>>> >>>> while(my $oldstylefeature = $features_in->next_feature) { >>>> $oldstylefeature->primary_tag('something_that_is_in_so'); >>>> $oldstylefeature->something_else('some other something that >>>> needs to >>>> be changed for compliance'); >>>> my $newfeature = Bio::SeqFeature::Annotated->new >>>> ($oldstylefeature); >>>> $gff3_out->write_feature($newfeature); >>>> } >>>> >>>> Does that sound like a good idea? I'd be more than willing to >>>> implement >>>> this, since I'm going to need to do this sort of thing with many more >>>> things than just RepeatMasker. >>>> >>>> Rob >>>> >>>> Scott Cain wrote: >>>> >>>>> Um, yeah, good question. The reason I didn't answer you when you >>>>> wrote >>>>> before is that I was hoping for divine inspiration for an answer >>>>> (or for >>>>> somebody else to answer, which would have been really great :-) >>>>> >>>>> The short answer (and easy one for me to type) is that you will >>>>> probably >>>>> need an ad hoc method to do it, which is the same thing I do when >>>>> I need >>>>> to convert gff2 to gff3, to make sure the things I need mapped get >>>>> mapped the 'right' way (that is, the way I want them to go). I >>>>> don't >>>>> have any sample code that does this, but if you want to start >>>>> working up >>>>> an ad hoc method, I will certainly try to help you as much as I can. >>>>> >>>>> Scott >>>>> >>>>> >>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: >>>>> >>>>> >>>>>> So about that converting ye olde feature objects into >>>>>> Bio::SeqFeature::Annotated objects. How do I do it? >>>>>> >>>>>> >>>>>> Scott Cain wrote: >>>>>> >>>>>> >>>>>>> That's OK--You added a few items that should be escaped that >>>>>>> weren't, so >>>>>>> I added those too. >>>>>>> >>>>>>> Thanks, >>>>>>> Scott >>>>>>> >>>>>>> >>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Woops, I should have said something about that. I submitted >>>>>>>> it before >>>>>>>> I saw that Scott had already done the escaping in CVS. >>>>>>>> >>>>>>>> Chris Fields wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Scott, >>>>>>>>> >>>>>>>>> Looks like Robert also submitted a bug report related to this >>>>>>>>> as well= >>>>>>>>> ---------------------------------------------------------------- >>>>>>>>> -------- >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Bioperl-l mailing list >>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>> -- >>> ---------------------------------------------------------------------- >>> -- >>> Scott Cain, Ph. D. >>> cain at cshl.edu >>> GMOD Coordinator (http://www.gmod.org/) >>> 216-392-3087 >>> Cold Spring Harbor Laboratory >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> - -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> >> >> >> -----BEGIN PGP SIGNATURE----- >> Version: GnuPG v1.4.2.2 (Darwin) >> >> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V >> ImoAXD/jrbF0gXzSr2CY4tQ= >> =XfDq >> -----END PGP SIGNATURE----- >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From hlapp at gmx.net Tue Jun 20 14:24:45 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 20 Jun 2006 14:24:45 -0400 Subject: [Bioperl-l] reading and writing GFF3 In-Reply-To: <449839E2.5080402@cornell.edu> References: <000b01c69178$cf0aeaa0$15327e82@pyrimidine> <449306CF.1030301@cornell.edu> <1150486453.4412.30.camel@localhost.localdomain> <449307B8.5040802@cornell.edu> <1150487731.4412.35.camel@localhost.localdomain> <4493150C.1080909@cornell.edu> <1150516605.2600.9.camel@localhost.localdomain> <35DEC5EE-B10F-42C5-A687-6B5FEC4B23AB@gmx.net> <1150819406.2585.27.camel@localhost.localdomain> <449839E2.5080402@cornell.edu> Message-ID: Yes, this is the sore problem area. AnnotatableI used to have only a single method (annotation()), the *_tag_* methods are new since 1.5 (and truly a developer release feature - don't rely on them staying). Likewise, the tag2text is an utterly ugly artifact (after all, this is an interface) rooted in the above addition. If we can't manage to remove it I'll remove my name from that module ;) -hilmar On Jun 20, 2006, at 2:09 PM, Robert Buels wrote: > Getting to know this code a little better, I notice a couple of little > things: > > 1.) my patch attached to bug 2026 draws unnecessary distinctions > between > feature types that use tags, and those that use annotations, since all > features are now Bio::AnnotatableI's and the *_tags_* methods are > implemented in AnnotatableI in terms of annotation objects now. You > guys should probably just ignore it, since from the sound of it you're > going to be changing all of this around anyway. Wish I could be there > to help and learn more. > > 2.) the %tag2text hash in Bio::AnnotatableI stores a list of scalar > accessors to use when translating Bio::Annotation::* objects to and > from > scalar tags. Seems to me, this would be much better accomplished by > using polymorphism of some sort, probably adding a multipurpose > as_tag() > accessor in Bio::AnnotationI and the objects that implement it, then > using that in Bio::AnnotatableI instead of %tag2text. Does this make > sense, or am I misinterpreting something here? Reason I've noticed > this > is because I've been wrestling with how to translate > Bio::Annotation::Target objects to and from scalar tag values, since a > Target is being represented as an ordered list of 3 or 4 scalar > tags in > old things that were designed to interoperate with gff2, and I can't > figure out a nice way to do it using the rather inflexible %tag2text > mechanism. > > Sorry to be a pain, just wanted to get that in there before you guys > start your jam session in Durham. > > Rob > > Scott Cain wrote: >> Hi Hilmar, >> >> Of course you are right--I was under the influence of a perl >> module that >> I work with that does something similar, but both of your >> solutions are >> better. >> >> I wasn't familiar with Bio::SeqFeature::TypedSeqFeatureI; I'll take a >> look this week. >> >> As for next week, I plan on spending the day at NESCent on Wednesday >> (though I haven't told Todd or Jeff that I am arriving early yet) >> just >> to make sure all the details are in place. I imagine I'll have a >> fair >> amount of free time to hash this stuff out. Anyone else who is in >> town >> (that is, in Durham, NC, USA) is welcome to come draw on a white >> board >> too. :-) >> >> Scott >> >> >> On Sat, 2006-06-17 at 12:20 -0400, Hilmar Lapp wrote: >> >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> You don't need a new method for this. Instead, support a -feature >>> argument. >>> >>> my $bsfa = Bio::SeqFeature::Annotated->new(-feature => $feature); >>> >>> This should work for any instance of Bio::SeqFeatureI. If it is a >>> B::SF::Annotated already it is obviously just a deep copy (if >>> copy is >>> desired - could be another parameter). Otherwise more will be >>> involved. >>> >>> Alternatively, and possibly better, is to write a specialized >>> SeqFeatureI factory (that would implement >>> Bio::Factory::ObjectFactoryI) and then delegate this job to it: >>> >>> my $feat_factory = Bio::SeqFeature::TypedFeatureFactory->new( >>> -type_ontology => $sequence_ontology, >>> -source_ontology => $feature_source_ontology, >>> -unflatten => 1); >>> my $bsfa = $feat_factory->create_object({-feature => $feature}); >>> >>> This is preferable because it separates business logic that isn't >>> necessarily related into defined units. I.e., the logic necessary to >>> convert an ordinary feature into a strongly typed one is different >>> from how to represent a strongly typed feature. IMHO anyway ... >>> >>> Also, don't dismiss the Bio::SeqFeature::TypedSeqFeatureI that Ewan >>> started as the result of a discussion thread earlier this (or last?) >>> year. Bio::SeqFeature::Annotated as such may as well be obsoleted, >>> though not in concept. >>> >>> Maybe we need to get together again and thrash out a strategy; or a >>> BOF at the GMOD meeting? I feel this does need a core group of >>> people >>> who care, hash out a strategy that will also solve the backwards >>> compatibility problem with the current Bio::SeqFeatureI state-of- >>> limbo, and allow us to implement the decisions with a few people >>> in a >>> concentrated effort. This will then also remove the only real large >>> stumbling block towards a 1.6 release. >>> >>> Maybe we should think about a little pre-GMOD hackathon to clear up >>> this mess? Scott, you'll be there a day early? I'll be already back >>> and Jason I believe will still be in town, although he may have >>> other >>> commitments already. Nonetheless, it shouldn't really take that much >>> but rather dedicated time, a whiteboard, and a few people who care >>> thrashing this out and then do it. >>> >>> Thoughts? >>> >>> -hilmar >>> >>> On Jun 16, 2006, at 11:56 PM, Scott Cain wrote: >>> >>> >>>> Rob, >>>> >>>> I came to the same conclusion as well; I wrote my response as I was >>>> heading out the door and while I was running errands, I realized >>>> the >>>> right thing to do is to write a Bio::SeqFeature::Annotated method >>>> called >>>> new_from_object, whose usage would be: >>>> >>>> my $my_BSFA = Bio::SeqFeature::Annotated->new_from_object >>>> ($my_BSFI, %args); >>>> >>>> where you would give it a Bio::SeqFeatureI compliant object and >>>> try to >>>> create a BSFA like use suggested below. You could allow passing in >>>> args >>>> to control how different things are handled, like mapping non-SO >>>> types >>>> to SO types. I'll think about this over the weekend and let you >>>> know if >>>> brilliance strikes me. >>>> >>>> Scott >>>> >>>> >>>> On Fri, 2006-06-16 at 13:31 -0700, Robert Buels wrote: >>>> >>>>> Rather than cobble together some ad-hoc solution, I would be >>>>> interested >>>>> in working on a good solution to this problem, because it seems >>>>> like >>>>> it's just going to get more common as more people start wanting to >>>>> write >>>>> GFF3. What about some code in whatever customarily makes these >>>>> objects >>>>> (probably BSF::Annotated's new() method?) that could take another >>>>> type >>>>> of Feature object and attempt to shoehorn its data into a new >>>>> BSF::Annotated? If it failed (because the type isn't in SO or >>>>> whatever), it could throw() some informative error message. >>>>> >>>>> Then, people could write straightforward code something like: >>>>> >>>>> while(my $oldstylefeature = $features_in->next_feature) { >>>>> $oldstylefeature->primary_tag('something_that_is_in_so'); >>>>> $oldstylefeature->something_else('some other something that >>>>> needs to >>>>> be changed for compliance'); >>>>> my $newfeature = Bio::SeqFeature::Annotated->new >>>>> ($oldstylefeature); >>>>> $gff3_out->write_feature($newfeature); >>>>> } >>>>> >>>>> Does that sound like a good idea? I'd be more than willing to >>>>> implement >>>>> this, since I'm going to need to do this sort of thing with >>>>> many more >>>>> things than just RepeatMasker. >>>>> >>>>> Rob >>>>> >>>>> Scott Cain wrote: >>>>> >>>>>> Um, yeah, good question. The reason I didn't answer you when you >>>>>> wrote >>>>>> before is that I was hoping for divine inspiration for an answer >>>>>> (or for >>>>>> somebody else to answer, which would have been really great :-) >>>>>> >>>>>> The short answer (and easy one for me to type) is that you will >>>>>> probably >>>>>> need an ad hoc method to do it, which is the same thing I do when >>>>>> I need >>>>>> to convert gff2 to gff3, to make sure the things I need mapped >>>>>> get >>>>>> mapped the 'right' way (that is, the way I want them to go). I >>>>>> don't >>>>>> have any sample code that does this, but if you want to start >>>>>> working up >>>>>> an ad hoc method, I will certainly try to help you as much as >>>>>> I can. >>>>>> >>>>>> Scott >>>>>> >>>>>> >>>>>> On Fri, 2006-06-16 at 12:34 -0700, Robert Buels wrote: >>>>>> >>>>>> >>>>>>> So about that converting ye olde feature objects into >>>>>>> Bio::SeqFeature::Annotated objects. How do I do it? >>>>>>> >>>>>>> >>>>>>> Scott Cain wrote: >>>>>>> >>>>>>> >>>>>>>> That's OK--You added a few items that should be escaped that >>>>>>>> weren't, so >>>>>>>> I added those too. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Scott >>>>>>>> >>>>>>>> >>>>>>>> On Fri, 2006-06-16 at 12:30 -0700, Robert Buels wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Woops, I should have said something about that. I submitted >>>>>>>>> it before >>>>>>>>> I saw that Scott had already done the escaping in CVS. >>>>>>>>> >>>>>>>>> Chris Fields wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Scott, >>>>>>>>>> >>>>>>>>>> Looks like Robert also submitted a bug report related to this >>>>>>>>>> as well= >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> --- >>>>>>>>>> -------- >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>>> >>>> -- >>>> ------------------------------------------------------------------- >>>> --- >>>> -- >>>> Scott Cain, Ph. D. >>>> cain at cshl.edu >>>> GMOD Coordinator (http://www.gmod.org/) >>>> 216-392-3087 >>>> Cold Spring Harbor Laboratory >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> - -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >>> >>> >>> >>> -----BEGIN PGP SIGNATURE----- >>> Version: GnuPG v1.4.2.2 (Darwin) >>> >>> iD8DBQFElCvAuV6N2JxL7qsRAhw1AJ9SaMR4tMFZCTrzimnEnDdjKqbPGgCgk38V >>> ImoAXD/jrbF0gXzSr2CY4tQ= >>> =XfDq >>> -----END PGP SIGNATURE----- >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -------------------------------------------------------------------- >>> ---- >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Robert Buels > SGN Bioinformatics Analyst > 252A Emerson Hall, Cornell University > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Tue Jun 20 16:22:45 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 20 Jun 2006 21:22:45 +0100 Subject: [Bioperl-l] Bio::Map changes Message-ID: <44985915.8010607@sendu.me.uk> Some initial changes have been made to some modules in Bio::Map to allow Positions to have a range (Bio::Map::PositionI implements Bio::RangeI). (see http://bugzilla.open-bio.org/show_bug.cgi?id=1998) Further changes are needed in some remaining Bio::Map modules for this addition to be complete (a number of Bio::Map related tests in the test suite currently fail), notably Bio::Map::Cyto* since they had implemented their own Range-related features. I propose bringing all Bio::Map into line so it behaves with and makes good use of the RangeI nature of Position. Beyond this initial change I want to add relative positioning and more, but I'll describe that in a future post to this thread. Can anyone see any issues with ranged positions (it's done in a backward compatible way)? Do any developers want to maintain control of a Bio::Map module or shall I just dive in? From cjfields at uiuc.edu Tue Jun 20 23:50:55 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 20 Jun 2006 22:50:55 -0500 Subject: [Bioperl-l] EUtilities interface Message-ID: <002301c694e5$e5f3a750$15327e82@pyrimidine> I'm working on a new eutilities interface which I hope to commit by late summer. It's basically a rewrite of WebDBSeqI/NCBIHelper. I set up a generic web database interface, which I call Bio::DB::WebDBI, and the EUtilities interface, Bio::DB::EUtilitiesI. The idea is that you can query NCBI for any information available via Entrez Utilities (i.e. taxonomy, pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only info like Bio::DB::WebDBSeqI. My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI. Does anyone think this will be an issue? Christopher Fields Postdoctoral Researcher - Switzer Lab Dept. of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Wed Jun 21 04:20:37 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 21 Jun 2006 09:20:37 +0100 Subject: [Bioperl-l] Bio::RangeI intersection proposal Message-ID: <44990155.6050501@sendu.me.uk> Bio::Map::PositionI (in bioperl-live) needs intersections of a list of ranges. It inherits from Bio::RangeI but unlike RangeI's union, intersection does not take a list. PositionI currently calls intersection repeatedly to handle a list. If there is no particular reason for this limitation, I propose making RangeI intersection handle lists natively. This won't do any harm to existing code at the time of the change, but its possible that someone has written a module that implements RangeI but overrides intersection (without making it accept a list), so that future code written that expects a RangeI to handle lists will break when getting a RangeI from that module. So the question is, has anyone overridden intersection in RangeI? Is the small risk of possible breakage compensated by the benefit of intersections of a list of ranges (which is surely useful in lots of situations, not just for PositionI)? I'm tempted to go ahead with this unless there are objections. From Bernhard.Schmalhofer at biomax.com Wed Jun 21 03:19:12 2006 From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer) Date: Wed, 21 Jun 2006 09:19:12 +0200 Subject: [Bioperl-l] YAPC anyone? In-Reply-To: <002701c69477$9ffa7c10$c2987ca5@pc13> References: <002701c69477$9ffa7c10$c2987ca5@pc13> Message-ID: <4498F2F0.7010203@biomax.com> Sohel Merchant wrote: > >>Just curious if any other BioPerlers will be at the YAPC conference in > > >>Chicago next week ( http://yapcchicago.org/). Not in chicago, but yesterday I got the OK from Biomax management to go the YAPC::Europe, http://www.birmingham2006.com/. So in the end of August I'll be in Birmingham. Yeah! Is anybody interested in writing parsers for Perl 6 there? CU, Bernhard -- ************************************************** Dipl.-Physiker Bernhard Schmalhofer Senior Developer Biomax Informatics AG Lochhamer Str. 11 82152 Martinsried, Germany Tel: +49 89 895574-839 Fax: +49 89 895574-825 eMail: Bernhard.Schmalhofer at biomax.com Website: www.biomax.com ************************************************** From cjfields at uiuc.edu Wed Jun 21 11:08:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Jun 2006 10:08:28 -0500 Subject: [Bioperl-l] YAPC anyone? In-Reply-To: <4498F2F0.7010203@biomax.com> Message-ID: <000301c69544$8d537710$15327e82@pyrimidine> Speaking of Perl6, there was interest here at one point in getting a bioperl-experimental going, which at this point in the game should involve Perl6. If there were enough interest in it we could probably get it set up via CVS and moving along. We might need to split the Perl6 stuff from Perl5 experimental modules in some way to prevent confusion (bioperl6-live???), though I'm not up to speed Perl6-wise so I'm not sure about namespace collisions and so on. bioperl-experimental would be, like the name implies, a sort of testing ground for ideas (good and bad). It seemed like it was going to take off a few years ago but it lost steam, I'm guess. As for your parsers, would you build them from the ground up (i.e. from Bio::Root::Root on up)? Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Bernhard Schmalhofer > Sent: Wednesday, June 21, 2006 2:19 AM > To: bioperl-l at lists.open-bio.org > Cc: Sohel Merchant > Subject: Re: [Bioperl-l] YAPC anyone? > > Sohel Merchant wrote: > > > > >>Just curious if any other BioPerlers will be at the YAPC conference in > > > > > >>Chicago next week ( http://yapcchicago.org/). > > Not in chicago, but yesterday I got the OK from Biomax management to go > the YAPC::Europe, http://www.birmingham2006.com/. So in the end of > August I'll be in Birmingham. Yeah! > > Is anybody interested in writing parsers for Perl 6 there? > > CU, Bernhard > > > > -- > ************************************************** > Dipl.-Physiker Bernhard Schmalhofer > Senior Developer > Biomax Informatics AG > Lochhamer Str. 11 > 82152 Martinsried, Germany > Tel: +49 89 895574-839 > Fax: +49 89 895574-825 > eMail: Bernhard.Schmalhofer at biomax.com > Website: www.biomax.com > ************************************************** > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Jun 21 11:16:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Jun 2006 10:16:17 -0500 Subject: [Bioperl-l] Bio::RangeI intersection proposal In-Reply-To: <44990155.6050501@sendu.me.uk> Message-ID: <000401c69545$a4a3ad30$15327e82@pyrimidine> I personally have no objections as long as it doesn't break API. Don't know how the senior guys feel (Jason, Brian, Heikki, Hilmar...); I'm not a user of Bio::Map modules myself. Actually, sounds weird to have me say "senior guys"; I'm 35 years old! Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Wednesday, June 21, 2006 3:21 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::RangeI intersection proposal > > Bio::Map::PositionI (in bioperl-live) needs intersections of a list of > ranges. It inherits from Bio::RangeI but unlike RangeI's union, > intersection does not take a list. PositionI currently calls > intersection repeatedly to handle a list. > > If there is no particular reason for this limitation, I propose making > RangeI intersection handle lists natively. This won't do any harm to > existing code at the time of the change, but its possible that someone > has written a module that implements RangeI but overrides intersection > (without making it accept a list), so that future code written that > expects a RangeI to handle lists will break when getting a RangeI from > that module. > > So the question is, has anyone overridden intersection in RangeI? Is the > small risk of possible breakage compensated by the benefit of > intersections of a list of ranges (which is surely useful in lots of > situations, not just for PositionI)? > > I'm tempted to go ahead with this unless there are objections. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From hlapp at gmx.net Wed Jun 21 11:24:47 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Jun 2006 11:24:47 -0400 Subject: [Bioperl-l] Bio::RangeI intersection proposal In-Reply-To: <000401c69545$a4a3ad30$15327e82@pyrimidine> References: <000401c69545$a4a3ad30$15327e82@pyrimidine> Message-ID: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote: > Actually, sounds weird to have me say "senior guys"; I'm 35 years old! Actually, it doesn't go by age but by the amount of hair you still have. ;) -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Jun 21 11:28:58 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Jun 2006 10:28:58 -0500 Subject: [Bioperl-l] Bio::RangeI intersection proposal In-Reply-To: <1BC663A6-FBFC-477F-8533-7A8077DABE33@gmx.net> Message-ID: <000501c69547$6a9f28b0$15327e82@pyrimidine> Then I'm really a senior guy... ; { Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, June 21, 2006 10:25 AM > To: Chris Fields > Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal > > > On Jun 21, 2006, at 11:16 AM, Chris Fields wrote: > > > Actually, sounds weird to have me say "senior guys"; I'm 35 years old! > > Actually, it doesn't go by age but by the amount of hair you still > have. ;) > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From hlapp at gmx.net Wed Jun 21 11:53:08 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 21 Jun 2006 11:53:08 -0400 Subject: [Bioperl-l] Bio::RangeI intersection proposal In-Reply-To: <000501c69547$6a9f28b0$15327e82@pyrimidine> References: <000501c69547$6a9f28b0$15327e82@pyrimidine> Message-ID: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net> We could run a Mr Seniority competition at BOSC with the attendees judging who got the weirdest looking hair loss. You'd take the challenge? The judging panel would need to be gender-mixed though. On Jun 21, 2006, at 11:28 AM, Chris Fields wrote: > Then I'm really a senior guy... > > ; { > > Chris > >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, June 21, 2006 10:25 AM >> To: Chris Fields >> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal >> >> >> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote: >> >>> Actually, sounds weird to have me say "senior guys"; I'm 35 years >>> old! >> >> Actually, it doesn't go by age but by the amount of hair you still >> have. ;) >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Jun 21 12:08:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Jun 2006 11:08:17 -0500 Subject: [Bioperl-l] Bio::RangeI intersection proposal In-Reply-To: <9E23326C-CB5D-4980-A48A-85CFA4A9B8DF@gmx.net> Message-ID: <000301c6954c$e89c7a60$15327e82@pyrimidine> I'd love to be at BOSC but I can't go (finishing up my postdoc this year, which is probably the primary cause of my hair loss). Would the judges accept a recent picture? Chris > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, June 21, 2006 10:53 AM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal > > We could run a Mr Seniority competition at BOSC with the attendees > judging who got the weirdest looking hair loss. You'd take the > challenge? The judging panel would need to be gender-mixed though. > > On Jun 21, 2006, at 11:28 AM, Chris Fields wrote: > > > Then I'm really a senior guy... > > > > ; { > > > > Chris > > > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Wednesday, June 21, 2006 10:25 AM > >> To: Chris Fields > >> Cc: 'Sendu Bala'; bioperl-l at lists.open-bio.org > >> Subject: Re: [Bioperl-l] Bio::RangeI intersection proposal > >> > >> > >> On Jun 21, 2006, at 11:16 AM, Chris Fields wrote: > >> > >>> Actually, sounds weird to have me say "senior guys"; I'm 35 years > >>> old! > >> > >> Actually, it doesn't go by age but by the amount of hair you still > >> have. ;) > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > >> > > > > > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From Bernhard.Schmalhofer at biomax.com Wed Jun 21 12:25:50 2006 From: Bernhard.Schmalhofer at biomax.com (Bernhard Schmalhofer) Date: Wed, 21 Jun 2006 18:25:50 +0200 Subject: [Bioperl-l] Perl 6 hacking was YAPC anyone? In-Reply-To: <000301c69544$8d537710$15327e82@pyrimidine> References: <000301c69544$8d537710$15327e82@pyrimidine> Message-ID: <4499730E.8090800@biomax.com> Chris Fields wrote: > Speaking of Perl6, there was interest here at one point in getting a > bioperl-experimental going, which at this point in the game should involve > Perl6. If there were enough interest in it we could probably get it set up > via CVS and moving along. We might need to split the Perl6 stuff from Perl5 > experimental modules in some way to prevent confusion (bioperl6-live???), > though I'm not up to speed Perl6-wise so I'm not sure about namespace > collisions and so on. As far as I understood it, the plan is to have a very smooth migration path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When new stuff is coming along, or when refactoring is done, you drop in use v6; or use v6-pugs; and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm or Audrey Tangs presentation at the Nordic Perl Workshop: http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf. So I would argue against having a completely seperate Perl6 experimental repository. > bioperl-experimental would be, like the name implies, a sort of testing > ground for ideas (good and bad). It seemed like it was going to take off a > few years ago but it lost steam, I'm guess. > > As for your parsers, would you build them from the ground up (i.e. from > Bio::Root::Root on up)? I'm just a casual Bio::Perl user and never hacked on any internals. So I don't know whether the current Bio::Perl framework is a good fit. The idea that is floating in my mind is to make a showcase of Perl 6 parsing, by tackling the various sequences and alignment formats. So this would involve shopping around for the cleanest parser implementations and porting that to Perl6. Which repository to use is more a question of social engineering. Are there more Pugs/Perl6 hackers interested in cool biological hacking, or biologist aching to try out Perl6? Regards, Bernhard Schmalhofer -- ************************************************** Dipl.-Physiker Bernhard Schmalhofer Senior Developer Biomax Informatics AG Lochhamer Str. 11 82152 Martinsried, Germany Tel: +49 89 895574-839 Fax: +49 89 895574-825 eMail: Bernhard.Schmalhofer at biomax.com Website: www.biomax.com ************************************************** From cjfields at uiuc.edu Wed Jun 21 14:01:02 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Jun 2006 13:01:02 -0500 Subject: [Bioperl-l] Perl 6 hacking was YAPC anyone? In-Reply-To: <4499730E.8090800@biomax.com> Message-ID: <000b01c6955c$ad0e6750$15327e82@pyrimidine> > Chris Fields wrote: > > Speaking of Perl6, there was interest here at one point in getting a > > bioperl-experimental going, which at this point in the game should > involve > > Perl6. If there were enough interest in it we could probably get it set > up > > via CVS and moving along. We might need to split the Perl6 stuff from > Perl5 > > experimental modules in some way to prevent confusion (bioperl6- > live???), > > though I'm not up to speed Perl6-wise so I'm not sure about namespace > > collisions and so on. > > As far as I understood it, the plan is to have a very smooth migration > path from Perl 5 to Perl 6. You start with plain old Perl 5 code. When > new stuff is coming along, or when refactoring is done, you drop in > > use v6; > > or > > use v6-pugs; > > and contiue with Perl 6. See http://svn.openfoundry.org/pugs/lib/v6.pm > or Audrey Tangs presentation at the Nordic Perl Workshop: > http://pugs.blogs.com/talks/npw06-deploying-perl6.pdf. > So I would argue against having a completely seperate Perl6 experimental > repository. Makes sense. I know Pugs is the Perl6 implementation in Haskell but I also know eventually Parrot will be taking over as the compiler (hopefully). Perl6 is pretty exciting since it's built to support OOP from the ground up, unlike the bolted-on OOP for Perl5, and has several other features that make it very useful (the new way regexes are handled). I just haven't had time to play around with it seriously enough. I may try using Pugs a bit more, though. So, as long as Perl5-Perl6 work together a separate repository wouldn't be necessary. > > bioperl-experimental would be, like the name implies, a sort of testing > > ground for ideas (good and bad). It seemed like it was going to take > off a > > few years ago but it lost steam, I'm guess. > > > > As for your parsers, would you build them from the ground up (i.e. from > > Bio::Root::Root on up)? > > I'm just a casual Bio::Perl user and never hacked on any internals. So I > don't know whether the current Bio::Perl framework is a good fit. > > The idea that is floating in my mind is to make a showcase of Perl 6 > parsing, by tackling the various sequences and alignment formats. > So this would involve shopping around for the cleanest parser > implementations and porting that to Perl6. > > Which repository to use is more a question of social engineering. > Are there more Pugs/Perl6 hackers interested in cool biological hacking, > or biologist aching to try out Perl6? I suppose the best way is initially to use a non-bioperl approach using Perl6, then try working the parsers in using 'use v6-pugs;'. Bioperl is heavily object-oriented so the code would probably need to be refactored from the bottom up (or top down, depending on your view) to fit Perl6. Having a perl5->perl6 translator helps, though. And, again, having Perl5 and Perl6 work together helps as well. Chris > Regards, > Bernhard Schmalhofer > > -- > ************************************************** > Dipl.-Physiker Bernhard Schmalhofer > Senior Developer > Biomax Informatics AG > Lochhamer Str. 11 > 82152 Martinsried, Germany > Tel: +49 89 895574-839 > Fax: +49 89 895574-825 > eMail: Bernhard.Schmalhofer at biomax.com > Website: www.biomax.com > ************************************************** From dwaner at scitegic.com Wed Jun 21 14:14:00 2006 From: dwaner at scitegic.com (dwaner at scitegic.com) Date: Wed, 21 Jun 2006 11:14:00 -0700 Subject: [Bioperl-l] EMBL release 87 format changes. Message-ID: With release 87 of EMBL (June 19th, 2006), there have been some minor changes to the flat file record format. In particular, the SV (sequence version) tag has been moved from its own line to a field in the ID line. See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html. Is somone already working on updating the SeqIO::embl parser, or should I volunteer? - David From bix at sendu.me.uk Wed Jun 21 14:23:28 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 21 Jun 2006 19:23:28 +0100 Subject: [Bioperl-l] EUtilities interface In-Reply-To: <002301c694e5$e5f3a750$15327e82@pyrimidine> References: <002301c694e5$e5f3a750$15327e82@pyrimidine> Message-ID: <44998EA0.1010406@sendu.me.uk> Chris Fields wrote: > I'm working on a new eutilities interface which I hope to commit by late > summer. It's basically a rewrite of WebDBSeqI/NCBIHelper. I set up a > generic web database interface, which I call Bio::DB::WebDBI, and the > EUtilities interface, Bio::DB::EUtilitiesI. The idea is that you can query > NCBI for any information available via Entrez Utilities (i.e. taxonomy, > pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence-only > info like Bio::DB::WebDBSeqI. > > My only concern is confusion over names, particularly WebDBI vs. WebDBSeqI. > Does anyone think this will be an issue? Well, I don't. Sounds good to me. What's the intended relationship between WebDBI and EUtilitiesI? Would your work end up in the removal of direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just convert the code that gets the XML to a one line statement or so? From cjfields at uiuc.edu Wed Jun 21 15:00:02 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Jun 2006 14:00:02 -0500 Subject: [Bioperl-l] EMBL release 87 format changes. In-Reply-To: Message-ID: <000c01c69564$e68b39b0$15327e82@pyrimidine> That would be great! Post a patch/fix via bugzilla: http://www.bioperl.org/wiki/HOWTO:SubmitPatch and we can add it and test it out. Or if you have CVS access you can do it yourself. Not sure who's taking care of SeqIO::embl at the moment.... Added bit : you'll need to update both next_seq and write_seq. next_seq should probably handle both old and new EMBL format and write_seq should only write new format (unless someone else disagrees???) Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of dwaner at scitegic.com > Sent: Wednesday, June 21, 2006 1:14 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] EMBL release 87 format changes. > > With release 87 of EMBL (June 19th, 2006), there have been some minor > changes to the flat file record format. In particular, the SV (sequence > version) tag has been moved from its own line to a field in the ID line. > See http://www.ebi.ac.uk/embl/Documentation/changesdetails.html. > > Is somone already working on updating the SeqIO::embl parser, or should I > volunteer? > > - David > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Jun 21 17:16:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 21 Jun 2006 16:16:38 -0500 Subject: [Bioperl-l] EUtilities interface In-Reply-To: <44998EA0.1010406@sendu.me.uk> Message-ID: <001b01c69577$fc7068f0$15327e82@pyrimidine> > -----Original Message----- > From: Sendu Bala [mailto:bix at sendu.me.uk] > Sent: Wednesday, June 21, 2006 1:23 PM > To: Chris Fields > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] EUtilities interface > > Chris Fields wrote: > > I'm working on a new eutilities interface which I hope to commit by late > > summer. It's basically a rewrite of WebDBSeqI/NCBIHelper. I set up a > > generic web database interface, which I call Bio::DB::WebDBI, and the > > EUtilities interface, Bio::DB::EUtilitiesI. The idea is that you can > query > > NCBI for any information available via Entrez Utilities (i.e. taxonomy, > > pubmed, sequences, dbSNP, Gene, etc); you're not limited to sequence- > only > > info like Bio::DB::WebDBSeqI. > > > > My only concern is confusion over names, particularly WebDBI vs. > WebDBSeqI. > > Does anyone think this will be an issue? > > Well, I don't. Sounds good to me. What's the intended relationship > between WebDBI and EUtilitiesI? Would your work end up in the removal of > direct XML parsing from Bio::DB::Taxonomy::entrez? Or would it just > convert the code that gets the XML to a one line statement or so? Well, right now all it does is use URI to build queries, submit them to Entrez Utilities, then grab the response; I've been hacking at it on and off for a few months now. It needs some error handling and added methods (mainly for proxies and handling WebEnv/query_key), though once I have it in decent enough shape I'll go ahead and add it to CVS. Theoretically once the response is returned it can be parsed like any stream (see WebDBSeqI/NCBIHelper for an idea of how sequences are parsed and returned using SeqIO). This should work as long as there is an appropriate class to handle the data stream and the appropriate 'plugin' to parse the data into objects; i.e. dbSNP can be handled by ClusterIO::dbSNP, sequences by SeqIO::genbank/fasta, pubmed by Bio::Biblio::IO::pubmedxml, and so on. If you don't have an object or want the raw data stream, you could submit a request using the various eutility (efetch, epost, esearch) and save as raw format to an output file or STDOUT. Here's a rough diagram: |------------------->Bio::DB::DBFetch (EBI interface)----->plugins for Bio* classes Bio::Root::Root | LWP::UserAgent ------Bio::DB::WebDBI------>Bio::DB::EUtilitiesI (NCBI interface)----->plugins for Bio* classes | |------------------->others? You probably don't need a Bio::*IO::plugin for each type; tax data in Bioperl seems to primarily utilizes the NCBI Tax database, so Bio::DB::Taxonomy::entrez shouldn't be too hard to adapt to act as a plugin. Bio::DB::Taxonomy::entrez uses XML::Twig to parse everything into Bio::Taxonomy::Node objects and is able to retrieve single and multiple ID's using the same method, though I would probably use XML::SAX instead. If I remember correctly there were issues with Bio::DB::Taxonomy that you brought up... Chris From bix at sendu.me.uk Thu Jun 22 09:28:25 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 22 Jun 2006 14:28:25 +0100 Subject: [Bioperl-l] Bio::Map changes In-Reply-To: <44985915.8010607@sendu.me.uk> References: <44985915.8010607@sendu.me.uk> Message-ID: <449A9AF9.2000305@sendu.me.uk> Sendu Bala wrote: > Some initial changes have been made to some modules in Bio::Map to allow > Positions to have a range (Bio::Map::PositionI implements Bio::RangeI). > (see http://bugzilla.open-bio.org/show_bug.cgi?id=1998) > > Further changes are needed in some remaining Bio::Map modules for this > addition to be complete Range is now done. The next step is to tidy up all of Bio::Map*, which involves a major reimplementation of the whole system (but with no significant API change). Basically, the current system is a awkward mix of older 'marker has a single position on a map' and new 'markers have multiple positions on multiple maps'. This gives us strange things like SimpleMap's add_element method which adds a reference to the element to the map without the element itself knowing it is now on the map (because it is Position that defines what maps an element is on). The reimplementation will make Position central to the model, allowing for lots of other things to work properly without anything becoming inconsistent (as is currently the case). The general tidy up will involve redoing and perhaps even removing things. For instance, OrderedPositionWithDistance has never worked so will be deleted (with OrderedPosition gaining the distance functionality its docs says it already has). But now is the time to speak up and change my mind if necessary! From golharam at umdnj.edu Thu Jun 22 17:05:00 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Thu, 22 Jun 2006 17:05:00 -0400 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) Message-ID: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1> Hi all, I'm trying to use Bio::Tools::Phylo::PAML to parse the results from baseml in the PAML package to measure the distances of some non-coding regions. I started with the coding regions, and used the script bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do something similar for non-coding regions. However, when I call Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' meaning matrix was never defined. I wanted to find out if anyone on here has done this before or knows a way to measure substitution frequencies of non-coding regions with the PAML package. The documentation with PAML is sparse so I'm not sure how to interpret its output directly - that's why I'm using Bioperl. Hopefully someone can help me before I start digging into the code...Thanks. Ryan From n.haigh at sheffield.ac.uk Fri Jun 23 02:43:48 2006 From: n.haigh at sheffield.ac.uk (Nathan S. Haigh) Date: Fri, 23 Jun 2006 07:43:48 +0100 Subject: [Bioperl-l] CVS Export Message-ID: <000001c69690$61afb540$b07f6f58@nathan243dd61f> I may have asked this previously, but I can?t find the answer to my question anywhere so I?ll have to ask it again ? sorry. Is it possible to export files/directories from cvs that have changed between to tags/branches/head? Specifically, I?d like to export (as I don?t want the cvs administrative directories) files that have been added to Bioperl since the 1.4 release. Cheers Nath ---------------------------------------------------------------------------- ------ Dr. Nathan S. Haigh MPharmacol. Ph.D. Bioinformatics PostDoctoral Research Associate ? Room B2 211????????????????????????????? ?????? ?????? Tel: +44 (0)114 22 20112 Department of Animal and Plant Sciences???????? ?????? Mob: +44 (0)7742 533 569 University of Sheffield???????????????????????? ?????? Fax: +44 (0)114 22 20002 Western Bank???????????????????????????? ?????? ?????? Web: www.bioinf.shef.ac.uk Sheffield?????????????????????????????????????????????????? www.petraea.shef.ac.uk S10 2TN????????????????????????????????? ?????? ---------------------------------------------------------------------------- ------ From cjfields at uiuc.edu Fri Jun 23 10:58:24 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 23 Jun 2006 09:58:24 -0500 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1> Message-ID: <000301c696d5$7da6c640$15327e82@pyrimidine> ***sounds of crickets*** Ryan, It's a pretty good possibility that Jason and the rest are on the road to conferences and such. There's been mention of a Durham, NC meeting and, of course, YAPC is happening soon as well. I wish I could help but I know diddly about PAML besides the HOWTO on the wiki (though I may be using it myself soon). Sorry, you may have to be a bit patient for a more productive response. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Thursday, June 22, 2006 4:05 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML > package) > > Hi all, > > I'm trying to use Bio::Tools::Phylo::PAML to parse the results from > baseml in the PAML package to measure the distances of some non-coding > regions. > > I started with the coding regions, and used the script > bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do > something similar for non-coding regions. However, when I call > Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' > meaning matrix was never defined. > > I wanted to find out if anyone on here has done this before or knows a > way to measure substitution frequencies of non-coding regions with the > PAML package. The documentation with PAML is sparse so I'm not sure how > to interpret its output directly - that's why I'm using Bioperl. > > Hopefully someone can help me before I start digging into the > code...Thanks. > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From MEC at stowers-institute.org Fri Jun 23 14:27:19 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Fri, 23 Jun 2006 13:27:19 -0500 Subject: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate output Message-ID: Guy, I've just downloaded and installed your latest 1.1.0 version of exonerate but unfortunately did not find any mention in the ChangeLog of addressing this bug, though I still see in the TODO: o Should GFF show all coordinates on the +ve strand? (jason_p2g eg) I was half expecting to see this fixed in this version based on this old thread. Can you please confirm that it has not yet been addressed, and accept my request that you continue to keep this change on your list for future versions... Also, might you elaborate on this entry from the ChangeLog. I don't see it mentioned in the manpage. o Added %tcs etc to --ryo for dumping coding sequences Thanks, Malcolm Cook >-----Original Message----- >From: bioperl-l-bounces at portal.open-bio.org >[mailto:bioperl-l-bounces at portal.open-bio.org] On Behalf Of Guy Slater >Sent: Friday, September 02, 2005 11:52 AM >To: Cook, Malcolm >Cc: bioperl-l >Subject: RE: [Bioperl-l] methods, etc. for Bio::SearchIO on >exonerate output > >On Fri, 2 Sep 2005, Cook, Malcolm wrote: > >> Hmmmm - I'd better get some clarification from Guy too. >> >> Guy, if you don't mind reading the thread below and chiming in on our >> discussion of interpreting the output of your excellent exonerate >> program: >> >> The sections of the manpage ( >> >> http://www.ebi.ac.uk/~guy/exonerate/exonerate.man.1.html) that appear >> relevant are these 2 excerpts: >> >> 1) When an alignment is reported on the reverse complement of a >> sequence, the coordinates are simply given on the reverse complement >> copy of the sequence. Hence positions on the sequences are never >> negative. Generally, the forward strand is indicated by '+', >the reverse >> strand by '-', and an unknown or not-applicable strand (as >in the case >> of a protein sequence) is indicated by '.' " >> >> 2) --forwardcoordinates By default, all coordinates are >> reported on the forward strand. Setting this option to false >reverts to >> the old behaviour (pre-0.8.3) whereby alignments on the reverse >> complement of a sequence are reported using coordinates on >the reverse >> complement. >> >> We see GFF DUMP coordinates still reported on the reverse stand >> regardless of the setting of --forwardcoordinates. So these two >> excerpts from you manpage seem contradictory to me. Unless I >> understand `--forwardcoordinates FALSE` to only effect the >coordinates >> reported in the alignment section, not in the GFF DUMP >section, which is >> what it appears to do in practice. >> >> Guy, can you confirm that the --forwardcoordinates option >has no effect >> on GFF output? >> > >Hi, > >Yes, it has no effect, and this is a bug >(sorry - it was due to my misinterpretation of the GFF2 spec) >- its on the list of things to be fixed for exonerate 1.1 (soon) > >> Further, can you tell us if you plan to comport more closely >to the GFF >> spec, in particular in this case by reporting >forwardcoordinates in the >> GFF DUMP section too? I see >> I see in your TODO list " o Should GFF show all coordinates on the >> +ve strand? (jason_p2g eg)". Hear hear! I second the motion. >> >> And TODO item " GFF3 support ? http://song.sf.net/" gets my >vote too.... >> though this is more of a sticky wicket.... >> > >Yup, GFF3 support is on the list, >but probably it will not be done in time for exonerate 1.1 >Of course, I'd welcome a patch ... ;) > >(I'm mainly working on getting the cdna2genome > and genome2genome models working properly for 1.1) > >Cheers, > >Guy. > >> Cheers and Thanks! >> >> Malcolm Cook >> >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Friday, September 02, 2005 9:46 AM >> To: Cook, Malcolm >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate >> output >> >> >> I've already talked to Guy about some of this and I assume >fixes will be >> part of the next release, but it can't hurt to have more people >> requesting. The main problem right now is reverse strand hits in GFF >> output are still screwed up even if you provide the >--forwardcoordinates >> option. >> >> If someone wanted to write/donate a VULGAR to GFF subroutine (okay >> VULGAR to a list of Bio::Search::HSP::GenericHSP). We can also >> reconstruct everything needed from that, I gave a stab at it >once, but >> there was something missing (or maybe it was pre --forwardcoordinates >> option). >> >> >> -jason >> >> On Sep 2, 2005, at 10:36 AM, Cook, Malcolm wrote: >> >> >> Jason, >> >> Thanks for the scripts and clues (esp re: using the --ryo option to >> inject the needed length into the exonerate output to compensate). >> >> I'm considering asking exonerate author to comport with GFF spec. Do >> you think this is a road to take? >> >> Cheers, >> >> Malcolm >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at duke.edu] >> Sent: Wednesday, August 31, 2005 12:35 PM >> To: Cook, Malcolm >> Cc: bioperl-l >> Subject: Re: [Bioperl-l] methods, etc. for Bio::SearchIO on exonerate >> output >> >> >> >http://fungal.genome.duke.edu/~jes12/software/scripts/process_e >xonerate_ >> gff3.pl >> >> You may still want to massage it some, but I use the script in this >> basic form, maybe with a few tweaks: >> >> Note that it requires you to run exonerate with specific >--ryo options >> so that it includes the length of the query and hit sequences in the >> report output. should be covered in the perldoc in the script. >> >> Without the ryo options enabled, you'll need to modify the >script more >> to have access to the original sequence db, use >Bio::DB::Fasta, and put >> in some $dbh->length($seqid) calls instead. >> >> I don't think the part which writes HSP/match lines is >actually correct >> - it is trying to roll gapped HSPs from the similarity features. >> >> I end up ignoring all but the 'exon' and 'gene' lines for my gbrowse >> instance and/or grepping out the lines I really think I need. >> You may want to s/exon/CDS/ for the protein2genome output as well. >> >> -jason >> >> On Aug 31, 2005, at 1:04 PM, Cook, Malcolm wrote: >> >> >> Jason, >> >> This message is in regards to an old thread in which you offered to >> shared a 'script for munging over' exonerate output for lading in >> DB::GFF (c.f. >> >> http://bioperl.org/pipermail/bioperl-l/2005-April/018741.html) >> >> Would you be willing to still share that script, if you've got it >> around? >> >> Thanks, and regards, >> >> Malcolm Cook - >> mec at stowers-institute.org - 816-926-4449 >> Database Applications Manager - Bioinformatics >> Stowers Institute for Medical Research - Kansas City, MO USA >> >> >> >> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> >> >> > >-- >%!PS % <------ Guy St.C. Slater ------> >http://www.ebi.ac.uk/~guy/ <------ >210 297/a{def}def/b{translate}a b 36/c{rotate}a c 0 1 0 1 >12/d{exch moveto} >a/e{closepath stroke}a/f{index}a/g{0 0 0 0 4 >f}a/h{setlinewidth newpath dup >g}a{pop exch 1 f add 0 h neg d lineto 72 c lineto e 2 h d 3 f >0 108 arc d e >18 c 0 2 f neg b 18 c}for 72 c newpath add g 0 7 arc d e pop showpage > >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at portal.open-bio.org >http://portal.open-bio.org/mailman/listinfo/bioperl-l > From oldham at ucla.edu Fri Jun 23 12:18:39 2006 From: oldham at ucla.edu (Michael Oldham) Date: Fri, 23 Jun 2006 09:18:39 -0700 Subject: [Bioperl-l] Output a subset of FASTA data from a single largefile In-Reply-To: Message-ID: Hello again, I finally got it to work, using the following script. However, it takes about 5 hours to run on a fast computer. Using grep (in bash), on the other hand, takes about 5 minutes (see below if you are interested). Thanks to everyone for your help! SLOW perl script: #!/usr/bin/perl -w use strict; my $IDs = 'ID_all_X'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; my @ID = ; print @ID; chomp @ID; while (my $line = ) { foreach my $identifier (@ID) { if($line=~/^>probe:\w+:$identifier:/) { print OUT $line; print OUT scalar(); } } } exit; FAST bash script: #!/usr/bin/bash exec<"ID_all_X" while read line; do echo $line grep -A 1 :$line: HG_U95Av2_probe_fasta >>myresults.txt done -----Original Message----- From: Cook, Malcolm [mailto:MEC at stowers-institute.org] Sent: Wednesday, June 14, 2006 6:48 AM To: Michael Oldham; Chris Fields Cc: bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single largefile Did you try my one-liner? Anyway, try this 1) predeclare $idmatch before the while loop 2) use ` select OUT` and print with no args to get $_ into it like this: #!/usr/bin/perl -w use strict; my $IDs = 'ID_dat.txt'; unless (open(IDFILE, $IDs)) { print "Could not open file $IDs!\n"; } my $probes = 'HG_U95Av2_probe_fasta.txt'; unless (open(PROBES, $probes)) { print "Could not open file $probes!\n"; } open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; select OUT; my @ID = ; chomp @ID; my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs and all values=1. my $idmatch; while () { $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; if ($idmatch){ print ; } } exit; >-----Original Message----- >From: Michael Oldham [mailto:oldham at ucla.edu] >Sent: Tuesday, June 13, 2006 9:03 PM >To: Cook, Malcolm; Chris Fields >Cc: bioperl-l at lists.open-bio.org >Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >single largefile > >Dear Malcolm, Chris, et al, > >Thanks to everyone for your helpful suggestions. When I run the code >below using an ID list ('ID_dat.txt') containing all 8175 IDs, the >output file is still blank. If I replace this list with a single ID >("542_at"), it works: > >>probe:HG_U95Av2:542_at:628:567; Interrogation_Position=116; Antisense; >GCGCAGCAGCGAGAATTTCGACGAG >>probe:HG_U95Av2:542_at:610:383; Interrogation_Position=128; Antisense; >GAATTTCGACGAGCTGCTGAAGGCA >>probe:HG_U95Av2:542_at:289:599; Interrogation_Position=134; Antisense; >CGACGAGCTGCTGAAGGCACTGGGT >........etc. > >If I try a list of two IDs ("542_at" and "31799_at"), only the last one >is present in the output: > >>probe:HG_U95Av2:31799_at:181:1; Interrogation_Position=1086; >Antisense; >GTTCATCACAAATCTATTGTGCTTG >>probe:HG_U95Av2:31799_at:534:511; Interrogation_Position=1126; >Antisense; >GTCCACTAAATGTAGTAACGAAATG >>probe:HG_U95Av2:31799_at:226:183; Interrogation_Position=1127; >Antisense; >TCCACTAAATGTAGTAACGAAATGT >........etc. > >The same thing seems to happen if I go to 3 IDs, or 4 IDs >(only the last >ID is present in the output file). At this point I have no idea why >this is happening, and I am not sure how to interpret >Malcolm's comment: > >oops, > >s/matches on of/matches one of/ >s/nothing that/noting that/ > >Any ideas? Thanks again................! > >Mike O. > > >#!/usr/bin/perl -w > >use strict; > >my $IDs = 'ID_dat.txt'; > >unless (open(IDFILE, $IDs)) { > print "Could not open file $IDs!\n"; > } > >my $probes = 'HG_U95Av2_probe_fasta.txt'; > >unless (open(PROBES, $probes)) { > print "Could not open file $probes!\n"; > } > >open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; > >my @ID = ; >chomp @ID; >my %ID = map {($_, 1)} @ID; #Note: This creates a hash with keys=PSIDs >and all values=1. > > while () { > my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; > if ($idmatch){ > print OUT; > print OUT scalar(); > } > } >exit; > > >-----Original Message----- >From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >Sent: Monday, June 12, 2006 8:48 AM >To: Cook, Malcolm; Chris Fields; Michael Oldham >Cc: bioperl-l at lists.open-bio.org >Subject: RE: [Bioperl-l] Output a subset of FASTA data from a single >largefile > > >oops, > >s/matches on of/matches one of/ >s/nothing that/noting that/ > >--Malcolm > > >>-----Original Message----- >>From: bioperl-l-bounces at lists.open-bio.org >>[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>Cook, Malcolm >>Sent: Monday, June 12, 2006 10:29 AM >>To: Chris Fields; Michael Oldham >>Cc: bioperl-l at lists.open-bio.org >>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >>single largefile >> >>Michael, >> >>I don't think you can call perl's `print` on just a filehandle as you >>are doing. This is probably your problem. >> >>If you call `select OUT` after opeining it, print will print $_ to it. >>And, every line in the fasta record whose header matches on of the IDS >>will get printed, not just the fasta header lines. Read the >code again >>nothing that $idmatch is only getting reset when a correctly formatted >>fasta header line is matched. >> >>--Malcolm >> >> >>>-----Original Message----- >>>From: Chris Fields [mailto:cjfields at uiuc.edu] >>>Sent: Saturday, June 10, 2006 11:32 PM >>>To: Michael Oldham >>>Cc: Cook, Malcolm; bioperl-l at lists.open-bio.org >>>Subject: Re: [Bioperl-l] Output a subset of FASTA data from a >>>single large file >>> >>>What happens if you just print $idmatch or $1 (i.e. check to see if >>>the regex matches anything)? If there is nothing printed >>then either >>>the regex isn't working as expected or there is something logically >>>wrong. The problem may be that the captured string must >>match the id >>>exactly, the id being the key to the %ID hash; any extra characters >>>picked up by the regex outside of your id key and you will not get >>>anything. Looking at Malcolm's regex it should work just fine, but >>>we only had one example sequence to try here. >>> >>>If your while loop is set up like this won't it only print only the >>>matched description lines to the outfile (no sequence) even if there >>>is a match? Or is this what you wanted? If you want the sequence >>>you should add 'print OUT ;' after the 'print OUT;' line. >>> >>>Chris >>> >>>On Jun 9, 2006, at 8:39 PM, Michael Oldham wrote: >>> >>>> Thanks to everyone for their helpful advice. I think I am getting >>>> closer, >>>> but no cigar quite yet. The script below runs quickly with no >>>> errors--but >>>> the output file is empty. It seems that the problem must lie >>>> somewhere in >>>> the 'while' loop, and I'm sure it's quite obvious to a more >>>> experienced >>>> eye--but not to mine! Any suggestions? Thanks again for >your help. >>>> >>>> --Mike O. >>>> >>>> >>>> #!/usr/bin/perl -w >>>> >>>> use strict; >>>> >>>> my $IDs = 'ID.dat.txt'; >>>> >>>> unless (open(IDFILE, $IDs)) { >>>> print "Could not open file $IDs!\n"; >>>> } >>>> >>>> my $probes = 'HG_U95Av2_probe_fasta.txt'; >>>> >>>> unless (open(PROBES, $probes)) { >>>> print "Could not open file $probes!\n"; >>>> } >>>> >>>> open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; >>>> >>>> my @ID = ; >>>> chomp @ID; >>>> my %ID = map {($_, 1)} @ID; #Note: This creates a hash with >>>> keys=PSIDs and >>>> all values=1. >>>> >>>> while () { >>>> my $idmatch = exists($ID{$1}) if /^>probe:\w+:(\w+):/; >>>> if ($idmatch){ >>>> print OUT; >>>> } >>>> } >>>> exit; >>>> >>>> >>>> -----Original Message----- >>>> From: Cook, Malcolm [mailto:MEC at stowers-institute.org] >>>> Sent: Friday, June 09, 2006 7:58 AM >>>> To: Michael Oldham; bioperl-l at lists.open-bio.org >>>> Subject: RE: [Bioperl-l] Output a subset of FASTA data from a >>>> single large >>>> file >>>> >>>> >>>> >>>> I wouldn't bioperl for this, or create an index. Perl would do >>>> fine and >>>> probably be faster. >>>> >>>> Assuming your ids are one per line in a file named id.dat >>>looking like >>>> this >>>> >>>> 1138_at >>>> 1134_at >>>> etc.. >>>> >>>> this should work: >>>> >>>> perl -n -e 'BEGIN{open(idfile, shift) or die "no can open"; @ID = >>>> ; chomp @ID; %ID = map {($_, 1)} @ID;} $inmatch = >>>> exists($ID{$1}) if /^>probe:\w+:(\w+):/; print if $inmatch' id.dat >>>> mybigfile.fa >>>> >>>> good luck >>>> >>>> --Malcolm Cook >>>> >>>>> -----Original Message----- >>>>> From: bioperl-l-bounces at lists.open-bio.org >>>>> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of >>>>> Michael Oldham >>>>> Sent: Thursday, June 08, 2006 9:08 PM >>>>> To: bioperl-l at lists.open-bio.org >>>>> Subject: [Bioperl-l] Output a subset of FASTA data from a >>>>> single large file >>>>> >>>>> Dear all, >>>>> >>>>> I am a total Bioperl newbie struggling to accomplish a >>>>> conceptually simple >>>>> task. I have a single large fasta file containing about 200,000 >>>>> probe >>>>> sequences (from an Affymetrix microarray), each of which looks >>>>> like this: >>>>> >>>>>> probe:HG_U95Av2:1138_at:395:301; Interrogation_Position=2631; >>>>> Antisense; >>>>> TGGCTCCTGCTGAGGTCCCCTTTCC >>>>> >>>>> What I would like to do is extract from this file a subset of >>>>> ~130,800 >>>>> probes (both the header and the sequence) and output this >>>>> subset into a new >>>>> fasta file. These 130,800 probes correspond to 8,175 >probe set IDs >>>>> ("1138_at" is the probe set ID in the header listed above); I >>>>> have these >>>>> 8,175 IDs listed in a separate file. I *think* that I managed >>>>> to create an >>>>> index of all 200,000 probes in the original fasta file using >>>>> the following >>>>> script: >>>>> >>>>> #!/usr/bin/perl -w >>>>> >>>>> # script 1: create the index >>>>> >>>>> use Bio::Index::Fasta; >>>>> use strict; >>>>> my $Index_File_Name = shift; >>>>> my $inx = Bio::Index::Fasta->new( >>>>> -filename => $Index_File_Name, >>>>> -write_flag => 1); >>>>> $inx->make_index(@ARGV); >>>>> >>>>> I'm not sure if this is the most sensible approach, and even >>>>> if it is, I'm >>>>> not sure what to do next. Any help would be greatly appreciated! >>>>> >>>>> Many thanks, >>>>> Mike O. >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> No virus found in this outgoing message. >>>>> Checked by AVG Free Edition. >>>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>>>> 6/8/2006 >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> -- >>>> No virus found in this incoming message. >>>> Checked by AVG Free Edition. >>>> Version: 7.1.394 / Virus Database: 268.8.3/359 - Release Date: >>>> 6/8/2006 >>>> >>>> -- >>>> No virus found in this outgoing message. >>>> Checked by AVG Free Edition. >>>> Version: 7.1.394 / Virus Database: 268.8.3/360 - Release Date: >>>> 6/9/2006 >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>>Christopher Fields >>>Postdoctoral Researcher >>>Lab of Dr. Robert Switzer >>>Dept of Biochemistry >>>University of Illinois Urbana-Champaign >>> >>> >>> >>> >> >>_______________________________________________ >>Bioperl-l mailing list >>Bioperl-l at lists.open-bio.org >>http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >-- >No virus found in this incoming message. >Checked by AVG Free Edition. >Version: 7.1.394 / Virus Database: 268.8.3/361 - Release Date: >6/11/2006 > >-- >No virus found in this outgoing message. >Checked by AVG Free Edition. >Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: >6/13/2006 > > -- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.8.4/363 - Release Date: 6/13/2006 -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.9.2/373 - Release Date: 6/22/2006 From pmiguel at purdue.edu Sat Jun 24 10:17:46 2006 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 24 Jun 2006 10:17:46 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: References: Message-ID: <449D498A.9020107@purdue.edu> Brian Osborne wrote: > Jay, > > Excellent! Now we need to answer a few more questions for ourselves: > > - Do we remove the file bptutorial.pl from the package now? I'd say yes, we > don't want to have to maintain two bptutorials. > I would be very disappointed to lose one part of bptutorial.pl--this was described in Tisdall's _Mastering Perl for Bioinformatics_. It is the only purpose I've ever used bptutorial.pl for--to find all the methods available to any given object. Eg: bptutorial.pl 100 Bio::PrimarySeq ***Methods for Object Bio::PrimarySeq ******** Methods taken from package Bio::IdentifiableI lsid_string namespace_string Methods taken from package Bio::PrimarySeq accession accession_number alphabet authority can_call_new desc description direct_seq_set display_id display_name id is_circular length namespace new object_id primary_id seq subseq validate_seq version Methods taken from package Bio::PrimarySeqI moltype revcom translate trunc Methods taken from package Bio::Root::Root DESTROY confess debug throw verbose Methods taken from package Bio::Root::RootI carp deprecated stack_trace stack_trace_dump throw_not_implemented warn warn_not_implemented Phillip SanMiguel From sdavis2 at mail.nih.gov Sat Jun 24 10:45:52 2006 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sat, 24 Jun 2006 10:45:52 -0400 Subject: [Bioperl-l] Output a subset of FASTA data from a singlelargefile References: Message-ID: <001a01c6979c$ff576dd0$6501a8c0@WATSON> ----- Original Message ----- From: "Michael Oldham" To: "Cook, Malcolm" ; "Chris Fields" Cc: Sent: Friday, June 23, 2006 12:18 PM Subject: Re: [Bioperl-l] Output a subset of FASTA data from a singlelargefile > Hello again, > > I finally got it to work, using the following script. However, it takes > about 5 hours to run on a fast computer. Using grep (in bash), on the > other hand, takes about 5 minutes (see below if you are interested). > Thanks to everyone for your help! > > SLOW perl script: > > #!/usr/bin/perl -w > > use strict; > > my $IDs = 'ID_all_X'; > > unless (open(IDFILE, $IDs)) { > print "Could not open file $IDs!\n"; > } > > my $probes = 'HG_U95Av2_probe_fasta'; > > unless (open(PROBES, $probes)) { > print "Could not open file $probes!\n"; > } > > open (OUT,'>','probe_subset.txt') or die "Can't write output: $!"; > > my @ID = ; > print @ID; > chomp @ID; > > while (my $line = ) { > foreach my $identifier (@ID) { > if($line=~/^>probe:\w+:$identifier:/) { > print OUT $line; > print OUT scalar(); > } > } > } This could probably be done MUCH faster using a hash on the sequence identifier. (I have to admit that I didn't follow the first part of this conversation, so I could be misunderstanding some part of what you are trying to do.) If you have a couple hundred-thousand sequences, my guess is that it could be done in under 30 seconds, but I could be wrong about the exact time. The important part is to make a hash of your sequences with the key being the $identifier. Then, loop through your @ID array doing something like (untested): #open files as before and read in @ID as before my %seq_hash; while (my $line = ) { if ($line =~/^>probe:\w+:$identifier:/) { $seq_hash{$identifier}=; } } foreach my $id (@ID) { print OUT ">$id\n" . $seq_hash{$id}; } From arareko at campus.iztacala.unam.mx Sat Jun 24 11:27:03 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 24 Jun 2006 10:27:03 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <449D498A.9020107@purdue.edu> References: <449D498A.9020107@purdue.edu> Message-ID: <449D59C7.4030008@campus.iztacala.unam.mx> Hi Philip, Have you tried the Deobfuscator interface? It's a newer and better way to browse all the methods available in BioPerl: http://bioperl.org/wiki/Deobfuscator http://bioperl.org/cgi-bin/deob_interface.cgi Regards, Mauricio. Phillip SanMiguel wrote: > Brian Osborne wrote: >> Jay, >> >> Excellent! Now we need to answer a few more questions for ourselves: >> >> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we >> don't want to have to maintain two bptutorials. >> > I would be very disappointed to lose one part of bptutorial.pl--this was > described in Tisdall's _Mastering Perl for Bioinformatics_. It is the > only purpose I've ever used bptutorial.pl for--to find all the methods > available to any given object. Eg: > > bptutorial.pl 100 Bio::PrimarySeq > > ***Methods for Object Bio::PrimarySeq ******** > > > Methods taken from package Bio::IdentifiableI > lsid_string namespace_string > > Methods taken from package Bio::PrimarySeq > accession accession_number alphabet authority can_call_new desc > description direct_seq_set display_id display_name id is_circular > length namespace new object_id primary_id seq > subseq validate_seq version > > Methods taken from package Bio::PrimarySeqI > moltype revcom translate trunc > > Methods taken from package Bio::Root::Root > DESTROY confess debug throw verbose > > Methods taken from package Bio::Root::RootI > carp deprecated stack_trace stack_trace_dump > throw_not_implemented warn > warn_not_implemented > > > Phillip SanMiguel > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From golharam at umdnj.edu Sat Jun 24 10:43:29 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Sat, 24 Jun 2006 10:43:29 -0400 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org> Message-ID: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1> I've managed to code three methods to calculate K into a perl script using the algorithms as described in "Molecular Evolution" by Wen-Hsuing Li. I'd be happy to contribute it as a script... -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Saturday, June 24, 2006 9:40 AM To: golharam at umdnj.edu Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) baseml is not well-supported to my knowledge - I think I started with attempt to capture a small amount of the data in the file. There are some people who have made modifications to possible parse it in-house but I know of no submitted patches. Many of the knowledgeable people are probably at the evolution meetings this week. I have no idea about the full set of information in the report files without going back to the Yang papers first. It depends on how much of that information you really want to capture of just the substitution rates. I'm Ccing Alisha in case she has ideas/solutions from her drosophila work+PAML. -jason On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote: > Hi all, > > I'm trying to use Bio::Tools::Phylo::PAML to parse the results from > baseml in the PAML package to measure the distances of some non-coding > regions. > > I started with the coding regions, and used the script > bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do > something similar for non-coding regions. However, when I call > Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' > meaning matrix was never defined. > > I wanted to find out if anyone on here has done this before or knows a > way to measure substitution frequencies of non-coding regions with the > PAML package. The documentation with PAML is sparse so I'm not > sure how > to interpret its output directly - that's why I'm using Bioperl. > > Hopefully someone can help me before I start digging into the > code...Thanks. > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From pmiguel at purdue.edu Sat Jun 24 12:59:21 2006 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 24 Jun 2006 12:59:21 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <449D59C7.4030008@campus.iztacala.unam.mx> References: <449D498A.9020107@purdue.edu> <449D59C7.4030008@campus.iztacala.unam.mx> Message-ID: <449D6F69.1090104@purdue.edu> Yes I have. It is very useful. But in situations where I don't have web access? Or I am working with Bioperl 1.5? Mauricio Herrera Cuadra wrote: > Hi Philip, > > Have you tried the Deobfuscator interface? It's a newer and better way > to browse all the methods available in BioPerl: > > http://bioperl.org/wiki/Deobfuscator > http://bioperl.org/cgi-bin/deob_interface.cgi > > Regards, > Mauricio. > > Phillip SanMiguel wrote: > >> Brian Osborne wrote: >> >>> Jay, >>> >>> Excellent! Now we need to answer a few more questions for ourselves: >>> >>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we >>> don't want to have to maintain two bptutorials. >>> >>> >> I would be very disappointed to lose one part of bptutorial.pl--this was >> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the >> only purpose I've ever used bptutorial.pl for--to find all the methods >> available to any given object. Eg: >> >> bptutorial.pl 100 Bio::PrimarySeq >> >> ***Methods for Object Bio::PrimarySeq ******** >> >> >> Methods taken from package Bio::IdentifiableI >> lsid_string namespace_string >> >> Methods taken from package Bio::PrimarySeq >> accession accession_number alphabet authority can_call_new desc >> description direct_seq_set display_id display_name id is_circular >> length namespace new object_id primary_id seq >> subseq validate_seq version >> >> Methods taken from package Bio::PrimarySeqI >> moltype revcom translate trunc >> >> Methods taken from package Bio::Root::Root >> DESTROY confess debug throw verbose >> >> Methods taken from package Bio::Root::RootI >> carp deprecated stack_trace stack_trace_dump >> throw_not_implemented warn >> warn_not_implemented >> >> >> Phillip SanMiguel >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > From arareko at campus.iztacala.unam.mx Sat Jun 24 13:35:54 2006 From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra) Date: Sat, 24 Jun 2006 12:35:54 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <449D6F69.1090104@purdue.edu> References: <449D498A.9020107@purdue.edu> <449D59C7.4030008@campus.iztacala.unam.mx> <449D6F69.1090104@purdue.edu> Message-ID: <449D77FA.70103@campus.iztacala.unam.mx> Currently I'm modifying the Deobfuscator so it'd be capable of browsing the different BioPerl packages as well as their respective releases, but haven't got many spare time to finish it :( Dave and I committed the Deobfuscator into the bioperl-live source tree (in /doc directory), so it'd be included in future releases of BioPerl. I'm also working on a command line version which won't need a CGI environment to have the same functionality, this would address the web access situation that you mention. Phillip SanMiguel wrote: > Yes I have. It is very useful. > But in situations where I don't have web access? Or I am working with > Bioperl 1.5? > > Mauricio Herrera Cuadra wrote: >> Hi Philip, >> >> Have you tried the Deobfuscator interface? It's a newer and better way >> to browse all the methods available in BioPerl: >> >> http://bioperl.org/wiki/Deobfuscator >> http://bioperl.org/cgi-bin/deob_interface.cgi >> >> Regards, >> Mauricio. >> >> Phillip SanMiguel wrote: >> >>> Brian Osborne wrote: >>> >>>> Jay, >>>> >>>> Excellent! Now we need to answer a few more questions for ourselves: >>>> >>>> - Do we remove the file bptutorial.pl from the package now? I'd say yes, we >>>> don't want to have to maintain two bptutorials. >>>> >>>> >>> I would be very disappointed to lose one part of bptutorial.pl--this was >>> described in Tisdall's _Mastering Perl for Bioinformatics_. It is the >>> only purpose I've ever used bptutorial.pl for--to find all the methods >>> available to any given object. Eg: >>> >>> bptutorial.pl 100 Bio::PrimarySeq >>> >>> ***Methods for Object Bio::PrimarySeq ******** >>> >>> >>> Methods taken from package Bio::IdentifiableI >>> lsid_string namespace_string >>> >>> Methods taken from package Bio::PrimarySeq >>> accession accession_number alphabet authority can_call_new desc >>> description direct_seq_set display_id display_name id is_circular >>> length namespace new object_id primary_id seq >>> subseq validate_seq version >>> >>> Methods taken from package Bio::PrimarySeqI >>> moltype revcom translate trunc >>> >>> Methods taken from package Bio::Root::Root >>> DESTROY confess debug throw verbose >>> >>> Methods taken from package Bio::Root::RootI >>> carp deprecated stack_trace stack_trace_dump >>> throw_not_implemented warn >>> warn_not_implemented >>> >>> >>> Phillip SanMiguel >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- MAURICIO HERRERA CUADRA arareko at campus.iztacala.unam.mx Laboratorio de Gen?tica Unidad de Morfofisiolog?a y Funci?n Facultad de Estudios Superiores Iztacala, UNAM From jason at bioperl.org Sat Jun 24 09:39:56 2006 From: jason at bioperl.org (Jason Stajich) Date: Sat, 24 Jun 2006 09:39:56 -0400 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1> References: <007101c6963f$865b1520$2f01a8c0@GOLHARMOBILE1> Message-ID: <2B2F07C4-E4C8-484D-BA72-FFB2A2C7EBE1@bioperl.org> baseml is not well-supported to my knowledge - I think I started with attempt to capture a small amount of the data in the file. There are some people who have made modifications to possible parse it in-house but I know of no submitted patches. Many of the knowledgeable people are probably at the evolution meetings this week. I have no idea about the full set of information in the report files without going back to the Yang papers first. It depends on how much of that information you really want to capture of just the substitution rates. I'm Ccing Alisha in case she has ideas/solutions from her drosophila work+PAML. -jason On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote: > Hi all, > > I'm trying to use Bio::Tools::Phylo::PAML to parse the results from > baseml in the PAML package to measure the distances of some non-coding > regions. > > I started with the coding regions, and used the script > bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do > something similar for non-coding regions. However, when I call > Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' > meaning matrix was never defined. > > I wanted to find out if anyone on here has done this before or knows a > way to measure substitution frequencies of non-coding regions with the > PAML package. The documentation with PAML is sparse so I'm not > sure how > to interpret its output directly - that's why I'm using Bioperl. > > Hopefully someone can help me before I start digging into the > code...Thanks. > > Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From pmiguel at purdue.edu Sat Jun 24 13:48:15 2006 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 24 Jun 2006 13:48:15 -0400 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <449D77FA.70103@campus.iztacala.unam.mx> References: <449D498A.9020107@purdue.edu> <449D59C7.4030008@campus.iztacala.unam.mx> <449D6F69.1090104@purdue.edu> <449D77FA.70103@campus.iztacala.unam.mx> Message-ID: <449D7ADF.3030604@purdue.edu> Yes, that would be better than bptutorial.pl 100 then. For some modules bptutorial.pl 100 doesn't seem to give any of the methods they have access to. Whereas the deobfuscator does. Mauricio Herrera Cuadra wrote: > Currently I'm modifying the Deobfuscator so it'd be capable of > browsing the different BioPerl packages as well as their respective > releases, but haven't got many spare time to finish it :( > > Dave and I committed the Deobfuscator into the bioperl-live source > tree (in /doc directory), so it'd be included in future releases of > BioPerl. I'm also working on a command line version which won't need a > CGI environment to have the same functionality, this would address the > web access situation that you mention. > > Phillip SanMiguel wrote: >> Yes I have. It is very useful. >> But in situations where I don't have web access? Or I am working with >> Bioperl 1.5? >> >> Mauricio Herrera Cuadra wrote: >>> Hi Philip, >>> >>> Have you tried the Deobfuscator interface? It's a newer and better >>> way to browse all the methods available in BioPerl: >>> >>> http://bioperl.org/wiki/Deobfuscator >>> http://bioperl.org/cgi-bin/deob_interface.cgi >>> >>> Regards, >>> Mauricio. >>> >>> Phillip SanMiguel wrote: >>> >>>> Brian Osborne wrote: >>>> >>>>> Jay, >>>>> >>>>> Excellent! Now we need to answer a few more questions for ourselves: >>>>> >>>>> - Do we remove the file bptutorial.pl from the package now? I'd >>>>> say yes, we >>>>> don't want to have to maintain two bptutorials. >>>>> >>>> I would be very disappointed to lose one part of >>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for >>>> Bioinformatics_. It is the only purpose I've ever used >>>> bptutorial.pl for--to find all the methods available to any given >>>> object. Eg: >>>> >>>> bptutorial.pl 100 Bio::PrimarySeq >>>> >>>> ***Methods for Object Bio::PrimarySeq ******** >>>> >>>> >>>> Methods taken from package Bio::IdentifiableI >>>> lsid_string namespace_string >>>> >>>> Methods taken from package Bio::PrimarySeq >>>> accession accession_number alphabet authority >>>> can_call_new desc >>>> description direct_seq_set display_id display_name id >>>> is_circular >>>> length namespace new object_id primary_id seq >>>> subseq validate_seq version >>>> >>>> Methods taken from package Bio::PrimarySeqI >>>> moltype revcom translate trunc >>>> >>>> Methods taken from package Bio::Root::Root >>>> DESTROY confess debug throw verbose >>>> >>>> Methods taken from package Bio::Root::RootI >>>> carp deprecated stack_trace stack_trace_dump >>>> throw_not_implemented warn >>>> warn_not_implemented >>>> >>>> >>>> Phillip SanMiguel >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>>> >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From jason at bioperl.org Sat Jun 24 14:42:57 2006 From: jason at bioperl.org (Jason Stajich) Date: Sat, 24 Jun 2006 14:42:57 -0400 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1> References: <000101c6979c$910a1fd0$2f01a8c0@GOLHARMOBILE1> Message-ID: You should look at the Align::DNAStatistics module if you just want pairwise DNA distance. I put in several different distance methods. Or you can use the distance methods implemented in PHYLIP or EMBOSS programs -- I thought you wanted the somewhat more sophisticated ML approaches that are implemented in PAML? --jason On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote: > I've managed to code three methods to calculate K into a perl script > using the algorithms as described in "Molecular Evolution" by Wen- > Hsuing > Li. I'd be happy to contribute it as a script... > > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of > Jason > Stajich > Sent: Saturday, June 24, 2006 9:40 AM > To: golharam at umdnj.edu > Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway > Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from > PAML > package) > > > baseml is not well-supported to my knowledge - I think I started with > attempt to capture a small amount of the data in the file. There are > some people who have made modifications to possible parse it in-house > but I know of no submitted patches. Many of the knowledgeable > people are probably at the evolution meetings this week. > > I have no idea about the full set of information in the report files > without going back to the Yang papers first. It depends on how much > of that information you really want to capture of just the > substitution rates. > > I'm Ccing Alisha in case she has ideas/solutions from her drosophila > work+PAML. > > -jason > On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote: > >> Hi all, >> >> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from >> baseml in the PAML package to measure the distances of some non- >> coding > >> regions. >> >> I started with the coding regions, and used the script >> bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do >> something similar for non-coding regions. However, when I call >> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' >> meaning matrix was never defined. >> >> I wanted to find out if anyone on here has done this before or >> knows a > >> way to measure substitution frequencies of non-coding regions with >> the > >> PAML package. The documentation with PAML is sparse so I'm not >> sure how >> to interpret its output directly - that's why I'm using Bioperl. >> >> Hopefully someone can help me before I start digging into the >> code...Thanks. >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Sat Jun 24 15:07:06 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 24 Jun 2006 14:07:06 -0500 Subject: [Bioperl-l] http://www.bioperl.org/wiki/Bptutorial.pl In-Reply-To: <449D7ADF.3030604@purdue.edu> References: <449D498A.9020107@purdue.edu> <449D59C7.4030008@campus.iztacala.unam.mx> <449D6F69.1090104@purdue.edu> <449D77FA.70103@campus.iztacala.unam.mx> <449D7ADF.3030604@purdue.edu> Message-ID: As a quickie method I use the script from the FAQ; you have to install Class::Inspector: #!/usr/bin/perl -w use Class::Inspector; $class = shift || die "Usage: methods perl_class_name\n"; eval "require $class"; print join ("\n", sort @{Class::Inspector->methods ($class,'full','public')}), "\n"; Works well, though doesn't have the links and so on like Deobfuscator; I use HTML-generated ActiveState docs: glaciers-115 chris$ methods.pl Bio::SeqIO Bio::Root::IO::catfile Bio::Root::IO::close Bio::Root::IO::dup Bio::Root::IO::exists_exe Bio::Root::IO::file Bio::Root::IO::flush Bio::Root::IO::gensym Bio::Root::IO::mode Bio::Root::IO::noclose Bio::Root::IO::qualify Bio::Root::IO::qualify_to_ref Bio::Root::IO::rmtree Bio::Root::IO::tempdir Bio::Root::IO::tempfile Bio::Root::IO::ungensym Bio::Root::Root::debug Bio::Root::Root::except Bio::Root::Root::finally Bio::Root::Root::otherwise Bio::Root::Root::throw Bio::Root::Root::try Bio::Root::Root::verbose Bio::Root::Root::with Bio::Root::RootI::carp Bio::Root::RootI::confess Bio::Root::RootI::deprecated Bio::Root::RootI::stack_trace Bio::Root::RootI::stack_trace_dump Bio::Root::RootI::throw_not_implemented Bio::Root::RootI::warn Bio::Root::RootI::warn_not_implemented Bio::SeqIO::DESTROY Bio::SeqIO::PRINT Bio::SeqIO::READLINE Bio::SeqIO::TIEHANDLE Bio::SeqIO::alphabet Bio::SeqIO::fh Bio::SeqIO::location_factory Bio::SeqIO::new Bio::SeqIO::newFh Bio::SeqIO::next_seq Bio::SeqIO::object_factory Bio::SeqIO::sequence_builder Bio::SeqIO::sequence_factory Bio::SeqIO::write_seq Chris On Jun 24, 2006, at 12:48 PM, Phillip SanMiguel wrote: > > Yes, that would be better than bptutorial.pl 100 then. For some > modules > bptutorial.pl 100 doesn't seem to give any of the methods they have > access to. Whereas the deobfuscator does. > > Mauricio Herrera Cuadra wrote: >> Currently I'm modifying the Deobfuscator so it'd be capable of >> browsing the different BioPerl packages as well as their respective >> releases, but haven't got many spare time to finish it :( >> >> Dave and I committed the Deobfuscator into the bioperl-live source >> tree (in /doc directory), so it'd be included in future releases of >> BioPerl. I'm also working on a command line version which won't >> need a >> CGI environment to have the same functionality, this would address >> the >> web access situation that you mention. >> >> Phillip SanMiguel wrote: >>> Yes I have. It is very useful. >>> But in situations where I don't have web access? Or I am working >>> with >>> Bioperl 1.5? >>> >>> Mauricio Herrera Cuadra wrote: >>>> Hi Philip, >>>> >>>> Have you tried the Deobfuscator interface? It's a newer and better >>>> way to browse all the methods available in BioPerl: >>>> >>>> http://bioperl.org/wiki/Deobfuscator >>>> http://bioperl.org/cgi-bin/deob_interface.cgi >>>> >>>> Regards, >>>> Mauricio. >>>> >>>> Phillip SanMiguel wrote: >>>> >>>>> Brian Osborne wrote: >>>>> >>>>>> Jay, >>>>>> >>>>>> Excellent! Now we need to answer a few more questions for >>>>>> ourselves: >>>>>> >>>>>> - Do we remove the file bptutorial.pl from the package now? I'd >>>>>> say yes, we >>>>>> don't want to have to maintain two bptutorials. >>>>>> >>>>> I would be very disappointed to lose one part of >>>>> bptutorial.pl--this was described in Tisdall's _Mastering Perl for >>>>> Bioinformatics_. It is the only purpose I've ever used >>>>> bptutorial.pl for--to find all the methods available to any given >>>>> object. Eg: >>>>> >>>>> bptutorial.pl 100 Bio::PrimarySeq >>>>> >>>>> ***Methods for Object Bio::PrimarySeq ******** >>>>> >>>>> >>>>> Methods taken from package Bio::IdentifiableI >>>>> lsid_string namespace_string >>>>> >>>>> Methods taken from package Bio::PrimarySeq >>>>> accession accession_number alphabet authority >>>>> can_call_new desc >>>>> description direct_seq_set display_id display_name id >>>>> is_circular >>>>> length namespace new object_id primary_id seq >>>>> subseq validate_seq version >>>>> >>>>> Methods taken from package Bio::PrimarySeqI >>>>> moltype revcom translate trunc >>>>> >>>>> Methods taken from package Bio::Root::Root >>>>> DESTROY confess debug throw verbose >>>>> >>>>> Methods taken from package Bio::Root::RootI >>>>> carp deprecated stack_trace stack_trace_dump >>>>> throw_not_implemented warn >>>>> warn_not_implemented >>>>> >>>>> >>>>> Phillip SanMiguel >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From pmiguel at purdue.edu Sat Jun 24 15:37:08 2006 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sat, 24 Jun 2006 15:37:08 -0400 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? Message-ID: <449D9464.6030508@purdue.edu> Here is an example bug: http://bugzilla.open-bio.org/show_bug.cgi?id=1682 It was a bug fixed in a module in BioPerl 1.4 back in October of 2004. The module was Bio::Seq::QualI. The patch resulted in v. 1.7 of the module. However the version of the module currently available from CPAN is 1.6. (That is the current "stable" release, BioPerl 1.4.0) I've written a script that relies on that bug being fixed. How should I deal with this when I want to give the script to others to use? Just tell them "You must have BioPerl 1.5 installed". Give them instructions for patching the module code? How long before the next "stable" release? Maybe a year? Should not a BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or would that be very difficult? By the way, I think the revision graph viewer is great for someone, at best, peripherally involved in BioPerl to figure out which module version is associated with which BioPerl version, for example: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/QualI.pm?graph=1 Phillip SanMiguel From golharam at umdnj.edu Sat Jun 24 14:57:52 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Sat, 24 Jun 2006 14:57:52 -0400 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: Message-ID: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1> Hi Jason, It looks like DNAStatistics is only for coding sequences. I'm trying to calculate the Ks of exons and the K (or Ki) of introns. All the methods in bioperl are based on coding sequences. Only the PAUP package (that I've found) does non-coding sequences. I would have used it but you need to pay for it and we don't have the funding to purchase much at the moment. I brielfy looked at PHYLIP and EMBOSS but it didn't look as straight-forward as I was hoping it would be. Either that, or I was getting fustrated looking for a simple solution. In the end, I found a molecular evolution book that talks about several methods used for non-coding sequences so I went ahead and implemented them. They seem to work well. Ryan -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Saturday, June 24, 2006 2:43 PM To: golharam at umdnj.edu Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway' Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) You should look at the Align::DNAStatistics module if you just want pairwise DNA distance. I put in several different distance methods. Or you can use the distance methods implemented in PHYLIP or EMBOSS programs -- I thought you wanted the somewhat more sophisticated ML approaches that are implemented in PAML? --jason On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote: > I've managed to code three methods to calculate K into a perl script > using the algorithms as described in "Molecular Evolution" by Wen- > Hsuing > Li. I'd be happy to contribute it as a script... > > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of > Jason > Stajich > Sent: Saturday, June 24, 2006 9:40 AM > To: golharam at umdnj.edu > Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway > Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from > PAML > package) > > > baseml is not well-supported to my knowledge - I think I started with > attempt to capture a small amount of the data in the file. There are > some people who have made modifications to possible parse it in-house > but I know of no submitted patches. Many of the knowledgeable > people are probably at the evolution meetings this week. > > I have no idea about the full set of information in the report files > without going back to the Yang papers first. It depends on how much > of that information you really want to capture of just the > substitution rates. > > I'm Ccing Alisha in case she has ideas/solutions from her drosophila > work+PAML. > > -jason > On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote: > >> Hi all, >> >> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from >> baseml in the PAML package to measure the distances of some non- >> coding > >> regions. >> >> I started with the coding regions, and used the script >> bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do >> something similar for non-coding regions. However, when I call >> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' >> meaning matrix was never defined. >> >> I wanted to find out if anyone on here has done this before or >> knows a > >> way to measure substitution frequencies of non-coding regions with >> the > >> PAML package. The documentation with PAML is sparse so I'm not >> sure how >> to interpret its output directly - that's why I'm using Bioperl. >> >> Hopefully someone can help me before I start digging into the >> code...Thanks. >> >> Ryan >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From golharam at umdnj.edu Sat Jun 24 18:37:15 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Sat, 24 Jun 2006 18:37:15 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect? Message-ID: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1> I'm using Bio::Tools::Run::Alignment::ClustalW to generate some alignments and parsing the resulting alignments. The ClustalW output is being sent to STDOUT. Is there a way I can redirect the output to STDERR instead? Here's how I'm using it: my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); my $aa_aln = $aln_factory->align(\@aa_seq); (Forgive me if it in the docs - I've been coding for a week straight now including saturday) Thanks, Ryan From cjfields at uiuc.edu Sat Jun 24 20:16:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 24 Jun 2006 19:16:40 -0500 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <449D9464.6030508@purdue.edu> References: <449D9464.6030508@purdue.edu> Message-ID: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote: > Here is an example bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1682 > > It was a bug fixed in a module in BioPerl 1.4 back in October of > 2004. > The module was Bio::Seq::QualI. The patch resulted in v. 1.7 of the > module. However the version of the module currently available from > CPAN > is 1.6. (That is the current "stable" release, BioPerl 1.4.0) > > I've written a script that relies on that bug being fixed. How > should I > deal with this when I want to give the script to others to use? Just > tell them "You must have BioPerl 1.5 installed". Give them > instructions > for patching the module code? A BioPerl module version is not the same as the distribution version. All the modules have different version numbers corresponding to CVS commits for various code changes. If you want to see the version for the distribution, read this: http://www.bioperl.org/wiki/ FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F Many 'bug fixes', you'll find, have less to do with problems/bugs in BioPerl code than they do with outside code changes beyond our control. By that I mean changes to other programs modify output so parsers break (BLAST, PAML, etc), or changes to API for remote databases that break queries (recent changes in EBI database concerning Swissprot, for example). So, the code is considered 'stable' at the time of release, but past that point issues beyond our control may break certain modules parsing output, accessing remote databases, and so on, at any time. This link: http://www.bioperl.org/wiki/FAQ#BioPerl_in_General should answer a few more questions you may have. The FAQ is very helpful... In general, if there are problems with code you could look at the latest developer's release (1.5.1, released in Oct 2005) to see if any bugs have been fixed. They may be fixed post-1.5.1 and will be in CVS; you can always suggest using 1.5.1 (it's pretty stable) and updating only the fixed modules from CVS if needed. > How long before the next "stable" release? Maybe a year? Should not a > BioPerl 1.4.1 be released so CPAN would get bug fixes like this > one? Or > would that be very difficult? No, it's not that easy. BioPerl isn't like most CPAN modules with one or two developers. See the wiki page for details on planning releases to see why: http://www.bioperl.org/wiki/Making_a_BioPerl_release It takes a lot of effort and coordination, much more so than the average CPAN module. I believe some of the core developers are meeting this weekend; maybe something will come of that and we'll get an idea of a next release. Chris > By the way, I think the revision graph viewer is great for someone, at > best, peripherally involved in BioPerl to figure out which module > version is associated with which BioPerl version, for example: > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ > QualI.pm?graph=1 > Phillip SanMiguel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From hlapp at gmx.net Sat Jun 24 21:02:36 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 24 Jun 2006 21:02:36 -0400 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <449D9464.6030508@purdue.edu> References: <449D9464.6030508@purdue.edu> Message-ID: On Jun 24, 2006, at 3:37 PM, Phillip SanMiguel wrote: > Here is an example bug: > > http://bugzilla.open-bio.org/show_bug.cgi?id=1682 > > It was a bug fixed in a module in BioPerl 1.4 back in October of > 2004. > The module was Bio::Seq::QualI. The patch resulted in v. 1.7 of the > module. However the version of the module currently available from > CPAN > is 1.6. (That is the current "stable" release, BioPerl 1.4.0) > > I've written a script that relies on that bug being fixed. How > should I > deal with this when I want to give the script to others to use? Just > tell them "You must have BioPerl 1.5 installed". Give them > instructions > for patching the module code? Either way. If the patch is trivial you could also provide the patch as an option. Generally we don't support that though. (Not everything that we don't support we don't support because it doesn't work. Sometimes it's just a statement along 'it-probably-works-but-don't- bug-us-if-it-doesn't'.) > > How long before the next "stable" release? Maybe a year? Should not a > BioPerl 1.4.1 be released so CPAN would get bug fixes like this > one? Or > would that be very difficult? 1.5.1 fixes a number of other problems too, so there isn't really much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1, so investing time into creating 1.4.1 we think is not the best investment we can make. Our current goal is to release 1.5.2 and possibly more development versions all leading on a steady path to 1.6.0. There's very few (but significant) stumbling blocks on this path that will require I believe some dedicated time from a couple of people and after that there shouldn't be any real obstacles. It's quite possible that at BOSC or as early as next week at the GMOD meeting we could see a leap forward, typically it's those meetings that pull the respective people away from their daily obligations (short of an actual hackathons). Some time back in spring 1.6 was put in proximity to BOSC, but that's probably not going to happen, but quite possibly not that much afterwards. -hilmar > > By the way, I think the revision graph viewer is great for someone, at > best, peripherally involved in BioPerl to figure out which module > version is associated with which BioPerl version, for example: > > http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Seq/ > QualI.pm?graph=1 > Phillip SanMiguel > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sat Jun 24 21:21:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 24 Jun 2006 20:21:56 -0500 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect? In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1> Message-ID: <000301c697f5$c08d1150$15327e82@pyrimidine> According to the docs ( ;> ) the default behaviour is to return "a BioPerl Bio::SimpleAlign object which can then be printed and/or saved in multiple formats using the AlignIO.pm module"; you should be able to do something like: use Bio::AlignIO; ... my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); my $aa_aln = $aln_factory->align(\@aa_seq); my $al_out = Bio::AlignIO::new(-format => 'clustalw', -fh => \*STDERR); $al_out->write_aln($aa_aln); Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Saturday, June 24, 2006 5:37 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect? > > I'm using Bio::Tools::Run::Alignment::ClustalW to generate some > alignments and parsing the resulting alignments. > > The ClustalW output is being sent to STDOUT. Is there a way I can > redirect the output to STDERR instead? > > Here's how I'm using it: > > my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my $aa_aln = $aln_factory->align(\@aa_seq); > > (Forgive me if it in the docs - I've been coding for a week straight now > including saturday) > > Thanks, Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From jason at bioperl.org Sat Jun 24 21:38:06 2006 From: jason at bioperl.org (Jason Stajich) Date: Sat, 24 Jun 2006 21:38:06 -0400 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1> References: <000601c697c0$19c42dc0$2f01a8c0@GOLHARMOBILE1> Message-ID: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org> they make no assumption about coding sequence,where do you get that impression. the ka,ks are for coding but the tamura/nei kimura, jukes-cantor are all for any type of sequence. the phylip and emboss are pretty straightforward IMHO - you give it an alignment and you get out a matrix of pairwise numbers.... \ but whatever makes sense to you - we are using the same methods as are in Li's book (that is where I took the equations from). -j On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote: > Hi Jason, > > It looks like DNAStatistics is only for coding sequences. I'm > trying to > calculate the Ks of exons and the K (or Ki) of introns. All the > methods > in bioperl are based on coding sequences. Only the PAUP package > (that > I've found) does non-coding sequences. I would have used it but you > need to pay for it and we don't have the funding to purchase much > at the > moment. > > I brielfy looked at PHYLIP and EMBOSS but it didn't look as > straight-forward as I was hoping it would be. Either that, or I was > getting fustrated looking for a simple solution. > > In the end, I found a molecular evolution book that talks about > several > methods used for non-coding sequences so I went ahead and implemented > them. They seem to work well. > > Ryan > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of > Jason > Stajich > Sent: Saturday, June 24, 2006 2:43 PM > To: golharam at umdnj.edu > Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway' > Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from > PAML > package) > > > You should look at the Align::DNAStatistics module if you just want > pairwise DNA distance. I put in several different distance methods. > Or you can use the distance methods implemented in PHYLIP or EMBOSS > programs -- I thought you wanted the somewhat more sophisticated ML > approaches that are implemented in PAML? > > --jason > > On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote: > >> I've managed to code three methods to calculate K into a perl script >> using the algorithms as described in "Molecular Evolution" by Wen- >> Hsuing >> Li. I'd be happy to contribute it as a script... >> >> >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of >> Jason >> Stajich >> Sent: Saturday, June 24, 2006 9:40 AM >> To: golharam at umdnj.edu >> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway >> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from >> PAML >> package) >> >> >> baseml is not well-supported to my knowledge - I think I started with >> attempt to capture a small amount of the data in the file. There are >> some people who have made modifications to possible parse it in-house >> but I know of no submitted patches. Many of the knowledgeable >> people are probably at the evolution meetings this week. >> >> I have no idea about the full set of information in the report files >> without going back to the Yang papers first. It depends on how much >> of that information you really want to capture of just the >> substitution rates. >> >> I'm Ccing Alisha in case she has ideas/solutions from her drosophila >> work+PAML. >> >> -jason >> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote: >> >>> Hi all, >>> >>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from >>> baseml in the PAML package to measure the distances of some non- >>> coding >> >>> regions. >>> >>> I started with the coding regions, and used the script >>> bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do >>> something similar for non-coding regions. However, when I call >>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' >>> meaning matrix was never defined. >>> >>> I wanted to find out if anyone on here has done this before or >>> knows a >> >>> way to measure substitution frequencies of non-coding regions with >>> the >> >>> PAML package. The documentation with PAML is sparse so I'm not >>> sure how >>> to interpret its output directly - that's why I'm using Bioperl. >>> >>> Hopefully someone can help me before I start digging into the >>> code...Thanks. >>> >>> Ryan >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Sat Jun 24 21:40:49 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sat, 24 Jun 2006 20:40:49 -0500 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: Message-ID: <000401c697f8$62d41e70$15327e82@pyrimidine> ... > > I've written a script that relies on that bug being fixed. How > > should I > > deal with this when I want to give the script to others to use? Just > > tell them "You must have BioPerl 1.5 installed". Give them > > instructions > > for patching the module code? > > Either way. If the patch is trivial you could also provide the patch > as an option. Generally we don't support that though. (Not everything > that we don't support we don't support because it doesn't work. > Sometimes it's just a statement along 'it-probably-works-but-don't- > bug-us-if-it-doesn't'.) The bug was fixed post-1.4 release according to the link, so Phillip should use v1.5.1 or newer. Hilmar's right. It's hard to address every single complaint about code not working or method not implemented w/o having patches or fixes submitted. It's not my top priority to fix bugs in modules submitted by other authors when I don't know the code. I'll try if I have the free time, but that's getting to be a precious commodity lately... > > How long before the next "stable" release? Maybe a year? Should not a > > BioPerl 1.4.1 be released so CPAN would get bug fixes like this > > one? Or > > would that be very difficult? > > 1.5.1 fixes a number of other problems too, so there isn't really > much reason to upgrade to 1.4.1 if there was one instead of to 1.5.1, > so investing time into creating 1.4.1 we think is not the best > investment we can make. > > Our current goal is to release 1.5.2 and possibly more development > versions all leading on a steady path to 1.6.0. There's very few (but > significant) stumbling blocks on this path that will require I > believe some dedicated time from a couple of people and after that > there shouldn't be any real obstacles. It's quite possible that at > BOSC or as early as next week at the GMOD meeting we could see a leap > forward, typically it's those meetings that pull the respective > people away from their daily obligations (short of an actual > hackathons). > > Some time back in spring 1.6 was put in proximity to BOSC, but that's > probably not going to happen, but quite possibly not that much > afterwards. > > -hilmar ... Nice to know. I guess a Release Pumpkin will be picked as well. BOSC is right around the corner so I guess we can expect something announced soon as to a possible roadmap (we can't talk about 'timelines' in the States, it's not patriotic). Chris From golharam at umdnj.edu Sat Jun 24 23:03:01 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Sat, 24 Jun 2006 23:03:01 -0400 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect? In-Reply-To: <000301c697f5$c08d1150$15327e82@pyrimidine> Message-ID: <000301c69803$df899f20$2f01a8c0@GOLHARMOBILE1> Thanks Chris. It is in fact when you call align() that clustalw generates the output that you see on the console. The alignment is generates I'm parsing right away. Here's the output (an example) of what I'm referring to: -- BEGIN -- CLUSTAL W (1.83) Multiple Sequence Alignments Sequence format is Pearson Sequence 1: human 271 aa Sequence 2: mouse 264 aa Start of Pairwise alignments Aligning... Sequences (1:2) Aligned. Score: 90 Guide tree file created: [/tmp/TX4yxP9uKQ/80W87TkT5Z.dnd] Start of Multiple Alignment There are 1 groups Aligning... Group 1: Sequences: 2 Score:5469 Alignment Score 1480 GCG-Alignment file created [/tmp/TX4yxP9uKQ/xE4GNyY7Rc] -- END -- How do I get this to do to stderr instead of stdout? Ryan -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Saturday, June 24, 2006 9:22 PM To: golharam at umdnj.edu; bioperl-l at bioperl.org Subject: RE: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect? According to the docs ( ;> ) the default behaviour is to return "a BioPerl Bio::SimpleAlign object which can then be printed and/or saved in multiple formats using the AlignIO.pm module"; you should be able to do something like: use Bio::AlignIO; ... my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); my $aa_aln = $aln_factory->align(\@aa_seq); my $al_out = Bio::AlignIO::new(-format => 'clustalw', -fh => \*STDERR); $al_out->write_aln($aa_aln); Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Ryan Golhar > Sent: Saturday, June 24, 2006 5:37 PM > To: bioperl-l at bioperl.org > Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output > redirect? > > I'm using Bio::Tools::Run::Alignment::ClustalW to generate some > alignments and parsing the resulting alignments. > > The ClustalW output is being sent to STDOUT. Is there a way I can > redirect the output to STDERR instead? > > Here's how I'm using it: > > my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my $aa_aln = $aln_factory->align(\@aa_seq); > > (Forgive me if it in the docs - I've been coding for a week straight > now including saturday) > > Thanks, Ryan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From golharam at umdnj.edu Sat Jun 24 23:05:41 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Sat, 24 Jun 2006 23:05:41 -0400 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: <3DDB9678-C8DC-4E7F-8564-EF385A768688@bioperl.org> Message-ID: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1> >>they make no assumption about coding sequence, >>where do you get that impression I get that information from the 1.5 api docs: http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/ Its documented under the description section. Oh well, I have it coded and working...might as well use it. Ryan -----Original Message----- From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason Stajich Sent: Saturday, June 24, 2006 9:38 PM To: golharam at umdnj.edu Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway' Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) they make no assumption about coding sequence,where do you get that impression. the ka,ks are for coding but the tamura/nei kimura, jukes-cantor are all for any type of sequence. the phylip and emboss are pretty straightforward IMHO - you give it an alignment and you get out a matrix of pairwise numbers.... \ but whatever makes sense to you - we are using the same methods as are in Li's book (that is where I took the equations from). -j On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote: > Hi Jason, > > It looks like DNAStatistics is only for coding sequences. I'm > trying to > calculate the Ks of exons and the K (or Ki) of introns. All the > methods > in bioperl are based on coding sequences. Only the PAUP package > (that > I've found) does non-coding sequences. I would have used it but you > need to pay for it and we don't have the funding to purchase much > at the > moment. > > I brielfy looked at PHYLIP and EMBOSS but it didn't look as > straight-forward as I was hoping it would be. Either that, or I was > getting fustrated looking for a simple solution. > > In the end, I found a molecular evolution book that talks about > several > methods used for non-coding sequences so I went ahead and implemented > them. They seem to work well. > > Ryan > > > -----Original Message----- > From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of > Jason > Stajich > Sent: Saturday, June 24, 2006 2:43 PM > To: golharam at umdnj.edu > Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway' > Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from > PAML > package) > > > You should look at the Align::DNAStatistics module if you just want > pairwise DNA distance. I put in several different distance methods. > Or you can use the distance methods implemented in PHYLIP or EMBOSS > programs -- I thought you wanted the somewhat more sophisticated ML > approaches that are implemented in PAML? > > --jason > > On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote: > >> I've managed to code three methods to calculate K into a perl script >> using the algorithms as described in "Molecular Evolution" by Wen- >> Hsuing >> Li. I'd be happy to contribute it as a script... >> >> >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of >> Jason >> Stajich >> Sent: Saturday, June 24, 2006 9:40 AM >> To: golharam at umdnj.edu >> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway >> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from >> PAML >> package) >> >> >> baseml is not well-supported to my knowledge - I think I started with >> attempt to capture a small amount of the data in the file. There are >> some people who have made modifications to possible parse it in-house >> but I know of no submitted patches. Many of the knowledgeable >> people are probably at the evolution meetings this week. >> >> I have no idea about the full set of information in the report files >> without going back to the Yang papers first. It depends on how much >> of that information you really want to capture of just the >> substitution rates. >> >> I'm Ccing Alisha in case she has ideas/solutions from her drosophila >> work+PAML. >> >> -jason >> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote: >> >>> Hi all, >>> >>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from >>> baseml in the PAML package to measure the distances of some non- >>> coding >> >>> regions. >>> >>> I started with the coding regions, and used the script >>> bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do >>> something similar for non-coding regions. However, when I call >>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' >>> meaning matrix was never defined. >>> >>> I wanted to find out if anyone on here has done this before or >>> knows a >> >>> way to measure substitution frequencies of non-coding regions with >>> the >> >>> PAML package. The documentation with PAML is sparse so I'm not >>> sure how >>> to interpret its output directly - that's why I'm using Bioperl. >>> >>> Hopefully someone can help me before I start digging into the >>> code...Thanks. >>> >>> Ryan >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > -- Jason Stajich Duke University http://www.duke.edu/~jes12 From bix at sendu.me.uk Sun Jun 25 07:33:58 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 25 Jun 2006 12:33:58 +0100 Subject: [Bioperl-l] Bio::Tools::Run::Alignment::ClustalW output redirect? In-Reply-To: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1> References: <000201c697de$be82b9d0$2f01a8c0@GOLHARMOBILE1> Message-ID: <449E74A6.3020709@sendu.me.uk> Ryan Golhar wrote: > I'm using Bio::Tools::Run::Alignment::ClustalW to generate some > alignments and parsing the resulting alignments. > > The ClustalW output is being sent to STDOUT. Is there a way I can > redirect the output to STDERR instead? > > Here's how I'm using it: > > my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my $aa_aln = $aln_factory->align(\@aa_seq); You can suppress the output completely using $aln_factory->quiet(1); (supplying quiet => 1 to new() should also work according to the docs, but doesn't seem to be implemented, though I could be wrong) If you really want the messages on STDERR you could try redirecting STDOUT to STDERR before calling align(): open(OLDOUT, ">&STDOUT"); open(STDOUT, ">&STDERR"); my $aa_aln = $aln_factory->align(\@aa_seq); open(STDOUT, ">&OLDOUT"); I haven't tested either of these ideas, but I think they should both work - try them out and let us know. Ideally there would be a saner way of doing this, but it isn't readily apparent to me. From jason at bioperl.org Sun Jun 25 08:37:11 2006 From: jason at bioperl.org (Jason Stajich) Date: Sun, 25 Jun 2006 08:37:11 -0400 Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with (baseml from PAML package)] In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1> References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1> Message-ID: On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote: >>> they make no assumption about coding sequence, >>> where do you get that impression > > I get that information from the 1.5 api docs: > > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/ > great - I would also always point people to the LIVE code documentation not the 1.5.0-RC1 which is +1 years old, but nothing particular has changed in this module since 1.5.0 that I know of. Someday someone will put a new ball of docs up on the site, but I hope that will come with the next development or stable release. > Its documented under the description section. > i don't really see what you refer to since there is a lot of documentation, but perhaps it should be clarified - I had hoped this was a sufficient description: "This object contains routines for calculating various statistics and distances for DNA alignments." > Oh well, I have it coded and working...might as well use it. > Sounds like your best bet for your situation. For the record and in the mailing list archives - as long as you don't call a method that contains "KaKs" it will work fine. You can calculate distances using the currently implemented distance methods: JukesCantor Uncorrected F81 Kimura Tamura F84 (Felsenstien 84) TajimaNei JinNei It will be more productive is to just drop the discussion since you seem to be fine without all of this anyways - if you decide you would like to use it and contribute new distances methods or doc fixes I am sure we'll enjoy your contributions. -jason -- Jason Stajich jason at bioperl.org http://jason.open-bio.org/ From cjfields at uiuc.edu Sun Jun 25 13:05:34 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Jun 2006 12:05:34 -0500 Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with(baseml from PAML package)] In-Reply-To: Message-ID: <000901c69879$97b7d5b0$15327e82@pyrimidine> > On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote: > > >>> they make no assumption about coding sequence, > >>> where do you get that impression > > > > I get that information from the 1.5 api docs: > > > > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/ > > > great - I would also always point people to the LIVE code > documentation not the 1.5.0-RC1 which is +1 years old, but nothing > particular has changed in this module since 1.5.0 that I know of. > Someday someone will put a new ball of docs up on the site, but I > hope that will come with the next development or stable release. Though I agree that the bioperl-live code is the best place for docs as it's the most up-to-date, that fact isn't really emphasized much on the docs page; the link is along with the other toolkits at the bottom of the page and is listed as Bioperl Core Code (some users don't seem to get that, in general, bioperl=bioperl core). Could be this is causing a bit of confusion for some. The page hasn't really been updated in a while so that could explain a lot; I'm not sure I can actually do any updating myself (or that I should be able to!). Maybe the best way to go is to have a wiki page for this instead...(thinking aloud, sorry). I was also thinking we should add something to http://doc.bioperl.org indicating the age of the various docs as most are over 2 years old, or at least link to the Release Pumpkin page which indicates the code release date for the various releases: http://www.bioperl.org/wiki/Release_pumpkin Besides that, I agree with pretty much everything that you said; the Bio::Align::DNAStatistics docs seem self-explanatory. BTW, is the following still true (from Bio::Align::DNAStatistics): "The routines are not well tested and do contain errors at this point. Work is underway to correct them, but do not expect this code to give you the right answer currently! Use dnadist/distmat in the PHLYIP or EMBOSS packages to calculate the distances." I'll likely be using this and Bio::Align::ProteinStatistics at some point relatively soon myself so I may be up to some testing on one/both of these modules if needed. Chris .... From golharam at umdnj.edu Sun Jun 25 13:20:12 2006 From: golharam at umdnj.edu (Ryan Golhar) Date: Sun, 25 Jun 2006 13:20:12 -0400 Subject: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with(baseml from PAML package)] In-Reply-To: <000901c69879$97b7d5b0$15327e82@pyrimidine> Message-ID: <000801c6987b$9e65f840$2f01a8c0@GOLHARMOBILE1> Exactly. Also on the page it says (in the descriptionfor Bio::Align::DNAStatistics): In order to use these methods there are several pre-requisites for the alignment. 1 DNA alignment must be based on protein alignment. Use the subroutine aa_to_dna_aln in Bio::Align::Utilities to achieve this. Etc etc etc The rest of the pre-reqs also mention that the sequences should be coding sequences. Because of this, I thought DNAStatistics was only for coding sequences and could not be used for non-coding sequences... Anyway, I've gotten past my troubles and am on to finish this project. I think the isssues I ran into others might run into as well. I'd be happy to contribue what I can but need to finish this stuff first... Thanks for all your help Jason, Chris, Sendu! Ryan -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Sunday, June 25, 2006 1:06 PM To: 'Jason Stajich'; golharam at umdnj.edu Cc: bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] DNA distance methods [was Bio::Tools::Phylo::PAML with(baseml from PAML package)] > On Jun 24, 2006, at 11:05 PM, Ryan Golhar wrote: > > >>> they make no assumption about coding sequence, > >>> where do you get that impression > > > > I get that information from the 1.5 api docs: > > > > http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/ > > > great - I would also always point people to the LIVE code > documentation not the 1.5.0-RC1 which is +1 years old, but nothing > particular has changed in this module since 1.5.0 that I know of. > Someday someone will put a new ball of docs up on the site, but I hope > that will come with the next development or stable release. Though I agree that the bioperl-live code is the best place for docs as it's the most up-to-date, that fact isn't really emphasized much on the docs page; the link is along with the other toolkits at the bottom of the page and is listed as Bioperl Core Code (some users don't seem to get that, in general, bioperl=bioperl core). Could be this is causing a bit of confusion for some. The page hasn't really been updated in a while so that could explain a lot; I'm not sure I can actually do any updating myself (or that I should be able to!). Maybe the best way to go is to have a wiki page for this instead...(thinking aloud, sorry). I was also thinking we should add something to http://doc.bioperl.org indicating the age of the various docs as most are over 2 years old, or at least link to the Release Pumpkin page which indicates the code release date for the various releases: http://www.bioperl.org/wiki/Release_pumpkin Besides that, I agree with pretty much everything that you said; the Bio::Align::DNAStatistics docs seem self-explanatory. BTW, is the following still true (from Bio::Align::DNAStatistics): "The routines are not well tested and do contain errors at this point. Work is underway to correct them, but do not expect this code to give you the right answer currently! Use dnadist/distmat in the PHLYIP or EMBOSS packages to calculate the distances." I'll likely be using this and Bio::Align::ProteinStatistics at some point relatively soon myself so I may be up to some testing on one/both of these modules if needed. Chris .... From pmiguel at purdue.edu Sun Jun 25 15:02:14 2006 From: pmiguel at purdue.edu (Phillip SanMiguel) Date: Sun, 25 Jun 2006 15:02:14 -0400 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> References: <449D9464.6030508@purdue.edu> <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> Message-ID: <449EDDB6.8020401@purdue.edu> Chris Fields wrote: > On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote: > > [...] >> How long before the next "stable" release? Maybe a year? Should not a >> BioPerl 1.4.1 be released so CPAN would get bug fixes like this one? Or >> would that be very difficult? > > No, it's not that easy. BioPerl isn't like most CPAN modules with one > or two developers. See the wiki page for details on planning releases > to see why: > > http://www.bioperl.org/wiki/Making_a_BioPerl_release > > It takes a lot of effort and coordination, much more so than the > average CPAN module. I believe some of the core developers are > meeting this weekend; maybe something will come of that and we'll get > an idea of a next release. > > Chris Hi Chris, Thanks for the information--the key part being that a bug fix from a couple of years ago has not propagated into the current stable release. Below I'll try to convince you that this is a serious problem. (Not because it is your fault, of course. I'm just trying to deliver my take on the situation to the bioperl-programmer-warriors who happen to be listening...) It isn't a problem for me to edit the offending statement in the QualI.pm module on systems I generally use. Or even install a developer's release of bioperl. My problem is one of advocacy. Maybe I have a warped view of the world, but it seems that except for those directly involved in the bioperl or GMOD projects, everyone looks to CPAN when they install bioperl. I write scripts that I sometimes want to send to biologists even less programming-capable than I am. I can just barely envision those biologists pestering their sysadmin to do a CPAN install of bioperl modules so that my script will work. But installing a non-CPAN set of modules probably isn't going to happen. So, this being the case, how can I, with a clear conscious, advocate bioperl to the junior bioinformaticians with whom I happen to interact? My take, for what it is worth, is that 1.5 has become an unratified stable release. How hard would it be to take 1.5.1--as is--and deposit that in CPAN? What would be the downside? Phillip SanMiguel From hlapp at gmx.net Sun Jun 25 15:42:20 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 25 Jun 2006 15:42:20 -0400 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <449EDDB6.8020401@purdue.edu> References: <449D9464.6030508@purdue.edu> <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> <449EDDB6.8020401@purdue.edu> Message-ID: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net> We did not and will not deposit 1.5.1 into CPAN due to the API issues in some (rather central) interfaces. These issues are changes over the 1.4 API and some of those changes are going to go away. Once we deposit it into CPAN we would sanction the changed API as the new 'official' API and would open a huge can of backward liability worms. If you just continue to use the 1.4 API on the 1.5.1 release you don't need to be concerned about an API method you're using going away. As I said, the people from the core group of developers who have traditionally shepherded releases all think that doing a 1.4.1 release wouldn't be the best investment of their time. You are most welcome to disagree and volunteer your time to coordinate the 1.4.1 release, and a lot of people will appreciate your efforts - including the bioperl developers and 'core'. It shouldn't be much work theoretically. -hilmar On Jun 25, 2006, at 3:02 PM, Phillip SanMiguel wrote: > Chris Fields wrote: >> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote: >> >> [...] >>> How long before the next "stable" release? Maybe a year? Should >>> not a >>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this >>> one? Or >>> would that be very difficult? >> >> No, it's not that easy. BioPerl isn't like most CPAN modules with >> one >> or two developers. See the wiki page for details on planning >> releases >> to see why: >> >> http://www.bioperl.org/wiki/Making_a_BioPerl_release >> >> It takes a lot of effort and coordination, much more so than the >> average CPAN module. I believe some of the core developers are >> meeting this weekend; maybe something will come of that and we'll get >> an idea of a next release. >> >> Chris > Hi Chris, > Thanks for the information--the key part being that a bug fix > from a > couple of years ago has not propagated into the current stable > release. > Below I'll try to convince you that this is a serious problem. (Not > because it is your fault, of course. I'm just trying to deliver my > take > on the situation to the bioperl-programmer-warriors who happen to be > listening...) > It isn't a problem for me to edit the offending statement in the > QualI.pm module on systems I generally use. Or even install a > developer's release of bioperl. My problem is one of advocacy. Maybe I > have a warped view of the world, but it seems that except for those > directly involved in the bioperl or GMOD projects, everyone looks to > CPAN when they install bioperl. > I write scripts that I sometimes want to send to biologists even > less programming-capable than I am. I can just barely envision those > biologists pestering their sysadmin to do a CPAN install of bioperl > modules so that my script will work. But installing a non-CPAN set of > modules probably isn't going to happen. > So, this being the case, how can I, with a clear conscious, > advocate > bioperl to the junior bioinformaticians with whom I happen to > interact? > My take, for what it is worth, is that 1.5 has become an > unratified > stable release. How hard would it be to take 1.5.1--as is--and deposit > that in CPAN? What would be the downside? > > Phillip SanMiguel > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Sun Jun 25 16:20:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Sun, 25 Jun 2006 15:20:20 -0500 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <449EDDB6.8020401@purdue.edu> References: <449D9464.6030508@purdue.edu> <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> <449EDDB6.8020401@purdue.edu> Message-ID: <7C28EA28-031A-4B1C-9625-A643247445FD@uiuc.edu> On Jun 25, 2006, at 2:02 PM, Phillip SanMiguel wrote: > Chris Fields wrote: >> On Jun 24, 2006, at 2:37 PM, Phillip SanMiguel wrote: >> >> [...] >>> How long before the next "stable" release? Maybe a year? Should >>> not a >>> BioPerl 1.4.1 be released so CPAN would get bug fixes like this >>> one? Or >>> would that be very difficult? >> >> No, it's not that easy. BioPerl isn't like most CPAN modules with >> one or two developers. See the wiki page for details on planning >> releases to see why: >> >> http://www.bioperl.org/wiki/Making_a_BioPerl_release >> >> It takes a lot of effort and coordination, much more so than the >> average CPAN module. I believe some of the core developers are >> meeting this weekend; maybe something will come of that and we'll >> get an idea of a next release. >> >> Chris > Hi Chris, > Thanks for the information--the key part being that a bug fix > from a couple of years ago has not propagated into the current > stable release. Below I'll try to convince you that this is a > serious problem. (Not because it is your fault, of course. I'm just > trying to deliver my take on the situation to the bioperl- > programmer-warriors who happen to be listening...) > It isn't a problem for me to edit the offending statement in the > QualI.pm module on systems I generally use. Or even install a > developer's release of bioperl. My problem is one of advocacy. > Maybe I have a warped view of the world, but it seems that except > for those directly involved in the bioperl or GMOD projects, > everyone looks to CPAN when they install bioperl. Again, it's not as easy as you make it seem. The idea is to upgrade the CPAN version to stable releases (even numbered) and that odd- numbered releases would be developer versions. Yes, it has been a while since the last stable version; it could be a while until the next as there have been suggestions of an interim 1.5.x release or so before that occurs (though he did say 1.6 could be soon after BOSC which is in August). Hilmar has explained that there are some stumbling blocks to get around before the next major release (if those 'stumbling blocks' are what I think they are, I agree). It's very likely implementation of changes that he mentions may require refactoring code, changing API, etc. Not easy in a project like this, a large core of contributors and with the developers scattered all over the world, all with different priorities (we all have $jobs after all). That's why we have a Release Pumpkin, akin to the Pumpkings that have ushered forth regular perl releases. It requires a large, coordinated effort with one person acting as overseer, pushing everybody to meet deadlines. Not easy and not, by a long shot, your typical CPAN module. > I write scripts that I sometimes want to send to biologists even > less programming-capable than I am. I can just barely envision > those biologists pestering their sysadmin to do a CPAN install of > bioperl modules so that my script will work. But installing a non- > CPAN set of modules probably isn't going to happen. > So, this being the case, how can I, with a clear conscious, > advocate bioperl to the junior bioinformaticians with whom I happen > to interact? Give those biologists some credit. Quite frankly, I would expect any bioinformaticist or computational biologist, junior or otherwise, to know or at least learn how to install from CPAN or from CVS, otherwise they need to change their job title. And, as a microbiologist myself (i.e. one of those biologists you mention) and as one who regularly interacts with biologists with little to no computer science experience, I believe I can speak from experience. I find the install documents that come with BioPerl and available on the wiki pretty much cover everything, from how to install to the dependencies required to problems one may encounter. The web site has a tone of documentation, including the FAQ (*cough* which covers this ground *cough*). If they are running perl scripts and using a system that requires sysadmin privileges they probably know what thy are doing anyway. If not they probably have students/employees that do know what's going on (and who may be the ones actually running the scripts). You can't please everybody, so I think you can proceed with a clear conscious knowing you did the best that you can to help! > My take, for what it is worth, is that 1.5 has become an > unratified stable release. How hard would it be to take 1.5.1--as > is--and deposit that in CPAN? What would be the downside? Ah I see Hilmar has responded. I think he adequately answers this. API is everything; changing API suddenly is bad bad bad. > Phillip SanMiguel Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From akholloway at ucdavis.edu Mon Jun 26 00:15:16 2006 From: akholloway at ucdavis.edu (Alisha Holloway) Date: Sun, 25 Jun 2006 21:15:16 -0700 Subject: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML package) In-Reply-To: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1> References: <000401c69804$3ea30500$2f01a8c0@GOLHARMOBILE1> Message-ID: Hi Ryan & Jason, Sorry I didn't get back to you sooner. I escaped the central valley heat (108!) and went to the coast for the weekend. I do have a script that will call baseml and then parse the results. Here it is and, Ryan, I can show you how to retrieve other parts of the data as well, but you may already know how to do this. I know it's ugly, I got it working and didn't clean it up. Just let me know if you need more info. Alisha At 11:05 PM -0400 6/24/06, Ryan Golhar wrote: > >>they make no assumption about coding sequence, >>>where do you get that impression > >I get that information from the 1.5 api docs: > >http://doc.bioperl.org/releases/bioperl-1.5.0-RC1/ > >Its documented under the description section. > >Oh well, I have it coded and working...might as well use it. > >Ryan >-----Original Message----- >From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of Jason >Stajich >Sent: Saturday, June 24, 2006 9:38 PM >To: golharam at umdnj.edu >Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway' >Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from PAML >package) > > >they make no assumption about coding sequence,where do you get that >impression. the ka,ks are for coding but the tamura/nei kimura, >jukes-cantor are all for any type of sequence. > >the phylip and emboss are pretty straightforward IMHO - you give it >an alignment and you get out a matrix of pairwise numbers.... >\ >but whatever makes sense to you - we are using the same methods as >are in Li's book (that is where I took the equations from). >-j >On Jun 24, 2006, at 2:57 PM, Ryan Golhar wrote: > >> Hi Jason, >> >> It looks like DNAStatistics is only for coding sequences. I'm >> trying to >> calculate the Ks of exons and the K (or Ki) of introns. All the >> methods >> in bioperl are based on coding sequences. Only the PAUP package >> (that >> I've found) does non-coding sequences. I would have used it but you >> need to pay for it and we don't have the funding to purchase much >> at the >> moment. >> >> I brielfy looked at PHYLIP and EMBOSS but it didn't look as >> straight-forward as I was hoping it would be. Either that, or I was >> getting fustrated looking for a simple solution. >> >> In the end, I found a molecular evolution book that talks about >> several >> methods used for non-coding sequences so I went ahead and implemented >> them. They seem to work well. >> >> Ryan >> >> >> -----Original Message----- >> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of >> Jason >> Stajich >> Sent: Saturday, June 24, 2006 2:43 PM >> To: golharam at umdnj.edu >> Cc: bioperl-l at lists.open-bio.org; 'Alisha Holloway' >> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from >> PAML >> package) >> >> >> You should look at the Align::DNAStatistics module if you just want >> pairwise DNA distance. I put in several different distance methods. >> Or you can use the distance methods implemented in PHYLIP or EMBOSS >> programs -- I thought you wanted the somewhat more sophisticated ML >> approaches that are implemented in PAML? >> >> --jason >> >> On Jun 24, 2006, at 10:43 AM, Ryan Golhar wrote: >> >>> I've managed to code three methods to calculate K into a perl script >>> using the algorithms as described in "Molecular Evolution" by Wen- >>> Hsuing >>> Li. I'd be happy to contribute it as a script... >>> >>> >>> >>> -----Original Message----- >>> From: Jason Stajich [mailto:jason.stajich at gmail.com] On Behalf Of >>> Jason >>> Stajich >>> Sent: Saturday, June 24, 2006 9:40 AM >>> To: golharam at umdnj.edu >>> Cc: bioperl-l at lists.open-bio.org list; Alisha Holloway >>> Subject: Re: [Bioperl-l] Bio::Tools::Phylo::PAML with (baseml from >>> PAML >>> package) >>> >>> >>> baseml is not well-supported to my knowledge - I think I started with >>> attempt to capture a small amount of the data in the file. There are > >> some people who have made modifications to possible parse it in-house >>> but I know of no submitted patches. Many of the knowledgeable >>> people are probably at the evolution meetings this week. >>> >>> I have no idea about the full set of information in the report files >>> without going back to the Yang papers first. It depends on how much >>> of that information you really want to capture of just the >>> substitution rates. >>> >>> I'm Ccing Alisha in case she has ideas/solutions from her drosophila >>> work+PAML. >>> >>> -jason >>> On Jun 22, 2006, at 5:05 PM, Ryan Golhar wrote: >>> >>>> Hi all, >>>> >>>> I'm trying to use Bio::Tools::Phylo::PAML to parse the results from >>>> baseml in the PAML package to measure the distances of some non- >>>> coding >>> >>>> regions. >>>> >>>> I started with the coding regions, and used the script >>>> bp_pairwise_kaks.pl without much trouble. Now, I'm trying to do >>>> something similar for non-coding regions. However, when I call >>>> Bio::Tools::Phylo::PAML::Result->get_MLmatrix(), I'm getting 'undef' >>>> meaning matrix was never defined. >>>> >>>> I wanted to find out if anyone on here has done this before or >>>> knows a >>> >>>> way to measure substitution frequencies of non-coding regions with >>>> the >>> >>>> PAML package. The documentation with PAML is sparse so I'm not >>>> sure how >>>> to interpret its output directly - that's why I'm using Bioperl. >>>> >>>> Hopefully someone can help me before I start digging into the >>>> code...Thanks. >>>> >>>> Ryan >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> Jason Stajich >>> Duke University >>> http://www.duke.edu/~jes12 >>> >> >> -- >> Jason Stajich >> Duke University >> http://www.duke.edu/~jes12 >> > >-- >Jason Stajich >Duke University >http://www.duke.edu/~jes12 -- Alisha Holloway Postdoctoral Fellow Section of Evolution & Ecology 3347 Storer Hall University of California Davis, CA 95616 530-754-9551 Office 512-297-3958 Cell 530-752-1449 Fax -------------- next part -------------- A non-text attachment was scrubbed... Name: batch_baseml_50nt.pl Type: application/octet-stream Size: 5395 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: baseml.ctl Type: application/octet-stream Size: 1699 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060625/a93124d8/attachment-0001.obj From fernan at iib.unsam.edu.ar Mon Jun 26 08:47:30 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Mon, 26 Jun 2006 09:47:30 -0300 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net> References: <449D9464.6030508@purdue.edu> <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> <449EDDB6.8020401@purdue.edu> <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net> Message-ID: <20060626124730.GA53298@iib.unsam.edu.ar> +----[ Hilmar Lapp (26.Jun.2006 07:22): | | We did not and will not deposit 1.5.1 into CPAN due to the API issues | in some (rather central) interfaces. These issues are changes over | the 1.4 API and some of those changes are going to go away. Once we | deposit it into CPAN we would sanction the changed API as the new | 'official' API and would open a huge can of backward liability worms. | If you just continue to use the 1.4 API on the 1.5.1 release you | don't need to be concerned about an API method you're using going away. | | As I said, the people from the core group of developers who have | traditionally shepherded releases all think that doing a 1.4.1 | release wouldn't be the best investment of their time. You are most | welcome to disagree and volunteer your time to coordinate the 1.4.1 | release, and a lot of people will appreciate your efforts - including | the bioperl developers and 'core'. It shouldn't be much work | theoretically. | | -hilmar | +----] I understand that, being a volunteer project, people can decide where to best invest their time. If core developers are no longer using 1.4 in their production setups, it is reasonable to expect that they invest all of their time in 1.5 or any other bioperl version that they're using. However, when view as an issue related to the setting of a policy for the whole project, then it makes sense to have a policy saying for how long a stable release will be supported, and when and in which case bugfixes that are committed to and tested in the development branch (as it should be) will get merged back to stable. I'm not knowledgeable enough about the bioperl release engineering process, nor about the internal development process, but just guessing I'd expect that whenever anyone submits a bugfix, it should be the responsibility of the committer to check (against the project policy, (written or implicit) or with the core developers in a difficult case) whether the fix should be committed to more than one branch. A patch like the one that started this thread, should have been committed to the 1.4 branch without too much thinking. And it would have cost the committer only a few seconds more of her/his time. But you only get this by setting and enforcing a policy. After a number of these fixes has accumulated, then making a new release shouldn't represent too much effort, nor it should be expected that the tests that passed before would break now. And in the worst case (no tarball release), people can be directed to obtain the most current 'stable' code from the repository, containing all bugfixes. I guess that this is what was meant by Phillip. Fernan From hlapp at gmx.net Mon Jun 26 09:59:00 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 26 Jun 2006 09:59:00 -0400 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar> References: <449D9464.6030508@purdue.edu> <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> <449EDDB6.8020401@purdue.edu> <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net> <20060626124730.GA53298@iib.unsam.edu.ar> Message-ID: On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote: > I'm not knowledgeable enough about the bioperl release > engineering process, nor about the internal development > process, but just guessing I'd expect that whenever anyone > submits a bugfix, it should be the responsibility of > the committer to check (against the project policy, > (written or implicit) or with the core developers in a > difficult case) whether the fix should be committed to more > than one branch. > > A patch like the one that started this thread, should have > been committed to the 1.4 branch without too much thinking. > And it would have cost the committer only a few seconds more > of her/his time. Sure. But for some reason he or she forgot. So what do you suggest we do - and I mean as a community, because this is a community project. Come after the guy until he commits it to the branch? Or post an email to the list saying what you think is the right way and then do it (yourself)? > > But you only get this by setting and enforcing a policy. Man, this is not a company. Take a step back and think again. What do you suggest we - again we as a community - do to enforce a policy? Take increasing levels of disciplinary action if someone keeps forgetting to commit to the branch? While there are clearly some rules everybody needs to follow and if you violate them deliberately and repeatedly you will get your CVS privileges withdrawn, by and large we as a community need to accept some responsibility for making the project what we think it should be - and do so not by invoking disciplinary action but by living by example and by taking action yourself when you think action is due. If Bioperl were a company and you asked for a 1.4.1 release and the customer service rep told you nope there's a 1.5.1 that you should use instead and that will do just fine, what will you do? Argue with him about the company policies and whether they are properly enforced or not? Obviously doing so will be a waste of your time. In Bioperl it is at the bottom of it no less waste of your time, because instead you now have the opportunity to make happen what you believe needs to happen. We have had a history of rapidly and un-bureaucratically putting people in power of what they wanted to do. We have also had a history of not listening much to people who don't want to put their feet where their mouth is. I'm sorry if what I'm saying puts people off, but really this is an open-source project and if you ask me it's one with the least barriers of entry for new developers or 'activists' that you can find in the open source arena. This doesn't come without some degree of anarchy, but really IMHO that's more of an advantage than a disadvantage. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From jason at bioperl.org Mon Jun 26 10:13:00 2006 From: jason at bioperl.org (Jason Stajich) Date: Mon, 26 Jun 2006 10:13:00 -0400 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar> References: <449D9464.6030508@purdue.edu> <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> <449EDDB6.8020401@purdue.edu> <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net> <20060626124730.GA53298@iib.unsam.edu.ar> Message-ID: fair enough - we can certainly merge fixes onto the branch - I am not sure why that is such a big deal. once the changes are made to the branch, If someone then wants to update to the latest code on 1.4 branch, they would to volunteer to do the last step of: cvs export -r branch-1-4 -d bioperl-1-4-1 bioperl-live then validate it, then make a tar ball, we can submit a 1.4.x to CPAN, but honestly a lot of other fixes have accumulated since the 1.4 branch and I don't think we want to keep merging back to it, we'd rather move forward. the not-so-compatible changes that got checked in after the 1.4 branch (having to do with Annotateable) has been part of the problem as this has not been fully fixed to make things backwards compatable. Nathan asked earlier on the list about how to get a list of modules added since 1.4 and I can only say how to generate a diff to the current version of the code which might be more than what he is asking for. read the docs on cvs diff where you specify the two tags you want to diff between. We certainly have a problem of meeting the needs of several different user groups - developers who need latest code, and users who want stable releases. We either get funding to support stable releases more deliberately, things that don't seem to be on the main radar screen of primary developers or people who are tied to working with older stable releases. Since most of us who are coding and making changes are just working from a CVS checkout we don't have a lot of pressure to make a release -- and we don't want to dump newly buggy (or broken interfaces) into CPAN on purpose. It also seems like many reported bugs have already been fixed on the latest branch but people are less interested in back-fixing on the old branch. Our hope is that 1.6 would be a good replacement for 1.4 - presumably API consistent for the most part, but we are suffering from lack of time of people willing to do the work to make this happen. I have mentioned in the past that I cannot be the release master for the project and it is time for someone else to step up and make this happen. Chris Fields has done a phenomenal job answering questions, fixing bugs, and helping run the project as some of us have started to have too busy of a schedule to keep daily tabs on Bioperl. But he too will probably have to cycle off as his career responsibilities (and job search) takes more time. I don't have a good answer for anyone on how to make this happen more smoothly, I am hopeful that the gmod mtg will spur some more commits and a roadplan for releasing the next dev release and seeing what can happen with 1.6. If we funded a Bioperl coordinator I am sure that would help things more and manage the different sets of priorities of the user groups. I think a dedicated hackathon to bioperl work could get 1.6 out after one week of solid work with some bug squashing followup. Barring that we'll have to see what everyone else wants to see done to get the next release out. The person leading the release doesn't have to really program things they just need to organize people around a time-frame, a set of features that need to be tested and fixed, and commitments from people of what they will do. Much of the release process is documented on the bioperl wiki site, if this is not clear enough please make a note on the page/talk page and we can start . My hope is that the wiki can be a good repository of the thought process behind the project. right now too much of it is floating in the minds of former and current project coordinators. ...just some of my thoughts as I get ready to be off-line starting next week for 4 weeks... -jason On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote: > +----[ Hilmar Lapp (26.Jun.2006 07:22): > | > | We did not and will not deposit 1.5.1 into CPAN due to the API > issues > | in some (rather central) interfaces. These issues are changes over > | the 1.4 API and some of those changes are going to go away. Once we > | deposit it into CPAN we would sanction the changed API as the new > | 'official' API and would open a huge can of backward liability > worms. > | If you just continue to use the 1.4 API on the 1.5.1 release you > | don't need to be concerned about an API method you're using going > away. > | > | As I said, the people from the core group of developers who have > | traditionally shepherded releases all think that doing a 1.4.1 > | release wouldn't be the best investment of their time. You are most > | welcome to disagree and volunteer your time to coordinate the 1.4.1 > | release, and a lot of people will appreciate your efforts - > including > | the bioperl developers and 'core'. It shouldn't be much work > | theoretically. > | > | -hilmar > | > +----] > > I understand that, being a volunteer project, people can > decide where to best invest their time. If core developers > are no longer using 1.4 in their production setups, it is > reasonable to expect that they invest all of their time in > 1.5 or any other bioperl version that they're using. > > However, when view as an issue related to the setting of a > policy for the whole project, then it makes sense to have a > policy saying for how long a stable release will be > supported, and when and in which case bugfixes that are committed > to and tested in the development branch (as it should be) > will get merged back to stable. > > I'm not knowledgeable enough about the bioperl release > engineering process, nor about the internal development > process, but just guessing I'd expect that whenever anyone > submits a bugfix, it should be the responsibility of > the committer to check (against the project policy, > (written or implicit) or with the core developers in a > difficult case) whether the fix should be committed to more > than one branch. > > A patch like the one that started this thread, should have > been committed to the 1.4 branch without too much thinking. > And it would have cost the committer only a few seconds more > of her/his time. > > But you only get this by setting and enforcing a policy. > > After a number of these fixes has accumulated, then making a > new release shouldn't represent too much effort, nor it > should be expected that the tests that passed before would > break now. And in the worst case (no tarball release), > people can be directed to obtain the most current 'stable' > code from the repository, containing all bugfixes. > > I guess that this is what was meant by Phillip. > > Fernan > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From bix at sendu.me.uk Mon Jun 26 10:44:55 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Jun 2006 15:44:55 +0100 Subject: [Bioperl-l] Tests Message-ID: <449FF2E7.3040101@sendu.me.uk> What level of testing is expected to be done in a test file? Is there such a thing as too many tests? Tests for every possible (documented) way of achieving a result with a module's method? Tests for every conceivable way of misusing a method? If I come across a test for a module that doesn't test for everything the module can do, should I add tests as a matter of course? Would this be beneficial, or a waste of time (given that the module probably is bug-free already)? From cjfields at uiuc.edu Mon Jun 26 11:24:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Jun 2006 10:24:00 -0500 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <20060626124730.GA53298@iib.unsam.edu.ar> Message-ID: <001301c69934$909f83c0$15327e82@pyrimidine> ... > However, when view as an issue related to the setting of a > policy for the whole project, then it makes sense to have a > policy saying for how long a stable release will be > supported, and when and in which case bugfixes that are committed > to and tested in the development branch (as it should be) > will get merged back to stable. In a project this large which relies on a lot of outside resources maintaining API and availability at all times, having a completely bug-free fix for any reasonable length of time is impossible. As a small example, almost every time NCBI changes BLAST output, it breaks our text parsers, and though we recommend using the BLAST XML format parser (which is much more stable), almost everybody continues using text parsing and wants that fixed. Now, NCBI routinely changes their BLAST version about every 3-6 months w/o notification, so remote BLAST parsing can break at any time. Fold into that any software changes that change output or API (PAML comes to mind). Fold into that remote database changes (EBI interface to Swissprot). Oh, let's not forget sequence format changes (recent SwissProt and GenBank changes). And, worst of all, we can't expect them to maintain API or output b/c they're updating based on user input/suggestions or bug fixes which require them to make changes. What's 'stable' about that? It's very easy to say you want something and then not volunteer to do it; if you want something then put forth the time and effort to get it done. Put your money where your mouth is (as they say in my home state). Again (for the third or fourth time now), putting together a release takes some time and effort. I actually think it takes more effort than Hilmar suggests; either way, it requires someone to act as the leader (release pumpkin) to handle changes, and I don't see anybody stepping forward. Personally, if I have the time, maybe I'll handle an interim release, but I'm looking for a job starting in the fall as well as finishing up research for publication so that will take up almost all the time I have. As Hilmar says, if you want to do it, fine. Realize, though, many many changes have been made since 1.4 and many more will likely be made on the road to 1.6 > I'm not knowledgeable enough about the bioperl release > engineering process, nor about the internal development > process, but just guessing I'd expect that whenever anyone > submits a bugfix, it should be the responsibility of > the committer to check (against the project policy, > (written or implicit) or with the core developers in a > difficult case) whether the fix should be committed to more > than one branch. This is a large open-source project with a ton of developers all over the world. Check out the AUTHORS file; it's at best incomplete and still has about 100 contributors. (Hey, my name's not on there!!!) > A patch like the one that started this thread, should have > been committed to the 1.4 branch without too much thinking. > And it would have cost the committer only a few seconds more > of her/his time. > > But you only get this by setting and enforcing a policy. You need to realize what this project is, what it is not, and how it evolved. A little history lesson might get you (and others) to understand just how complex it all is (and how old some of the code is). http://www.bioperl.org/wiki/FAQ#Can_you_explain_the_Object_Model_design_and_ rationale.3F explains a bit on the project design. http://www.bioperl.org/wiki/History_of_BioPerl explains how BioPerl came to be. This is not a job or a company but an open-source project; it's origins are based in the scientific community. You're probably right about the person not committing the change to the 1.4 branch. We probably should have a policy for commits to stable releases. But how can we logically rationalize doing so now for 1.4, almost three years hence? We're post 1.5.1 and likely going into 1.6 as we speak. It's too late for 1.4 changes IMHO, frankly, but you're welcome to try. I don't think it's worth the effort. As for policy enforcement, what would you want us to do? This is a volunteer effort. Fire him/her? Frankly they should be commended for getting the fix committed in the first place, and if someone points out that it should be committed to the 1.4 branch then fine; it shouldn't be hard to do so even long after the commit to the main branch is made. It just requires someone to do so. Again, this is NOT your typical CPAN module with one or two developers or a project that relies on doing one thing very well. This project has over 100 developers and is supposed to do everything adequately (and many things very well). > After a number of these fixes has accumulated, then making a > new release shouldn't represent too much effort, nor it > should be expected that the tests that passed before would > break now. And in the worst case (no tarball release), > people can be directed to obtain the most current 'stable' > code from the repository, containing all bugfixes. You can download a tarball from the latest CVS code at any time. There is a link for doing just that at the bottom of the anonymous CVS page: http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/ Chris From hlapp at gmx.net Mon Jun 26 11:30:05 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Mon, 26 Jun 2006 11:30:05 -0400 Subject: [Bioperl-l] Tests In-Reply-To: <449FF2E7.3040101@sendu.me.uk> References: <449FF2E7.3040101@sendu.me.uk> Message-ID: On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote: > What level of testing is expected to be done in a test file? Is there > such a thing as too many tests? No, not really. > Tests for every possible (documented) > way of achieving a result with a module's method? Ideally that's the minimum. > Tests for every conceivable way of misusing a method? If some or known already (from reports) or you think can be anticipated, yes. Generally, if a method documents what are invalid values for its input it's a good idea to test what the method does if supplied with such values. The one thing it shouldn't do is silently ignore them, or produce a result anyway (which presumably would be a wrong result by definition). > > If I come across a test for a module that doesn't test for everything > the module can do, should I add tests as a matter of course? Would > this > be beneficial, or a waste of time (given that the module probably is > bug-free already)? It would certainly be beneficial. It'd be great if you were willing to volunteer for this. Note that a module being bug free now doesn't mean it always will be. The main point of tests is not only to weed out bugs at the time it is written, but also to make sure that future changes to the module itself, or to other modules it interacts with or inherits from, don't break it. -hilmar > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Mon Jun 26 11:39:25 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Jun 2006 16:39:25 +0100 Subject: [Bioperl-l] Tests In-Reply-To: References: <449FF2E7.3040101@sendu.me.uk> Message-ID: <449FFFAD.40506@sendu.me.uk> Hilmar Lapp wrote: > > On Jun 26, 2006, at 10:44 AM, Sendu Bala wrote: > >> If I come across a test for a module that doesn't test for everything >> the module can do, should I add tests as a matter of course? Would this >> be beneficial, or a waste of time (given that the module probably is >> bug-free already)? > > It would certainly be beneficial. It'd be great if you were willing to > volunteer for this. I doubt I have time to do this on the global scale[*], but certainly I will for the modules I work on. Cheers, Sendu. * Though... it would certainly be a good way of getting to know all of Bioperl intimately! From bix at sendu.me.uk Mon Jun 26 11:42:33 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Jun 2006 16:42:33 +0100 Subject: [Bioperl-l] Bio::Map changes In-Reply-To: <449A9AF9.2000305@sendu.me.uk> References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk> Message-ID: <44A00069.6010107@sendu.me.uk> Sendu Bala wrote: > The reimplementation will make Position central to the model, allowing > for lots of other things to work properly without anything becoming > inconsistent (as is currently the case). > > The general tidy up will involve redoing and perhaps even removing > things. Does anyone know what the intent behind the split Bio::Map::MappableI and Bio::Map::MarkerI was? I somehow get the impression these started as one interface but then became two. The split /seems/ to be MappableI as a map element with one position on one map, whilst MarkerI is a map element with multiple positions on multiple maps. But MarkerI has no synopsis or description, and MappableI says it does what MarkerI does (but doesn't). So I'm left guessing atm. Do we want to keep the split? If yes, what exactly should be the difference between the two? If no, would it be ok to just get rid of MarkerI (folding it back into MappableI)? From cjfields at uiuc.edu Mon Jun 26 11:45:51 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Jun 2006 10:45:51 -0500 Subject: [Bioperl-l] Tests In-Reply-To: <449FF2E7.3040101@sendu.me.uk> Message-ID: <001a01c69937$9a1c1320$15327e82@pyrimidine> My opinion: tests should cover methods and expected results and are based on what the module actually accomplishes. Some classes (like SeqIO, SearchIO) are normally relatively easy to build tests for b/c the expected results are in the file being parsed. Tests which check calculated results from modules (Bio::Align::DNAStatictics for instance) I would think are trickier since you should confirm the calculations are correct through independent means. Links: http://www.bioperl.org/wiki/Advanced_BioPerl#Designing_Good_Tests http://search.cpan.org/~mschwern/Test-Simple-0.62/lib/Test/Tutorial.pod The link above uses Test::Simple or Test::More; we use Test (but have considered moving to Test::More using Devel::Cover). My 2c Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Monday, June 26, 2006 9:45 AM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Tests > > What level of testing is expected to be done in a test file? Is there > such a thing as too many tests? Tests for every possible (documented) > way of achieving a result with a module's method? Tests for every > conceivable way of misusing a method? > > If I come across a test for a module that doesn't test for everything > the module can do, should I add tests as a matter of course? Would this > be beneficial, or a waste of time (given that the module probably is > bug-free already)? > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Mon Jun 26 12:15:32 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Mon, 26 Jun 2006 17:15:32 +0100 Subject: [Bioperl-l] Bio::Map changes In-Reply-To: <449A9AF9.2000305@sendu.me.uk> References: <44985915.8010607@sendu.me.uk> <449A9AF9.2000305@sendu.me.uk> Message-ID: <44A00824.20002@sendu.me.uk> Sendu Bala wrote: > The reimplementation will make Position central to the model, allowing > for lots of other things to work properly without anything becoming > inconsistent (as is currently the case). To do this I actually need to make some slightly more significant API changes than I had hoped. To make Position central, all maps, mappables and markers need to be able to add and remove Positions (and similar things). As I see it, we can say that such methods are fundamental to the coordination required between Bio::Map modules. I feel that I'm therefore justified in implementing these kinds of methods in the interfaces (which would allow all the downstream modules that implement those interfaces to work in the new system without much/any alteration). Am I justified? Should I try harder to do it without implementations in the interfaces? From pmiguel at purdue.edu Mon Jun 26 12:53:56 2006 From: pmiguel at purdue.edu (Phillip San Miguel) Date: Mon, 26 Jun 2006 12:53:56 -0400 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <001301c69934$909f83c0$15327e82@pyrimidine> References: <001301c69934$909f83c0$15327e82@pyrimidine> Message-ID: <44A01124.5040102@purdue.edu> Chris Fields wrote: > ... > >> However, when view as an issue related to the setting of a >> policy for the whole project, then it makes sense to have a >> policy saying for how long a stable release will be >> supported, and when and in which case bugfixes that are committed >> to and tested in the development branch (as it should be) >> will get merged back to stable. >> > > In a project this large which relies on a lot of outside resources > maintaining API and availability at all times, having a completely bug-free > fix for any reasonable length of time is impossible. As a small example, > almost every time NCBI changes BLAST output, it breaks our text parsers, and > though we recommend using the BLAST XML format parser (which is much more > stable), almost everybody continues using text parsing and wants that fixed. > Now, NCBI routinely changes their BLAST version about every 3-6 months w/o > notification, so remote BLAST parsing can break at any time. Fold into that > any software changes that change output or API (PAML comes to mind). Fold > into that remote database changes (EBI interface to Swissprot). Oh, let's > not forget sequence format changes (recent SwissProt and GenBank changes). > And, worst of all, we can't expect them to maintain API or output b/c > they're updating based on user input/suggestions or bug fixes which require > them to make changes. What's 'stable' about that? > > It's very easy to say you want something and then not volunteer to do it; if > you want something then put forth the time and effort to get it done. Put > your money where your mouth is (as they say in my home state). > > Again (for the third or fourth time now), putting together a release takes > some time and effort. I actually think it takes more effort than Hilmar > suggests; either way, it requires someone to act as the leader (release > pumpkin) to handle changes, and I don't see anybody stepping forward. > Personally, if I have the time, maybe I'll handle an interim release, but > I'm looking for a job starting in the fall as well as finishing up research > for publication so that will take up almost all the time I have. As Hilmar > says, if you want to do it, fine. Realize, though, many many changes have > been made since 1.4 and many more will likely be made on the road to 1.6 > > Hi Chris et al., I was just reporting the situation from where I sit. I think this issue was important enough to bring to everyones attention. I've done so and I'm more than satisfied with the response. I hope my emails were not too abrasive. I've have now read the wiki about coordinating a release. You are right, that does sound hard. At least to me--I've never even used CVS, nor contributed a module to CPAN. I just don't see myself as being qualified to coordinate a 1.4.1 release. So since I'm not, for that reason, able to volunteer to do it myself, I'll withdraw my request for a new release to CPAN. That being said, I think Fernan's suggestion bears keeping in mind once 1.6 has been released and bug fixes are being committed. By that time, I hope I'll be savvy enough to help out in the process. Thanks for your attention, Phillip SanMiguel From fernan at iib.unsam.edu.ar Mon Jun 26 15:24:51 2006 From: fernan at iib.unsam.edu.ar (Fernan Aguero) Date: Mon, 26 Jun 2006 16:24:51 -0300 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: References: <449D9464.6030508@purdue.edu> <87F415B3-9EED-496F-91EB-F8FF63A2DE92@uiuc.edu> <449EDDB6.8020401@purdue.edu> <6F9752AE-B384-447F-969A-498D6EDEE2E0@gmx.net> <20060626124730.GA53298@iib.unsam.edu.ar> Message-ID: <20060626192451.GB53298@iib.unsam.edu.ar> +----[ Hilmar Lapp (26.Jun.2006 11:01): | | On Jun 26, 2006, at 8:47 AM, Fernan Aguero wrote: | | >I'm not knowledgeable enough about the bioperl release | >engineering process, nor about the internal development | >process, but just guessing I'd expect that whenever anyone | >submits a bugfix, it should be the responsibility of | >the committer to check (against the project policy, | >(written or implicit) or with the core developers in a | >difficult case) whether the fix should be committed to more | >than one branch. | > | >A patch like the one that started this thread, should have | >been committed to the 1.4 branch without too much thinking. | >And it would have cost the committer only a few seconds more | >of her/his time. | | Sure. But for some reason he or she forgot. So what do you suggest we | do - and I mean as a community, because this is a community project. | Come after the guy until he commits it to the branch? No, I never said or implied that. | Or post an email to the list saying what you think is the | right way and then do it (yourself)? Of course I could volunteer some of my time to do that (that is, go over the commit history and see what changes could be merged back to 1.4, if that seems to be useful), provided I get a polite reply to my 'email to the list saying what [I] think is the right way'. I'm a volunteer in other open source, community projects, and I do contribute regularly so I see no problem except the obvious scarcity of free time in doing the same for bioperl. | >But you only get this by setting and enforcing a policy. | | Man, this is not a company. Take a step back and think again. What do | you suggest we - again we as a community - do to enforce a policy? | Take increasing levels of disciplinary action if someone keeps | forgetting to commit to the branch? Seems like you were pissed off by what I said ... What I was just trying to say is that merely by formulating and communicating a policy you could be taking steps towards making it a reality. Maybe 'enforcing' was an unfortunate word to use here ... You don't have to punish anyone, just sending a polite email to the list reminding people about the policy once in a while, should be enough. It's OK if some committer doesn't care, or just forgets about doing the right thing once in a while ... But of course, you might be pissed off by me talking about something that I know nothing about (the devleopment of bioperl), given that I'm just a bioperl user. Perhaps my mistake was to bring here ideas from other projects (in which I do contribute regularly) without realizing that, not being a contributor, I could be punished for suggesting how things could be done better. | While there are clearly some rules everybody needs to follow and if | you violate them deliberately and repeatedly you will get your CVS | privileges withdrawn, by and large we as a community need to accept | some responsibility for making the project what we think it should be | - and do so not by invoking disciplinary action but by living by | example and by taking action yourself when you think action is due. I completely agree. When I said 'setting a policy' I just meant something along the lines of clearly stating what are those 'rules everybody needs to follow'. My suggestion was to add a 'merge trivial fixes back to stable' rule to that list. I agree with Jason: why is that such a big deal. | If Bioperl were a company and you asked for a 1.4.1 release and the | customer service rep told you nope there's a 1.5.1 that you should | use instead and that will do just fine, what will you do? Argue with | him about the company policies and whether they are properly enforced | or not? | | Obviously doing so will be a waste of your time. In Bioperl it is at | the bottom of it no less waste of your time, because instead you now | have the opportunity to make happen what you believe needs to happen. Right, but first i have to realize what needs to happen. I realized it when I read your reply to Philips message. I then proceeded to write my thoughts and send them to the list, to see what kind of feedback I get. Hopefully, someone with commit privileges would think that what I said makes sense and just proceed to doing it (saving me from the task :) Or perhaps, someone, as Jason did, would say that it's not worth to try to merge back things to 1.4 and move forward instead. In his message he even explained what the problems and needs are (lack of man-time, need for volunteers) and politely asked for help. | We have had a history of rapidly and un-bureaucratically putting | people in power of what they wanted to do. We have also had a history | of not listening much to people who don't want to put their feet | where their mouth is. I would call your reply (this message) a barrier of entry for new developers. In the above paragraph I guess you are referring to the bioperl motto: 'whoever codes it wins'. That is true in any open source project. But at least to me, that doesn't say that you should not listen to people just because they haven't contributed a single line of code. | I'm sorry if what I'm saying puts people off, but really this is an | open-source project and if you ask me it's one with the least | barriers of entry for new developers or 'activists' that you can find | in the open source arena. Let me disagree. The barriers of entry are not just the giving away of a developer accounts and/or repository write privileges. I'm a regular contributor in another open source, community project (FreeBSD) that has more and higher barriers of entry with respect to giving away privileges (for example for committing changes to the repository). Nonetheless FreeBSD has historically shown to have few and low barriers of entry for incorporating people to the project (without the need to give away commit privileges, making them responsible for parts of the FreeBSD source code/documentation/ports/etc). IMO, that comes from a very good communication of the direction of the project, what needs to be done, how to do it, and a tendency of privileged and older members to listen to people's suggestions, inviting and helping people to jump the fence and become part of the project. It's not an untought occurrence that FreeBSD has ?mentors? that introduce new members, help them to get acquainted with how the project works, policies, etc. and supervise their actions. | This doesn't come without some degree of | anarchy, but really IMHO that's more of an advantage than a | disadvantage. | -hilmar | +----] Fernan PS: finally, let me just add that english is not my native language. Although I'm quite familiar with it, once in a while, an unfortunate choice of words might blur my intented meaning or the strength I wanted to convey. In case that has been the case, let me put clearly that it has not been my intention to criticize the way the project does things, but to suggest ideas for the future (merge back trivial changes to a 'stable' branch as a policy) based on my experience with other projects. Whether that fits bioperl or not was what I would have expected as a reply. From cjfields at uiuc.edu Mon Jun 26 16:18:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Mon, 26 Jun 2006 15:18:40 -0500 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <20060626192451.GB53298@iib.unsam.edu.ar> Message-ID: <002701c6995d$b738f790$15327e82@pyrimidine> > | >A patch like the one that started this thread, should have > | >been committed to the 1.4 branch without too much thinking. > | >And it would have cost the committer only a few seconds more > | >of her/his time. > | > | Sure. But for some reason he or she forgot. So what do you suggest we > | do - and I mean as a community, because this is a community project. > | Come after the guy until he commits it to the branch? > > No, I never said or implied that. Right, you didn't say that. But you didn't clarify your statements either. I think you're treading into dangerous waters when you come in and criticize something w/o bothering to read up on how things have been done here. As you say yourself below, it's 'something that I know nothing about (the devleopment of bioperl), given that I'm just a bioperl user'. It's akin to "I don't think you're coding things correctly, here's the right way to do it" w/o knowing what the code is used for. > | Or post an email to the list saying what you think is the > | right way and then do it (yourself)? > > Of course I could volunteer some of my time to > do that (that is, go over the commit history and see what > changes could be merged back to 1.4, if that seems to be > useful), provided I get a polite reply to my 'email > to the list saying what [I] think is the right way'. You will get a polite email when you respond politely. I actually agree with many things you say, but you sure aren't making any friends here by the way you consistently take the opposite stance and judge what other people do. I think you have a point about having a stable release be supported for a period of time. My point is, how long? We didn't really get an idea of that from you, did we? > I'm a volunteer in other open source, community projects, > and I do contribute regularly so I see no problem except the > obvious scarcity of free time in doing the same for bioperl. And others here also volunteer elsewhere (GMOD, DAS, Ensembl, etc). Don't presume we don't have experience in open-source. That's being pretty judgmental. > | >But you only get this by setting and enforcing a policy. > | > | Man, this is not a company. Take a step back and think again. What do > | you suggest we - again we as a community - do to enforce a policy? > | Take increasing levels of disciplinary action if someone keeps > | forgetting to commit to the branch? > > Seems like you were pissed off by what I said ... ????Ya think???? You know, okay, forget it. This is completely non-productive. We'll all agree to disagree, argue, whatever. The points made here, as I see them: 1) Commits should be made to stable releases (as well as to the main branch in CVS) to fix bugs as long as that release is supported. I agree with this, but someone has to volunteer, and the length of time a release is supported also worked out. Almost would be better going to a regular release schedule (once every 3-6 months or so) where the code is given as is to CPAN, whether it passes tests or not. 2) More communication about the direction Bioperl is heading; personally I haven't see a problem with this as much as there is no information about a roadmap. That is being alleviated soon I believe, thought people out there need to be patient. 3) Volunteer. If you have something you believe needs to be done and you believe so fervently, then put up or shut up. Make (nice polite) suggestions otherwise. Don't judge code or "the way things are done" and don't presume what kind of experience people have that you don't know and haven't met. End of story. Chris From torsten.seemann at infotech.monash.edu.au Mon Jun 26 22:57:47 2006 From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann) Date: Tue, 27 Jun 2006 12:57:47 +1000 Subject: [Bioperl-l] Comments on new PDOC documentation Message-ID: <44A09EAB.2030401@infotech.monash.edu.au> Hello all, I am very happy to see the PDOC software has been improved, as I use the online web documentation frequently. Thanks to Jason, Raphael and Patrick for making this happen. http://doc.bioperl.org/bioperl-live/ Now for some comments... 1. CSS It uses CSS which is excellent, reducing HTML size and allowing easy tweaks to the design. However its current implementation has some issues: A. it seems to only use ID, rather than CLASS, to specify styles. ID values must be unique in a page, and are for one-off styles. CLASS may be re-used throught a page. eg "sub" and "subArea". Many browsers do not enforce this however... B. it seems to be doing unusual, but possibly deliberate, things with the POD when determining what CSS ID to give it, but perhaps this is more to do with how Bioperl formats the POD on some subheadings eg. C. the "Description" sections etc are in a proportional font, but I think it should be "font-family: monospace" as many authors have exploited the traditional monospace of most editors to format their comments, which are now lost 2. FRAMES I notice it still uses HTML Frames. Although this reduces code size also, it makes it impossible to LINK directly to a specific documentation page with all the frames intact. It may be better to use 3 DIV elements which are part of each page, and they could be server-side included so there is no HTML duplication. 3. MERGING OF BIOPERL DOCS One facet of the docs I find frustrating is that bioperl-live and bioperl-run (and the others) are separate! This means that you have to keep switching between them, and more importantly, class-names to classes in other packages are not present; this is particularly bad when browsing bioperl-run. Is there any chance of creating a "merged" bioperl-doc page somehow? 4. STYLE Choice of colours and layouts is such a personal thing. I guess people can download http://doc.bioperl.org/css/perl.css and re-edit it, and get their Browser to over-ride the supplied CSS with their version. 5. CONCLUSION Please don't get the wrong idea, I love the new PDOC, I would just like to love it more. And yes I understand the nightmare that is parsing Perl/POD and generating compatible CSS :-) -- Dr Torsten Seemann http://www.vicbioinformatics.com Victorian Bioinformatics Consortium, Monash University, Australia From bix at sendu.me.uk Tue Jun 27 06:21:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Jun 2006 11:21:57 +0100 Subject: [Bioperl-l] Bio::Score of interest? Message-ID: <44A106C5.9040706@sendu.me.uk> Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033 Is the idea of a Bio::Score of interest? See bug, but basically an object that can handle multiple kinds of scores effectively. I would like to use such a thing in Bioperl, but what standard needs to be met before Bioperl gets a new kind of object? From hlapp at gmx.net Tue Jun 27 08:24:16 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 27 Jun 2006 08:24:16 -0400 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <44A106C5.9040706@sendu.me.uk> References: <44A106C5.9040706@sendu.me.uk> Message-ID: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> So you basically want to attach semantic information to a number, and type the number thereby? If so, an ontology would be the more natural choice (and in the end more flexible one) for expressing this kind of information. Have you looked at the concept of 'quantitation types', e.g. in MAGE (the XML [MGAE-ML] or the object model [MAGE-OM])? There is no quantitation type ontology at a repository I know of. I have used my own ones in the past and they have been pretty useful. -hilmar On Jun 27, 2006, at 6:21 AM, Sendu Bala wrote: > Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033 > Is the idea of a Bio::Score of interest? See bug, but basically an > object that can handle multiple kinds of scores effectively. > > I would like to use such a thing in Bioperl, but what standard > needs to > be met before Bioperl gets a new kind of object? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Tue Jun 27 08:52:05 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Jun 2006 13:52:05 +0100 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> References: <44A106C5.9040706@sendu.me.uk> <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> Message-ID: <44A129F5.3030500@sendu.me.uk> Hilmar Lapp wrote: > So you basically want to attach semantic information to a number, and > type the number thereby? Basically, I want to be able to stick a bunch of (different kinds of) numbers into an object, and later get the 'best' one out (of a particular kind), or sort multiple of those objects. > If so, an ontology would be the more natural choice (and in the end more > flexible one) for expressing this kind of information. I'm not really sure I understand 'and type the number', or what (useful) flexibility doing it with an ontology would provide. > Have you looked at the concept of 'quantitation types', e.g. in MAGE > (the XML [MGAE-ML] or the object model [MAGE-OM])? I had a quick look, but not really sure what you intended to suggest here. > There is no quantitation type ontology at a repository I know of. I have > used my own ones in the past and they have been pretty useful. Can you provide a brief example of what you mean? If it would be appropriate to implement a Bio::Score with an ontology that's fine. Would we want a Bio::Score implemented though? Or are you suggesting each module make it's own quantitation type ontology when it wants to deal with numerous scores? I like the idea of a Bio::Score because then you can compare complex scores from multiple different unrelated modules. From cjfields at uiuc.edu Tue Jun 27 10:08:57 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Jun 2006 09:08:57 -0500 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <44A129F5.3030500@sendu.me.uk> Message-ID: <001e01c699f3$3b6cda50$15327e82@pyrimidine> > Hilmar Lapp wrote: > > So you basically want to attach semantic information to a number, and > > type the number thereby? > > Basically, I want to be able to stick a bunch of (different kinds of) > numbers into an object, and later get the 'best' one out (of a > particular kind), or sort multiple of those objects. The 'best one' might be tricky when dealing with different kinds of scores, esp. scores calculated different ways. For instance, I run RNA motif programs quite frequently (RNAMotif, ERPIN, Infernal), but all generate 'scores' based on different criteria (algorithms, different parameters, how the author slept, and so on). RNAMotif in particular is hard to deal with (though a great program) b/c the scores are based on criteria in the descriptor file (the file used to describe the motif), so aren't comparable to other descriptors, which may have their own method of generating scores, let alone output from other programs. Which one would be 'the best?' It's a bit subjective since the scores are predictive based upon your input, various program limitations, specific program parameter implementations, etc. I do like the idea of grouping together scores for comparison, such as when a particular region of DNA has multiple hits from different programs with different scores. It would at least suffice as a test on how various programs or experimental data would compare with one another. > > If so, an ontology would be the more natural choice (and in the end more > > flexible one) for expressing this kind of information. > > I'm not really sure I understand 'and type the number', or what (useful) > flexibility doing it with an ontology would provide. I'm not sure, but maybe something along the lines of what the number (the score) actually means, especially when compared to other scores. In other words, how you could compare one score or number versus the other. An ontology would allow more complex information to be included along with the score information so one could make more informed choices based on how the score was obtained, the algorithm used, the program involved, etc. Hence flexible. Is that close, Hilmar? To use my RNA program example above, I could include the information about how the scores were obtained, the programs involved, parameters used, the various raw scores, the time it took to run the program, etc. (i.e. you could make it as specific as you wanted). This could also be extended to other data types as well besides program, such as wet bench experimental data and so on, which I deal with quite a bit. I think there are a few XML specs out there besides MAGE that do this as well but I can't think of any off the top of my head. > > Have you looked at the concept of 'quantitation types', e.g. in MAGE > > (the XML [MGAE-ML] or the object model [MAGE-OM])? > > I had a quick look, but not really sure what you intended to suggest here. I think the idea is that MAGE, strictly as an example, deals with microarray data from different sources or different data systems for comparison. Sounds a little like what you want to do. > > There is no quantitation type ontology at a repository I know of. I have > > used my own ones in the past and they have been pretty useful. > > Can you provide a brief example of what you mean? > > If it would be appropriate to implement a Bio::Score with an ontology > that's fine. Would we want a Bio::Score implemented though? Or are you > suggesting each module make it's own quantitation type ontology when it > wants to deal with numerous scores? > > I like the idea of a Bio::Score because then you can compare complex > scores from multiple different unrelated modules. Which is what MAGE does in a way, but more specifically, i.e. just microarray data from different sources. So the array data may be calculated in different ways based upon the specs for different machines, the way array slides were prepared, how the experimenter slept, etc. Chris From hlapp at gmx.net Tue Jun 27 10:27:55 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 27 Jun 2006 10:27:55 -0400 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <44A129F5.3030500@sendu.me.uk> References: <44A106C5.9040706@sendu.me.uk> <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> <44A129F5.3030500@sendu.me.uk> Message-ID: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net> I would have suggested initiating a quantitation type ontology, not one individual per module. An ontology would capture all your semantic information (min/max or range, higher or lower is better, what is a reasonable default [not sure there would be one], etc) and you would have a hierarchical structure. You type a score by associating it with an ontology term: BLAST_e-value is-a expectation_value expectation_value has-min-value 0 expectation_value has-max-value positive_infinity BLAST_p-value is-a probability_value probability_value has-min-value 0 probability_value has-max-value 1 etc and then something being an expectation_value for instance would imply several attributes laid down in the ontology (probably through has-a statements). It seems to me that essentially what you are trying to do is capturing knowledge for particular types of scores, which you would then use in more general purpose programs to sort from more to less significant, and possibly filter? If so, then hard-coding this into objects (all over the place or in a single place) is typically not the best practice; rather, the usual best-practice approach is using (and if necessary, constructing) an ontology. This is also the most re-usable approach. -hilmar On Jun 27, 2006, at 8:52 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> So you basically want to attach semantic information to a number, and >> type the number thereby? > > Basically, I want to be able to stick a bunch of (different kinds of) > numbers into an object, and later get the 'best' one out (of a > particular kind), or sort multiple of those objects. > > >> If so, an ontology would be the more natural choice (and in the >> end more >> flexible one) for expressing this kind of information. > > I'm not really sure I understand 'and type the number', or what > (useful) > flexibility doing it with an ontology would provide. > > >> Have you looked at the concept of 'quantitation types', e.g. in MAGE >> (the XML [MGAE-ML] or the object model [MAGE-OM])? > > I had a quick look, but not really sure what you intended to > suggest here. > > >> There is no quantitation type ontology at a repository I know of. >> I have >> used my own ones in the past and they have been pretty useful. > > Can you provide a brief example of what you mean? > > If it would be appropriate to implement a Bio::Score with an ontology > that's fine. Would we want a Bio::Score implemented though? Or are you > suggesting each module make it's own quantitation type ontology > when it > wants to deal with numerous scores? > > I like the idea of a Bio::Score because then you can compare complex > scores from multiple different unrelated modules. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bix at sendu.me.uk Tue Jun 27 11:25:06 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Jun 2006 16:25:06 +0100 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net> References: <44A106C5.9040706@sendu.me.uk> <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> <44A129F5.3030500@sendu.me.uk> <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net> Message-ID: <44A14DD2.7000402@sendu.me.uk> Hilmar Lapp wrote: > I would have suggested initiating a quantitation type ontology, not one > individual per module. Where would such a thing 'live'? Would it be some static file somewhere that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology that can added to by a module when it needs extra terms to describe its particular kind of scores? > An ontology would capture all your semantic information [snip] Thanks, I agree that an ontology would be the way to do it... > It seems to me that essentially what you are trying to do is capturing > knowledge for particular types of scores, which you would then use in > more general purpose programs to sort from more to less significant, and > possibly filter? Yes. > If so, then hard-coding this into objects (all over the > place or in a single place) is typically not the best practice; rather, > the usual best-practice approach is using (and if necessary, > constructing) an ontology. This is also the most re-usable approach. Not having any experience with ontolgies, I can't think how this would all be done in practice though. Don't we need some central module (Bio::Score) to create the ontology (or read it in) and then present some suitable interface to it? For example, modules that wanted to store some scores might just ask Bio::Score for the ontology and type their scores by associating with an available ontology term, creating new terms if necessary (or is that something you would never do; the ontology needed to have been set up to cover all possible terms?). Then when the user has a bunch of these typed scores, surely he doesn't want to deal with going through the ontology himself to work out what it all means? Well, he could if he needs that level of control, but also he just wants to say Bio::Score->sort(x y z) or something. From bix at sendu.me.uk Tue Jun 27 12:13:46 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Jun 2006 17:13:46 +0100 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: References: Message-ID: <44A1593A.809@sendu.me.uk> Cook, Malcolm wrote: > > All this semantic cruft is overkill for a moving target and will never > settle down until your analysis results are no longer relevant. I'm not sure what you mean by that. What moves? An evalue will always be an evalue. Once you know that you are in fact dealing with an evalue, and once your sorting algorithm knows that lower evalues are better, nothing changes. Likewise for other kinds of scores. Instead of having to discover that a particular program is giving you an evalue, and then writing code to deal with an evalue appropriately, I thought it would be nicer to have a single module that knew how to deal with it already. From MEC at stowers-institute.org Tue Jun 27 12:01:45 2006 From: MEC at stowers-institute.org (Cook, Malcolm) Date: Tue, 27 Jun 2006 11:01:45 -0500 Subject: [Bioperl-l] Bio::Score of interest? Message-ID: For the use case of TFBS analysis demonstrated in the attachment to the bug, I would expect to find potentially three scores, ala, {evalue, bitscore, and percentmatch}. To deal with this in existing framework (i.e. GFF/bioperl analysis modules/TFBS), I would try to make GFFx eat scalars as scores and pack the three values into a string and unpack them as needed for sorting, etc. Else put the one score I know I'm going to 'use' in a particular analysis into 'score' and adorn column 9 with the rest. All this semantic cruft is overkill for a moving target and will never settle down until your analysis results are no longer relevant. my $.02 --Malcolm >-----Original Message----- >From: bioperl-l-bounces at lists.open-bio.org >[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Sendu Bala >Sent: Tuesday, June 27, 2006 5:22 AM >To: bioperl-l at lists.open-bio.org >Subject: [Bioperl-l] Bio::Score of interest? > >Please see http://bugzilla.bioperl.org/show_bug.cgi?id=2033 >Is the idea of a Bio::Score of interest? See bug, but basically an >object that can handle multiple kinds of scores effectively. > >I would like to use such a thing in Bioperl, but what standard >needs to >be met before Bioperl gets a new kind of object? >_______________________________________________ >Bioperl-l mailing list >Bioperl-l at lists.open-bio.org >http://lists.open-bio.org/mailman/listinfo/bioperl-l > From bix at sendu.me.uk Tue Jun 27 14:07:44 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 27 Jun 2006 19:07:44 +0100 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <001e01c699f3$3b6cda50$15327e82@pyrimidine> References: <001e01c699f3$3b6cda50$15327e82@pyrimidine> Message-ID: <44A173F0.4040302@sendu.me.uk> Chris Fields wrote: >> Hilmar Lapp wrote: >>> So you basically want to attach semantic information to a number, and >>> type the number thereby? >> Basically, I want to be able to stick a bunch of (different kinds of) >> numbers into an object, and later get the 'best' one out (of a >> particular kind), or sort multiple of those objects. > > The 'best one' might be tricky when dealing with different kinds of scores, > esp. scores calculated different ways. I didn't make myself very clear, but you don't compare different kinds of scores. When you want to compare two different Score objects, each of which may contain multiple different kinds of scores, you pick the kind of score you're interested in, and for that kind of score ask which object has the 'best' score. I can't readily think of any exceptions to the rule that 'best' is either the higher score or the lower score, depending on what kind of score you've chosen. I may not have made myself clear in another way. One of the ideas behind a Bio::Score is to have a container object for multiple different kinds of scores (and even multiple values per kind) all generated by one program in one analysis on one data set. The container then lets you pick the kind of score you want to work with and compare its scores with those in other Bio::Score objects that contain the same kind of score (most probably, ones made by the same analysis program but on different data sets). Furthermore, the kind of score you want to work with could have multiple values from that single analysis. So the container also lets you summarise these values (eg. average them) before trying to compare with another Score object. Often, it may be that for a certain kind of score it makes sense (it is intended by the score-generating program) to always summarise the values in a certain way. So the container needs to know about that and 'do the right thing' so the user can just compare things without having to trouble himself. So this is why I feel that to just 'use an ontology' isn't enough. Certainly one ought to be used when defining the kinds, but you need some single interface with useful methods that lets you deal with the actual score values easily. From cjfields at uiuc.edu Tue Jun 27 14:56:40 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Tue, 27 Jun 2006 13:56:40 -0500 Subject: [Bioperl-l] How to handle bugs in bioperl 1.4 on CPAN? In-Reply-To: <20060627181439.GD51742@iib.unsam.edu.ar> Message-ID: <000a01c69a1b$6d0338c0$15327e82@pyrimidine> > | 1) Commits should be made to stable releases (as well as to the main > branch > | in CVS) to fix bugs as long as that release is supported. I agree with > | this, but someone has to volunteer, and the length of time a release is > | supported also worked out. > > I volunteer to do that (merge approved changes/fixes back to > a stable branch), though as said by others, 1.4 may not be > the most appropriate 'stable' branch, as too many changes > have accumulated, and maybe it's not worth it. But I could > do that for the next 'stable' release, 1.6 or 2.0 whichever > comes next. > > As per the length of time, I would say that a stable release > should be supported at least until another 'stable' release > is made. Or until it's no longer being used in production > setups, which is only feasible to know in small > communities. I'm posting this to the mail list so that others can respond. Kevin Brown (in a response to me) made some good points about updating and maintaining stable releases in that only bug fixes are committed (i.e. no refactoring, no new modules or features). I personally wouldn't have a problem in someone doing this, releasing periodic updates to stable or developer releases to fix bugs only but I may be in the minority here. The rest of the core guys and others need to also speak their thoughts. I hate forwarding this to Jason since he's in the middle of getting ready for a move but I think this is important enough to do so. I can say that I am unequivocally against updating 1.4. Too much has changed since then and I think it would be a mess trying to figure out what bug fixes to include, etc. I also am very much against placing developer's releases in CPAN; those releases are not intended to be completely stable as they may be implementing new features that haven't been tested completely and may contain various other bugs. v 1.5.1 is remarkably stable for a developer's release but several bug fixes have been made since. If someone wants to try out the developer's versions or bioperl-live they are most welcome to it; the web site docs give all the instructions one needs to install from pretty much any platform. Beyond that, I'm spent on this thread. Chris From lstein at cshl.edu Tue Jun 27 18:35:08 2006 From: lstein at cshl.edu (Lincoln Stein) Date: Tue, 27 Jun 2006 18:35:08 -0400 Subject: [Bioperl-l] Output a subset of FASTA data from a single large file In-Reply-To: References: Message-ID: <200606271835.09558.lstein@cshl.edu> Hi All, This is rather late, but just for future reference on the mailing list, here is how I would do the task using Bio::DB::Fasta. Script 1: index the file for future use: #!/usr/bin/perl use strict; use Bio::DB::Fasta; my $filename = shift; # name of file to index on command line Bio::DB::Fasta->new($filename,-makeid=>\&make_my_id) or die "Indexing failed"; print "Indexing succeeded!\n"; exit 0; sub make_my_id { my $description_line = shift; $description_line =~ /(\d+_at)/ or die "malformed description line"; return $1; } Run this script once to create a reusable index of the file. The index will be stored in the same directory as the FASTA file. Script 2: extract the sequences using the IDs stored in a second file: #!/usr/bin/perl use strict; use Bio::DB::Fasta; use Bio::SeqIO; use IO::File; my $indexed_fasta_file = shift; my $probe_id_file = shift; # open up the indexed fasta file my $db = Bio::DB::Fasta->new($indexed_fasta_file) or die; # open up a FASTA writer my $out = Bio::SeqIO->new(-format=>'Fasta',-fh=>\*STDOUT) or die; # open the probe id file my $in = IO::File->new($probe_id_file) or die; # do the work while (my $id = <$in>) { chomp $id; my $seq = $db->get_Seq_by_id($id) or die; $out->write_seq($seq); } exit 0; Bio::Index::Fasta will work in almost exactly the same way. The only difference is that the Bio::DB::Fasta will allow you to retrieve subsequences efficiently. Lincoln -- Lincoln D. Stein Cold Spring Harbor Laboratory 1 Bungtown Road Cold Spring Harbor, NY 11724 (516) 367-8380 (voice) (516) 367-8389 (fax) FOR URGENT MESSAGES & SCHEDULING, PLEASE CONTACT MY ASSISTANT, SANDRA MICHELSEN, AT michelse at cshl.edu From awitney at sgul.ac.uk Tue Jun 27 10:08:20 2006 From: awitney at sgul.ac.uk (Adam Witney) Date: Tue, 27 Jun 2006 15:08:20 +0100 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> References: <44A106C5.9040706@sendu.me.uk> <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> Message-ID: <44A13BD4.60802@sgul.ac.uk> > Have you looked at the concept of 'quantitation types', e.g. in MAGE > (the XML [MGAE-ML] or the object model [MAGE-OM])? the MGED Ontology has a concept of quantitation type if that helps http://mged.sourceforge.net/ontologies/MGEDontology.php#QuantitationType -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. From william.hsiao at gmail.com Tue Jun 27 15:52:03 2006 From: william.hsiao at gmail.com (William Hsiao) Date: Tue, 27 Jun 2006 12:52:03 -0700 Subject: [Bioperl-l] strange error parsing a specific NCBI gff file Message-ID: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com> Hi all, I've encountered a strange problem while parsing a gff file from NCBI using perl. I'm hoping that someone on the list may have a solution even though this is not a bioperl issue. Maybe someone familiar with gff3 parsing can help :) Essentially, I'm parsing a gff file into a nested hash structure using the following functions: sub parse_gff { my $file = shift; my %hash_gff; open (INFILE, $file) or die "Cannot find file $file\n"; while(){ next if (/^\#/); chomp; my ($seqid, $source, $type, $start, $end, $score, $strand, $phase, $attributes) = split /\t/; my $attri_ref = &process_attributes($attributes); my %record = ('seqid' => $seqid, 'source' => $source, 'type' => $type, 'start' => $start, 'end' => $end, 'score' => $score, 'strand' => $strand, 'phase' => $phase, 'attribute' => $attri_ref); push @{$hash_gff{$type}}, \%record; } close INFILE; print Dumper %hash_gff; return \%hash_gff; } sub process_attributes { my $attr_string = shift; my @attributes = split (/\;/, $attr_string); my %attr; foreach (@attributes){ my ($key, $value) = split /=/; if ($value=~/\:/){ my ($subkey, $subvalue) = split (/:/, $value); $attr{$key}{$subkey}=$subvalue; } else{ $attr{$key}=$value; } } return \%attr; } It works for all the gff files we downloaded from NCBI's microbial genomes refseq ftp repository. However, 3 lines from one particular file NC_005966.gff (of Acinetobacter_sp_ADP1) can not be parsed properly. These lines are: NC_005966.1 RefSeq CDS 635836 636489 . - 0 locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1 NC_005966.1 RefSeq start_codon 636487 636489 . - 0 locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1 NC_005966.1 RefSeq stop_codon 635833 635835 . - 0 locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1 They generate an error: Can't use string ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use. The strange part is that all I have to do is replace the word "function" in front of "=adaptation%20to%20stress;" with another word or simply change it to functions or functio or Function, etc, then the line parses properly. If I retype the word "function", it doesn't solve the problem. For some strange reason, when the word "function" is there, perl tried to use "adaptation%20to%20stress" as the hash key and failed. The word "function" is used in other lines as well so I don't think the problem is not caused by the word alone. Any suggestion on what might be happening would be greatly appreciated. Thank you. Cheers, Will -- William Hsiao PhD Student, Brinkman Laboratory Department of Molecular Biology and Biochemistry Simon Fraser University, 8888 University Dr. Burnaby, BC, Canada V5A 1S6 Phone: 604-291-4206 Fax: 604-291-5583 From bix at sendu.me.uk Wed Jun 28 04:25:52 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Jun 2006 09:25:52 +0100 Subject: [Bioperl-l] strange error parsing a specific NCBI gff file In-Reply-To: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com> References: <679a35b20606271252x1a321756m51d203b28c259d40@mail.gmail.com> Message-ID: <44A23D10.1010308@sendu.me.uk> William Hsiao wrote: > > sub process_attributes { > my $attr_string = shift; > my @attributes = split (/\;/, $attr_string); > my %attr; > foreach (@attributes){ > my ($key, $value) = split /=/; > if ($value=~/\:/){ > my ($subkey, $subvalue) = split (/:/, $value); # assign hashref to $key, assign key => value pair to that > $attr{$key}{$subkey}=$subvalue; > } > else{ # assign scalar $key > $attr{$key}=$value; > } > } > return \%attr; > } > NC_005966.1 RefSeq CDS 635836 636489 . - 0 locus_tag=ACIAD0647;function=adaptation%20to%20stress;function=protection%20%28MultiFun:5.5%29;note=Multifun:5.6%0AEvidence%203%20:%20Function%20proposed%20based%20on%20presence%20of%20conserved%20amino%20acid%20motif%2C%20structural%20feature%20or%20limited%20homolgy;inference=non-experimental%20evidence%2C%20no%20additional%20details%20recorded;transl_table=11;product=putative%20antioxidant%20protein;protein_id=YP_045389.1;db_xref=GI:50083879;db_xref=GeneID:2878732;exon_number=1 > They generate an error: Can't use string > ("adaptation%20to%20stress") as a HASH ref while "strict refs" in use. > The strange part is that all I have to do is replace the word > "function" in front of "=adaptation%20to%20stress;" with another word > or simply change it to functions or functio or Function, etc, then the > line parses properly. The problem is that these lines contain function=x twice, where the second x contains a colon. So your code first assigns $attr{function} = $scalar, and then tries to do $attr{function}{before_colon} = "after_colon". Normally the latter would auto-vivicate $attr{function} as a hash reference: $attr{function} == HASH(xyz) and then set before_colon => after_colon as a key value pair of HASH(xyz). But in this case, $attr{function} already exists: $attr{function} == "adaptation%20to%20stress". But you try and set before_colon => after_colon as a key value pair of that string. Which you can't do. Basically, your data structure isn't so great, mixing scalars and hash references as values of %attr. The solution may be to parse using Bioperl instead ;). From selvik at ufl.edu Tue Jun 27 08:54:48 2006 From: selvik at ufl.edu (Kadirvel, Selvi) Date: Tue, 27 Jun 2006 08:54:48 -0400 (EDT) Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, evalue, description) Message-ID: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu> All, (I am new to Bioinformatics and Bioperl, so please apologize if I get my terminology wrong) I am currently using Bio::SearchIO to parse HMMPFAM reports. This report consists of three sections namely; 1. A ranked list of the best scoring HMMs 2. A list of the best scoring domains in order of their occurrence in the sequence 3. Alignments for all the best scoring domains. Section 3 can be truncated to a specific number using the ??A? option when building the report. Though the Bio::SearchIO::hmmer module parses through the entire HMMER report (Section 1, 2 and 3), the set of values made available through Bio::Search::Result::ResultI seem to be using Section 3 alone. So when we use the ?A option to truncate, we lose otherwise useful information in Section 1. This information is lost (only) for those models that do not have any of their domains in the top ?A number of? best scoring domains. The fields that are not available are: 1. Description of a model 2. Score of a model 3. Evalue of a model If I use the older Bio::Tools::HMMER:Results module, NEITHER Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to retrieve the above listed values. Scores and Evalues are available for each domain but not for the model it belongs to. I was wondering if there is any other method to access these values or do I have to write my own module to do this? Any ideas/suggestions would be greatly appreciated. Thank you! Selvi Kadirvel Graduate Research Assistant High Performance Computing Center University of Florida From hlapp at gmx.net Tue Jun 27 20:18:36 2006 From: hlapp at gmx.net (Hilmar Lapp) Date: Tue, 27 Jun 2006 20:18:36 -0400 Subject: [Bioperl-l] Bio::Score of interest? In-Reply-To: <44A14DD2.7000402@sendu.me.uk> References: <44A106C5.9040706@sendu.me.uk> <43499206-5A11-4162-B1ED-72F5AC92C9B4@gmx.net> <44A129F5.3030500@sendu.me.uk> <46797034-EF95-45A4-9AA6-EE6FEA3A5270@gmx.net> <44A14DD2.7000402@sendu.me.uk> Message-ID: On Jun 27, 2006, at 11:25 AM, Sendu Bala wrote: > Hilmar Lapp wrote: >> I would have suggested initiating a quantitation type ontology, >> not one >> individual per module. > > Where would such a thing 'live'? Would it be some static file > somewhere > that gets read in with Bio::OntologyIO? Or an in-memory Bio::Ontology > that can added to by a module when it needs extra terms to describe > its > particular kind of scores? For instance, yes. Once you read in an ontology (through Bio::OntologyIO indeed) it sits essentially in memory. > [...] > Not having any experience with ontolgies, I can't think how this would > all be done in practice though. Don't we need some central module > (Bio::Score) to create the ontology (or read it in) and then present > some suitable interface to it? Possibly - the problem is how to get the ontology=typed term given an analysis program and attribute name (e.g. 'score' of a feature object). There is no method for doing this on a feature object and bolting one on would be a bad idea I think. So, the Bio::Score would be a little hybrid between an objectified score value that now doesn't just have a numeric value but also a type term, and a factory for creating the ontology (e.g., by reading it in from a specified or default location). I.e., you'd have my $value = $score->value(); my $type = $score->type(); # $type is-a Bio::Ontology::TermI my $quant_ont = $type->ontology(); # see what type of score we have my @ancestors = $quant_ont->get_ancestor_terms($type); if (grep {$_->name eq 'expectation_value'} @ancestors) { # it's an e-value } elsif ( ...test for some other type...) { # etc } > For example, modules that wanted to store > some scores might just ask Bio::Score for the ontology and type their > scores by associating with an available ontology term, creating new > terms if necessary (or is that something you would never do; the > ontology needed to have been set up to cover all possible terms?). Yes. You'd extend it as you encounter types that aren't in the ontology yet, until the ontology fully captures the knowledge domain. > Then > when the user has a bunch of these typed scores, surely he doesn't > want > to deal with going through the ontology himself to work out what it > all > means? Well, he could if he needs that level of control, but also he > just wants to say Bio::Score->sort(x y z) or something. See above for a quick example of the logic. I'd separate that into its own module, like Bio::Score::Utils. -hilmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at uiuc.edu Wed Jun 28 10:29:17 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 09:29:17 -0500 Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, evalue, description) In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu> Message-ID: <002b01c69abf$3cc0fef0$15327e82@pyrimidine> Selvi, Can you send me the report you are trying to parse as an attachment? I'll give it a look. Judging by the pdoc this is mapped for the event handler so it should be there. From the %MAPPING hash: 'HMMER_program' => 'RESULT-algorithm_name', 'HMMER_version' => 'RESULT-algorithm_version', 'HMMER_query-def' => 'RESULT-query_name', 'HMMER_query-len' => 'RESULT-query_length', 'HMMER_query-acc' => 'RESULT-query_accession', 'HMMER_querydesc' => 'RESULT-query_description', 'HMMER_hmm' => 'RESULT-hmm_name', 'HMMER_seqfile' => 'RESULT-sequence_file', 'HMMER_db' => 'RESULT-database_name', Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi > Sent: Tuesday, June 27, 2006 7:55 AM > To: bioperl-l at lists.open-bio.org > Cc: selvik at ufl.edu > Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, > evalue, description) > > All, > > (I am new to Bioinformatics and Bioperl, so please apologize if I > get my terminology wrong) > > I am currently using Bio::SearchIO to parse HMMPFAM reports. This > report consists of three sections namely; > > 1. A ranked list of the best scoring HMMs > 2. A list of the best scoring domains in order of their occurrence > in the sequence > 3. Alignments for all the best scoring domains. > > Section 3 can be truncated to a specific number using the ??A? > option when building the report. > > Though the Bio::SearchIO::hmmer module parses through the entire > HMMER report (Section 1, 2 and 3), the set of values made > available through Bio::Search::Result::ResultI seem to be using > Section 3 alone. So when we use the ?A option to truncate, we lose > otherwise useful information in Section 1. This information is > lost (only) for those models that do not have any of their domains > in the top ?A number of? best scoring domains. The fields that are > not available are: > > 1. Description of a model > 2. Score of a model > 3. Evalue of a model > > If I use the older Bio::Tools::HMMER:Results module, NEITHER > Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to > retrieve the above listed values. Scores and Evalues are available > for each domain but not for the model it belongs to. > > I was wondering if there is any other method to access these > values or do I have to write my own module to do this? > > Any ideas/suggestions would be greatly appreciated. > > Thank you! > > > > > Selvi Kadirvel > > Graduate Research Assistant > High Performance Computing Center > University of Florida > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Jun 28 10:55:31 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 09:55:31 -0500 Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, evalue, description) In-Reply-To: <002b01c69abf$3cc0fef0$15327e82@pyrimidine> Message-ID: <003501c69ac2$e70623b0$15327e82@pyrimidine> I hate responding to myself!! Forgot to add that there is also Bio::Tools::Hmmpfam : http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam I'll check if Bio::SearchIO catches this data and let you know what I find out. It should at least some according to the mapping. Chris > Selvi, > > Can you send me the report you are trying to parse as an attachment? I'll > give it a look. > > Judging by the pdoc this is mapped for the event handler so it should be > there. From the %MAPPING hash: > > 'HMMER_program' => 'RESULT-algorithm_name', > 'HMMER_version' => 'RESULT-algorithm_version', > 'HMMER_query-def' => 'RESULT-query_name', > 'HMMER_query-len' => 'RESULT-query_length', > 'HMMER_query-acc' => 'RESULT-query_accession', > 'HMMER_querydesc' => 'RESULT-query_description', > 'HMMER_hmm' => 'RESULT-hmm_name', > 'HMMER_seqfile' => 'RESULT-sequence_file', > 'HMMER_db' => 'RESULT-database_name', > > Chris > > > -----Original Message----- > > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > > bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi > > Sent: Tuesday, June 27, 2006 7:55 AM > > To: bioperl-l at lists.open-bio.org > > Cc: selvik at ufl.edu > > Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, > > evalue, description) > > > > All, > > > > (I am new to Bioinformatics and Bioperl, so please apologize if I > > get my terminology wrong) > > > > I am currently using Bio::SearchIO to parse HMMPFAM reports. This > > report consists of three sections namely; > > > > 1. A ranked list of the best scoring HMMs > > 2. A list of the best scoring domains in order of their occurrence > > in the sequence > > 3. Alignments for all the best scoring domains. > > > > Section 3 can be truncated to a specific number using the ??A? > > option when building the report. > > > > Though the Bio::SearchIO::hmmer module parses through the entire > > HMMER report (Section 1, 2 and 3), the set of values made > > available through Bio::Search::Result::ResultI seem to be using > > Section 3 alone. So when we use the ?A option to truncate, we lose > > otherwise useful information in Section 1. This information is > > lost (only) for those models that do not have any of their domains > > in the top ?A number of? best scoring domains. The fields that are > > not available are: > > > > 1. Description of a model > > 2. Score of a model > > 3. Evalue of a model > > > > If I use the older Bio::Tools::HMMER:Results module, NEITHER > > Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to > > retrieve the above listed values. Scores and Evalues are available > > for each domain but not for the model it belongs to. > > > > I was wondering if there is any other method to access these > > values or do I have to write my own module to do this? > > > > Any ideas/suggestions would be greatly appreciated. > > > > Thank you! > > > > > > > > > > Selvi Kadirvel > > > > Graduate Research Assistant > > High Performance Computing Center > > University of Florida > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Wed Jun 28 11:04:29 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Jun 2006 16:04:29 +0100 Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, evalue, description) In-Reply-To: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu> References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu> Message-ID: <44A29A7D.7020602@sendu.me.uk> Kadirvel, Selvi wrote: > All, > > (I am new to Bioinformatics and Bioperl, so please apologize if I > get my terminology wrong) > > I am currently using Bio::SearchIO to parse HMMPFAM reports. This > report consists of three sections namely; > > 1. A ranked list of the best scoring HMMs > 2. A list of the best scoring domains in order of their occurrence > in the sequence > 3. Alignments for all the best scoring domains. > > Section 3 can be truncated to a specific number using the ??A? > option when building the report. What do you mean by this? What is ??A? ? Is this an option you're supplying to hmmpfam or a bioperl module? > Though the Bio::SearchIO::hmmer module parses through the entire > HMMER report (Section 1, 2 and 3), the set of values made > available through Bio::Search::Result::ResultI seem to be using > Section 3 alone. So when we use the ?A option to truncate, we lose > otherwise useful information in Section 1. This information is > lost (only) for those models that do not have any of their domains > in the top ?A number of? best scoring domains. The fields that are > not available are: > > 1. Description of a model > 2. Score of a model > 3. Evalue of a model Each hit you get back from each result of the SearchIO is a Bio::Search::Hit::HMMERHit and represents the results of a particular model (you can also say $result->next_model). So you can say: $hit->name, " ", $hit->description, " ", $hit->significance, " ", $hit->score; To get the information you want. General information about the result can be had like so: print $result->query_name, " ", $result->algorithm, " ", $result->hmm_name, "\n"; I have another problem (or the same one as you? I'm can't tell...) in that I can only get a single result, hit and hsp from my hmmpfam file! It is doing my head in, but I might be doing something wrong so will look into it further before posting a bug report. From bix at sendu.me.uk Wed Jun 28 12:46:57 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Wed, 28 Jun 2006 17:46:57 +0100 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A29A7D.7020602@sendu.me.uk> References: <648707510.16101151412888032.JavaMail.osg@osgjas03.cns.ufl.edu> <44A29A7D.7020602@sendu.me.uk> Message-ID: <44A2B281.7030806@sendu.me.uk> Sendu Bala wrote: [ from thread Bio::SearchIO - Accessing Model parameters (score, evalue, description) ] [ concerning hmmpfam output ] > I have another problem (or the same one as you? I'm can't tell...) in > that I can only get a single result, hit and hsp from my hmmpfam file! > It is doing my head in, but I might be doing something wrong so will > look into it further before posting a bug report. I was just doing something wrong, but... Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report a single HSP per Hit so domains with multiple alignments get separate Hits (more FASTA like) since they aren't really HSPs' Strangely 1.25 (Bioperl 1.4) seems to behave like that already. In any case, this is extremely counter-intuitive, especially given that next_domain is a synonym of next_hsp. I think either the synonym relationship remains and hits have multiple hsps (and there is only one hit per model), or next_domain goes off and finds the hsp that is the next domain of the current model. But that would be incredibly broken in the current model since it would be found in a different hit object... What hmmpfam does is take a database of models which can be thought of as database sequences. Then it aligns each one against your query sequences. A model could align in multiple locations along a query sequence. Each one of these locations is called a domain of the model. A user of hmmpfam is model-centric (wants to know which models are on his query), and so you want to know all about how well the model did in one go. So you should be able to get the results for a model ($hit = $result->next_model), get overall info about it ($hit->score etc.), then get more detailed information about each domain of it (while ($hsp = $hit->next_domain) {...}). But right now you only get one domain and you have to go searching through all your other hits to find a hit with the same ->name() as your model of interest to get the next domain of your model. In my view this is less than ideal. What do people think? Should it be changed? From selvik at ufl.edu Wed Jun 28 11:21:37 2006 From: selvik at ufl.edu (Selvi Kadirvel) Date: Wed, 28 Jun 2006 11:21:37 -0400 Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, evalue, description) In-Reply-To: <003501c69ac2$e70623b0$15327e82@pyrimidine> References: <003501c69ac2$e70623b0$15327e82@pyrimidine> Message-ID: <2679E8D1-E225-4414-8925-1EB73B83523B@ufl.edu> Thanks for your reply Chris. I am attaching a part of the report I am trying to parse. Also I see that, Bio::SearchIO::hmmer.pm is parsing all three sections. I am not sure how (or whether) fields from Section 1 are actually being made available through Bio::SearchIO or Bio::Search:: [Hit | Hsp | Result]. I'll look into Bio::Tools::Hmmpfam and let you know if that works for me. -Selvi -------------- next part -------------- A non-text attachment was scrubbed... Name: ManyQueries.hmmer Type: application/octet-stream Size: 3684451 bytes Desc: not available Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060628/53dcc875/attachment-0001.obj -------------- next part -------------- On Jun 28, 2006, at 10:55 AM, Chris Fields wrote: > I hate responding to myself!! Forgot to add that there is also > Bio::Tools::Hmmpfam : > > http://www.bioperl.org/wiki/Module:Bio::Tools::Hmmpfam > > I'll check if Bio::SearchIO catches this data and let you know what > I find > out. It should at least some according to the mapping. > > Chris > >> Selvi, >> >> Can you send me the report you are trying to parse as an >> attachment? I'll >> give it a look. >> >> Judging by the pdoc this is mapped for the event handler so it >> should be >> there. From the %MAPPING hash: >> >> 'HMMER_program' => 'RESULT-algorithm_name', >> 'HMMER_version' => 'RESULT-algorithm_version', >> 'HMMER_query-def' => 'RESULT-query_name', >> 'HMMER_query-len' => 'RESULT-query_length', >> 'HMMER_query-acc' => 'RESULT-query_accession', >> 'HMMER_querydesc' => 'RESULT-query_description', >> 'HMMER_hmm' => 'RESULT-hmm_name', >> 'HMMER_seqfile' => 'RESULT-sequence_file', >> 'HMMER_db' => 'RESULT-database_name', >> >> Chris >> >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Kadirvel, Selvi >>> Sent: Tuesday, June 27, 2006 7:55 AM >>> To: bioperl-l at lists.open-bio.org >>> Cc: selvik at ufl.edu >>> Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters >>> (score, >>> evalue, description) >>> >>> All, >>> >>> (I am new to Bioinformatics and Bioperl, so please apologize if I >>> get my terminology wrong) >>> >>> I am currently using Bio::SearchIO to parse HMMPFAM reports. This >>> report consists of three sections namely; >>> >>> 1. A ranked list of the best scoring HMMs >>> 2. A list of the best scoring domains in order of their occurrence >>> in the sequence >>> 3. Alignments for all the best scoring domains. >>> >>> Section 3 can be truncated to a specific number using the ??A? >>> option when building the report. >>> >>> Though the Bio::SearchIO::hmmer module parses through the entire >>> HMMER report (Section 1, 2 and 3), the set of values made >>> available through Bio::Search::Result::ResultI seem to be using >>> Section 3 alone. So when we use the ?A option to truncate, we lose >>> otherwise useful information in Section 1. This information is >>> lost (only) for those models that do not have any of their domains >>> in the top ?A number of? best scoring domains. The fields that are >>> not available are: >>> >>> 1. Description of a model >>> 2. Score of a model >>> 3. Evalue of a model >>> >>> If I use the older Bio::Tools::HMMER:Results module, NEITHER >>> Bio::Tools::HMMER::Domain or Bio::Tools::HMMER::Set allow me to >>> retrieve the above listed values. Scores and Evalues are available >>> for each domain but not for the model it belongs to. >>> >>> I was wondering if there is any other method to access these >>> values or do I have to write my own module to do this? >>> >>> Any ideas/suggestions would be greatly appreciated. >>> >>> Thank you! >>> >>> >>> >>> >>> Selvi Kadirvel >>> >>> Graduate Research Assistant >>> High Performance Computing Center >>> University of Florida >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From akarger at CGR.Harvard.edu Wed Jun 28 15:49:54 2006 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Wed, 28 Jun 2006 15:49:54 -0400 Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work Message-ID: >perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";' -e '$sequence = get_sequence($database, $id);' -------------------- WARNING --------------------- MSG: acc (P09651) does not exist --------------------------------------------------- >perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN", $format="fasta";' -e '$sequence = get_sequence($database, $id);' -------------------- WARNING --------------------- MSG: id (ROA1_HUMAN) does not exist --------------------------------------------------- But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN Same error for a couple other proteins. Works for a GenBank protein. perl 5.8.6 Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp This worked a few months ago. What's going on? -Amir Karger From cjfields at uiuc.edu Wed Jun 28 16:27:15 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 15:27:15 -0500 Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work In-Reply-To: Message-ID: <006901c69af1$412c3590$15327e82@pyrimidine> This was a recent bug due to recent changes in EBI's remote database; they changed the name of the database from 'swall' to 'uniprot'. Update to bioperl-live from CVS (or just Bio::DB::SwissProt) and that should fix it. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Amir Karger > Sent: Wednesday, June 28, 2006 2:50 PM > To: bioperl-l at lists.open-bio.org > Subject: [Bioperl-l] Bio::Perl::get_sequence for Swiss-Prot doesn't work > > >perl -MBio::Perl -e '$database="swiss", $id="P09651", $format="fasta";' > -e '$sequence = get_sequence($database, $id);' > > -------------------- WARNING --------------------- > MSG: acc (P09651) does not exist > --------------------------------------------------- > >perl -MBio::Perl -e '$database="swiss", $id="ROA1_HUMAN", > $format="fasta";' -e '$sequence = get_sequence($database, $id);' > > -------------------- WARNING --------------------- > MSG: id (ROA1_HUMAN) does not exist > --------------------------------------------------- > > But of course, it does exist: http://ca.expasy.org/uniprot/ROA1_HUMAN > Same error for a couple other proteins. > Works for a GenBank protein. > > perl 5.8.6 > Perl.pm,v 1.23.2.1 2005/10/09 15:16:18 jason Exp > > This worked a few months ago. > What's going on? > > -Amir Karger > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Jun 28 16:39:43 2006 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 28 Jun 2006 13:39:43 -0700 Subject: [Bioperl-l] FW: How to handle bugs in bioperl 1.4 on CPAN? Message-ID: <1A4207F8295607498283FE9E93B775B4019719A4@EX02.asurite.ad.asu.edu> This was supposed to go to the list... Still not used to Outlook... > The points made here, as I see them: > > 1) Commits should be made to stable releases (as well as to > the main branch in CVS) to fix bugs as long as that release is supported. I > agree with this, but someone has to volunteer, and the length of time a > release is supported also worked out. Almost would be better going to a regular > release schedule (once every 3-6 months or so) where the code is given as is > to CPAN, whether it passes tests or not. What I've seen in other projects is that stable is supported and bug patched up till the next stable release. After that support is dropped. Once a branch was tagged stable the ONLY thing that went into it was fixes for bugs based on the code already present. No new features, no refactoring of any code or modules. I'm not certain how often things like a stable patch release happened since most of the bugs were worked on long before while it was still tagged as dev. I could see, worst case a .x release to stable every 6 months to a year until the next stable came out if there were patches to it. It looks like the wiki has most of this kind of stuff documented in the previously posted link: http://www.bioperl.org/wiki/Making_a_BioPerl_release. I guess it would just need a pumpkin/monkey/whatever to step up to keep things rolling... > 2) More communication about the direction Bioperl is > heading; personally I > haven't see a problem with this as much as there is no > information about a > roadmap. That is being alleviated soon I believe, thought > people out there > need to be patient. > > 3) Volunteer. If you have something you believe needs to be > done and you > believe so fervently, then put up or shut up. Make (nice polite) > suggestions otherwise. Don't judge code or "the way things > are done" and > don't presume what kind of experience people have that you > don't know and > haven't met. End of story. > > Chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at uiuc.edu Wed Jun 28 18:14:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 17:14:09 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A2B281.7030806@sendu.me.uk> Message-ID: <007e01c69b00$2e091410$15327e82@pyrimidine> > Sendu Bala wrote: > [ from thread Bio::SearchIO - Accessing Model parameters (score, evalue, > description) ] > [ concerning hmmpfam output ] > > I have another problem (or the same one as you? I'm can't tell...) in > > that I can only get a single result, hit and hsp from my hmmpfam file! > > It is doing my head in, but I might be doing something wrong so will > > look into it further before posting a bug report. > > I was just doing something wrong, but... > > Revision 1.27 of Bio::SearchIO::hmmer did 'Change HMMER parser to report > a single HSP per Hit so domains with multiple alignments get separate > Hits (more FASTA like) since they aren't really HSPs' > > Strangely 1.25 (Bioperl 1.4) seems to behave like that already. > > In any case, this is extremely counter-intuitive, especially given that > next_domain is a synonym of next_hsp. I think either the synonym > relationship remains and hits have multiple hsps (and there is only one > hit per model), or next_domain goes off and finds the hsp that is the > next domain of the current model. But that would be incredibly broken in > the current model since it would be found in a different hit object... > > What hmmpfam does is take a database of models which can be thought of > as database sequences. Then it aligns each one against your query > sequences. A model could align in multiple locations along a query > sequence. Each one of these locations is called a domain of the model. A > user of hmmpfam is model-centric (wants to know which models are on his > query), and so you want to know all about how well the model did in one > go. So you should be able to get the results for a model ($hit = > $result->next_model), get overall info about it ($hit->score etc.), then > get more detailed information about each domain of it (while ($hsp = > $hit->next_domain) {...}). But right now you only get one domain and you > have to go searching through all your other hits to find a hit with the > same ->name() as your model of interest to get the next domain of your > model. > > In my view this is less than ideal. What do people think? Should it be > changed? The model (hit-like) table scores are retained and can be retrieved via $model->significance and the individual domain (hsp-like) evalues via $model->evalue. The reason you don't get all the individual domain evalues is that only five alignments are returned by default. You might try changing the 'A' parameter to see if you can get more alignments; that may work around the problem of missing domains for now. You'll note that the Model/Domain results returned are not based on top score but what looks like the position of the domain in the sequence (seq-t in the last table); that's what is stated in the hmmpfam docs. Anyway, I tried this loop with the reports Selvi sent and it works, but only for the ones that return alignments: my $result_count = 1; while ( my $result = $searchio->next_result() ) { print "Result $result_count : ",$result->query_name,"\n"; print "Result models: ",$result->num_hits,"\n"; while (my $model = $result->next_hit) { print "\tModel : ",$model->name,"\n"; print "\tSignif: ",$model->significance,"\n"; while (my $domain = $model->next_hsp) { print "\t\tDomain : ",$domain->name,"\n"; print "\t\tEvalue : ",$domain->evalue,"\n"; } } $result_count++; } >From the HMMER docs: "Say you have a new sequence that, according to a BLAST analysis, shows a slew of hits to receptor tyrosine kinases. Before you decide to call your sequence an RTK homologue, you suspiciously recall that RTK's are, like many proteins, composed of multiple functional domains, and these domains are often found promiscuously in proteins with a wide variety of functions. Is your sequence really an RTK? Or is it a novel sequence that just happens to have a protein kinase catalytic domain or fibronectin type III domain?" Model/domain pairs really aren't Hits/HSPs by definition, like the CVS commit from Jason states. The way Pfam is set up, you actually have your query(ies) scanned using a database of Pfam domains (HMM's, built from protein alignments for various protein families), hence the alignment in the report is not a HSP since HSPs come from pairwise sequence alignments. An HSP is a pair of sequences which, when aligned, meet or exceed a maximal cutoff. The hmmpfam report has alignments of the sequence and the consensus for the alignment the HMM is based on (not another sequence, so not an HSP). This is also the same reason you can't get alignments from Bio::Search::HSP::HMMERHSP objects since the model 'sequence' isn't a true sequence but a consensus of sequences, so it's 'inappropriate' to use that as an actual alignment. Bad Bioperl user! Bad! I think the reasoning for keeping single model-domain pairs is that you should consider each domain's location in the sequence as well as the number of times they appear, regardless of whether they belong to the same model or not. One protein could have three ATP-binding domains and another two, and they could be located in different positions on the sequence. But where they are on the sequence in relation to other domains and to each other (i.e. positional information) is just as important, maybe more so, than how many times that domain appears. Well, that and SearchIO is set up as a SAX-like parser, so I believe it processes the model-domain alignments as the file is parsed. My 2c: there should be a way to get all model-domain pairs in the "parsed for domains" table (which is like a list of HSPs). Seems the last few w/o alignments are not retained; this may be the way the parser is set up. I would try getting the handler to return just evalues and similar stuff for those and leave out sequence/alignment info, if that's possible. Not sure how this is handled with BLAST reports where there are more hits reported than alignments... Chris _____________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Wed Jun 28 18:16:38 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 17:16:38 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour Message-ID: <000001c69b00$86adcc00$15327e82@pyrimidine> Arghhhh! Made a mistake: > my $result_count = 1; > while ( my $result = $searchio->next_result() ) { > print "Result $result_count : ",$result->query_name,"\n"; > print "Result models: ",$result->num_hits,"\n"; > while (my $model = $result->next_hit) { > print "\tModel : ",$model->name,"\n"; > print "\tSignif: ",$model->significance,"\n"; > while (my $domain = $model->next_hsp) { > print "\t\tDomain : ",$domain->name,"\n"; ^^^^^^^ Should be: $model > print "\t\tEvalue : ",$domain->evalue,"\n"; > } > } > $result_count++; > } My bad! Chris From bix at sendu.me.uk Wed Jun 28 19:00:11 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 29 Jun 2006 00:00:11 +0100 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <007e01c69b00$2e091410$15327e82@pyrimidine> References: <007e01c69b00$2e091410$15327e82@pyrimidine> Message-ID: <44A309FB.2050009@sendu.me.uk> Chris Fields wrote: >> Sendu Bala wrote: [snip] >> In any case, this is extremely counter-intuitive, especially given >> that next_domain is a synonym of next_hsp. I think either the >> synonym relationship remains and hits have multiple hsps (and there >> is only one hit per model) [snip] > The model (hit-like) table scores are retained and can be retrieved > via $model->significance and the individual domain (hsp-like) evalues > via $model->evalue. I know, see my earlier post. > The reason you don't get all the individual domain evalues is that > only five alignments are returned by default. You might try changing > the 'A' parameter to see if you can get more alignments; that may > work around the problem of missing domains for now. [I'm using my own data, not the OP's] No, I have all the alignments: 'A' isn't a problem. And I can get all the domains. The problem is I have to check multiple different hits to find them all. > You'll note that the Model/Domain results returned are not based on > top score but what looks like the position of the domain in the > sequence (seq-t in the last table); that's what is stated in the > hmmpfam docs. [...] > Well, that and SearchIO is set up as a SAX-like parser, so I believe > it processes the model-domain alignments as the file is parsed. Yes, this is the problem. The parser does the obvious thing, but in my view it does not do the correct thing. > Model/domain pairs really aren't Hits/HSPs by definition, like the > CVS commit from Jason states. The way Pfam is set up, you actually > have your query(ies) scanned using a database of Pfam domains (HMM's, > built from protein alignments for various protein families), hence > the alignment in the report is not a HSP since HSPs come from > pairwise sequence alignments. An HSP is a pair of sequences which, > when aligned, meet or exceed a maximal cutoff. The hmmpfam report > has alignments of the sequence and the consensus for the alignment > the HMM is based on (not another sequence, so not an HSP). But this is just semantics. It doesn't /matter/ that its not really truly a sequence that's being aligned. The parser needs to present to the user the information in the file. As we see in the OP's example, it simply fails to do this because the parser isn't model-centric while the file it is parsing /is/. And in any case, your argument doesn't hold because even the current parser /does/ store domains in hsp objects! It just only stores one hsp per hit, repeatedly, which is nonsensical. [to avoid confusion, in the following the use of 'model' is in the programming sense, whilst 'Model' refers to the things generated by hmmer] The correct model to describe the file being parsed is one that is able provide to the user all the available results for all Models that hit a query sequence, even when there are no alignments in the file. To make this fit the SearchIO scheme, we must have one hit per Model. The hit has hsps which are the domains. This perfectly matches the information in the file. It matches something like a Blast, where you have one hit per database sequence/query sequence combo. A hit could end up with no hsps (no domains), but we may not even care. Sometimes you really do just want to know if a particular model hit at all, and with what evalue/score. The current parsing model isn't guaranteed to tell you this even when you can read it yourself in the file being parsed. You can guess at the intent of the original authors, I think, just by looking at those method synonyms. next_hit == next_model. next_hsp == next_domain. This makes perfect sense. This is the way to correctly model the information in the file. The problem is that next_model doesn't give you the next Model (because each Model has multiple hits), and next_domain doesn't give you the next domain (because each hit only has one domain). > I think the reasoning for keeping single model-domain pairs is that > you should consider each domain's location in the sequence as well as > the number of times they appear, regardless of whether they belong to > the same model or not. One protein could have three ATP-binding > domains and another two, and they could be located in different > positions on the sequence. But where they are on the sequence in > relation to other domains and to each other (i.e. positional > information) is just as important, maybe more so, than how many times > that domain appears. Well, that's for the user to decide. But the way the results are presented needs to make sense. If blast results came back with all hsps listed out in sequence position order, would you have multiple hits per database sequence each with one hsp? No, because the meaning is completely wrong. The 'hit' is the collection of alignments of a particular database sequence hitting a query sequence. The alignments are stored in a bunch of hsps. It is absurd to have more than one hit object for a database+query sequence combo, because then we have multiple hit objects duplicating the exact same information, and 'hit' no longer has any meaning - it is a collection of /some/ of the alignments? Yet this is exactly what we have with hmmpfam result parsing. From selvik at ufl.edu Wed Jun 28 16:11:56 2006 From: selvik at ufl.edu (Selvi Kadirvel) Date: Wed, 28 Jun 2006 16:11:56 -0400 Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, evalue, description) In-Reply-To: References: Message-ID: Sendu, > > What do you mean by this? What is ??A? ? > Is this an option you're supplying to hmmpfam or a bioperl module? I was referring to the '-A' option when running hmmpfam. So if I were to use '-A 5', Section 3 will have only the top scoring (first) five HSPs. > > So you can say: > $hit->name, " ", $hit->description, " ", $hit->significance, " ", > $hit->score; > > To get the information you want. > General information about the result can be had like so: > print $result->query_name, " ", $result->algorithm, " ", > $result->hmm_name, "\n"; I do use the same methods that you have suggested. Let me try to explain my problem in detail. Lets say I have a report that was generated using this "-A 5" option. I want to get the description, score, evalue of a model that *does not* have a domain in the top 5 high scoring HSPs. This information *exists* in the report in Section 1 but neither $result->next_hit or $hit->next_hsp can see it. Details of ALL domains are available through: foreach $domain ($result->each_Domain) { $domain-> [ hmmname, hmmacc, start, end, hstart, hend, evalue ] } where $result is a Bio::Tools::HMMER::Results object. But this again represents information in Section 2. It gives us domain scores and evalues (and not model scores and evalues.) I am working around this by finding the sum of scores (evalues) of all domains in a model. But there seems to be no work-around to retrieve the description. $domain->hmmacc contains only the first string of the description. -Selvi From jason at bioperl.org Wed Jun 28 22:53:25 2006 From: jason at bioperl.org (Jason Stajich) Date: Wed, 28 Jun 2006 22:53:25 -0400 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A309FB.2050009@sendu.me.uk> References: <007e01c69b00$2e091410$15327e82@pyrimidine> <44A309FB.2050009@sendu.me.uk> Message-ID: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org> I don't have any time to really debate this sadly - I definitely went back and forth on how to solve this and not many people ever spoke up about what the WANTED. So glad to hear there are opinions out there now. I think the bug fix you refer to had to do with not returning things ordered by E-value -- the creation machinery only only builds Hit objects when there are HSP objects being built. Basically the parsing is linear in terms of the file, we read "Model" (Hit) data first and store them in a hash keyed by the name of the domain, but we only >>build<< the "Hits" when seen HSPs, hence the problem when the -A option limits alignments but reports Hits that don't have individual alignments. This has to do with the order of things not syncing up and/or dealing with the -A option when there is leftover Hit data but no HSPs to populate them. We also had this problem in BLAST reports and had to work around that, but I never bothered solving it in HMMER I guess. Glad there are other people who are going to fix the problems! The one "alignment" (HSP) per hit was a workaround to the problem that Hits were being returned in the order the HSPs came in (Sequence order) -- because that is the order they were being built in -- not in the sorted order of the Hits as seen in the report. Feel free to propose an alternative implement for parser as you see fit as long as the API is preserved. you can contibute a new SearchIO plugin and HMMERSearchResultListener to deal with it - or I guess do what I also do and just run hmmer2table and deal with things in a tab-delimited format. Personally my interests lie in the actual domains so the Hit objects are superfluous in my own work so it never bothered me to have one per Hit and it flows more naturally to things like GFF, etc. You can aggregate them however you like after the fact pretty simply so I don't find this too hard to deal with, but if this a major deterrent for people I guess have at it ( I think the speed of object creation is a larger problem that I hope that someone will work on soon). I'd appreciate you including the salient points of how the report is interpreted on the wiki at some point (with 8X10 glossy pictures and circles and arrows on the back...http://en.wikipedia.org/wiki/Alice% 27s_Restaurant) so the debate can be archived too. -jason On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote: > Chris Fields wrote: >>> Sendu Bala wrote: > [snip] >>> In any case, this is extremely counter-intuitive, especially given >>> that next_domain is a synonym of next_hsp. I think either the >>> synonym relationship remains and hits have multiple hsps (and there >>> is only one hit per model) > [snip] > >> The model (hit-like) table scores are retained and can be retrieved >> via $model->significance and the individual domain (hsp-like) evalues >> via $model->evalue. > > I know, see my earlier post. > >> The reason you don't get all the individual domain evalues is that >> only five alignments are returned by default. You might try changing >> the 'A' parameter to see if you can get more alignments; that may >> work around the problem of missing domains for now. > > [I'm using my own data, not the OP's] > No, I have all the alignments: 'A' isn't a problem. And I can get all > the domains. The problem is I have to check multiple different hits to > find them all. > > >> You'll note that the Model/Domain results returned are not based on >> top score but what looks like the position of the domain in the >> sequence (seq-t in the last table); that's what is stated in the >> hmmpfam docs. > [...] >> Well, that and SearchIO is set up as a SAX-like parser, so I believe >> it processes the model-domain alignments as the file is parsed. > > Yes, this is the problem. The parser does the obvious thing, but in my > view it does not do the correct thing. > > >> Model/domain pairs really aren't Hits/HSPs by definition, like the >> CVS commit from Jason states. The way Pfam is set up, you actually >> have your query(ies) scanned using a database of Pfam domains (HMM's, >> built from protein alignments for various protein families), hence >> the alignment in the report is not a HSP since HSPs come from >> pairwise sequence alignments. An HSP is a pair of sequences which, >> when aligned, meet or exceed a maximal cutoff. The hmmpfam report >> has alignments of the sequence and the consensus for the alignment >> the HMM is based on (not another sequence, so not an HSP). > > But this is just semantics. It doesn't /matter/ that its not really > truly a sequence that's being aligned. The parser needs to present to > the user the information in the file. As we see in the OP's > example, it > simply fails to do this because the parser isn't model-centric > while the > file it is parsing /is/. > > And in any case, your argument doesn't hold because even the current > parser /does/ store domains in hsp objects! It just only stores one > hsp > per hit, repeatedly, which is nonsensical. > > [to avoid confusion, in the following the use of 'model' is in the > programming sense, whilst 'Model' refers to the things generated by > hmmer] > > The correct model to describe the file being parsed is one that is > able > provide to the user all the available results for all Models that > hit a > query sequence, even when there are no alignments in the file. To make > this fit the SearchIO scheme, we must have one hit per Model. The hit > has hsps which are the domains. This perfectly matches the information > in the file. It matches something like a Blast, where you have one hit > per database sequence/query sequence combo. > > A hit could end up with no hsps (no domains), but we may not even > care. > Sometimes you really do just want to know if a particular model hit at > all, and with what evalue/score. The current parsing model isn't > guaranteed to tell you this even when you can read it yourself in the > file being parsed. > > You can guess at the intent of the original authors, I think, just by > looking at those method synonyms. next_hit == next_model. next_hsp == > next_domain. This makes perfect sense. This is the way to correctly > model the information in the file. The problem is that next_model > doesn't give you the next Model (because each Model has multiple > hits), > and next_domain doesn't give you the next domain (because each hit > only > has one domain). > > >> I think the reasoning for keeping single model-domain pairs is that >> you should consider each domain's location in the sequence as well as >> the number of times they appear, regardless of whether they belong to >> the same model or not. One protein could have three ATP-binding >> domains and another two, and they could be located in different >> positions on the sequence. But where they are on the sequence in >> relation to other domains and to each other (i.e. positional >> information) is just as important, maybe more so, than how many times >> that domain appears. > > Well, that's for the user to decide. But the way the results are > presented needs to make sense. If blast results came back with all > hsps > listed out in sequence position order, would you have multiple hits > per > database sequence each with one hsp? No, because the meaning is > completely wrong. The 'hit' is the collection of alignments of a > particular database sequence hitting a query sequence. The alignments > are stored in a bunch of hsps. It is absurd to have more than one hit > object for a database+query sequence combo, because then we have > multiple hit objects duplicating the exact same information, and 'hit' > no longer has any meaning - it is a collection of /some/ of the > alignments? Yet this is exactly what we have with hmmpfam result > parsing. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Wed Jun 28 23:40:28 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Wed, 28 Jun 2006 22:40:28 -0500 Subject: [Bioperl-l] Bio::SearchIO - Accessing Model parameters (score, evalue, description) In-Reply-To: Message-ID: <000301c69b2d$c3fdc6a0$15327e82@pyrimidine> According to CVS, using -A0 (no alignments) is supposed to work since v. 1.5.1 and (I'm guessing here) should return HMMERHit/HMMERHSP objects with no sequences, just the values from the table. By this reasoning using -A5 should work but the first five Hit/HSP pairs will give you sequences and any remaining should give nothing, just the Sequence Model combined evalue (which you can get by $model->significance) and individual Domain (HSP-like) evalues ($domain->evalue). I don't get these either (I only get a max of 5 model/domain pairs). So, I tried a little experiment using the first single result output for this query from your combined file (nbd27e02.y1 716 69 831 ; translated), which was the first one I came across with more than five model/domain pairs, and this scripted loop: while ( my $result = $searchio->next_result() ) { print "Query: ",$result->query_name,"\n"; while (my $model = $result->next_model) { print "\tModel : ",$model->name,"\n"; print "\tSignif: ",$model->significance,"\n"; while (my $domain = $model->next_domain) { print "\t\tEvalue : ",$domain->evalue,"\n"; } } } I get this with the file containing the alignments. For anyone following, I'm using bioperl-live, perl 5.8, WinXP: Query: nbd27e02.y1 716 69 831 ; translated Model : IBB Signif: 2.6e-43 Evalue : 2.6e-43 Model : HEAT Signif: 1.2e-11 Evalue : 40 Model : IBN_N Signif: 2.1 Evalue : 2.1 Model : Arm Signif: 6e-38 Evalue : 3.5e-12 Model : HEAT Signif: 1.2e-11 Evalue : 0.0096 If I manually delete the alignments (make it like -A0 output) I get this: Query: nbd27e02.y1 716 69 831 ; translated Model : IBB Signif: 157.3 Evalue : 2.6e-43 Model : HEAT Signif: 52.1 Evalue : 40 Model : IBN_N Signif: -3.6 Evalue : 2.1 Model : Arm Signif: 139.5 Evalue : 3.5e-12 Model : HEAT Signif: 52.1 Evalue : 0.0096 Model : Arm Signif: 139.5 Evalue : 2.2e-13 Model : HEAT Signif: 52.1 Evalue : 0.0032 Model : Arm Signif: 139.5 Evalue : 0.00019 i.e. all the model/domain pairs! So I think it's safe to say that this is a bug; the last few don't get processed but should. I'll drop a bug report into Bugzilla along with the test files and script so it can be confirmed. This shouldn't be too hard to fix but it make take a few days; I'm pretty busy here until Saturday. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Selvi Kadirvel > Sent: Wednesday, June 28, 2006 3:12 PM > To: bioperl-l at lists.open-bio.org > Cc: Selvi Kadirvel > Subject: Re: [Bioperl-l] Bio::SearchIO - Accessing Model parameters > (score,evalue, description) > > Sendu, > > > > > What do you mean by this? What is ??A? ? > > Is this an option you're supplying to hmmpfam or a bioperl module? > > I was referring to the '-A' option when running hmmpfam. So if I were > to use '-A 5', Section 3 will have only the top scoring (first) five > HSPs. > > > > > So you can say: > > $hit->name, " ", $hit->description, " ", $hit->significance, " ", > > $hit->score; > > > > To get the information you want. > > General information about the result can be had like so: > > print $result->query_name, " ", $result->algorithm, " ", > > $result->hmm_name, "\n"; > > I do use the same methods that you have suggested. Let me try to > explain my problem in detail. Lets say I have a report that was > generated using this "-A 5" option. I want to get the description, > score, evalue of a model that *does not* have a domain in the top 5 > high scoring HSPs. This information *exists* in the report in Section > 1 but neither $result->next_hit or $hit->next_hsp can see it. > > Details of ALL domains are available through: > > foreach $domain ($result->each_Domain) > { > $domain-> [ hmmname, hmmacc, start, end, hstart, hend, > evalue ] > } > > where $result is a Bio::Tools::HMMER::Results object. But this again > represents information in Section 2. It gives us domain scores and > evalues (and not model scores and evalues.) > > I am working around this by finding the sum of scores (evalues) of > all domains in a model. But there seems to be no work-around to > retrieve the description. $domain->hmmacc contains only the first > string of the description. > > -Selvi > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Jun 29 01:20:10 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Jun 2006 00:20:10 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A309FB.2050009@sendu.me.uk> Message-ID: <000d01c69b3b$b17776d0$15327e82@pyrimidine> > I know, see my earlier post. ... > [I'm using my own data, not the OP's] ... Sorry, I was typing that one up over a three-hour period in between experiments, so I didn't go back and check everything before I sent it. Pretty much the entire file Selvi sent me (and the entire group, grrr) shows that the domains in the domain table are not completely parsed, and the number of reported hits correlates with the number of alignments present. In other words, only five or less hits are reported based on the alignments and the default max alignments reported per result is five. I figured out that it is a bug and plan on submitting it to Bugzilla. What you are talking about and what Selvi describes are two separate issues. I dealt with Selvi's for the moment; let's deal with yours. > > Well, that and SearchIO is set up as a SAX-like parser, so I believe ... > Yes, this is the problem. The parser does the obvious thing, but in my > view it does not do the correct thing. Yes, and that's your opinion. To tell the truth I'm quite neutral on this; I'm trying to reason along the lines the contributors for the module intended. The fact of the matter is the parser is set up to do it this way, and it was set up this way by others (not you or I); modifying it to suit one's personal wants and needs is not our job here. I don't have issues while I'm running it so I really don't see what the problem is, well, besides the reported bug I found along with Selvi's help. My view on all this before I quit for the night: I'm really don't want to get into what I consider nit-picky issues (the 'semantics' you mention; it's a simple difference in opinion and a small one at that). We can agree to disagree, whatever. The issue immediately at hand, what I consider the most important, is that Selvi has uncovered a bug with the code, as is. But I'm going to vent here a bit. It's late, I'm tired, and this whole thing irks me. It irks me a great deal. Personally, I don't think right now is the time to think about refactoring this particular module, esp. since I find it essentially works. I believe that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for instance, or refactoring SearchIO::blast etc to use hashes instead of objects to speed things up. Or creating something yourself. Or doing what you currently are doing (Bio::Map). In other words, areas where use is high, code is aging, and refactoring is more productive. I'll add that I'm not trying to dissuade you from trying to build your own variation of a SearchIO HMMER parser; by all means go ahead. The above is how I feel. You can build your own parser to do what you want; you can even base it off the current SearchIO HMMER parser and see if you can set it up to give you the results you want, using a different handler and so on. Just don't break the API or modify the current code based strictly on what your opinion of how it should work is. It was probably set up this way for a particular reason. According to the SearchIO HOWTO the intent for SearchIO was to 'genericize' parsing reports with 'similar' styles, like BLAST, FASTA, HMMER, and so on. The most prevalently parsed reports, by a long stretch, are BLAST reports, which is what the system is based on: http://www.bioperl.org/wiki/HOWTO:SearchIO#Design So the SearchIO system is based on the >assumption< that these reports can be divi'd up with the data mapped into categories (Results, Hits, HSPs), so similar objects should be able to handle them. Domain data are currently stored in HSP objects (HMMERHSP), but that's nothing more than a convenient way to store HMMER report data in my opinion; the alignment matches, strictly speaking, are not HSP's. You could rename HMMERHit HMMERModel and HMMERHsp HMMERDomain, but they would still, if they fit into SearchIO and used the current event handlers, implement HitI/HSPI by inheriting from GenericHit/GenericHSP. Ergo, any easy way you go about it here, HMMERHit is-a HitI and HMMERHsp is-a HSPI. You could probably work around it by building the 'correct' object hierarchy by setting up your own handler and SearchIO plugin, but that risks changing API. And, really, if you decide to go down that path, consider what Jason is talking about when he mentions using "under-the-hood" hashes. > A hit could end up with no hsps (no domains), but we may not even care. > Sometimes you really do just want to know if a particular model hit at > all, and with what evalue/score. The current parsing model isn't > guaranteed to tell you this even when you can read it yourself in the > file being parsed. For every model (hit) you should have a corresponding domain (HSP) or more depending on your view of how the parser works, even if the domain (HSP) is only present in the table and not in an alignment. You shouldn't have models w/o domains from your query (hits w/o hsps); that doesn't make any sense. If hmmpfam output has this then it's a serious issue, but, again, that doesn't make sense. All that information is in the tables in the hmmpfam output; you can even build objects w/o alignments present (-A0) straight from the tables. If you wanted to know whether a particular model hit at all, grab all the model objects ($result->models) and run through them to see if your expected model (Annexin, Phosphoribosyl, or whatever) is there using a map/grep block, regex, or whatever; you could autovivicate a hash or similar data structure indicating that a particular sequence has x domains of y type. Or iterate through them like you would for a BLAST report. I don't see what's difficult about this; I do it for BLAST sequences, SeqFeatures, and many other BioPerl objects all the time! Yes, it can be slow; that's an issue with object instantiation and Perl and there is no easy way around it besides refactoring the SearchIO parsers/eventhandlers to send back hashes, as Jason has suggested. > You can guess at the intent of the original authors, I think, just by > looking at those method synonyms. next_hit == next_model. next_hsp == > next_domain. This makes perfect sense. This is the way to correctly > model the information in the file. The problem is that next_model > doesn't give you the next Model (because each Model has multiple hits), > and next_domain doesn't give you the next domain (because each hit only > has one domain). .... > Well, that's for the user to decide. But the way the results are > presented needs to make sense. If blast results came back with all hsps > listed out in sequence position order, would you have multiple hits per > database sequence each with one hsp? No, because the meaning is > completely wrong. The 'hit' is the collection of alignments of a > particular database sequence hitting a query sequence. The alignments > are stored in a bunch of hsps. It is absurd to have more than one hit > object for a database+query sequence combo, because then we have > multiple hit objects duplicating the exact same information, and 'hit' > no longer has any meaning - it is a collection of /some/ of the > alignments? Yet this is exactly what we have with hmmpfam result parsing. The problem is that the module is geared to parse the output as simply as possible, so it does it by sequence order, just like the output. And, as is, it makes sense to me why Eddy and Co. set it that way, not that I completely agree with it. Hmmpfam output is designed for annotating sequences using Pfam HMM's, so the results are hard-coded to appear in sequence order, not based on score or evalue. That's the way it is; not necessarily the best way IMHO (I would have a way to sort by evalue or model myself as an option), but it's the only way that's currently available. Yes, each Model can match more than one domain on a query sequence. Again, that this is the 'correct way' to set up this parser is your opinion; if you want, design your own SearchIO parser. Like I said, I don't have a problem with using this module myself. And I'm a bit reticent to spend the energy overhaulin' this module when I could spend my time working on something else I consider more constructive (or destructive, depending on your view). And, frankly, it's not up to the user when using code they didn't create. You have to deal with it. Or code something yourself to do things the way you want. You have the power to do that; most bioperl users don't simply b/c they probably don't understand the class structure and OO nature of Bioperl. It's just a matter of where you want to spend your energy: dealing with something that interests you or fixing other's people's broken code. Chris From cjfields at uiuc.edu Thu Jun 29 01:23:03 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Jun 2006 00:23:03 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org> Message-ID: <000e01c69b3c$18d58fb0$15327e82@pyrimidine> ... > I think the bug fix you refer to had to do with not returning things > ordered by E-value -- the creation machinery only only builds Hit > objects when there are HSP objects being built. Basically the > parsing is linear in terms of the file, we read "Model" (Hit) data > first and store them in a hash keyed by the name of the domain, but > we only >>build<< the "Hits" when seen HSPs, hence the problem when > the -A option limits alignments but reports Hits that don't have > individual alignments. This has to do with the order of things not > syncing up and/or dealing with the -A option when there is leftover > Hit data but no HSPs to populate them. We also had this problem in > BLAST reports and had to work around that, but I never bothered > solving it in HMMER I guess. Glad there are other people who are > going to fix the problems! Yeah, just figured that one out. I see the two tables are parsed into two arrays, so it is feasible to have the leftover (Hit/HSP|Model/Domain) whatever converted into the proper objects like without any alignments (-A0 optional output). I plan on reporting this in Bugzilla and will work on it, but can't get to it immediately (probably not 'til Friday-Saturday at the earliest). If Sendu wants to tackle it I don't have a problem. > The one "alignment" (HSP) per hit was a workaround to the problem > that Hits were being returned in the order the HSPs came in (Sequence > order) -- because that is the order they were being built in -- not > in the sorted order of the Hits as seen in the report. The SAX method, I gather, getting in the way. > Feel free to propose an alternative implement for parser as you see > fit as long as the API is preserved. you can contibute a new > SearchIO plugin and HMMERSearchResultListener to deal with it - or I > guess do what I also do and just run hmmer2table and deal with things > in a tab-delimited format. Or set it up as hashes, which you have mentioned before for BLAST. > Personally my interests lie in the actual domains so the Hit objects > are superfluous in my own work so it never bothered me to have one > per Hit and it flows more naturally to things like GFF, etc. You can > aggregate them however you like after the fact pretty simply so I > don't find this too hard to deal with, but if this a major deterrent > for people I guess have at it ( I think the speed of object creation > is a larger problem that I hope that someone will work on soon). Agreed, though now it's finding the time.... Chris > I'd appreciate you including the salient points of how the report is > interpreted on the wiki at some point (with 8X10 glossy pictures and > circles and arrows on the back...http://en.wikipedia.org/wiki/Alice% > 27s_Restaurant) so the debate can be archived too. > > -jason > > On Jun 28, 2006, at 7:00 PM, Sendu Bala wrote: > > > Chris Fields wrote: > >>> Sendu Bala wrote: > > [snip] > >>> In any case, this is extremely counter-intuitive, especially given > >>> that next_domain is a synonym of next_hsp. I think either the > >>> synonym relationship remains and hits have multiple hsps (and there > >>> is only one hit per model) > > [snip] > > > >> The model (hit-like) table scores are retained and can be retrieved > >> via $model->significance and the individual domain (hsp-like) evalues > >> via $model->evalue. > > > > I know, see my earlier post. > > > >> The reason you don't get all the individual domain evalues is that > >> only five alignments are returned by default. You might try changing > >> the 'A' parameter to see if you can get more alignments; that may > >> work around the problem of missing domains for now. > > > > [I'm using my own data, not the OP's] > > No, I have all the alignments: 'A' isn't a problem. And I can get all > > the domains. The problem is I have to check multiple different hits to > > find them all. > > > > > >> You'll note that the Model/Domain results returned are not based on > >> top score but what looks like the position of the domain in the > >> sequence (seq-t in the last table); that's what is stated in the > >> hmmpfam docs. > > [...] > >> Well, that and SearchIO is set up as a SAX-like parser, so I believe > >> it processes the model-domain alignments as the file is parsed. > > > > Yes, this is the problem. The parser does the obvious thing, but in my > > view it does not do the correct thing. > > > > > >> Model/domain pairs really aren't Hits/HSPs by definition, like the > >> CVS commit from Jason states. The way Pfam is set up, you actually > >> have your query(ies) scanned using a database of Pfam domains (HMM's, > >> built from protein alignments for various protein families), hence > >> the alignment in the report is not a HSP since HSPs come from > >> pairwise sequence alignments. An HSP is a pair of sequences which, > >> when aligned, meet or exceed a maximal cutoff. The hmmpfam report > >> has alignments of the sequence and the consensus for the alignment > >> the HMM is based on (not another sequence, so not an HSP). > > > > But this is just semantics. It doesn't /matter/ that its not really > > truly a sequence that's being aligned. The parser needs to present to > > the user the information in the file. As we see in the OP's > > example, it > > simply fails to do this because the parser isn't model-centric > > while the > > file it is parsing /is/. > > > > And in any case, your argument doesn't hold because even the current > > parser /does/ store domains in hsp objects! It just only stores one > > hsp > > per hit, repeatedly, which is nonsensical. > > > > [to avoid confusion, in the following the use of 'model' is in the > > programming sense, whilst 'Model' refers to the things generated by > > hmmer] > > > > The correct model to describe the file being parsed is one that is > > able > > provide to the user all the available results for all Models that > > hit a > > query sequence, even when there are no alignments in the file. To make > > this fit the SearchIO scheme, we must have one hit per Model. The hit > > has hsps which are the domains. This perfectly matches the information > > in the file. It matches something like a Blast, where you have one hit > > per database sequence/query sequence combo. > > > > A hit could end up with no hsps (no domains), but we may not even > > care. > > Sometimes you really do just want to know if a particular model hit at > > all, and with what evalue/score. The current parsing model isn't > > guaranteed to tell you this even when you can read it yourself in the > > file being parsed. > > > > You can guess at the intent of the original authors, I think, just by > > looking at those method synonyms. next_hit == next_model. next_hsp == > > next_domain. This makes perfect sense. This is the way to correctly > > model the information in the file. The problem is that next_model > > doesn't give you the next Model (because each Model has multiple > > hits), > > and next_domain doesn't give you the next domain (because each hit > > only > > has one domain). > > > > > >> I think the reasoning for keeping single model-domain pairs is that > >> you should consider each domain's location in the sequence as well as > >> the number of times they appear, regardless of whether they belong to > >> the same model or not. One protein could have three ATP-binding > >> domains and another two, and they could be located in different > >> positions on the sequence. But where they are on the sequence in > >> relation to other domains and to each other (i.e. positional > >> information) is just as important, maybe more so, than how many times > >> that domain appears. > > > > Well, that's for the user to decide. But the way the results are > > presented needs to make sense. If blast results came back with all > > hsps > > listed out in sequence position order, would you have multiple hits > > per > > database sequence each with one hsp? No, because the meaning is > > completely wrong. The 'hit' is the collection of alignments of a > > particular database sequence hitting a query sequence. The alignments > > are stored in a bunch of hsps. It is absurd to have more than one hit > > object for a database+query sequence combo, because then we have > > multiple hit objects duplicating the exact same information, and 'hit' > > no longer has any meaning - it is a collection of /some/ of the > > alignments? Yet this is exactly what we have with hmmpfam result > > parsing. > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Thu Jun 29 03:02:49 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 29 Jun 2006 08:02:49 +0100 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <000d01c69b3b$b17776d0$15327e82@pyrimidine> References: <000d01c69b3b$b17776d0$15327e82@pyrimidine> Message-ID: <44A37B19.7030908@sendu.me.uk> Chris Fields wrote: > > Personally, I don't think right now is the time to think about refactoring > this particular module, esp. since I find it essentially works. I believe > that energy is better spent elsewhere, such as SeqIO::genbank/swiss/embl for > instance, or refactoring SearchIO::blast etc to use hashes instead of > objects to speed things up. Or creating something yourself. Or doing what > you currently are doing (Bio::Map). In other words, areas where use is > high, code is aging, and refactoring is more productive. Hmmer parsing happens to be important to me, in fact vital for my work. I've been using my own parser up till now, so didn't know what the Bioperl one was like. I'd like to use Bioperl for more things, preferably everything. > I'll add that I'm not trying to dissuade you from trying to build your own > variation of a SearchIO HMMER parser; by all means go ahead. The above is > how I feel. You can build your own parser to do what you want; you can even > base it off the current SearchIO HMMER parser and see if you can set it up > to give you the results you want, using a different handler and so on. Just > don't break the API or modify the current code based strictly on what your > opinion of how it should work is. It was probably set up this way for a > particular reason. Well, I don't like the idea of there being multiple SearchIO parsers for the same thing. [...] > And, frankly, it's not up to the user when using code they didn't create. > You have to deal with it. Or code something yourself to do things the way > you want. You have the power to do that; most bioperl users don't simply > b/c they probably don't understand the class structure and OO nature of > Bioperl. It's just a matter of where you want to spend your energy: dealing > with something that interests you or fixing other's people's broken code. My original question was essentially: does doing it my way make sense? And implicitly: would doing it my way be of any harm? Ie. can I go ahead and change how the parser reports results and groups them together? I don't think it will involve an API change, but the results it generates will obviously be very different. From bix at sendu.me.uk Thu Jun 29 03:54:50 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 29 Jun 2006 08:54:50 +0100 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org> References: <007e01c69b00$2e091410$15327e82@pyrimidine> <44A309FB.2050009@sendu.me.uk> <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org> Message-ID: <44A3874A.9040803@sendu.me.uk> Jason Stajich wrote: > > Feel free to propose an alternative implement for parser as you see > fit as long as the API is preserved. you can contibute a new > SearchIO plugin and HMMERSearchResultListener to deal with it - or [snip] What's the thinking behind the way SearchIOs work? Is it necessary or desirable to always do it with events and listeners? Or is it enough to simply return a ResultI regardless of how you made it? From cjfields at uiuc.edu Thu Jun 29 09:27:00 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Jun 2006 08:27:00 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A37B19.7030908@sendu.me.uk> References: <000d01c69b3b$b17776d0$15327e82@pyrimidine> <44A37B19.7030908@sendu.me.uk> Message-ID: On Jun 29, 2006, at 2:02 AM, Sendu Bala wrote: > Chris Fields wrote: >> >> Personally, I don't think right now is the time to think about >> refactoring >> this particular module, esp. since I find it essentially works. I >> believe >> that energy is better spent elsewhere, such as SeqIO::genbank/ >> swiss/embl for >> instance, or refactoring SearchIO::blast etc to use hashes instead of >> objects to speed things up. Or creating something yourself. Or >> doing what >> you currently are doing (Bio::Map). In other words, areas where >> use is >> high, code is aging, and refactoring is more productive. > > Hmmer parsing happens to be important to me, in fact vital for my > work. > I've been using my own parser up till now, so didn't know what the > Bioperl one was like. I'd like to use Bioperl for more things, > preferably everything. We're not deterring you from setting up your own parser, something both Jason and I suggested. I just don't see what the major issue is; hmmerpfam results never really contain the same number of hits per query that BLAST does (I get at the very most 30-40 and that is usually based on repeats). I believe the best place to spend this energy first and foremost is fixing the bug. >> I'll add that I'm not trying to dissuade you from trying to build >> your own >> variation of a SearchIO HMMER parser; by all means go ahead. The >> above is >> how I feel. You can build your own parser to do what you want; >> you can even >> base it off the current SearchIO HMMER parser and see if you can >> set it up >> to give you the results you want, using a different handler and so >> on. Just >> don't break the API or modify the current code based strictly on >> what your >> opinion of how it should work is. It was probably set up this way >> for a >> particular reason. > > Well, I don't like the idea of there being multiple SearchIO > parsers for > the same thing. See, here's the thing: if the community-at-large decides to use your version of the parser then, by default it will become the only HMMER SearchIO parser and we'll deprecate the old one. I just don't think this is the way I would go about it. Jason has mentioned that object instantiation is a bigger issue with parsing (speed) than anything else; why not, if you plan on doing this, set up a Handler to return hashes, or do it completely under-the-hood? Have it be the 'new, faster way to run SearchIO.' Don't rehash (pardon the bad pun) the way things were esp. when proposals are out there to improve the toolkit. > [...] >> And, frankly, it's not up to the user when using code they didn't >> create. >> You have to deal with it. Or code something yourself to do things >> the way >> you want. You have the power to do that; most bioperl users don't >> simply >> b/c they probably don't understand the class structure and OO >> nature of >> Bioperl. It's just a matter of where you want to spend your >> energy: dealing >> with something that interests you or fixing other's people's >> broken code. > > My original question was essentially: does doing it my way make sense? > And implicitly: would doing it my way be of any harm? Ie. can I go > ahead > and change how the parser reports results and groups them together? I > don't think it will involve an API change, but the results it > generates > will obviously be very different. And my point is that both ways make sense, at least to me (and it sounds like to Jason though I could be wrong). Again, create a new version of the parser based on what you want to do and accomplish. Don't just modify something the community at-large uses based on your whims. Make the changes to a new module and let the community decide. As an example, BioPerl, for the longest time, had several BLAST parsers; we directed everybody over to SearchIO and most people seem to like it; hence the others are deprecated. And changing the results returned by some could be considered changing the API or a bug. If someone using this module has an automated pipeline set up for annotation using Pfam, hmmpfam, Bioperl, and a database, and their setup expects single model/domain pairs, yeah, your changes will break that. Maybe small, inconsequential even, but it's possible (and even true; many genome annotation pipelines are set up exactly how I describe). Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From ClarkeW at AGR.GC.CA Thu Jun 29 10:31:14 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Thu, 29 Jun 2006 10:31:14 -0400 Subject: [Bioperl-l] BioPerl and quality files Message-ID: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca> Hi all, Recently I was working on a project which required some manipulation of Quality files. I may be wrong in this, but I don't believe that there is a Quality format for Bio:SeqIO. If there is, someone could point me in the right direction as I could write a much nicer script then what I currently have, if not I was wondering if anyone here has any use for such a thing. I am pretty new to developing but would be willing to give it a shot, as I feel that for all the use I get out of BioPerl with no thanks to anyone who spent time on writing something I used, I could try and contribute my limited amount. Any comments would be appreciated, and don't be afraid to tell me this is a lost cause. I realize that quality files tend to be less important than FASTA sequence files. I will give you a little information on me so that you know what to expect/what I am working with. I am a fourth year bioinformatics student, and am currently working as a summer student. I have some limited experience with writing perl modules and test scripts. Mostly I write perl to do specific jobs, that I or someone else has come up with to fill some immediate need of the company. I am interested in most things bioinformatics/computer sci/biology and am hoping to do Graduate studies when I finish my degree. Well that's enough for now, if you have any comments/suggestions I would appreciate it. Cheers, Wayne From cjfields at uiuc.edu Thu Jun 29 10:55:16 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Jun 2006 09:55:16 -0500 Subject: [Bioperl-l] BioPerl and quality files In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca> Message-ID: <001601c69b8c$08cdce70$15327e82@pyrimidine> > Recently I was working on a project which required some manipulation of > Quality files. I may be wrong in this, but I don't believe that there is > a Quality format for Bio:SeqIO. If there is, someone could point me in > the right direction as I could write a much nicer script then what I > currently have, if not I was wondering if anyone here has any use for > such a thing. I am pretty new to developing but would be willing to give > it a shot, as I feel that for all the use I get out of BioPerl with no > thanks to anyone who spent time on writing something I used, I could try > and contribute my limited amount. Any comments would be appreciated, and > don't be afraid to tell me this is a lost cause. I realize that quality > files tend to be less important than FASTA sequence files. I will give > you a little information on me so that you know what to expect/what I am > working with. Here's a list I dredged up when looking for Bio::Seq::Quality in BioPerl, which is the sequence implementation for sequences with quality data and/or trace values: Instances: 2 Module : Bio::Assembly::Contig Instances: 2 Module : Bio::Assembly::IO::ace Instances: 1 Module : Bio::Assembly::Singlet Instances: 1 Module : Bio::Index::Fastq Instances: 2 Module : Bio::Seq::Meta::Array Instances: 1 Module : Bio::Seq::MetaI Instances: 8 Module : Bio::Seq::Quality Instances: 1 Module : Bio::Seq::SeqWithQuality Instances: 6 Module : Bio::Seq::SequenceTrace Instances: 1 Module : Bio::Seq::TraceI Instances: 2 Module : Bio::SeqIO::abi Instances: 2 Module : Bio::SeqIO::ctf Instances: 2 Module : Bio::SeqIO::exp Instances: 10 Module : Bio::SeqIO::fastq Instances: 5 Module : Bio::SeqIO::phd Instances: 5 Module : Bio::SeqIO::qual Instances: 3 Module : Bio::SeqIO::raw Instances: 13 Module : Bio::SeqIO::scf Instances: 2 Module : Bio::SeqIO::ztr Does that help? > I am a fourth year bioinformatics student, and am currently working as a > summer student. I have some limited experience with writing perl modules > and test scripts. Mostly I write perl to do specific jobs, that I or > someone else has come up with to fill some immediate need of the > company. I am interested in most things bioinformatics/computer > sci/biology and am hoping to do Graduate studies when I finish my > degree. > > Well that's enough for now, if you have any comments/suggestions I would > appreciate it. Always can use an extra hand! Chris > > > Cheers, Wayne > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From ClarkeW at AGR.GC.CA Thu Jun 29 11:01:52 2006 From: ClarkeW at AGR.GC.CA (Clarke, Wayne) Date: Thu, 29 Jun 2006 11:01:52 -0400 Subject: [Bioperl-l] BioPerl and quality files Message-ID: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca> Thanks Chris, I don't know how I didn't come up with this before. Can I use Bio::SeqIO::qual as follows? my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual'); Cheers, Wayne -----Original Message----- From: Chris Fields [mailto:cjfields at uiuc.edu] Sent: Thursday, June 29, 2006 8:55 AM To: Clarke, Wayne; bioperl-l at lists.open-bio.org Subject: RE: [Bioperl-l] BioPerl and quality files > Recently I was working on a project which required some manipulation of > Quality files. I may be wrong in this, but I don't believe that there is > a Quality format for Bio:SeqIO. If there is, someone could point me in > the right direction as I could write a much nicer script then what I > currently have, if not I was wondering if anyone here has any use for > such a thing. I am pretty new to developing but would be willing to give > it a shot, as I feel that for all the use I get out of BioPerl with no > thanks to anyone who spent time on writing something I used, I could try > and contribute my limited amount. Any comments would be appreciated, and > don't be afraid to tell me this is a lost cause. I realize that quality > files tend to be less important than FASTA sequence files. I will give > you a little information on me so that you know what to expect/what I am > working with. Here's a list I dredged up when looking for Bio::Seq::Quality in BioPerl, which is the sequence implementation for sequences with quality data and/or trace values: Instances: 2 Module : Bio::Assembly::Contig Instances: 2 Module : Bio::Assembly::IO::ace Instances: 1 Module : Bio::Assembly::Singlet Instances: 1 Module : Bio::Index::Fastq Instances: 2 Module : Bio::Seq::Meta::Array Instances: 1 Module : Bio::Seq::MetaI Instances: 8 Module : Bio::Seq::Quality Instances: 1 Module : Bio::Seq::SeqWithQuality Instances: 6 Module : Bio::Seq::SequenceTrace Instances: 1 Module : Bio::Seq::TraceI Instances: 2 Module : Bio::SeqIO::abi Instances: 2 Module : Bio::SeqIO::ctf Instances: 2 Module : Bio::SeqIO::exp Instances: 10 Module : Bio::SeqIO::fastq Instances: 5 Module : Bio::SeqIO::phd Instances: 5 Module : Bio::SeqIO::qual Instances: 3 Module : Bio::SeqIO::raw Instances: 13 Module : Bio::SeqIO::scf Instances: 2 Module : Bio::SeqIO::ztr Does that help? > I am a fourth year bioinformatics student, and am currently working as a > summer student. I have some limited experience with writing perl modules > and test scripts. Mostly I write perl to do specific jobs, that I or > someone else has come up with to fill some immediate need of the > company. I am interested in most things bioinformatics/computer > sci/biology and am hoping to do Graduate studies when I finish my > degree. > > Well that's enough for now, if you have any comments/suggestions I would > appreciate it. Always can use an extra hand! Chris > > > Cheers, Wayne > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at uiuc.edu Thu Jun 29 11:21:21 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Jun 2006 10:21:21 -0500 Subject: [Bioperl-l] BioPerl and quality files In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A70@onncrxms4.agr.gc.ca> Message-ID: <002001c69b8f$ad754450$15327e82@pyrimidine> It should work that way, yes: my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual'); # the below should return a Bio::Seq::Quality object my $seq = $in->next_seq; You might want to check the other SeqIO modules as well depending on your format: ... Instances: 2 Module : Bio::SeqIO::abi Instances: 2 Module : Bio::SeqIO::ctf Instances: 2 Module : Bio::SeqIO::exp Instances: 10 Module : Bio::SeqIO::fastq Instances: 5 Module : Bio::SeqIO::phd Instances: 3 Module : Bio::SeqIO::raw Instances: 13 Module : Bio::SeqIO::scf Instances: 2 Module : Bio::SeqIO::ztr ... Chris > Thanks Chris, > > I don't know how I didn't come up with this before. Can I use > Bio::SeqIO::qual as follows? > > my $in = Bio::SeqIO->new(-file => $qual_file , -format => 'qual'); > > Cheers, Wayne ... From cjfields at uiuc.edu Thu Jun 29 11:23:20 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Jun 2006 10:23:20 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A3874A.9040803@sendu.me.uk> Message-ID: <002101c69b8f$f48bd070$15327e82@pyrimidine> Sendu, The HOWTO explains everything: http://www.bioperl.org/wiki/HOWTO:SearchIO under "Implementation." I learned this the hard way when I started working on SearchIO::blast and wondered why it had so many *_element methods. Yes, you will need an EventHandler if you implement SearchIO; the EventHandler should implement Bio::SearchIO::EventHandlerI interface. You might not need one that returns objects though (i.e. it could return hashes). And you could possibly get around the event handler somehow, though if you plan on doing that, why not just work on Bio::Tools::Hmmpfam as an alternative parser? We've had other BLAST parsers before (Bio::Tools::BPLite comes to mind); if they aren't maintained and there is a viable alternative they can be deprecated. Hence the reason I mentioned working on your own version of SearchIO::hmmer; if that module becomes most prevalently used we can deprecate the older version. The idea that a SearchIO plugin should act like a SAX parser is based on the fact that many files being parsed are quite large, so it would be nice to have everything parsed as a stream (on-the-go) as opposed to preprocessing everything into an object hierarchy (which can be very memory intensive for large files). Whether this is done in practice in all SearchIO modules is another thing; it may be based upon what particular fixes were made over time or the contributor's intentions. Chris > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Sendu Bala > Sent: Thursday, June 29, 2006 2:55 AM > To: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour > > Jason Stajich wrote: > > > > Feel free to propose an alternative implement for parser as you see > > fit as long as the API is preserved. you can contibute a new > > SearchIO plugin and HMMERSearchResultListener to deal with it - or > [snip] > > What's the thinking behind the way SearchIOs work? Is it necessary or > desirable to always do it with events and listeners? Or is it enough to > simply return a ResultI regardless of how you made it? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From roy at colibase.bham.ac.uk Thu Jun 29 11:05:54 2006 From: roy at colibase.bham.ac.uk (Roy Chaudhuri) Date: Thu, 29 Jun 2006 16:05:54 +0100 Subject: [Bioperl-l] BioPerl and quality files In-Reply-To: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca> References: <320530F83FA47047823E57F110DDEAADB15A6F@onncrxms4.agr.gc.ca> Message-ID: <44A3EC52.7030502@colibase.bham.ac.uk> Hi Wayne. I think Bio::SeqIO::qual is what you are looking for. Roy. -- Dr. Roy Chaudhuri Bioinformatics Research Fellow Division of Immunity and Infection University of Birmingham, U.K. http://xbase.bham.ac.uk From jason at bioperl.org Thu Jun 29 14:04:12 2006 From: jason at bioperl.org (Jason Stajich) Date: Thu, 29 Jun 2006 14:04:12 -0400 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A3874A.9040803@sendu.me.uk> References: <007e01c69b00$2e091410$15327e82@pyrimidine> <44A309FB.2050009@sendu.me.uk> <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org> <44A3874A.9040803@sendu.me.uk> Message-ID: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org> however you want - the idea of listeners at the time was to make it more SAX like so we could throw away events we didn't want and speed up the whole system when there was some idea of how you wanted the data filtered. That may have been too much wishful thinking and I just couldn't do it alone. On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote: > Jason Stajich wrote: >> >> Feel free to propose an alternative implement for parser as you see >> fit as long as the API is preserved. you can contibute a new >> SearchIO plugin and HMMERSearchResultListener to deal with it - or >> [snip] > > What's the thinking behind the way SearchIOs work? Is it necessary or > desirable to always do it with events and listeners? Or is it > enough to > simply return a ResultI regardless of how you made it? > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From prettyblondegirl222 at yahoo.com Thu Jun 29 14:23:56 2006 From: prettyblondegirl222 at yahoo.com (S S) Date: Thu, 29 Jun 2006 11:23:56 -0700 (PDT) Subject: [Bioperl-l] TAKE ME OFF Message-ID: <20060629182356.93810.qmail@web51305.mail.yahoo.com> --------------------------------- How low will we go? Check out Yahoo! Messenger?s low PC-to-Phone call rates. From cjfields at uiuc.edu Thu Jun 29 23:53:22 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Thu, 29 Jun 2006 22:53:22 -0500 Subject: [Bioperl-l] SearchIO::blast, was Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org> References: <007e01c69b00$2e091410$15327e82@pyrimidine> <44A309FB.2050009@sendu.me.uk> <3C40FA86-86A8-4627-9F49-072627D552E1@bioperl.org> <44A3874A.9040803@sendu.me.uk> <166CA2FE-DE49-43D9-A628-2155A30AC653@bioperl.org> Message-ID: <7511BE75-3A87-4E78-BFEA-2B38210BAD85@uiuc.edu> If we can work around the listener/handler that'll definitely speed things up. I was thinking about tackling the SearchIO::blast parser next, refactoring it to use hashes as a separate plugin module; if I don't need the handler for that then it'll speed things up a bit. Chris On Jun 29, 2006, at 1:04 PM, Jason Stajich wrote: > however you want - the idea of listeners at the time was to make it > more SAX like so we could throw away events we didn't want and speed > up the whole system when there was some idea of how you wanted the > data filtered. That may have been too much wishful thinking and I > just couldn't do it alone. > > > On Jun 29, 2006, at 3:54 AM, Sendu Bala wrote: > >> Jason Stajich wrote: >>> >>> Feel free to propose an alternative implement for parser as you see >>> fit as long as the API is preserved. you can contibute a new >>> SearchIO plugin and HMMERSearchResultListener to deal with it - or >>> [snip] >> >> What's the thinking behind the way SearchIOs work? Is it necessary or >> desirable to always do it with events and listeners? Or is it >> enough to >> simply return a ResultI regardless of how you made it? >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bernd.web at gmail.com Fri Jun 30 08:45:15 2006 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 30 Jun 2006 14:45:15 +0200 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A37B19.7030908@sendu.me.uk> References: <000d01c69b3b$b17776d0$15327e82@pyrimidine> <44A37B19.7030908@sendu.me.uk> Message-ID: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com> Hi, >My original question was essentially: does doing it my way make sense? With respect to Sendu's points, I can only say that a colleague (developer) and I were surprised that the HMMer parser did not group the hits as the blast parser does, in "Hit" and "Hsp". When we realized how hmmer parsing worked we continued with to use it but used a check for multiple hits of one domain on 1 query sequence (e.g. in hmmpfam). Regards, Bernd From jason at bioperl.org Fri Jun 30 10:05:01 2006 From: jason at bioperl.org (Jason Stajich) Date: Fri, 30 Jun 2006 10:05:01 -0400 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com> References: <000d01c69b3b$b17776d0$15327e82@pyrimidine> <44A37B19.7030908@sendu.me.uk> <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com> Message-ID: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org> I understand the confusion and it was the intention of having HSPs grouped together under the same Hit initialy just like BLAST reports - but somehow in the bug-fix-cycle the way to deal with the fact that "HSPs" aren't ordered by the overall Hit table led to this design decision - the problem before was something with the ordering, but I must admit to not being able to remember what specifically was the problem t I can't really remember why I changed things to do this. Does 1.4 actually do it the way you expect? Again, more user feedback is definitely critical to make these tools useful to everyone so please don't bashful about reporting your preferences. -j On Jun 30, 2006, at 8:45 AM, Bernd Web wrote: > Hi, > >> My original question was essentially: does doing it my way make >> sense? > With respect to Sendu's points, I can only say that a colleague > (developer) and I were surprised that the HMMer parser did not group > the hits as the blast parser does, in "Hit" and "Hsp". > When we realized how hmmer parsing worked we continued with to use it > but used a check for multiple hits of one domain on 1 query sequence > (e.g. in hmmpfam). > > Regards, > Bernd > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich Duke University http://www.duke.edu/~jes12 From cjfields at uiuc.edu Fri Jun 30 11:56:09 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Jun 2006 10:56:09 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org> References: <000d01c69b3b$b17776d0$15327e82@pyrimidine> <44A37B19.7030908@sendu.me.uk> <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com> <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org> Message-ID: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu> It may have been just simpler to have it be one HSP (domain) per Hit (model) as that's how the reports are generated. My reasoning was that using the one domain per model made sense based on what you are actually trying to do, which is annotate the sequence based on the order the domain appears. Most others may not view it that way, which is fine. One can always gather the relevant HSP's, convert to seqfeatures, then sort them if order is important, I suppose. I would say, if the overall consensus is to modify it to have multiple domain hits per model (similar to BLAST) then Sendu should go ahead and make those changes then announce it on the list so no one can gripe about it later. My main concern was not changing things so dramatically that it'll break for someone, but seeing as we've had a lengthy discussion about it already they should have piped up by now! Well, that and trying to return everything as hashes as Jason suggested. From looking at SearchIO::hmmer we need to make sure that both hmmsearch and hmmpfam work the same way (looks like they have different sections) and that the reported bug about missing hits (Bug 2036) is fixed as well. Chris On Jun 30, 2006, at 9:05 AM, Jason Stajich wrote: > I understand the confusion and it was the intention of having HSPs > grouped together under the same Hit initialy just like BLAST reports > - but somehow in the bug-fix-cycle the way to deal with the fact that > "HSPs" aren't ordered by the overall Hit table led to this design > decision - the problem before was something with the ordering, but I > must admit to not being able to remember what specifically was the > problem t I can't really remember why I changed things to do this. > Does 1.4 actually do it the way you expect? > > Again, more user feedback is definitely critical to make these tools > useful to everyone so please don't bashful about reporting your > preferences. > > -j > > On Jun 30, 2006, at 8:45 AM, Bernd Web wrote: > >> Hi, >> >>> My original question was essentially: does doing it my way make >>> sense? >> With respect to Sendu's points, I can only say that a colleague >> (developer) and I were surprised that the HMMer parser did not group >> the hits as the blast parser does, in "Hit" and "Hsp". >> When we realized how hmmer parsing worked we continued with to use it >> but used a check for multiple hits of one domain on 1 query sequence >> (e.g. in hmmpfam). >> >> Regards, >> Bernd >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > Duke University > http://www.duke.edu/~jes12 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From bix at sendu.me.uk Fri Jun 30 12:14:05 2006 From: bix at sendu.me.uk (Sendu Bala) Date: Fri, 30 Jun 2006 17:14:05 +0100 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu> References: <000d01c69b3b$b17776d0$15327e82@pyrimidine> <44A37B19.7030908@sendu.me.uk> <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com> <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org> <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu> Message-ID: <44A54DCD.3050708@sendu.me.uk> Chris Fields wrote: > It may have been just simpler to have it be one HSP (domain) per Hit > (model) as that's how the reports are generated. My reasoning was that > using the one domain per model made sense based on what you are actually > trying to do, which is annotate the sequence based on the order the > domain appears. Most others may not view it that way, which is fine. > One can always gather the relevant HSP's, convert to seqfeatures, then > sort them if order is important, I suppose. > > I would say, if the overall consensus is to modify it to have multiple > domain hits per model (similar to BLAST) then Sendu should go ahead and > make those changes then announce it on the list so no one can gripe > about it later. My main concern was not changing things so dramatically > that it'll break for someone Going on your earlier suggestion, I was thinking about making SearchIO::hmmpfam instead, which would get used if you set the format to 'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I suppose I would make a SearchIO::hmmsearch as well, if necessary. [...] > that the reported bug about missing hits (Bug 2036) is fixed as well. However, having never made a SearchIO plugin before, it will be some time before I get my head around it. I'll want to make one the current HOWTO:SearchIO way before I can think about doing it a better way (hashes) as well. So I can say I'll make a move on this at some point in the future, but if someone wants to fix Bug 2036 in the mean time, they are welcome to. Again as suggested, my priority is Bio::Map right now. From rmb32 at cornell.edu Fri Jun 30 13:01:38 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 30 Jun 2006 10:01:38 -0700 Subject: [Bioperl-l] parser for GeneSeqer Message-ID: <44A558F2.2050304@cornell.edu> Hi all, I find myself needing a parser for GeneSeqer output, so I'm writing one (which I will submit for your consideration when it's working). In a nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of ESTs to genomic sequence, then using those alignments to predict where in the genomic sequence the genes are. So really what you get from this is a bunch of hierarchical features. I don't really know where I should put it in the bioperl hierarchy though. Probably FeatureIO? And what's the current fashion for objects it should emit? Bio::SeqFeature::Generic? Bio::SeqFeature::Annotated? Rob -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at uiuc.edu Fri Jun 30 13:43:56 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Jun 2006 12:43:56 -0500 Subject: [Bioperl-l] Bio::SearchIO::hmmer hsp behaviour In-Reply-To: <44A54DCD.3050708@sendu.me.uk> References: <000d01c69b3b$b17776d0$15327e82@pyrimidine> <44A37B19.7030908@sendu.me.uk> <716af09c0606300545q2c105158w87e4a603692f0357@mail.gmail.com> <54A02827-74E9-450B-8A37-7942507CD2C1@bioperl.org> <6752BF57-99DF-4619-A637-9592FFFE28DD@uiuc.edu> <44A54DCD.3050708@sendu.me.uk> Message-ID: I'll try looking at it this weekend. A suggested workaround is to either try setting -A for no alignments or setting it to a high number to retrieve all of them. It's pretty serious as the error silently dumps those domains, so for those using automated annotation pipelines would miss it unless they are also checking the raw output. You could design a SearchIO::hmmpfam parser then expand it to take in hmmsearch output at a later point, or keep them separate. I like the idea of having modules that are more specific about what they parse; seems at some point you reach serious code bloat and maintenance becomes an issue. Look at SearchIO::blast; it parses various text BLAST output very well but with some serious obfuscation. Just don't know how productive it would be to separate out the PSI-BLAST and bl2seq stuff since they are pretty close to a standard BLAST report... oh well. To Jason : good luck on your move. Drop us a line here to let us know everything went well. Chris On Jun 30, 2006, at 11:14 AM, Sendu Bala wrote: > Chris Fields wrote: >> It may have been just simpler to have it be one HSP (domain) per Hit >> (model) as that's how the reports are generated. My reasoning was >> that >> using the one domain per model made sense based on what you are >> actually >> trying to do, which is annotate the sequence based on the order the >> domain appears. Most others may not view it that way, which is fine. >> One can always gather the relevant HSP's, convert to seqfeatures, >> then >> sort them if order is important, I suppose. >> >> I would say, if the overall consensus is to modify it to have >> multiple >> domain hits per model (similar to BLAST) then Sendu should go >> ahead and >> make those changes then announce it on the list so no one can gripe >> about it later. My main concern was not changing things so >> dramatically >> that it'll break for someone > > Going on your earlier suggestion, I was thinking about making > SearchIO::hmmpfam instead, which would get used if you set the > format to > 'hmmpfam' instead of the generic 'hmmer' when making a SearchIO. I > suppose I would make a SearchIO::hmmsearch as well, if necessary. > > > [...] >> that the reported bug about missing hits (Bug 2036) is fixed as well. > > However, having never made a SearchIO plugin before, it will be some > time before I get my head around it. I'll want to make one the current > HOWTO:SearchIO way before I can think about doing it a better way > (hashes) as well. So I can say I'll make a move on this at some > point in > the future, but if someone wants to fix Bug 2036 in the mean time, > they > are welcome to. Again as suggested, my priority is Bio::Map right now. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From cjfields at uiuc.edu Fri Jun 30 13:54:23 2006 From: cjfields at uiuc.edu (Chris Fields) Date: Fri, 30 Jun 2006 12:54:23 -0500 Subject: [Bioperl-l] parser for GeneSeqer In-Reply-To: <44A558F2.2050304@cornell.edu> References: <44A558F2.2050304@cornell.edu> Message-ID: <2FB066C7-12E6-46D8-8F4A-BD096BE2A0CA@uiuc.edu> If you plan on generating seqfeatures from this output you could check out the Bio::Tools core modules for examples. There are a few there that take program output and convert them to Bio::SeqFeature::Generic objects, including Bio::Tools:RNAMotif and Bio::Tools::tRNAscanSE. If alignments are involved you might want something like Bio::SeqFeature::FeaturePair. Not sure about using the SeqFeature::Annotation or others; I thought that the some of the Annotation/Annotatable stuff might be changing soon but I may be wrong. Chris On Jun 30, 2006, at 12:01 PM, Robert Buels wrote: > Hi all, > > I find myself needing a parser for GeneSeqer output, so I'm writing > one > (which I will submit for your consideration when it's working). In a > nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of > ESTs to genomic sequence, then using those alignments to predict where > in the genomic sequence the genes are. So really what you get from > this > is a bunch of hierarchical features. > > I don't really know where I should put it in the bioperl hierarchy > though. Probably FeatureIO? > > And what's the current fashion for objects it should emit? > Bio::SeqFeature::Generic? Bio::SeqFeature::Annotated? > > Rob > > -- > Robert Buels > SGN Bioinformatics Analyst > 252A Emerson Hall, Cornell University > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l Christopher Fields Postdoctoral Researcher Lab of Dr. Robert Switzer Dept of Biochemistry University of Illinois Urbana-Champaign From rmb32 at cornell.edu Fri Jun 30 15:32:11 2006 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 30 Jun 2006 12:32:11 -0700 Subject: [Bioperl-l] Fwd: FW: parser for GeneSeqer In-Reply-To: <29201430510651801@webmail.iastate.edu> References: <29201430510651801@webmail.iastate.edu> Message-ID: <44A57C3B.8040808@cornell.edu> Aha! Isn't it amazing what gets revealed when you just get off your butt and ask on the mailing list. I'll look at that code straightaway. The concept is quite attractive to me, since GenomeThreader is the next program that I'm going to be integrating into my analysis stuff. Unfortunately, (I am under the impression that) my GeneSeqer parser is almost finished. This brings us to the next question, what about parsing the GenomeThreader XML? Would be lovely to have a Bioperl interface for that. Is there some code floating about for that too? Rob Michael E Sparks wrote: > Hi Rob, > > For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output. > You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/ > > There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into > an XML format also used by the GenomeThreader spliced alignment program, whose > schema is specified in a RELAX NG document, GenomeThreader.rng.txt. The file > 0README in the above directory will give you an overview of what tools I've made > available. Hope you find it useful! > > Regards, > Michael > > -- > Thanks, > Michael E Sparks > Graduate Assistant, Brendel Lab > 2128 Molecular Biology Building > Iowa State University > Ames, IA 50011-3260 > 1-515-294-4063 > http://www.public.iastate.edu/~mespar1/ > > > Forwarded Message: > >> To: >> From: "Shannon D Schlueter" >> Subject: FW: [Bioperl-l] parser for GeneSeqer >> Date: Fri, 30 Jun 2006 13:01:46 -0500 >> ----- >> >>> Date: Fri, 30 Jun 2006 10:01:38 -0700 >>> From: Robert Buels >>> User-Agent: Thunderbird 1.5.0.2 (X11/20060516) >>> To: bioperl-l at bioperl.org >>> Subject: [Bioperl-l] parser for GeneSeqer >>> Sender: bioperl-l-bounces at lists.open-bio.org >>> >>> Hi all, >>> >>> I find myself needing a parser for GeneSeqer output, so I'm writing one >>> (which I will submit for your consideration when it's working). In a >>> nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of >>> ESTs to genomic sequence, then using those alignments to predict where >>> in the genomic sequence the genes are. So really what you get from this >>> is a bunch of hierarchical features. >>> >>> I don't really know where I should put it in the bioperl hierarchy >>> though. Probably FeatureIO? >>> >>> And what's the current fashion for objects it should emit? >>> Bio::SeqFeature::Generic? Bio::SeqFeature::Annotated? >>> >>> Rob >>> >>> -- >>> Robert Buels >>> SGN Bioinformatics Analyst >>> 252A Emerson Hall, Cornell University >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> > > > > -- Robert Buels SGN Bioinformatics Analyst 252A Emerson Hall, Cornell University Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From mespar1 at iastate.edu Fri Jun 30 15:20:29 2006 From: mespar1 at iastate.edu (Michael E Sparks) Date: Fri, 30 Jun 2006 14:20:29 -0500 (CDT) Subject: [Bioperl-l] Fwd: FW: parser for GeneSeqer Message-ID: <29201430510651801@webmail.iastate.edu> Hi Rob, For what it's worth, I wrote a fair amount of code for parsing GeneSeqer output. You may want to have a look here: http://www.public.iastate.edu/~mespar1/gthxml/ There is a perl script--GSQ2XML.pl--to convert GeneSeqer plain text output into an XML format also used by the GenomeThreader spliced alignment program, whose schema is specified in a RELAX NG document, GenomeThreader.rng.txt. The file 0README in the above directory will give you an overview of what tools I've made available. Hope you find it useful! Regards, Michael -- Thanks, Michael E Sparks Graduate Assistant, Brendel Lab 2128 Molecular Biology Building Iowa State University Ames, IA 50011-3260 1-515-294-4063 http://www.public.iastate.edu/~mespar1/ Forwarded Message: > To: > From: "Shannon D Schlueter" > Subject: FW: [Bioperl-l] parser for GeneSeqer > Date: Fri, 30 Jun 2006 13:01:46 -0500 > ----- > >Date: Fri, 30 Jun 2006 10:01:38 -0700 > >From: Robert Buels > >User-Agent: Thunderbird 1.5.0.2 (X11/20060516) > >To: bioperl-l at bioperl.org > >Subject: [Bioperl-l] parser for GeneSeqer > >Sender: bioperl-l-bounces at lists.open-bio.org > > > >Hi all, > > > >I find myself needing a parser for GeneSeqer output, so I'm writing one > >(which I will submit for your consideration when it's working). In a > >nutshell, GeneSeqer is a (kind of old) program for aligning a bunch of > >ESTs to genomic sequence, then using those alignments to predict where > >in the genomic sequence the genes are. So really what you get from this > >is a bunch of hierarchical features. > > > >I don't really know where I should put it in the bioperl hierarchy > >though. Probably FeatureIO? > > > >And what's the current fashion for objects it should emit? > >Bio::SeqFeature::Generic? Bio::SeqFeature::Annotated? > > > >Rob > > > >-- > >Robert Buels > >SGN Bioinformatics Analyst > >252A Emerson Hall, Cornell University > >Ithaca, NY 14853 > >Tel: 503-889-8539 > >rmb32 at cornell.edu > >http://www.sgn.cornell.edu > > > > > >_______________________________________________ > >Bioperl-l mailing list > >Bioperl-l at lists.open-bio.org > >http://lists.open-bio.org/mailman/listinfo/bioperl-l >