From maj at fortinbras.us Sat Aug 1 00:35:04 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 1 Aug 2009 00:35:04 -0400 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: References: Message-ID: <99E27D08408340B9B0611751A17DF266@NewLife> Sorry, I cut off the last script. The entire thing follows: /usr/local/bin/conv-ASMake.sh : #!/usr/bin/sed -f #converting an ActiveState PERL Makefile to run under cygwin make: s/^DIRFILESEP = ^\\/DIRFILESEP = \// s/^NOOP = rem/NOOP = :/ # -or- NOOP = echo -n # byebye volume s/C:/\/cygdrive\/c/ # sed to convert directory \ to / s/\([\)0-9a-zA-Z.]\)\\\([\(0-9a-zA-Z]\)/\1\/\2/g # convert full perl s/\/usr\/bin\/perl/\/cygdrive\/c\/Perl\/bin\/perl/ # a key conversion for DOC_INSTALL action /^DESTINSTALLVENDORHTMLDIR/ a\ DECYGDESTINSTALLARCHLIB = $(subst /cygdrive/c,c:,$(DESTINSTALLARCHLIB)) # --- MakeMaker tools_other section: # let cygwin do native linux commands /^MAKE/ c\ MAKE = make /^CHMOD/ c\ CHMOD = chmod /^CP/ c\ CP = cp /^MV/ c\ MV = mv /^NOOP/ c\ NOOP = : /^RM_F/ c\ RM_F = rm -f /^RM_RF/ c\ RM_RF = rm -rf /^TEST_F[^I]/ c\ TEST_F = test -f /^TOUCH/ c\ TOUCH = touch /^TEST_S/ c\ TEST_S = test -s /^DEV_NULL/ c\ DEV_NULL = > /dev/null 2>&1 /^ECHO[^_]/ c\ ECHO = echo /^ECHO_N/ c\ ECHO_N = echo -n # override OS-specific File::Spec /^MOD_INSTALL/ c\ MOD_INSTALL = $(ABSPERLRUN) -MExtUtils::Install -e "use File::Spec::Cygwin;@File::Spec::ISA=('File::Spec::Cygwin');" -e "map { s[/cygdrive/c][] } @ARGV;install({@ARGV}, '$(VERBINST)', 0, '$(UNINST)');" -- /^FIXIN/ c\ FIXIN = $(PERLRUN) "-MExtUtils::MY" -e "MY->fixin(shift)" # remove cygwin volume prefix for doc installs /Appending installation info to/ s/DESTIN/DECYGDESTIN/ /perllocal\.pod/ s/DESTIN/DECYGDESTIN/ /NOECHO) \$(MKPATH/ s/DESTIN/DECYGDESTIN/ #end conv-ASMake.sh ----- Original Message ----- From: "Jonathan Cline" To: Cc: Sent: Friday, July 31, 2009 11:24 PM Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl >I recently mentioned working on Bio::Robotics for Tecan. Vendors > being MS-Win specific, the vendor software allows third-party software > communication through a named pipe (the literal filename is > "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific > and this pseudo-pipe is opened with sysopen() ). This is broken under > cygwin-perl due to cygwin's method of handling paths -- the sysopen > fails. However it works under ActiveState Perl and communication > through the named pipe (to the robot hardware) is OK. The standard > workaround is usually to use cygwin bash, and force the PATH to use > ActiveState perl. (Typical MS Windows incompatibility problem.) The > issue is: Perl module libraries for CPAN work under cygwin-perl > (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN > module use, or "make test", result in a bad list of incompatibility > problems. Yet ActiveState Perl is required for communicating to the > vendor application (unless there is some workaround to raw filesystem > access in cygwin-perl that I haven't found in 2 days of working this). > The stand-alone scripts I have work fine to access the named pipe > (using ActiveState Perl) since the standalone scripts have no module > INC dependencies, no CPAN module test harness, etc etc. > > This isn't specifically a Bio:: issue, though if anyone has > suggestions please email. I could try msys and see if it handles the > named-pipe-special-file better, if msys has an msys-perl distribution. > > -- > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From jncline at gmail.com Sun Aug 2 23:32:20 2009 From: jncline at gmail.com (Jonathan Cline) Date: Sun, 02 Aug 2009 22:32:20 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> Message-ID: <4A765A44.7030902@gmail.com> Smithies, Russell wrote: > I "acquired" an old Biomek 1000 that I'm thinking of modernising. It was originally controlled by a monstrously large but slow pc (IBM Value Point Model 466DX2 computer with Microsoft Windows* Version 3.1) > My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) and use software like mach3 www.machsupport.com along with G-code to control it. > I come from an engineering background so it seemed like the easy way to me :-) > > Now I just need a bit of free time to get it working... > > --Russell > > > I agree, that's probably the best way to go. It's hard to know what amount of s/w processing was done on the host PC vs. the embedded controller. If you were able to connect directly to the robot hardware with serial port(s) or whatever it's using, it would be tough to find out the comm protocol unless someone has already reverse engineered it (which is doubtful). Also from what I have seen online, attempting to run the old software under virtual machine is unpredictable due to timing differences in the serial port communication. So removal of the old electronics is probably the best bet. If it has one arm, then it's much easier. As for robots with working workstation software, it seems the annoyance factor is that while the scripting languages are powerful (for GUI scripting that is), they are still relatively low level. Bio types with a bit of CS seem to immediately turn to visual basic, labview, or even excel spreadsheets and macros, in order to provide a higher level abstraction for the workstation software. To me, it seems natural that there should be a "protocol compiler" which takes biology protocols as input, and gives robot instructions as output (google "protolexer"). The huge bottleneck of course is that everyone's robotics work tables and equipment are somewhat unique to their needs. ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## >> -----Original Message----- >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >> Sent: Thursday, 30 July 2009 2:07 p.m. >> To: bioperl-l at lists.open-bio.org >> Cc: Jonathan Cline >> Subject: [Bioperl-l] Bio::Robotics namespace discussion >> >> I am writing a module for communication with biology robotics, as >> discussed recently on #bioperl, and I invite your comments. >> >> Currently this mode talks to a Tecan genesis workstation robot ( >> http://images.google.com/images?q=tecan genesis ). Other vendors are >> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >> 'net with the exception of some visual basic and labview scripts which I >> have found. There are some computational biologists who program for >> robots via high level s/w, but these scripts are not distributed as OSS. >> >> With Tecan, there is a datapipe interface for hardware communication, as >> an added $$ option from the vendor. I haven't checked other vendors to >> see if they likewise have an open communication path for third party >> software. By allowing third-party communication, then naturally the >> next step is to create a socket client-server; especially as the robot >> vendor only support MS Win and using the local machine has typical >> Microsoft issues (like losing real time communication with the hardware >> due to GUI animation, bad operating system stability, no unix except >> cygwin, etc). >> >> >> On Namespace: >> >> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are many >> s/w modules already called 'robots' (web spider robots, chat bots, www >> automate, etc) so I chose the longer name "robotics" to differentiate >> this module as manipulating real hardware. Bio::Robotics is the >> abstraction for generic robotics and Bio::Robotics::(vendor) is the >> manufacturer-specific implementation. Robot control is made more >> complex due to the very configurable nature of the work table (placement >> of equipment, type of equipment, type of attached arm, etc). The >> abstraction has to be careful not to generalize or assume too much. In >> some cases, the Bio::Robotics modules may expand to arbitrary equipment >> such as thermocyclers, tray holders, imagers, etc - that could be a >> future roadmap plan. >> >> Here is some theoretical example usage below, subject to change. At >> this time I am deciding how much state to keep within the Perl module. >> By keeping state, some robot programming might be simplified (avoiding >> deadlock or tracking tip state). In general I am aiming for a more >> "protocol friendly" method implementation. >> >> >> To use this software with locally-connected robotics hardware: >> >> use Bio::Robotics; >> >> my $tecan = Bio::Robotics->new("Tecan") || die; >> $tecan->attach() || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack1"); >> $tecan->pipette(aspirate => "1", dispense => "1", from => "sampleTray", to >> => "DNATray"); >> ... >> >> To use this software with remote robotics hardware over the network: >> >> # On the local machine, run: >> use Bio::Robotics; >> >> my @connected_hardware = Bio::Robotics->query(); >> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >> @connected_hardware\n"; >> $tecan->attach() || die; >> $tecan->configure("my work table configuration file") || die; >> # Run the server and process commands >> while (1) { >> $error = $tecan->server(passwordplaintext => "0xd290"); >> if ($tecan->lastClientCommand() =~ /^shutdown/) { >> last; >> } >> } >> $tecan->detach(); >> exit(0); >> >> # On the remote machine (the client), run: >> use Bio::Robotics; >> >> my $server = "heavybio.dyndns.org:8080"; >> my $password = "0xd290"; >> my $tecan = Bio::Robotics->new("Tecan"); >> $tecan->connect($server, $mypassword) || die; >> $tecan->home(); >> $tecan->pipette(tips => "1", from => "rack200"); >> $tecan->pipette(aspirate => "1", dispense => "1", >> from => "sampleTray A1", to => "DNATray A2", >> volume => "45", liquid => "Buffer"); >> $tecan->pipette(drop => "1"); >> ... >> $tecan->disconnect(); >> exit(0); >> >> >> >> -- >> >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > ======================================================================= > Attention: The information contained in this message and/or attachments > from AgResearch Limited is intended only for the persons or entities > to which it is addressed and may contain confidential and/or privileged > material. Any review, retransmission, dissemination or other use of, or > taking of any action in reliance upon, this information by persons or > entities other than the intended recipients is prohibited by AgResearch > Limited. If you have received this message in error, please notify the > sender immediately. > ======================================================================= > From dan.bolser at gmail.com Tue Aug 4 08:03:00 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:03:00 +0100 Subject: [Bioperl-l] problem with t/LocalDB/SeqFeature.t when host ne localhost In-Reply-To: References: <2c8757af0907310513q24bec4b0k7bec06b09e069b07@mail.gmail.com> Message-ID: <2c8757af0908040503oe2a258dkac4311bb099dc3ac@mail.gmail.com> 2009/7/31 Chris Fields : > Dan, > > Can you file this as a BioPerl bug? ?I'm planning on driving towards > releasing 1.6.1 alpha1 soon (next few weeks) and I would like to get this > one fixed. http://bugzilla.open-bio.org/show_bug.cgi?id=2899 Dan. From dan.bolser at gmail.com Tue Aug 4 08:14:02 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 13:14:02 +0100 Subject: [Bioperl-l] Clear range from Bio::Seq::Quality? In-Reply-To: References: <2c8757af0904240824x63b6e17eh4d0271bb0bc038bf@mail.gmail.com> <2c8757af0904240920n34d8269ckb092e81eaf136c0c@mail.gmail.com> <90AD6534-0539-4E2B-BA4F-9B226CBB9F0E@illinois.edu> <2c8757af0904270131o66ca30a8j746998df895af2e0@mail.gmail.com> Message-ID: <2c8757af0908040514w198085cfgf4a1adc344095f36@mail.gmail.com> 2009/4/27 Heikki Lehvaslaiho : > Dan, > > Have a look at Bio/Seq/Quality.pm and t/Seq/Quality.t in bioperl-live. > > Test and extend, > > ? ?-Heikki Thanks for help with this. I finally got round to looking at the code (after several others had done the same). I have messed with the code a bit, and added a 'mask_below_threshold' method [1] and some tests to go with it (including some extra tests) [2]. Cheers, Dan. [1] http://bugzilla.open-bio.org/show_bug.cgi?id=2897 [2] http://bugzilla.open-bio.org/show_bug.cgi?id=2898 > 2009/4/27 Heikki Lehvaslaiho : >> Dan, >> >> I'll take your code and put it into bioperl-live rewritten the way I >> suggested and add few tests. >> >> That should get you started, >> >> ? -Heikki >> >> 2009/4/27 Dan Bolser : >>> Hi Heikki, >>> >>> Thanks very much for the advice on how to better implement the clear >>> range method within the Bio::Seq::Quality object. I can understand the >>> logic of what you have written, and it all sounds reasonable. The only >>> problem is that I am very inexperienced with working on object >>> oriented Perl (my 'one man' projects to date have never really >>> required me to think beyond scripts, and its been years since I >>> actually tried to code objects in Perl). >>> >>> To be specific, when you say, "Lets add a method that sets the >>> threshold and stores it internally as $self->_threshold", ignoring any >>> other functionality, what would that method look like? in particular, >>> how would $self->_threshold be implemented? >>> >>> I think once I see that detail, I can go ahead and try to code what >>> you suggested. >>> >>> >>> Similarly (Chris), where would I put the tests / how would they be implemented? >>> >>> >>> Thanks again for the feedback. >>> >>> All the best, >>> Dan. >>> >>> >>> >>> 2009/4/27 Heikki Lehvaslaiho : >>>> Dan, >>>> >>>> It looks like your method does two different things: >>>> >>>> 1. Returns the longest subsequence above the threshold >>>> 2. Analyses the the sequence for the number of ranges the current >>>> threshold creates. >>>> >>>> Why not separate these functions? >>>> >>>> Lets add a method that sets the threshold and stores it internally as >>>> $self->_threshold. Setting it to a new values should trigger emptying >>>> all the caches (see below.) >>>> >>>> Lets have two more public methods: >>>> >>>> 1. get_clean_range() - optional argument 'threshold' >>>> >>>> It returns the longest clean subseq. >>>> >>>> 2. count_clean_ranges() -again optional argument 'threshold' >>>> >>>> This returns the number of ranges detected. >>>> >>>> Both methods call first the public method threshold if the argument >>>> has been given and then an internal method ?_find_clean_ranges(). That >>>> method calculates all the ranges and stores them internally ?(as >>>> $self->_clean_ranges-> [...]). The number of ranges is also stored >>>> (e.g. $self->_number_of ranges).These internal values form ?the cache >>>> that needs to be emptied whenever any of the critical values of the >>>> object changes: threshold, quality or seq. Create an internal method >>>> $self->_clear_cache, that does that. >>>> >>>> Now the quality new object does not get created until you call >>>> get_clean_range() which accesses the cached values (or creates them if >>>> they are not there). >>>> >>>> This design allows you to have no extra penalty for adding more >>>> methods that act on cached values. For example, it might be sensible >>>> thing to do ?at some point to look at all the ranges that are longer >>>> than some length. Then you could write in your program: >>>> >>>> >>>> $qual->threshold(10); >>>> if ($qual->count_clean_ranges = 1) { >>>> ?my $newqual = $qual->get_clean_range() >>>> ?# do your analysis >>>> } elsif ($qual->count_clean_ranges = 0) { >>>> ? # do some reporting and logging >>>> } else { ?# more than one ranges >>>> ? my @quals = $qual->get_all_clean_ranges($min_lenght); >>>> ? # do some more work and possibly select the best one(s) >>>> } >>>> >>>> >>>> >>>> Yours, >>>> >>>> ? -Heikki >>>> >>>> 2009/4/24 Chris Fields : >>>>> You could submit this as a diff against Bio::Seq::Quality to bugzilla. ?If >>>>> possible, tests don't hurt either! >>>>> >>>>> chris >>>>> >>>>> On Apr 24, 2009, at 11:20 AM, Dan Bolser wrote: >>>>> >>>>>> Its a bit rough and ready, but it does what I need... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> =head2 get_clear_range >>>>>> >>>>>> Title ? ?: get_clear_range >>>>>> >>>>>> Title ? ?: subqual >>>>>> Usage ? ?: $subobj = $obj->get_clear_range(); >>>>>> ? ? ? ? ? $subobj = $obj->get_clear_range(20); >>>>>> Function : Get the clear range using the given quality score as a >>>>>> ? ? ? ? ? cutoff or a default value of 13. >>>>>> >>>>>> Returns ?: a new Bio::Seq::Quality object >>>>>> Args ? ? : a minimum quality value, optional, devault = 13 >>>>>> >>>>>> =cut >>>>>> >>>>>> sub get_clear_range >>>>>> { >>>>>> ? my $self = shift; >>>>>> ? my $qual = $self->qual; >>>>>> ? my $minQual = shift || 13; >>>>>> >>>>>> ? my (@ranges, $rangeFlag); >>>>>> >>>>>> ? for(my $i=0; $i<@$qual; $i++){ >>>>>> ? ? ? ?## Are we currently within a clear range or not? >>>>>> ? ? ? ?if(defined($rangeFlag)){ >>>>>> ? ? ? ? ? ?## Did we just leave the clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]<$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Log the range >>>>>> ? ? ? ? ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? ? ? ? ? ? ? ?## and reset the range flag. >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = undef; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? ? ? ?else{ >>>>>> ? ? ? ? ? ?## Did we just enter a clear range? >>>>>> ? ? ? ? ? ?if($qual->[$i]>=$minQual){ >>>>>> ? ? ? ? ? ? ? ?## Better set the range flag! >>>>>> ? ? ? ? ? ? ? ?$rangeFlag = $i; >>>>>> ? ? ? ? ? ?} >>>>>> ? ? ? ? ? ?## else nothing changes >>>>>> ? ? ? ?} >>>>>> ? } >>>>>> ? ## Did we exit the last clear range? >>>>>> ? if(defined($rangeFlag)){ >>>>>> ? ? ? ?my $i = scalar(@$qual); >>>>>> ? ? ? ?## Log the range >>>>>> ? ? ? ?push @ranges, [$rangeFlag, $i-1, $i-$rangeFlag]; >>>>>> ? } >>>>>> >>>>>> ? unless(@ranges){ >>>>>> ? ? ? ?die "There is no clear range... I don't know what to do here!\n"; >>>>>> ? } >>>>>> >>>>>> ? print "there are ", scalar(@ranges), " clear ranges\n"; >>>>>> >>>>>> ? my $sum; map {$sum += $_->[2]} @ranges; >>>>>> >>>>>> ? print "of ", scalar(@$qual), " bases, there are $sum with ". >>>>>> ? ? ? ?"quality scores above the given threshold\n"; >>>>>> >>>>>> ? for (sort {$b->[2] <=> $a->[2]} @ranges){ >>>>>> ? ? ? ?if($_->[2]/$sum < 0.5){ >>>>>> ? ? ? ? ? ?warn "not so much a clear range as a clear chunk...\n"; >>>>>> ? ? ? ?} >>>>>> ? ? ? ?print $_->[2], "\t", $_->[2]/$sum, "\n"; >>>>>> >>>>>> ? ? ? ?return Bio::Seq::QualityDB->new( -seq => $self->subseq( ?$_->[0]+1, >>>>>> $_->[1]+1), >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? -qual => $self->subqual($_->[0]+1, >>>>>> $_->[1]+1) >>>>>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ); >>>>>> ? } >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> Note, for testing I made a package called Bio/Seq/QualityDB.pm (which >>>>>> is a copy of Bio/Seq/Quality.pm that just has the above method added). >>>>>> That is why the 'new Bio::Seq::Quality object' is actually a >>>>>> Bio::Seq::QualityDB object, but other than that it should slot right >>>>>> in (apart from all the debugging output that I spit out). >>>>>> >>>>>> >>>>>> Cheers, >>>>>> Dan. >>>>>> >>>>>> >>>>>> 2009/4/24 Dan Bolser : >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> I couldn't find out how to get the 'clear range' from a >>>>>>> Bio::Seq::Quality object... Am I looking in the wrong place, or should >>>>>>> this method be a part of the Bio::Seq::Quality class? >>>>>>> >>>>>>> In the latter case I'm on my way to an implementation, but I am not >>>>>>> good at navigating the bioperl docs, so I thought I should ask before >>>>>>> I take the time to finish that off. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Dan. >>>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> >>>> >>>> -- >>>> ? ?-Heikki >>>> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >>>> cell: +27 (0)714328090 >>>> Sent from Claremont, WC, South Africa >>>> >>> >> >> >> >> -- >> ? ?-Heikki >> Heikki Lehvaslaiho - skype:heikki_lehvaslaiho >> cell: +27 (0)714328090 >> Sent from Claremont, WC, South Africa >> > > > > -- > ? ?-Heikki > Heikki Lehvaslaiho - skype:heikki_lehvaslaiho > cell: +27 (0)714328090 > Sent from Claremont, WC, South Africa > From dan.bolser at gmail.com Tue Aug 4 12:32:31 2009 From: dan.bolser at gmail.com (Dan Bolser) Date: Tue, 4 Aug 2009 17:32:31 +0100 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> Message-ID: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> 2009/7/28 shalabh sharma : > Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to find > overall percentage similarity between them. > How i can do that? Tried using blast? You can download that. Try asking in irc://irc.freenode.net/#bioinformatics Dan. > > Thanks > Shalabh > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shameer at ncbs.res.in Tue Aug 4 12:43:40 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 4 Aug 2009 22:13:40 +0530 (IST) Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> Message-ID: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Hello Shalabh, You may try ALISTAT. Available as a part of SQUID library from Prof. Sean Eddy. Make an alignment of your 100 sequences and use alignment as input of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ Best, Khader Shameer > 2009/7/28 shalabh sharma : >> Hi All, ? ? ? ? ?I have some protein sequences (around 100) i need to >> find >> overall percentage similarity between them. >> How i can do that? > > Tried using blast? > > You can download that. > > > Try asking in irc://irc.freenode.net/#bioinformatics > > Dan. > > >> >> Thanks >> Shalabh >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From shalabh.sharma7 at gmail.com Tue Aug 4 13:36:34 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Tue, 4 Aug 2009 13:36:34 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> Message-ID: <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> Hi All, thanks a lot. @Khader Shameer, ALISTAT is what i was looking for. But still it gives you the average identity, what i need exactly is the average similarity. Thanks Shalabh Sharma On Tue, Aug 4, 2009 at 12:43 PM, K. Shameer wrote: > Hello Shalabh, > > You may try ALISTAT. Available as a part of SQUID library from Prof. Sean > Eddy. Make an alignment of your 100 sequences and use alignment as input > of ALISTAT. ftp://selab.janelia.org/pub/software/squid/ > > Best, > Khader Shameer > > > 2009/7/28 shalabh sharma : > >> Hi All, I have some protein sequences (around 100) i need to > >> find > >> overall percentage similarity between them. > >> How i can do that? > > > > Tried using blast? > > > > You can download that. > > > > > > Try asking in irc://irc.freenode.net/#bioinformatics > > > > Dan. > > > > > >> > >> Thanks > >> Shalabh > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > From shalabh.sharma7 at gmail.com Wed Aug 5 09:31:21 2009 From: shalabh.sharma7 at gmail.com (shalabh sharma) Date: Wed, 5 Aug 2009 09:31:21 -0400 Subject: [Bioperl-l] Percentage Similarity In-Reply-To: <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> References: <9fcc48c70907280846q32dacfd5od52bdb152426bafd@mail.gmail.com> <2c8757af0908040932l35dd74das644f2f99cde7d011@mail.gmail.com> <53005.192.168.1.1.1249404220.squirrel@mail.ncbs.res.in> <9fcc48c70908041036p4511bdebh708edfc699077b65@mail.gmail.com> <2c8757af0908050010y76b278b2v1445b50e27c5f4d0@mail.gmail.com> Message-ID: <9fcc48c70908050631q1a080b74x12e81985b455332e@mail.gmail.com> Hi, Thanks for the reply. I used clustalW for the MSA. Also i was just wondering that what if i use smith Waterman (EMBOSS' water) and pass the same library as query sequences and reference library, then just parse it and calculate average similarity.Is this right approach? Thanks Shalabh On Wed, Aug 5, 2009 at 3:10 AM, Dan Bolser wrote: > 2009/8/4 shalabh sharma : > > Hi All, thanks a lot. > > @Khader Shameer, ALISTAT is what i was looking for. But still it gives > you > > the average identity, what i need exactly is the average similarity. > > The problem is that identity is well defined. Similarity is more > vague, and at least depends on a particular alignment scoring matrix. > How did you align your sequences? > > Dan. > > >> > Try asking in irc://irc.freenode.net/#bioinformatics > >> > > > ;-) > From michael.watson at bbsrc.ac.uk Wed Aug 5 09:50:35 2009 From: michael.watson at bbsrc.ac.uk (michael watson (IAH-C)) Date: Wed, 5 Aug 2009 14:50:35 +0100 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank Message-ID: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Hi I want to download GSS sequences using Bio::DB::GenBank. When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. I'm using bioperl 1.5.1. Any clues? Mick From rmb32 at cornell.edu Wed Aug 5 11:28:46 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Wed, 05 Aug 2009 08:28:46 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> Message-ID: <4A79A52E.7000104@cornell.edu> I think you're looking for the -db => 'nucgss' option. I'll add a better listing of this (undocumented) options to the Bio::DB::Query::GenBank docs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu michael watson (IAH-C) wrote: > Hi > > I want to download GSS sequences using Bio::DB::GenBank. > > When I specify db => 'nucleotide', it gets the 3000 or so that Entrez reports are in nucleotide, but there are another ~30000 in GSS that I want, but when I try db => 'GSS' or db => 'gss' nothing comes down. > > I'm using bioperl 1.5.1. > > Any clues? > > Mick > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hartzell at alerce.com Wed Aug 5 12:16:04 2009 From: hartzell at alerce.com (George Hartzell) Date: Wed, 5 Aug 2009 09:16:04 -0700 Subject: [Bioperl-l] Job opening at Genentech [SSF, CA]. Message-ID: <19065.45124.4999.922147@already.dhcp.gene.com> I have an opening in my group in the Bioinformatics department at Genentech [South San Francisco, CA]. At the moment (for the next year or so) our main focus is rebuilding and extending a system for collecting, processing, and disseminating information about mutations and variations (think web interfaces, relational databases, alignments, workflows/pipelines). In the future we'll pick up projects related to next-gen sequencing (Me too!!! In the future, what isn't related to next-gen?), data integration, and/or lab-specific projects. First and foremost I'm looking for someone who's sharp and who enjoys computers, biology, and technology; someone who gets excited about picking up new tools but who also has a sense of responsibility and restraint. I'm looking for someone who's familiar with several languages and tools; modern Perl complemented with C is my first choice these days, supplemented with R and (when necessary) anything from the rest of the programming language bestiary. There's a fair amount of Java flying around here too so familiarity with it and the JVM world will help. Relational databases are part of the picture: Oracle for the big stuff; SQLite, Postgresql, and MySQL play niche roles. I generally interact with them via ORM's, lately it's been Rose::DB::Object on the Perl side though I've been convinced to take another look at DBIx::Class. Most of my web apps use CGI::Application, as fastcgi's, mod_perl, or simple CGI scripts, but (as with ORM's) I may take another look at Catalyst. I'm looking for someone who's interested in building real software. We'll be putting together a set of tools and data that need to hang together and evolve for at least 4-5 years. Deploy and run won't cut it. Requirements will change, so it's important to me that we build things so they're as modular and flexible as possible. Testing, source control, and documentation matter. A strong candidate will have an understanding of basic bioinformatics concepts and the ability to pick up new biology and computer science concepts as necessary. At the junior end of the spectrum I'd expect a bachelor's degree + 3 years of experience, at the upper end would a masters + 5 years (or a PhD interested in moving towards the production side of the house). I can imagine running through one or more detail oriented interview questions that drilled down (or took of on a tangent) from the following: - What's the difference between Smith-Waterman, blast, sim4, gmap, and/or bowtie alignment algorithms or tools? Which would you use when, and why? - Why is Moose better than Class::Accessor? (yes, it's Perl centered, but it could spin out into any language [e.g. why is Java better than Perl?]). What's a MOP? Who cares? - CVS, subversion, git, mercurial. You've already picked one? Which one? Why? Why not? - XML or JSON or YAML. Pick one for moving data back and forth in an Ajax based interface. Why? Would it also work well in other contexts? - How would you store information about positional features on a genome so that you could get fast random access? How would your solution tie into a larger data context? Genentech's a great place to work: solid salaries, great benefits, Bay Area location (who could ask for more?). We're open source friendly and with the arrival Robert Gentleman (our new Director, of Bioconductor/R fame) likely to become more so. The recent Roche acquisition hasn't changed life much, it seems to mostly be a source of opportunities for those of us in Research. If you know anyone who fits the bill, have them drop me a note. Thanks! g. From hilgert at cshl.edu Wed Aug 5 16:27:28 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Wed, 5 Aug 2009 16:27:28 -0400 Subject: [Bioperl-l] Bio::SeqIO issue Message-ID: Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org From cjfields at illinois.edu Wed Aug 5 17:04:14 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:04:14 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > Is my impression correct that Bio::SeqIO just assumes that sequences > are > being submitted in FASTA format? No. See: http://www.bioperl.org/wiki/HOWTO:SeqIO SeqIO tries to guess at the format using the file extension, and if one isn't present makes use of Bio::Tools::GuessSeqFormat. It's possible that the extension is causing the problem, or that GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to guessing). In any case, it's always advisable to explicitly indicate the format when possible. Relevant lines: return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/i; ... return 'raw' if /\.(txt)$/i; > In our experience, implementing > Bio::SeqIO led to the first line of files being cut off, regardless of > whether the files were indeed fasta files or files that only contained > sequence. Files that only contain sequence are 'raw'. Ones in FASTA are 'fasta'. > Which, in the latter, led to sequence submissions that had the > first line of nucleotides removed. Has anyone tried to write a fix for > this? This sounds like a bug, but we have very little to go on beyond your description. What version of bioperl are you using, OS, etc? What does your data look like? File extension? chris > Thanks, > > Uwe > > > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From Kevin.M.Brown at asu.edu Wed Aug 5 17:03:04 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:03:04 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: Message-ID: <1A4207F8295607498283FE9E93B775B40624DA61@EX02.asurite.ad.asu.edu> SeqIO is just a base framework for reading/writing of files. If you want it to read a fasta format, then you tell it create it the object. $seqio = Bio::SeqIO->new(-format=>'fasta'); Will tell the program to use Bio::SeqIO::fasta for the object. Look at the docs for the various formats that Bio::SeqIO supports. -----Original Message----- From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Hilgert, Uwe Sent: Wednesday, August 05, 2009 1:27 PM To: bioperl-l at lists.open-bio.org Subject: [Bioperl-l] Bio::SeqIO issue Is my impression correct that Bio::SeqIO just assumes that sequences are being submitted in FASTA format? In our experience, implementing Bio::SeqIO led to the first line of files being cut off, regardless of whether the files were indeed fasta files or files that only contained sequence. Which, in the latter, led to sequence submissions that had the first line of nucleotides removed. Has anyone tried to write a fix for this? Thanks, Uwe - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe Hilgert, Ph.D. Dolan DNA Learning Center Cold Spring Harbor Laboratory V: (516) 367-5185 E: hilgert at cshl.edu F: (516) 367-5182 W: http://www.dnalc.org _______________________________________________ Bioperl-l mailing list Bioperl-l at lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Wed Aug 5 17:37:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 16:37:52 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> Message-ID: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Uwe, Please keep replies on the list. It's very possible that's the issue; IIRC the fasta parser pulls out the full sequence in chunks (based on local $/ = "\n>") and splits the header off as the first line in that chunk. You could probably try leaving the format out and letting SeqIO guess it, or passing the file into Bio::Tools::GuessSeqFormat directly, but it's probably better to go through the files and add a file extension that corresponds to the format. chris On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > Thanks, Chris. The files have no extension, but we indicate what > format > to use, like in the manual: > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > I wonder now whether this could exactly cause the problem: as we are > telling that input files are in fasta format they are being treated as > such (=remove first line) - regardless of whether they really are > fasta? > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > Uwe Hilgert, Ph.D. > Dolan DNA Learning Center > Cold Spring Harbor Laboratory > > C: (516) 857-1693 > V: (516) 367-5185 > E: hilgert at cshl.edu > F: (516) 367-5182 > W: http://www.dnalc.org > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Wednesday, August 05, 2009 5:04 PM > To: Hilgert, Uwe > Cc: bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >> Is my impression correct that Bio::SeqIO just assumes that sequences >> are >> being submitted in FASTA format? > > No. See: > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > SeqIO tries to guess at the format using the file extension, and if > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > possible that the extension is causing the problem, or that > GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to > guessing). In any case, it's always advisable to explicitly indicate > the format when possible. > > Relevant lines: > > return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > i; > ... > return 'raw' if /\.(txt)$/i; > >> In our experience, implementing >> Bio::SeqIO led to the first line of files being cut off, regardless >> of >> whether the files were indeed fasta files or files that only >> contained >> sequence. > > Files that only contain sequence are 'raw'. Ones in FASTA are > 'fasta'. > >> Which, in the latter, led to sequence submissions that had the >> first line of nucleotides removed. Has anyone tried to write a fix >> for >> this? > > This sounds like a bug, but we have very little to go on beyond your > description. What version of bioperl are you using, OS, etc? What > does your data look like? File extension? > > chris > >> Thanks, >> >> Uwe >> >> >> >> >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> >> Uwe Hilgert, Ph.D. >> >> Dolan DNA Learning Center >> >> Cold Spring Harbor Laboratory >> >> >> >> V: (516) 367-5185 >> >> E: hilgert at cshl.edu >> >> F: (516) 367-5182 >> >> W: http://www.dnalc.org >> >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From Kevin.M.Brown at asu.edu Wed Aug 5 17:45:03 2009 From: Kevin.M.Brown at asu.edu (Kevin Brown) Date: Wed, 5 Aug 2009 14:45:03 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <1A4207F8295607498283FE9E93B775B40624DA9B@EX02.asurite.ad.asu.edu> I'm not sure, but I think the module is fasta, not Fasta. So it should be -format=>'fasta', unless you're on a case-insensitive system that is forgiving the capital... Kevin Brown Center for Innovations in Medicine Biodesign Institute Arizona State University > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org > [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of > Chris Fields > Sent: Wednesday, August 05, 2009 2:38 PM > To: Hilgert, Uwe > Cc: BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and > splits the > header off as the first line in that chunk. You could probably try > leaving the format out and letting SeqIO guess it, or passing > the file > into Bio::Tools::GuessSeqFormat directly, but it's probably > better to > go through the files and add a file extension that > corresponds to the > format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > > > Thanks, Chris. The files have no extension, but we indicate what > > format > > to use, like in the manual: > > > > $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > > > > I wonder now whether this could exactly cause the problem: as we are > > telling that input files are in fasta format they are being > treated as > > such (=remove first line) - regardless of whether they really are > > fasta? > > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > > Uwe Hilgert, Ph.D. > > Dolan DNA Learning Center > > Cold Spring Harbor Laboratory > > > > C: (516) 857-1693 > > V: (516) 367-5185 > > E: hilgert at cshl.edu > > F: (516) 367-5182 > > W: http://www.dnalc.org > > > > -----Original Message----- > > From: Chris Fields [mailto:cjfields at illinois.edu] > > Sent: Wednesday, August 05, 2009 5:04 PM > > To: Hilgert, Uwe > > Cc: bioperl-l at lists.open-bio.org > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > > > >> Is my impression correct that Bio::SeqIO just assumes that > sequences > >> are > >> being submitted in FASTA format? > > > > No. See: > > > > http://www.bioperl.org/wiki/HOWTO:SeqIO > > > > SeqIO tries to guess at the format using the file extension, and if > > one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > > possible that the extension is causing the problem, or that > > GuessSeqFormat guessing wrong (it's apt to do that, as it's > forced to > > guessing). In any case, it's always advisable to > explicitly indicate > > the format when possible. > > > > Relevant lines: > > > > return 'fasta' if > /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > > i; > > ... > > return 'raw' if /\.(txt)$/i; > > > >> In our experience, implementing > >> Bio::SeqIO led to the first line of files being cut off, > regardless > >> of > >> whether the files were indeed fasta files or files that only > >> contained > >> sequence. > > > > Files that only contain sequence are 'raw'. Ones in FASTA are > > 'fasta'. > > > >> Which, in the latter, led to sequence submissions that had the > >> first line of nucleotides removed. Has anyone tried to > write a fix > >> for > >> this? > > > > This sounds like a bug, but we have very little to go on beyond your > > description. What version of bioperl are you using, OS, etc? What > > does your data look like? File extension? > > > > chris > > > >> Thanks, > >> > >> Uwe > >> > >> > >> > >> > >> > >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >> > >> Uwe Hilgert, Ph.D. > >> > >> Dolan DNA Learning Center > >> > >> Cold Spring Harbor Laboratory > >> > >> > >> > >> V: (516) 367-5185 > >> > >> E: hilgert at cshl.edu > >> > >> F: (516) 367-5182 > >> > >> W: http://www.dnalc.org > >> > >> > >> > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From hlapp at gmx.net Wed Aug 5 18:53:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Wed, 5 Aug 2009 18:53:56 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> Message-ID: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Wed Aug 5 19:12:52 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Wed, 5 Aug 2009 19:12:52 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> If these items were included in a Bugzilla report, that would be most convenient (= most likely to get looked carefully) and is the best place for us to keep track of these kinds of issues-- http://bugzilla.bioperl.org/ cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Chris Fields" Cc: "BioPerl List" Sent: Wednesday, August 05, 2009 6:53 PM Subject: Re: [Bioperl-l] Bio::SeqIO issue >I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >>> guessing). In any case, it's always advisable to explicitly indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Thu Aug 6 00:43:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 5 Aug 2009 23:43:45 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: The SeqIO::fasta parser sets: local $/ = "\n>"; then splits the resulting chunks of data (each corresponding to a full FASTA-formatted sequence) into two pieces: my ($top,$sequence) = split(/\n/,$entry,2); If there is no description line (e.g. the file is all raw sequence data) these lines would result in reading in the whole file, then split out the first line. chris On Aug 5, 2009, at 5:53 PM, Hilmar Lapp wrote: > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show > us your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the > line, or that the line endings in your data file are from a > different OS than the one you're running the script on. (Or that you > are running a very old version of BioPerl, which is entirely > possible if you installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls >> out the full sequence in chunks (based on local $/ = "\n>") and >> splits the header off as the first line in that chunk. You could >> probably try leaving the format out and letting SeqIO guess it, or >> passing the file into Bio::Tools::GuessSeqFormat directly, but it's >> probably better to go through the files and add a file extension >> that corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being >>> treated as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a >>>> fix for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 01:12:13 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 00:12:13 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <8FAB8756AD944534B49F2C4356CB6D92@NewLife> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> Message-ID: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be > most convenient (= most likely to get looked carefully) > and is the best place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From eigenrosen at gmail.com Thu Aug 6 03:12:24 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 00:12:24 -0700 Subject: [Bioperl-l] Trouble with Clustalw Message-ID: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> I'm a complete bioperl novice, trying to do Clustalw on some fasta files, and am running into trouble: ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 550. Use of uninitialized value in concatenation (.) or string at /usr/ pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 551. Can't exec "align": No such file or directory at /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 STACK: Error::throw STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/ Root/Root.pm:328 STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/perl5/ site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 STACK: TestClust:22 ----------------------------------------------------------- Here's my code: #!/usr/bin/perl -w use Bio::Perl; use Bio::AlignIO; use Bio::Tools::Run::Alignment::Clustalw; use Bio::SimpleAlign; use Bio::Seq; use strict; use warnings; my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); my @seq_array = read_all_sequences($ARGV[0],'fasta'); for (my $i = 0; $i < @seq_array; $i++){ (my $seq = $seq_array[$i]->seq()) =~ s/-//g; $seq_array[$i]->seq($seq); } write_sequence(">test",'fasta', at seq_array); my $seq_array_ref = \@seq_array; my $aln = $factory->align($seq_array_ref); my @align_array = $aln->each_seq(); write_sequence(">testfile",'fasta', at align_array); The loop is just there to take out some gaps that were placed in a blast previous to this. The write_sequence call confirms that @seq_array is a valid array of Bio:Seq objects at the time align calls it. Here's some output in "test": >A0220B0939one.1 FV584Q101DEWY9 TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >A0220B0939one.2 FV584Q101A4DG7 TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG ... Thanks, Mike From florian.mittag at uni-tuebingen.de Thu Aug 6 05:38:38 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 11:38:38 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200907151500.21947.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> Message-ID: <200908061138.38809.florian.mittag@uni-tuebingen.de> Hi! I just noticed, that we didn't solve this problem completely. On Wednesday, 15. July 2009 15:00, Florian Mittag wrote: > > Well, it is like this with version 9.5 of DB2 Express-C: > > > > SELECT NULL FROM bioentry; > > > > yields: > > SQL0206N "NULL" is not valid in the context where it is used. > > SQLSTATE=42703 SQLCODE=-206 > > > > But if I do: > > > > SELECT cast(NULL AS VARCHAR(255)) FROM bioentry; > > > > [...] > > > > It ran fine without the NULL column, but that isn't necessarily a sign of > > correctness. My problem was that (as stated above) the old version of DB2 > > requires you to cast the NULL value to a data type, which I wasn't able > > to determine from the code. With the new version, it should work, so I'll > > have to rerun my tests again and see if the problem is still there. > > You convinced me that the NULL column is supposed to be there, so I found > another workaround around line 1273 in BaseDriver.pm: > > if((! $attr) || (! $entitymap->{$tbl}) || > $dont_select_attrs->{$tbl .".". $attr}) { > #push(@attrs, "NULL"); > push(@attrs, "cast(NULL as VARCHAR(255))"); > } else { > > Since I don't know how to determine the datatype of the column that is set > to NULL, I simply chose VARCHAR and tested it. And it worked! (BTW: The > column set to NULL is named "rank" in the case below.) Although this solution works, it is not the best, because it breaks compatibility with all other database types, e.g., MySQL. Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" only when the driver is DB2? - Florian From hlapp at gmx.net Thu Aug 6 09:36:08 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:36:08 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: Why is specifying fasta format when your input is not in fasta format not a user error? I agree with the not removing newlines in raw format being a bug. -hilmar On Aug 6, 2009, at 1:12 AM, Chris Fields wrote: > Just to confirm: the following is using bioperl-live on my macbook > pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug > or a user issue (if it's the former, we can easily add an exception > indicating lack of a header). Note that 'raw' also fails for the > raw example below (doesn't appear to remove newlines). > > -c > > cjfields4:fasta cjfields$ cat raw_v_fasta.pl > #!/usr/bin/perl -w > > use strict; > use warnings; > use IO::String; > use Bio::SeqIO; > use Test::More qw(no_plan); > > my %seq; > > $seq{raw} = < MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > RAW > > $seq{fasta} = < >CATH_RAT > MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN > HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW > TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG > QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA > VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV > FASTA > > my %newdata; > for my $input (sort keys %seq) { > my $fh = IO::String->new($seq{$input}); > my $seq = Bio::SeqIO->new(-format => 'fasta', > -fh => $fh)->next_seq; > $newdata{$input} = $seq->seq; > } > is($newdata{raw}, $newdata{fasta}, 'format'); > > cjfields4:fasta cjfields$ perl raw_v_fasta.pl > not ok 1 - format > # Failed test 'format' > # at raw_v_fasta.pl line 36. > # got: > 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > # expected: > 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNHTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' > 1..1 > # Looks like you failed 1 test of 1. > > On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > >> If these items were included in a Bugzilla report, that would be >> most convenient (= most likely to get looked carefully) >> and is the best place for us to keep track of these kinds of >> issues-- http://bugzilla.bioperl.org/ >> cheers MAJ >> ----- Original Message ----- From: "Hilmar Lapp" >> To: "Chris Fields" >> Cc: "BioPerl List" >> Sent: Wednesday, August 05, 2009 6:53 PM >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> Uwe - I'd like you to go back to Chris' initial questions that >>> you haven't answered yet: "What version of bioperl are you using, >>> OS, etc? What does your data look like?" I'd add to that, can >>> you show us your full script, or a smaller code snippet that >>> reproduces the problem. >>> I suspect that either something in your script is swallowing the >>> line, or that the line endings in your data file are from a >>> different OS than the one you're running the script on. (Or that >>> you are running a very old version of BioPerl, which is entirely >>> possible if you installed through CPAN.) >>> -hilmar >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out the full sequence in chunks (based on local $/ = "\n>") and >>>> splits the header off as the first line in that chunk. You >>>> could probably try leaving the format out and letting SeqIO >>>> guess it, or passing the file into Bio::Tools::GuessSeqFormat >>>> directly, but it's probably better to go through the files and >>>> add a file extension that corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate >>>>> what format >>>>> to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated as >>>>> such (=remove first line) - regardless of whether they really >>>>> are fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences >>>>>> are >>>>>> being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>>> forced to >>>>> guessing). In any case, it's always advisable to explicitly >>>>> indicate >>>>> the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>>> $/ i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless of >>>>>> whether the files were indeed fasta files or files that only >>>>>> contained >>>>>> sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix for >>>>>> this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Thu Aug 6 09:42:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 09:42:06 -0400 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <200908061138.38809.florian.mittag@uni-tuebingen.de> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200907061808.18651.florian.mittag@uni-tuebingen.de> <200907151500.21947.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> Message-ID: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > only when the driver is DB2? Not yet, but that's the solution I had in mind, i.e., introducing a method in the Bio::DB::DBI::* (driver-specific) classes that returns whatever NULL as a SELECT field should be represented as. What will be very hard or nearly impossible to do is to cast to the actual type of the column, so if simply using VARCHAR(255) does the trick for DB2 that'd be great. BTW you did check that simply aliasing the column does not fix the problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will throw an error, right? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From florian.mittag at uni-tuebingen.de Thu Aug 6 10:12:21 2009 From: florian.mittag at uni-tuebingen.de (Florian Mittag) Date: Thu, 6 Aug 2009 16:12:21 +0200 Subject: [Bioperl-l] DB2 driver for BioPerl In-Reply-To: <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> References: <200907021128.21239.florian.mittag@uni-tuebingen.de> <200908061138.38809.florian.mittag@uni-tuebingen.de> <0727DC90-B764-4CBE-B5A4-844941F1A3B4@gmx.net> Message-ID: <200908061612.21852.florian.mittag@uni-tuebingen.de> On Thursday, 6. August 2009 15:42, Hilmar Lapp wrote: > On Aug 6, 2009, at 5:38 AM, Florian Mittag wrote: > > Is there a way to change the "NULL" to "cast(NULL as VARCHAR(255))" > > only when the driver is DB2? > > Not yet, but that's the solution I had in mind, i.e., introducing a > method in the Bio::DB::DBI::* (driver-specific) classes that returns > whatever NULL as a SELECT field should be represented as. Sounds like a good idea! > What will be > very hard or nearly impossible to do is to cast to the actual type of > the column, so if simply using VARCHAR(255) does the trick for DB2 > that'd be great. Surprisingly, it does. At least, I haven't noticed any problems if the target data type is for example an integer. With all the trouble I have with DB2, I didn't expect this. > BTW you did check that simply aliasing the column does not fix the > problem for DB2, right? I.e., "SELECT NULL AS col1 FROM bioentry" will > throw an error, right? Yepp: SELECT term.term_id, term.identifier, term.name, term.definition, term.is_obsolete, NULL AS col1, term.ontology_id FROM term WHERE identifier = ? [IBM][CLI Driver][DB2/LINUX] SQL0418N A statement contains a use of an untyped parameter marker or a null value that is not valid. - Florian From hilgert at cshl.edu Thu Aug 6 11:01:05 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:01:05 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: I'm not sure what version we have. Cornel may have installed it a while ago from CVS: Module id = Bio::Root::Build CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm INST_VERSION 1.006900 cpan> m Bio::Root::Version Module id = Bio::Root::Version CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm INST_VERSION 1.006900 cpan> m Bio::SeqIO Module id = Bio::SeqIO CPAN_USERID CJFIELDS (Christopher Fields ) CPAN_VERSION 1.006000 INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm INST_VERSION undef Cornel still has the checked-out "bioperl-live" directory and the last changes are from March this year. As per why he used "Fasta" instead of 'fasta" as the format parameter in Bio::SeqIO, it's because that what it says in the modules manual. He now tried 'fasta' instead and see no changes in behavior. Omitting the format parameter altogether, fasta-formatted sequence continues to be treated correctly, the first line being removed. However, raw sequence is being treated differently in that the first line is not being removed any more. Instead, the program returns the first line only. Which, in the example I am going to forward in my next message, will return 60 amino acids out of raw sequence of 300 aa. Can't win with raw sequence... The files may be created on different platforms, we didn't notice any difference between using files created on Windows or Linux. Thanks Uwe -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Wednesday, August 05, 2009 6:54 PM To: Chris Fields Cc: Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue I don't think that can be the problem. If anything, providing the format ought to be better in terms of result than not providing it? Uwe - I'd like you to go back to Chris' initial questions that you haven't answered yet: "What version of bioperl are you using, OS, etc? What does your data look like?" I'd add to that, can you show us your full script, or a smaller code snippet that reproduces the problem. I suspect that either something in your script is swallowing the line, or that the line endings in your data file are from a different OS than the one you're running the script on. (Or that you are running a very old version of BioPerl, which is entirely possible if you installed through CPAN.) -hilmar On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > Uwe, > > Please keep replies on the list. > > It's very possible that's the issue; IIRC the fasta parser pulls out > the full sequence in chunks (based on local $/ = "\n>") and splits > the header off as the first line in that chunk. You could probably > try leaving the format out and letting SeqIO guess it, or passing > the file into Bio::Tools::GuessSeqFormat directly, but it's probably > better to go through the files and add a file extension that > corresponds to the format. > > chris > > On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >> Thanks, Chris. The files have no extension, but we indicate what >> format >> to use, like in the manual: >> >> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >> >> I wonder now whether this could exactly cause the problem: as we are >> telling that input files are in fasta format they are being treated >> as >> such (=remove first line) - regardless of whether they really are >> fasta? >> >> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >> Uwe Hilgert, Ph.D. >> Dolan DNA Learning Center >> Cold Spring Harbor Laboratory >> >> C: (516) 857-1693 >> V: (516) 367-5185 >> E: hilgert at cshl.edu >> F: (516) 367-5182 >> W: http://www.dnalc.org >> >> -----Original Message----- >> From: Chris Fields [mailto:cjfields at illinois.edu] >> Sent: Wednesday, August 05, 2009 5:04 PM >> To: Hilgert, Uwe >> Cc: bioperl-l at lists.open-bio.org >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >> >>> Is my impression correct that Bio::SeqIO just assumes that sequences >>> are >>> being submitted in FASTA format? >> >> No. See: >> >> http://www.bioperl.org/wiki/HOWTO:SeqIO >> >> SeqIO tries to guess at the format using the file extension, and if >> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >> possible that the extension is causing the problem, or that >> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced to >> guessing). In any case, it's always advisable to explicitly indicate >> the format when possible. >> >> Relevant lines: >> >> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >> i; >> ... >> return 'raw' if /\.(txt)$/i; >> >>> In our experience, implementing >>> Bio::SeqIO led to the first line of files being cut off, >>> regardless of >>> whether the files were indeed fasta files or files that only >>> contained >>> sequence. >> >> Files that only contain sequence are 'raw'. Ones in FASTA are >> 'fasta'. >> >>> Which, in the latter, led to sequence submissions that had the >>> first line of nucleotides removed. Has anyone tried to write a fix >>> for >>> this? >> >> This sounds like a bug, but we have very little to go on beyond your >> description. What version of bioperl are you using, OS, etc? What >> does your data look like? File extension? >> >> chris >> >>> Thanks, >>> >>> Uwe >>> >>> >>> >>> >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> >>> Uwe Hilgert, Ph.D. >>> >>> Dolan DNA Learning Center >>> >>> Cold Spring Harbor Laboratory >>> >>> >>> >>> V: (516) 367-5185 >>> >>> E: hilgert at cshl.edu >>> >>> F: (516) 367-5182 >>> >>> W: http://www.dnalc.org >>> >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hilgert at cshl.edu Thu Aug 6 11:03:53 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 11:03:53 -0400 Subject: [Bioperl-l] FW: Bio::SeqIO issue Message-ID: If you don't specify any format only the first line gets returned: not ok 1 - format # Failed test 'format' # at test/test_fasta.pl line 35. # got: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. -----Original Message----- From: Hilgert, Uwe Sent: Thursday, August 06, 2009 9:12 AM To: Ghiban, Cornel Subject: FW: [Bioperl-l] Bio::SeqIO issue -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 1:12 AM To: Mark A. Jensen Cc: Hilgert, Uwe; BioPerl List; Hilmar Lapp Subject: Re: [Bioperl-l] Bio::SeqIO issue Just to confirm: the following is using bioperl-live on my macbook pro (perl 5.10.0, 64bit). We need to decide if this is a legit bug or a user issue (if it's the former, we can easily add an exception indicating lack of a header). Note that 'raw' also fails for the raw example below (doesn't appear to remove newlines). -c cjfields4:fasta cjfields$ cat raw_v_fasta.pl #!/usr/bin/perl -w use strict; use warnings; use IO::String; use Bio::SeqIO; use Test::More qw(no_plan); my %seq; $seq{raw} = <CATH_RAT MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRN HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCW TFSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNG QCKFNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHA VLAVGYGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV FASTA my %newdata; for my $input (sort keys %seq) { my $fh = IO::String->new($seq{$input}); my $seq = Bio::SeqIO->new(-format => 'fasta', -fh => $fh)->next_seq; $newdata{$input} = $seq->seq; } is($newdata{raw}, $newdata{fasta}, 'format'); cjfields4:fasta cjfields$ perl raw_v_fasta.pl not ok 1 - format # Failed test 'format' # at raw_v_fasta.pl line 36. # got: 'HTFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWT FSTTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCK FNPEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVG YGEQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' # expected: 'MWTALPLLCAGAWLLSAGATAELTVNAIEKFHFTSWMKQHQKTYSSREYSHRLQVFANNWRKIQAHNQRNH TFKMGLNQFSDMSFAEIKHKYLWSEPQNCSATKSNYLRGTGPYPSSMDWRKKGNVVSPVKNQGACGSCWTFS TTGALESAVAIASGKMMTLAEQQLVDCAQNFNNHGCQGGLPSQAFEYILYNKGIMGEDSYPYIGKNGQCKFN PEKAVAFVKNVVNITLNDEAAMVEAVALYNPVSFAFEVTEDFMMYKSGVYSSNSCHKTPDKVNHAVLAVGYG EQNGLLYWIVKNSWGSNWGNNGYFLIERGKNMCGLAACASYPIPQV' 1..1 # Looks like you failed 1 test of 1. On Aug 5, 2009, at 6:12 PM, Mark A. Jensen wrote: > If these items were included in a Bugzilla report, that would be most > convenient (= most likely to get looked carefully) and is the best > place for us to keep track of these kinds of > issues-- http://bugzilla.bioperl.org/ > cheers MAJ > ----- Original Message ----- From: "Hilmar Lapp" > To: "Chris Fields" > Cc: "BioPerl List" > Sent: Wednesday, August 05, 2009 6:53 PM > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? What does your data look like?" I'd add to that, can you show >> us your full script, or a smaller code snippet that reproduces the >> problem. >> I suspect that either something in your script is swallowing the >> line, or that the line endings in your data file are from a >> different OS than the one you're running the script on. (Or that >> you are running a very old version of BioPerl, which is entirely >> possible if you installed through CPAN.) >> -hilmar >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls >>> out the full sequence in chunks (based on local $/ = "\n>") and >>> splits the header off as the first line in that chunk. You could >>> probably try leaving the format out and letting SeqIO guess it, >>> or passing the file into Bio::Tools::GuessSeqFormat directly, but >>> it's probably better to go through the files and add a file >>> extension that corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format >>>> to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being >>>> treated as >>>> such (=remove first line) - regardless of whether they really >>>> are fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> Uwe Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences >>>>> are >>>>> being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's >>>> forced to >>>> guessing). In any case, it's always advisable to explicitly >>>> indicate >>>> the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa) >>>> $/ i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless of >>>>> whether the files were indeed fasta files or files that only >>>>> contained >>>>> sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a >>>>> fix for >>>>> this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From hlapp at gmx.net Thu Aug 6 11:18:06 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 11:18:06 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while > ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the > format parameter altogether, fasta-formatted sequence continues to be > treated correctly, the first line being removed. However, raw sequence > is being treated differently in that the first line is not being > removed > any more. Instead, the program returns the first line only. Which, in > the example I am going to forward in my next message, will return 60 > amino acids out of raw sequence of 300 aa. Can't win with raw > sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, > etc? What does your data look like?" I'd add to that, can you show us > your full script, or a smaller code snippet that reproduces the > problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing >> the file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format >>> to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as >>> such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>> Uwe Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences >>>> are >>>> being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to >>> guessing). In any case, it's always advisable to explicitly >>> indicate >>> the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, >>>> regardless of >>>> whether the files were indeed fasta files or files that only >>>> contained >>>> sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for >>>> this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From bosborne11 at verizon.net Thu Aug 6 11:20:49 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 11:20:49 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> Message-ID: <2F73C3DC-D943-4EC3-834A-EA2984FDDB5D@verizon.net> Uwe et al, Yes, this argument works irrespective of case: The format name is case-insensitive: 'FASTA', 'Fasta' and 'fasta' are all valid. From Bio::SeqIO. Brian O. On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > As per why he used "Fasta" instead of 'fasta" as the format > parameter in > Bio::SeqIO, it's because that what it says in the modules manual. He > now > tried 'fasta' instead and see no changes in behavior. Omitting the From cjfields at illinois.edu Thu Aug 6 12:30:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:30:01 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> Message-ID: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> On Aug 6, 2009, at 8:36 AM, Hilmar Lapp wrote: > Why is specifying fasta format when your input is not in fast format > not a user error? Agreed. My point is should we worry about adding an exception (which may be a little more user-friendly). Right now the bad stuff happens silently. > I agree with the not removing newlines in raw format being a bug. > > -hilmar Acc. to the SeqIO::raw docs, this is a little trickier. The documented behavior explicitly indicates that each line (sans non- whitespace) is assumed to be a separate sequence, so changing that behavior breaks API. I suppose we can have $/ set locally to a cached $/ default value or undef: # assumes entire file is read in my $io = Bio::SeqIO->new(-format => 'raw', -gulp => 1); chris From hlapp at gmx.net Thu Aug 6 12:42:00 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Thu, 6 Aug 2009 12:42:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu><5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> Message-ID: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> On Aug 6, 2009, at 12:30 PM, Chris Fields wrote: > Agreed. My point is should we worry about adding an exception > (which may be a little more user-friendly). Right now the bad stuff > happens silently. Great point. We don't want silent failures, do we. > >> I agree with the not removing newlines in raw format being a bug. >> >> -hilmar > > Acc. to the SeqIO::raw docs, this is a little trickier. The > documented behavior explicitly indicates that each line (sans non- > whitespace) is assumed to be a separate sequence, so changing that > behavior breaks API. Ah - true indeed. I like the optional argument feature - that way it's easy for the user to choose. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Thu Aug 6 12:49:53 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:49:53 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From biopython at maubp.freeserve.co.uk Thu Aug 6 12:51:34 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 6 Aug 2009 17:51:34 +0100 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> Message-ID: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: > >>> I agree with the not removing newlines in raw format being a bug. >>> >>> ? ? ? ?-hilmar >> >> Acc. to the SeqIO::raw docs, this is a little trickier. ?The documented >> behavior explicitly indicates that each line (sans non-whitespace) is >> assumed to be a separate sequence, so changing that behavior breaks API. > > Ah - true indeed. I like the optional argument feature - that way it's easy > for the user to choose. > For reference, "raw" as a format in EMBOSS seems to give just one sequence regardless of any line breaks. Adding an optional argument might be clearest, but have you considered using the new BioPerl SeqIO variant argument to have two forms of raw (the original variant giving one sequence per line, and a new variant where you just get one sequence regardless of any line breaks)? Peter From cjfields at illinois.edu Thu Aug 6 12:58:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 11:58:07 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <8FAB8756AD944534B49F2C4356CB6D92@NewLife> <79AEB387-76AC-4A95-BA75-F64D45F9812D@illinois.edu> <72A9E556-96C1-40DA-A799-47956396372B@illinois.edu> <12BFAC40-19C5-4F34-B2F7-32739AD73BEC@gmx.net> <320fb6e00908060951n40aa750cu3df5a51d092f5398@mail.gmail.com> Message-ID: On Aug 6, 2009, at 11:51 AM, Peter wrote: > On Thu, Aug 6, 2009 at 5:42 PM, Hilmar Lapp wrote: >> >>>> I agree with the not removing newlines in raw format being a bug. >>>> >>>> -hilmar >>> >>> Acc. to the SeqIO::raw docs, this is a little trickier. The >>> documented >>> behavior explicitly indicates that each line (sans non-whitespace) >>> is >>> assumed to be a separate sequence, so changing that behavior >>> breaks API. >> >> Ah - true indeed. I like the optional argument feature - that way >> it's easy >> for the user to choose. >> > > For reference, "raw" as a format in EMBOSS seems to give just one > sequence regardless of any line breaks. Yes, and that's the behavior I would expect, actually. > Adding an optional argument might be clearest, but have you considered > using the new BioPerl SeqIO variant argument to have two forms of raw > (the original variant giving one sequence per line, and a new variant > where you just get one sequence regardless of any line breaks)? > > Peter That's a good point. We'd have to keep 'raw' as the prior behavior, but 'raw-complete' could be used for such a circumstance ('raw-gulp' sounds just wrong ;) chris From rmb32 at cornell.edu Thu Aug 6 13:14:12 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 06 Aug 2009 10:14:12 -0700 Subject: [Bioperl-l] tigrxml parsing Message-ID: <4A7B0F64.9070205@cornell.edu> Hi all, Recently in #bioperl somebody came by trying to use Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz svn HEAD tigrxml.pm was not at all happy with these files, eventually dieing with ------------- EXCEPTION: Bio::Root::Exception ------------- MSG: start is undefined STACK: Error::throw STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 STACK: Bio::RangeI::contains Bio/RangeI.pm:255 STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/Generic.pm:783 STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/Base.pm:266 STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/Expat.pm:225 STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm:469 STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/Expat.pm:45 STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm:2631 STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 STACK: /crypt/rob/test2.pl:10 ----------------------------------------------------------- Looking at the medicago XML and comparing it to the bioperl-live/t/data/test.tigrxml, the two look VERY different in structure. Lots of things that are attrs in test.tigrxml seem to be elements in the medicago XML, for example. So I guess the question is: is the medicago TIGR XML malformed? Can tigrxml.pm be expected to parse it? What, if anything, should be done about this? Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From hilgert at cshl.edu Thu Aug 6 15:36:36 2009 From: hilgert at cshl.edu (Hilgert, Uwe) Date: Thu, 6 Aug 2009 15:36:36 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Hmmm, I fail to see how supplying raw sequence could be a called "bad" input or a "problem". In our case, for example, not every user is a bioinformatics expert and Cornel was suggesting to account for that instead of trying to "train" the user to adhere to requirements that have not much to do with what s/he tries to accomplish. I don't really see data being modified, rather that the data format is being adopted to the needs of the software; which I would argue should be something the software is being able to take care of. Uwe -----Original Message----- From: Chris Fields [mailto:cjfields at illinois.edu] Sent: Thursday, August 06, 2009 12:50 PM To: Ghiban, Cornel Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List Subject: Re: [Bioperl-l] Bio::SeqIO issue Cornel, I'm failing to see how adding '>' would solve the problem. This is a simple validation issue: should we throw an exception on bad input (no '>'), or just argue GIGO based on user error (the assumption that the SeqIO parser will read raw sequence correctly when set to 'fasta' is wrong)? I think, in this circumstance, the former applies. It is easy to add, and the use of an exception in this case is violently user-friendly, e.g. it will stop cold and immediately point out the problem. Otherwise data is (silently) being modified, which is always a bad thing. chris On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > Hi, > > It doesn't matter what sequence we use. As Chris Fields's showed in > his test, not having > ">" as the 1st character on the first line is the problem. > We always assumed the sequence is in FASTA format and this seems to > be wrong. > > I think, the solution to our problem is to check whether the ">" > symbol is present or not. > If not present then it will be added. > > Thank you, > Cornel Ghiban > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Thursday, August 06, 2009 11:18 AM > To: Hilgert, Uwe > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Uwe - could you send an actual data file (as an attachment) that > reproduces the problem, or is that not possible? > > -hilmar > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > >> I'm not sure what version we have. Cornel may have installed it a >> while ago from CVS: >> >> Module id = Bio::Root::Build >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >> INST_VERSION 1.006900 >> cpan> m Bio::Root::Version >> Module id = Bio::Root::Version >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >> INST_VERSION 1.006900 >> cpan> m Bio::SeqIO >> Module id = Bio::SeqIO >> CPAN_USERID CJFIELDS (Christopher Fields ) >> CPAN_VERSION 1.006000 >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >> INST_VERSION undef >> >> Cornel still has the checked-out "bioperl-live" directory and the >> last >> changes are from March this year. >> >> As per why he used "Fasta" instead of 'fasta" as the format parameter >> in Bio::SeqIO, it's because that what it says in the modules manual. >> He now tried 'fasta' instead and see no changes in behavior. Omitting >> the format parameter altogether, fasta-formatted sequence continues >> to >> be treated correctly, the first line being removed. However, raw >> sequence is being treated differently in that the first line is not >> being removed any more. Instead, the program returns the first line >> only. Which, in the example I am going to forward in my next message, >> will return 60 amino acids out of raw sequence of 300 aa. Can't win >> with raw sequence... >> >> >> The files may be created on different platforms, we didn't notice any >> difference between using files created on Windows or Linux. >> >> Thanks >> Uwe >> >> >> >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Wednesday, August 05, 2009 6:54 PM >> To: Chris Fields >> Cc: Hilgert, Uwe; BioPerl List >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> I don't think that can be the problem. If anything, providing the >> format ought to be better in terms of result than not providing it? >> >> Uwe - I'd like you to go back to Chris' initial questions that you >> haven't answered yet: "What version of bioperl are you using, OS, >> etc? >> What does your data look like?" I'd add to that, can you show us your >> full script, or a smaller code snippet that reproduces the problem. >> >> I suspect that either something in your script is swallowing the >> line, >> or that the line endings in your data file are from a different OS >> than the one you're running the script on. (Or that you are running a >> very old version of BioPerl, which is entirely possible if you >> installed through CPAN.) >> >> -hilmar >> >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >> >>> Uwe, >>> >>> Please keep replies on the list. >>> >>> It's very possible that's the issue; IIRC the fasta parser pulls out >>> the full sequence in chunks (based on local $/ = "\n>") and splits >>> the header off as the first line in that chunk. You could probably >>> try leaving the format out and letting SeqIO guess it, or passing >>> the >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>> better to go through the files and add a file extension that >>> corresponds to the format. >>> >>> chris >>> >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>> >>>> Thanks, Chris. The files have no extension, but we indicate what >>>> format to use, like in the manual: >>>> >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>> >>>> I wonder now whether this could exactly cause the problem: as we >>>> are >>>> telling that input files are in fasta format they are being treated >>>> as such (=remove first line) - regardless of whether they really >>>> are >>>> fasta? >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>>> Hilgert, Ph.D. >>>> Dolan DNA Learning Center >>>> Cold Spring Harbor Laboratory >>>> >>>> C: (516) 857-1693 >>>> V: (516) 367-5185 >>>> E: hilgert at cshl.edu >>>> F: (516) 367-5182 >>>> W: http://www.dnalc.org >>>> >>>> -----Original Message----- >>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>> To: Hilgert, Uwe >>>> Cc: bioperl-l at lists.open-bio.org >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>> >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>> >>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>> sequences are being submitted in FASTA format? >>>> >>>> No. See: >>>> >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>> >>>> SeqIO tries to guess at the format using the file extension, and if >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>> possible that the extension is causing the problem, or that >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>> to guessing). In any case, it's always advisable to explicitly >>>> indicate the format when possible. >>>> >>>> Relevant lines: >>>> >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>> i; >>>> ... >>>> return 'raw' if /\.(txt)$/i; >>>> >>>>> In our experience, implementing >>>>> Bio::SeqIO led to the first line of files being cut off, >>>>> regardless >>>>> of whether the files were indeed fasta files or files that only >>>>> contained sequence. >>>> >>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>> 'fasta'. >>>> >>>>> Which, in the latter, led to sequence submissions that had the >>>>> first line of nucleotides removed. Has anyone tried to write a fix >>>>> for this? >>>> >>>> This sounds like a bug, but we have very little to go on beyond >>>> your >>>> description. What version of bioperl are you using, OS, etc? What >>>> does your data look like? File extension? >>>> >>>> chris >>>> >>>>> Thanks, >>>>> >>>>> Uwe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> >>>>> Uwe Hilgert, Ph.D. >>>>> >>>>> Dolan DNA Learning Center >>>>> >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> >>>>> >>>>> V: (516) 367-5185 >>>>> >>>>> E: hilgert at cshl.edu >>>>> >>>>> F: (516) 367-5182 >>>>> >>>>> W: http://www.dnalc.org >>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Bioperl-l mailing list >>>>> Bioperl-l at lists.open-bio.org >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > From cjfields at illinois.edu Thu Aug 6 16:09:22 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:09:22 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <6729F9CC-ACF9-4BC4-9905-7EA24C1DCA61@illinois.edu> If one supplies raw sequence (no descriptor) to a FASTA parser (requires a descriptor), then it is bad input. One can't reasonably expect the parser to work correctly under those circumstance. Garbage in, garbage out. The simplest and (IMHO) best solution under such circumstances is for the parser to die meaningfully ("Sequence is not FASTA format; '>' descriptor line is missing" or similar). Tacking a '>' onto bad data doesn't make it magically work, it's just bad data with a '>' appended. To take this one step further, what if this were genbank data? Or XML? A well-formed exception, though initially inconvenient to the user, will indicate the problem right away. Silently trying to fix the problem by appending '>' to bad input data wouldn't work, and the resulting failure downstream (likely from validate_seq) would obscure the real problem, being the user is using the wrong format parser. chris On Aug 6, 2009, at 2:36 PM, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being > adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > >> Hi, >> >> It doesn't matter what sequence we use. As Chris Fields's showed in >> his test, not having >> ">" as the 1st character on the first line is the problem. >> We always assumed the sequence is in FASTA format and this seems to >> be wrong. >> >> I think, the solution to our problem is to check whether the ">" >> symbol is present or not. >> If not present then it will be added. >> >> Thank you, >> Cornel Ghiban >> >> -----Original Message----- >> From: Hilmar Lapp [mailto:hlapp at gmx.net] >> Sent: Thursday, August 06, 2009 11:18 AM >> To: Hilgert, Uwe >> Cc: Chris Fields; BioPerl List; Ghiban, Cornel >> Subject: Re: [Bioperl-l] Bio::SeqIO issue >> >> Uwe - could you send an actual data file (as an attachment) that >> reproduces the problem, or is that not possible? >> >> -hilmar >> >> On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: >> >>> I'm not sure what version we have. Cornel may have installed it a >>> while ago from CVS: >>> >>> Module id = Bio::Root::Build >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::Root::Version >>> Module id = Bio::Root::Version >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm >>> INST_VERSION 1.006900 >>> cpan> m Bio::SeqIO >>> Module id = Bio::SeqIO >>> CPAN_USERID CJFIELDS (Christopher Fields ) >>> CPAN_VERSION 1.006000 >>> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm >>> INST_VERSION undef >>> >>> Cornel still has the checked-out "bioperl-live" directory and the >>> last >>> changes are from March this year. >>> >>> As per why he used "Fasta" instead of 'fasta" as the format >>> parameter >>> in Bio::SeqIO, it's because that what it says in the modules manual. >>> He now tried 'fasta' instead and see no changes in behavior. >>> Omitting >>> the format parameter altogether, fasta-formatted sequence continues >>> to >>> be treated correctly, the first line being removed. However, raw >>> sequence is being treated differently in that the first line is not >>> being removed any more. Instead, the program returns the first line >>> only. Which, in the example I am going to forward in my next >>> message, >>> will return 60 amino acids out of raw sequence of 300 aa. Can't win >>> with raw sequence... >>> >>> >>> The files may be created on different platforms, we didn't notice >>> any >>> difference between using files created on Windows or Linux. >>> >>> Thanks >>> Uwe >>> >>> >>> >>> >>> -----Original Message----- >>> From: Hilmar Lapp [mailto:hlapp at gmx.net] >>> Sent: Wednesday, August 05, 2009 6:54 PM >>> To: Chris Fields >>> Cc: Hilgert, Uwe; BioPerl List >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> I don't think that can be the problem. If anything, providing the >>> format ought to be better in terms of result than not providing it? >>> >>> Uwe - I'd like you to go back to Chris' initial questions that you >>> haven't answered yet: "What version of bioperl are you using, OS, >>> etc? >>> What does your data look like?" I'd add to that, can you show us >>> your >>> full script, or a smaller code snippet that reproduces the problem. >>> >>> I suspect that either something in your script is swallowing the >>> line, >>> or that the line endings in your data file are from a different OS >>> than the one you're running the script on. (Or that you are >>> running a >>> very old version of BioPerl, which is entirely possible if you >>> installed through CPAN.) >>> >>> -hilmar >>> >>> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: >>> >>>> Uwe, >>>> >>>> Please keep replies on the list. >>>> >>>> It's very possible that's the issue; IIRC the fasta parser pulls >>>> out >>>> the full sequence in chunks (based on local $/ = "\n>") and splits >>>> the header off as the first line in that chunk. You could probably >>>> try leaving the format out and letting SeqIO guess it, or passing >>>> the >>>> file into Bio::Tools::GuessSeqFormat directly, but it's probably >>>> better to go through the files and add a file extension that >>>> corresponds to the format. >>>> >>>> chris >>>> >>>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >>>> >>>>> Thanks, Chris. The files have no extension, but we indicate what >>>>> format to use, like in the manual: >>>>> >>>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>>>> >>>>> I wonder now whether this could exactly cause the problem: as we >>>>> are >>>>> telling that input files are in fasta format they are being >>>>> treated >>>>> as such (=remove first line) - regardless of whether they really >>>>> are >>>>> fasta? >>>>> >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>> Uwe >>>>> Hilgert, Ph.D. >>>>> Dolan DNA Learning Center >>>>> Cold Spring Harbor Laboratory >>>>> >>>>> C: (516) 857-1693 >>>>> V: (516) 367-5185 >>>>> E: hilgert at cshl.edu >>>>> F: (516) 367-5182 >>>>> W: http://www.dnalc.org >>>>> >>>>> -----Original Message----- >>>>> From: Chris Fields [mailto:cjfields at illinois.edu] >>>>> Sent: Wednesday, August 05, 2009 5:04 PM >>>>> To: Hilgert, Uwe >>>>> Cc: bioperl-l at lists.open-bio.org >>>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>>>> >>>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>>>> >>>>>> Is my impression correct that Bio::SeqIO just assumes that >>>>>> sequences are being submitted in FASTA format? >>>>> >>>>> No. See: >>>>> >>>>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>>>> >>>>> SeqIO tries to guess at the format using the file extension, and >>>>> if >>>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>>>> possible that the extension is causing the problem, or that >>>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>>>> to guessing). In any case, it's always advisable to explicitly >>>>> indicate the format when possible. >>>>> >>>>> Relevant lines: >>>>> >>>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>>>> i; >>>>> ... >>>>> return 'raw' if /\.(txt)$/i; >>>>> >>>>>> In our experience, implementing >>>>>> Bio::SeqIO led to the first line of files being cut off, >>>>>> regardless >>>>>> of whether the files were indeed fasta files or files that only >>>>>> contained sequence. >>>>> >>>>> Files that only contain sequence are 'raw'. Ones in FASTA are >>>>> 'fasta'. >>>>> >>>>>> Which, in the latter, led to sequence submissions that had the >>>>>> first line of nucleotides removed. Has anyone tried to write a >>>>>> fix >>>>>> for this? >>>>> >>>>> This sounds like a bug, but we have very little to go on beyond >>>>> your >>>>> description. What version of bioperl are you using, OS, etc? >>>>> What >>>>> does your data look like? File extension? >>>>> >>>>> chris >>>>> >>>>>> Thanks, >>>>>> >>>>>> Uwe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>>>> >>>>>> Uwe Hilgert, Ph.D. >>>>>> >>>>>> Dolan DNA Learning Center >>>>>> >>>>>> Cold Spring Harbor Laboratory >>>>>> >>>>>> >>>>>> >>>>>> V: (516) 367-5185 >>>>>> >>>>>> E: hilgert at cshl.edu >>>>>> >>>>>> F: (516) 367-5182 >>>>>> >>>>>> W: http://www.dnalc.org >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> -- >>> =========================================================== >>> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >>> =========================================================== >>> >>> >> >> -- >> =========================================================== >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : >> =========================================================== >> >> >> > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:25:45 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:25:45 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> Message-ID: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Michael, Are you using ClustalW 2? I'm not sure but I don't think the wrapper has been updated for the latest version (I think parsing still works, though). chris On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > I'm a complete bioperl novice, trying to do Clustalw on some fasta > files, and am running into trouble: > > ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 550. > Use of uninitialized value in concatenation (.) or string at /usr/ > pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm > line 551. > Can't exec "align": No such file or directory at /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. > > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - > output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 > > STACK: Error::throw > STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ > Bio/Root/Root.pm:328 > STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 > STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ > perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 > STACK: TestClust:22 > ----------------------------------------------------------- > > Here's my code: > > #!/usr/bin/perl -w > > use Bio::Perl; > use Bio::AlignIO; > use Bio::Tools::Run::Alignment::Clustalw; > use Bio::SimpleAlign; > use Bio::Seq; > use strict; > use warnings; > > my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); > my @seq_array = read_all_sequences($ARGV[0],'fasta'); > > for (my $i = 0; $i < @seq_array; $i++){ > (my $seq = $seq_array[$i]->seq()) =~ s/-//g; > $seq_array[$i]->seq($seq); > } > > write_sequence(">test",'fasta', at seq_array); > > my $seq_array_ref = \@seq_array; > my $aln = $factory->align($seq_array_ref); > > my @align_array = $aln->each_seq(); > write_sequence(">testfile",'fasta', at align_array); > > > The loop is just there to take out some gaps that were placed in a > blast previous to this. The write_sequence call confirms that > @seq_array is a valid array of Bio:Seq objects at the time align > calls it. Here's some output in "test": > > >A0220B0939one.1 FV584Q101DEWY9 > TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC > CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT > TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT > TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG > CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG > CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA > CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA > CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT > AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG > >A0220B0939one.2 FV584Q101A4DG7 > TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG > ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC > AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG > TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG > GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA > GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT > CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT > CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT > ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG > ... > > Thanks, > Mike > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 16:30:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 15:30:30 -0500 Subject: [Bioperl-l] tigrxml parsing In-Reply-To: <4A7B0F64.9070205@cornell.edu> References: <4A7B0F64.9070205@cornell.edu> Message-ID: Robert, This popped up recently (may be related): http://thread.gmane.org/gmane.comp.lang.perl.bio.general/19782 http://bugzilla.open-bio.org/show_bug.cgi?id=2868 It might be possible to map this into bioperl, but someone needs to take it up. chris On Aug 6, 2009, at 12:14 PM, Robert Buels wrote: > Hi all, > > Recently in #bioperl somebody came by trying to use > Bio::SeqIO::tigrxml.pm to parse the medicago genome annotations at http://www.medicago.org/genome/downloads/Mt2/MT2.0_medicago_chrX_20080103_NoOverlap.xml.tar.gz > > svn HEAD tigrxml.pm was not at all happy with these files, > eventually dieing with > ------------- EXCEPTION: Bio::Root::Exception ------------- > MSG: start is undefined > STACK: Error::throw > STACK: Bio::Root::Root::throw Bio/Root/Root.pm:368 > STACK: Bio::RangeI::contains Bio/RangeI.pm:255 > STACK: Bio::SeqFeature::Generic::add_SeqFeature Bio/SeqFeature/ > Generic.pm:783 > STACK: Bio::SeqIO::tigrxml::start_element Bio/SeqIO/tigrxml.pm:206 > STACK: try{} block /usr/share/perl5/XML/SAX/Base.pm:292 > STACK: XML::SAX::Base::start_element /usr/share/perl5/XML/SAX/ > Base.pm:266 > STACK: XML::SAX::Expat::_handle_start /usr/share/perl5/XML/SAX/ > Expat.pm:225 > STACK: XML::Parser::Expat::parse /usr/lib/perl5/XML/Parser/Expat.pm: > 469 > STACK: XML::Parser::parse /usr/lib/perl5/XML/Parser.pm:187 > STACK: XML::SAX::Expat::_parse_bytestream /usr/share/perl5/XML/SAX/ > Expat.pm:45 > STACK: XML::SAX::Base::parse /usr/share/perl5/XML/SAX/Base.pm:2602 > STACK: XML::SAX::Base::parse_file /usr/share/perl5/XML/SAX/Base.pm: > 2631 > STACK: Bio::SeqIO::tigrxml::next_seq Bio/SeqIO/tigrxml.pm:116 > STACK: /crypt/rob/test2.pl:10 > ----------------------------------------------------------- > > Looking at the medicago XML and comparing it to the bioperl-live/t/ > data/test.tigrxml, the two look VERY different in structure. Lots > of things that are attrs in test.tigrxml seem to be elements in the > medicago XML, for example. > > So I guess the question is: is the medicago TIGR XML malformed? > Can tigrxml.pm be expected to parse it? What, if anything, should > be done about this? > > Rob > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From eigenrosen at gmail.com Thu Aug 6 16:39:09 2009 From: eigenrosen at gmail.com (Michael Rosen) Date: Thu, 6 Aug 2009 13:39:09 -0700 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Hi Chris, I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the top of the module being called. Mike On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the > wrapper has been updated for the latest version (I think parsing > still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > >> I'm a complete bioperl novice, trying to do Clustalw on some fasta >> files, and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at /usr/ >> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >> Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a >> blast previous to this. The write_sequence call confirms that >> @seq_array is a valid array of Bio:Seq objects at the time align >> calls it. Here's some output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > From lsbrath at gmail.com Thu Aug 6 16:49:56 2009 From: lsbrath at gmail.com (Mgavi Brathwaite) Date: Thu, 6 Aug 2009 16:49:56 -0400 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <69367b8f0908061349i48f4d2b1tcbccb00d5a3de5ca@mail.gmail.com> Hi Micheal, Have you considered calling clustalw from perl's "system" command and passing in the files for alignment? Mgavi On Thu, Aug 6, 2009 at 4:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris > > On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: > > I'm a complete bioperl novice, trying to do Clustalw on some fasta files, >> and am running into trouble: >> >> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 550. >> Use of uninitialized value in concatenation (.) or string at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 551. >> Can't exec "align": No such file or directory at >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm >> line 555. >> >> ------------- EXCEPTION: Bio::Root::Exception ------------- >> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf -output=gcg >> -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >> >> STACK: Error::throw >> STACK: Bio::Root::Root::throw >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:328 >> STACK: Bio::Tools::Run::Alignment::Clustalw::_run >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >> STACK: Bio::Tools::Run::Alignment::Clustalw::align >> /usr/pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >> STACK: TestClust:22 >> ----------------------------------------------------------- >> >> Here's my code: >> >> #!/usr/bin/perl -w >> >> use Bio::Perl; >> use Bio::AlignIO; >> use Bio::Tools::Run::Alignment::Clustalw; >> use Bio::SimpleAlign; >> use Bio::Seq; >> use strict; >> use warnings; >> >> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >> >> for (my $i = 0; $i < @seq_array; $i++){ >> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >> $seq_array[$i]->seq($seq); >> } >> >> write_sequence(">test",'fasta', at seq_array); >> >> my $seq_array_ref = \@seq_array; >> my $aln = $factory->align($seq_array_ref); >> >> my @align_array = $aln->each_seq(); >> write_sequence(">testfile",'fasta', at align_array); >> >> >> The loop is just there to take out some gaps that were placed in a blast >> previous to this. The write_sequence call confirms that @seq_array is a >> valid array of Bio:Seq objects at the time align calls it. Here's some >> output in "test": >> >> >A0220B0939one.1 FV584Q101DEWY9 >> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >> >A0220B0939one.2 FV584Q101A4DG7 >> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >> ... >> >> Thanks, >> Mike >> >> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From cjfields at illinois.edu Thu Aug 6 17:00:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 16:00:37 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <2C8DF4CB-40B0-41DB-882A-AAF346A008B2@illinois.edu> Michael, No, I meant was what version of clustalw (the actual executable) you are using. This is the bioperl wrapper svn version. What happens if you enter 'clustalw' on the command line? Do you get: ************************************************************** ******** CLUSTAL 2.0.11 Multiple Sequence Alignments ******** ************************************************************** I think the above version has problems with bioperl, though I can't recall exactly what the problems were. chris On Aug 6, 2009, at 3:39 PM, Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at > the top of the module being called. > > Mike > On Aug 6, 2009, at 1:25 PM, Chris Fields wrote: > >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has been updated for the latest version (I think parsing >> still works, though). >> >> chris >> >> On Aug 6, 2009, at 2:12 AM, Michael Rosen wrote: >> >>> I'm a complete bioperl novice, trying to do Clustalw on some fasta >>> files, and am running into trouble: >>> >>> ~/454DATA> perl TestClust BlastedReads/A0220B0939all.fasta >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 550. >>> Use of uninitialized value in concatenation (.) or string at /usr/ >>> pubsw/lib/perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/ >>> Clustalw.pm line 551. >>> Can't exec "align": No such file or directory at /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm line 555. >>> >>> ------------- EXCEPTION: Bio::Root::Exception ------------- >>> MSG: Clustalw call ( align -infile=/tmp/6g7vpegtdP/tBlfRYOnKf - >>> output=gcg -outfile=/tmp/6g7vpegtdP/4WWjuhKS3p) crashed: -1 >>> >>> STACK: Error::throw >>> STACK: Bio::Root::Root::throw /usr/pubsw/lib/perl5/site_perl/5.8.8/ >>> Bio/Root/Root.pm:328 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::_run /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:556 >>> STACK: Bio::Tools::Run::Alignment::Clustalw::align /usr/pubsw/lib/ >>> perl5/site_perl/5.8.8/Bio/Tools/Run/Alignment/Clustalw.pm:472 >>> STACK: TestClust:22 >>> ----------------------------------------------------------- >>> >>> Here's my code: >>> >>> #!/usr/bin/perl -w >>> >>> use Bio::Perl; >>> use Bio::AlignIO; >>> use Bio::Tools::Run::Alignment::Clustalw; >>> use Bio::SimpleAlign; >>> use Bio::Seq; >>> use strict; >>> use warnings; >>> >>> my $factory = Bio::Tools::Run::Alignment::Clustalw->new(); >>> my @seq_array = read_all_sequences($ARGV[0],'fasta'); >>> >>> for (my $i = 0; $i < @seq_array; $i++){ >>> (my $seq = $seq_array[$i]->seq()) =~ s/-//g; >>> $seq_array[$i]->seq($seq); >>> } >>> >>> write_sequence(">test",'fasta', at seq_array); >>> >>> my $seq_array_ref = \@seq_array; >>> my $aln = $factory->align($seq_array_ref); >>> >>> my @align_array = $aln->each_seq(); >>> write_sequence(">testfile",'fasta', at align_array); >>> >>> >>> The loop is just there to take out some gaps that were placed in a >>> blast previous to this. The write_sequence call confirms that >>> @seq_array is a valid array of Bio:Seq objects at the time align >>> calls it. Here's some output in "test": >>> >>> >A0220B0939one.1 FV584Q101DEWY9 >>> TAAAGGAGCGGTTCACTTCCCGCAGCCCGGCTACCAAGTATTCATCGAGGGGGCCGGTGC >>> CACCCGCAACCAGGGAATAGGTGATGAAGCGGAGGTAGTAGCCGATGTCGCGGGCACACT >>> TGGCCTGAAACACATCGCCGTGGCCCATTTCACCCGGCTGGGTCAAGTAAGGGAACCTCT >>> TGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTGTTGGTTAGGACACGGG >>> CCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACGGCCTGCAGCTCGCTGG >>> CATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCAGTGATGACGGTTTTCA >>> CCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATTCTGTAGGAAAGGCGGA >>> CTGGATCTCCACCTGCCTATCAGAAATGAAGGGATCTAACCGATCTAAAAAGGGACGACT >>> AAGCCAGCTTCGACCCAAAGCTCAAACGATGGCG >>> >A0220B0939one.2 FV584Q101A4DG7 >>> TCGAGGGGGCCGGTGCCACCCGCAACCAGGGAATAGGTAATGAAGCGGAGGTAGTAGCCG >>> ATGTCGCGGGCACACTTGGCCTGAAACACATCGCCGTGGCCCATTTCCCCCGGCTGGGTC >>> AAGTAAGGGAACCTCTTGAACACTTCCTGCACCGCTTCCCGCACCAGGGTTTGCTGATTG >>> TTGGTTAGGACACGGGCCGCTTCCAGAGAAGCAGCAGCACGCTGGTAACGACCATTCACG >>> GCCTGCAGCTCGCTGGCATTCAGAAAACGCCCTTGATTGTCAGCGGCAGCAATCGCTTCA >>> GTGATGACGGTTTTCACCTTGCAACTCCTAAATTCATCAATTGTGTTGTTAACGAACATT >>> CTGTAGGAAAGGCGGACTGGATCTCCACCTGCCTATTAGAAATGAAGGGATCTAACCGAT >>> CTAAAAAGGACGACTAAGCCAGCTTCGACCCAAAGCTCAAACGATGGCGGCAGCAGCCTT >>> ATCGAAGTAGCTGGCCACTTCGCTTTGCAGCG >>> ... >>> >>> Thanks, >>> Mike >>> >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > From bosborne11 at verizon.net Thu Aug 6 16:01:00 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Thu, 06 Aug 2009 16:01:00 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: Chris, Yes, I think so. By the way, this is related to an old bug: http://bugzilla.bioperl.org/show_bug.cgi?id=1508 Brian O. > This is a simple validation issue: should we throw an exception on > bad input (no '>') From bix at sendu.me.uk Thu Aug 6 17:18:02 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Thu, 06 Aug 2009 22:18:02 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <2F194A7C-45C5-4252-84D2-E976A013E4BB@gmail.com> Message-ID: <4A7B488A.2060600@sendu.me.uk> Michael Rosen wrote: > Hi Chris, > I'm not sure, but I don't think so. I see "Clustalw.pm,v 1.36" at the > top of the module being called. I'm guessing your error is caused simply by not having clustalw installed. BioPerl run modules provide perl wrappers to external executables. They don't replace the need for those executables. From cjfields at illinois.edu Thu Aug 6 20:47:47 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 19:47:47 -0500 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: I added the exception and tests to svn (r15895), so I closed that bug out. Almost forgot about that one, thanks for pointing it out! chris On Aug 6, 2009, at 3:01 PM, Brian Osborne wrote: > Chris, > > Yes, I think so. > > By the way, this is related to an old bug: > > http://bugzilla.bioperl.org/show_bug.cgi?id=1508 > > > Brian O. > > >> This is a simple validation issue: should we throw an exception on >> bad input (no '>') > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 6 22:30:09 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 6 Aug 2009 21:30:09 -0500 Subject: [Bioperl-l] Bio::Robotics namespace discussion In-Reply-To: <4A765A44.7030902@gmail.com> References: <4A71002E.6060507@gmail.com> <18DF7D20DFEC044098A1062202F5FFF32AAB5A50FB@exchsth.agresearch.co.nz> <4A765A44.7030902@gmail.com> Message-ID: Jonathan, Just to make sure you aren't accidentally 'warnocked' by the core devs: Your code sounds quite nice! However, we will begin the process of massively restructuring bioperl pretty soon, so I don't think it's a good idea to gear your code towards fitting directly into core. The best alternative should be fairly obvious, which is to release it to CPAN listing BioPerl 1.6.0 as a dependency if it is required. Your modules may or may not need the Bio* namespace (that's up to you, actually); there are several non-bioperl modules that also share the Bio* namespace, and I believe there are modules that aren't Bio* that use BioPerl (Gbrowse comes to mind). If you're focusing on interaction with robotics, Robotics::Bio::X might be a better namespace for instance (b/c you could expand later into other possibly non-bio robotics interfaces). The cpan-discuss list is probably a good place to ask, or (after you register on PAUSE) you can register the module namespace and see if there are any objections to the request. chris On Aug 2, 2009, at 10:32 PM, Jonathan Cline wrote: > Smithies, Russell wrote: >> I "acquired" an old Biomek 1000 that I'm thinking of modernising. >> It was originally controlled by a monstrously large but slow pc >> (IBM Value Point Model 466DX2 computer with Microsoft Windows* >> Version 3.1) >> My plan is to fit a 3-axis CAD/CAM stepper controller (about $60) >> and use software like mach3 www.machsupport.com along with G-code >> to control it. >> I come from an engineering background so it seemed like the easy >> way to me :-) >> >> Now I just need a bit of free time to get it working... >> >> --Russell >> >> >> > I agree, that's probably the best way to go. It's hard to know what > amount of s/w processing was done on the host PC vs. the embedded > controller. If you were able to connect directly to the robot > hardware > with serial port(s) or whatever it's using, it would be tough to find > out the comm protocol unless someone has already reverse engineered it > (which is doubtful). Also from what I have seen online, attempting > to > run the old software under virtual machine is unpredictable due to > timing differences in the serial port communication. So removal of > the > old electronics is probably the best bet. If it has one arm, then > it's > much easier. > > As for robots with working workstation software, it seems the > annoyance > factor is that while the scripting languages are powerful (for GUI > scripting that is), they are still relatively low level. Bio types > with > a bit of CS seem to immediately turn to visual basic, labview, or even > excel spreadsheets and macros, in order to provide a higher level > abstraction for the workstation software. To me, it seems natural > that > there should be a "protocol compiler" which takes biology protocols as > input, and gives robot instructions as output (google "protolexer"). > The huge bottleneck of course is that everyone's robotics work tables > and equipment are somewhat unique to their needs. > > > ## Jonathan Cline > ## jcline at ieee.org > ## Mobile: +1-805-617-0223 > ######################## > > >>> -----Original Message----- >>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- >>> bounces at lists.open-bio.org] On Behalf Of Jonathan Cline >>> Sent: Thursday, 30 July 2009 2:07 p.m. >>> To: bioperl-l at lists.open-bio.org >>> Cc: Jonathan Cline >>> Subject: [Bioperl-l] Bio::Robotics namespace discussion >>> >>> I am writing a module for communication with biology robotics, as >>> discussed recently on #bioperl, and I invite your comments. >>> >>> Currently this mode talks to a Tecan genesis workstation robot ( >>> http://images.google.com/images?q=tecan genesis ). Other vendors >>> are >>> Beckman Biomek, Agilent, etc. No such modules exist anywhere on the >>> 'net with the exception of some visual basic and labview scripts >>> which I >>> have found. There are some computational biologists who program for >>> robots via high level s/w, but these scripts are not distributed >>> as OSS. >>> >>> With Tecan, there is a datapipe interface for hardware >>> communication, as >>> an added $$ option from the vendor. I haven't checked other >>> vendors to >>> see if they likewise have an open communication path for third party >>> software. By allowing third-party communication, then naturally the >>> next step is to create a socket client-server; especially as the >>> robot >>> vendor only support MS Win and using the local machine has typical >>> Microsoft issues (like losing real time communication with the >>> hardware >>> due to GUI animation, bad operating system stability, no unix except >>> cygwin, etc). >>> >>> >>> On Namespace: >>> >>> I have chosen Bio::Robotics and Bio::Robotics::Tecan. There are >>> many >>> s/w modules already called 'robots' (web spider robots, chat bots, >>> www >>> automate, etc) so I chose the longer name "robotics" to >>> differentiate >>> this module as manipulating real hardware. Bio::Robotics is the >>> abstraction for generic robotics and Bio::Robotics::(vendor) is the >>> manufacturer-specific implementation. Robot control is made more >>> complex due to the very configurable nature of the work table >>> (placement >>> of equipment, type of equipment, type of attached arm, etc). The >>> abstraction has to be careful not to generalize or assume too >>> much. In >>> some cases, the Bio::Robotics modules may expand to arbitrary >>> equipment >>> such as thermocyclers, tray holders, imagers, etc - that could be a >>> future roadmap plan. >>> >>> Here is some theoretical example usage below, subject to change. At >>> this time I am deciding how much state to keep within the Perl >>> module. >>> By keeping state, some robot programming might be simplified >>> (avoiding >>> deadlock or tracking tip state). In general I am aiming for a more >>> "protocol friendly" method implementation. >>> >>> >>> To use this software with locally-connected robotics hardware: >>> >>> use Bio::Robotics; >>> >>> my $tecan = Bio::Robotics->new("Tecan") || die; >>> $tecan->attach() || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack1"); >>> $tecan->pipette(aspirate => "1", dispense => "1", from => >>> "sampleTray", to >>> => "DNATray"); >>> ... >>> >>> To use this software with remote robotics hardware over the network: >>> >>> # On the local machine, run: >>> use Bio::Robotics; >>> >>> my @connected_hardware = Bio::Robotics->query(); >>> my $tecan = Bio::Robotics->new("Tecan") || die "no tecan found in >>> @connected_hardware\n"; >>> $tecan->attach() || die; >>> $tecan->configure("my work table configuration file") || die; >>> # Run the server and process commands >>> while (1) { >>> $error = $tecan->server(passwordplaintext => "0xd290"); >>> if ($tecan->lastClientCommand() =~ /^shutdown/) { >>> last; >>> } >>> } >>> $tecan->detach(); >>> exit(0); >>> >>> # On the remote machine (the client), run: >>> use Bio::Robotics; >>> >>> my $server = "heavybio.dyndns.org:8080"; >>> my $password = "0xd290"; >>> my $tecan = Bio::Robotics->new("Tecan"); >>> $tecan->connect($server, $mypassword) || die; >>> $tecan->home(); >>> $tecan->pipette(tips => "1", from => "rack200"); >>> $tecan->pipette(aspirate => "1", dispense => "1", >>> from => "sampleTray A1", to => "DNATray A2", >>> volume => "45", liquid => "Buffer"); >>> $tecan->pipette(drop => "1"); >>> ... >>> $tecan->disconnect(); >>> exit(0); >>> >>> >>> >>> -- >>> >>> ## Jonathan Cline >>> ## jcline at ieee.org >>> ## Mobile: +1-805-617-0223 >>> ######################## >>> >>> >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> = >> = >> ===================================================================== >> Attention: The information contained in this message and/or >> attachments >> from AgResearch Limited is intended only for the persons or entities >> to which it is addressed and may contain confidential and/or >> privileged >> material. Any review, retransmission, dissemination or other use >> of, or >> taking of any action in reliance upon, this information by persons or >> entities other than the intended recipients is prohibited by >> AgResearch >> Limited. If you have received this message in error, please notify >> the >> sender immediately. >> = >> = >> ===================================================================== >> > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From biopython at maubp.freeserve.co.uk Fri Aug 7 05:19:14 2009 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 7 Aug 2009 10:19:14 +0100 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> Message-ID: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields wrote: > Michael, > > Are you using ClustalW 2? ?I'm not sure but I don't think the wrapper has > been updated for the latest version (I think parsing still works, though). > > chris That shouldn't matter, according to Des Higgins ClustalW 2 is intended to be completely compatible with ClustalW 1.83, including the command line options. They will be adding new stuff in ClustalW 3. The only think to worry about with ClustalW 2 is parsing the output, as the header line of the alignments has changed very slightly. I can tell you from personal experience that the Biopython command line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for example, and would expect the same to be true for BioPerl. Peter From paola.bisignano at gmail.com Fri Aug 7 08:11:58 2009 From: paola.bisignano at gmail.com (Paola Bisignano via Scour) Date: Fri, 7 Aug 2009 05:11:58 -0700 Subject: [Bioperl-l] Scour Friend Invite Message-ID: <4a7c1a0e5b82d@gmail.com> Hey, Check out: http://scour.com/invite/paola82/ I'm using a new search engine called Scour.com. It shows Google/Yahoo/MSN results and user comments all on one page. Best of all we get rewarded for using it by collecting points with every search, comment and vote. The points are redeemable for Visa gift cards. Join through my invite link so we can be friends and search socially! I know you'll like it, - Paola Bisignano This message was sent to you as a friend referral to join scour.com, please feel free to review our http://scour.com/privacy page and our http://scour.com/communityguidelines/antispam page. If you prefer not to receive invitations from ANY scour members, please click here - http://www.scour.com/unsub/e/YmlvcGVybC1sQGxpc3RzLm9wZW4tYmlvLm9yZw== Write to us at: Scour, Inc., 15303 Ventura Blvd. Suite 220, Sherman Oaks, CA 91403, USA. campaignid: scour200908070001 Scour.com From hlapp at gmx.net Fri Aug 7 09:21:51 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Fri, 7 Aug 2009 09:21:51 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4a7c1a0e5b82d@gmail.com> References: <4a7c1a0e5b82d@gmail.com> Message-ID: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Just FYI, I am addressing this offline. Note to everyone: we don't tolerate this and it will get you removed from the list immediately (and banned for the second offense). This is a large list. You better spend the time and be very careful who you send this kind of stuff to before you waste everyone else's. -hilmar From stefan.kirov at bms.com Fri Aug 7 10:25:52 2009 From: stefan.kirov at bms.com (Stefan Kirov) Date: Fri, 07 Aug 2009 10:25:52 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> References: <4a7c1a0e5b82d@gmail.com> <8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> Message-ID: <4A7C3970.10501@bms.com> Hilmar Lapp wrote: > Just FYI, I am addressing this offline. Note to everyone: we don't > tolerate this and it will get you removed from the list immediately > (and banned for the second offense). This is a large list. You better > spend the time and be very careful who you send this kind of stuff to > before you waste everyone else's. > > -hilmar > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > It is quite possible this guy has no idea scour is spamming people on his behalf. It seems to me there should be spam-filter trained to take care of these guys. As a reference: http://forums.digitalpoint.com/showthread.php?t=955786 http://markmail.org/message/fzlutwd3mkforbsu -------------- next part -------------- A non-text attachment was scrubbed... Name: stefan_kirov.vcf Type: text/x-vcard Size: 207 bytes Desc: not available URL: From jdalzell03 at qub.ac.uk Mon Aug 3 19:18:24 2009 From: jdalzell03 at qub.ac.uk (Johnathan Dalzell) Date: Tue, 4 Aug 2009 00:18:24 +0100 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 Message-ID: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl 5.10 and the activePerl equivalent. I'm wrking through vista, and ovver multiple times, this is the furthest I can get through installation.... Install [a]ll Bioperl scripts, [n]one, or choose groups [i]nteractively? [a] a - will install all scripts Do you want to run tests that require connection to servers across the internet (likely to cause some failures)? y/n [n] y - will run internet-requiring tests Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/lib/Data/Dumper.pm lin e 190, line 9. Creating new 'Build' script for 'BioPerl' version '1.006000' ---- Unsatisfied dependencies detected during ---- ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- SOAP::Lite [requires] GraphViz [requires] Convert::Binary::C [requires] Algorithm::Munkres [requires] XML::Twig [requires] DB_File [requires] Set::Scalar [requires] XML::Parser::PerlSAX [requires] XML::Writer [requires] XML::SAX::Writer [requires] Clone [requires] XML::DOM::XPath [requires] PostScript::TextBlock [requires] Running Build test Delayed until after prerequisites Running Build install Delayed until after prerequisites Running install for module 'SOAP::Lite' Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP-Lite-0.710.08.tar.gz ok CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz We are about to install SOAP::Lite and for your convenience will provide you with list of modules and prerequisites, so you'll be able to choose only modules you need for your configuration. XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by default. Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. Press to see the detailed list. Feature Prerequisites Install? ----------------------------- ---------------------------- -------- Core Package [*] Scalar::Util always [*] Test::More [*] URI [*] MIME::Base64 [*] version [*] XML::Parser (v2.23) Client HTTP support [*] LWP::UserAgent always Client HTTPS support [ ] Crypt::SSLeay [ no ] Client SMTP/sendmail support [ ] MIME::Lite [ no ] Client FTP support [*] IO::File [ yes ] [*] Net::FTP Standalone HTTP server [*] HTTP::Daemon [ yes ] Apache/mod_perl server [ ] Apache [ no ] FastCGI server [ ] FCGI [ no ] POP3 server [ ] MIME::Parser [ no ] [*] Net::POP3 IO server [*] IO::File [ yes ] MQ transport support [ ] MQSeries [ no ] JABBER transport support [ ] Net::Jabber [ no ] MIME messages [ ] MIME::Parser [ no ] DIME messages [*] IO::Scalar (v2.105) [ no ] [ ] DIME::Tools (v0.03) [ ] Data::UUID (v0.11) SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] Compression support for HTTP [*] Compress::Zlib [ yes ] MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] --- An asterix '[*]' indicates if the module is currently installed. Do you want to proceed with this configuration? [yes] yes Checking if your kit is complete... Looks good Writing Makefile for SOAP::Lite cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport\TCP.pm cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport\POP3.pm cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema19 99.pm cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite\Deserializer\XMLSchema20 01.pm cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport\MQ.pm cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport\FTP.pm cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP\Transport\JABBER.pm cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_2.pm cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport\IO.pm cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite\Deserializer\XMLSchem aSOAP1_1.pm cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP\Transport\LOCAL.pm cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP\Transport\MAILTO.pm cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/SOAPsh.pl blib\script\S OAPsh.pl pl2bat.bat blib\script\SOAPsh.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/stubmaker.pl blib\scrip t\stubmaker.pl pl2bat.bat blib\script\stubmaker.pl C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/XMLRPCsh.pl blib\script \XMLRPCsh.pl pl2bat.bat blib\script\XMLRPCsh.pl MKUTTER/SOAP-Lite-0.710.08.tar.gz C:\strawberry\c\bin\dmake.EXE -- OK Running make test C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib\lib' , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/013-array-deserializati on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03-server.t t/04-attach. t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08-schema.t t/096_characters.t t /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t t/IO/SessionSet.t t/SO AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/Deserializer/XMLSchema199 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t t /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/SOAP/Transport/FTP.t t/S OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t t/SOAP/Transport/MAILT O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/HTTP/CGI.t t/XML/Parser /Lite.t t/XMLRPC/Lite.t t/01-core.t .................................. ok t/010-serializer.t ........................... ok t/012-cloneable.t ............................ ok t/013-array-deserialization.t ................ ok t/014_UNIVERSAL_use.t ........................ ok t/015_UNIVERSAL_can.t ........................ ok t/02-payload.t ............................... ok t/03-server.t ................................ ok t/04-attach.t ................................ skipped: Could not find MIME::Parser - is M IME::Tools installed? Aborting. t/05-customxml.t ............................. ok t/06-modules.t ............................... ok t/07-xmlrpc_payload.t ........................ ok t/08-schema.t ................................ ok t/096_characters.t ........................... skipped: (no reason given) t/097_kwalitee.t ............................. skipped: (no reason given) t/098_pod.t .................................. skipped: (no reason given) t/099_pod_coverage.t ......................... skipped: (no reason given) t/IO/SessionData.t ........................... ok t/IO/SessionSet.t ............................ ok t/SOAP/Data.t ................................ ok t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok t/SOAP/Lite/Packager.t ....................... ok t/SOAP/Schema/WSDL.t ......................... ok t/SOAP/Serializer.t .......................... 1/12 Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Lite .pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. Use of uninitialized value $values[0] in join or string at C:\strawberry\cpan\build\SOAP-L ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. t/SOAP/Serializer.t .......................... ok t/SOAP/Transport/FTP.t ....................... 1/7 Use of uninitialized value in split at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 55. substr outside of string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SO AP/Transport/FTP.pm line 56. Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/perl/lib/IO/Socket/INET. pm line 117. Use of uninitialized value $server in concatenation (.) or string at C:\strawberry\cpan\bu ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. t/SOAP/Transport/FTP.t ....................... ok t/SOAP/Transport/HTTP.t ...................... ok t/SOAP/Transport/HTTP/CGI.t .................. everytime I get to the CGI.t at the end here the installation won't move! Any suggestions would be greatly appreciated, I've been trying to force it through, literally for 5 hours now.... cheers, jonny From ghiban at cshl.edu Thu Aug 6 12:04:38 2009 From: ghiban at cshl.edu (Ghiban, Cornel) Date: Thu, 6 Aug 2009 12:04:38 -0400 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> Message-ID: <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Hi, It doesn't matter what sequence we use. As Chris Fields's showed in his test, not having ">" as the 1st character on the first line is the problem. We always assumed the sequence is in FASTA format and this seems to be wrong. I think, the solution to our problem is to check whether the ">" symbol is present or not. If not present then it will be added. Thank you, Cornel Ghiban -----Original Message----- From: Hilmar Lapp [mailto:hlapp at gmx.net] Sent: Thursday, August 06, 2009 11:18 AM To: Hilgert, Uwe Cc: Chris Fields; BioPerl List; Ghiban, Cornel Subject: Re: [Bioperl-l] Bio::SeqIO issue Uwe - could you send an actual data file (as an attachment) that reproduces the problem, or is that not possible? -hilmar On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > I'm not sure what version we have. Cornel may have installed it a > while ago from CVS: > > Module id = Bio::Root::Build > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > INST_VERSION 1.006900 > cpan> m Bio::Root::Version > Module id = Bio::Root::Version > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > INST_VERSION 1.006900 > cpan> m Bio::SeqIO > Module id = Bio::SeqIO > CPAN_USERID CJFIELDS (Christopher Fields ) > CPAN_VERSION 1.006000 > INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > INST_VERSION undef > > Cornel still has the checked-out "bioperl-live" directory and the last > changes are from March this year. > > As per why he used "Fasta" instead of 'fasta" as the format parameter > in Bio::SeqIO, it's because that what it says in the modules manual. > He now tried 'fasta' instead and see no changes in behavior. Omitting > the format parameter altogether, fasta-formatted sequence continues to > be treated correctly, the first line being removed. However, raw > sequence is being treated differently in that the first line is not > being removed any more. Instead, the program returns the first line > only. Which, in the example I am going to forward in my next message, > will return 60 amino acids out of raw sequence of 300 aa. Can't win > with raw sequence... > > > The files may be created on different platforms, we didn't notice any > difference between using files created on Windows or Linux. > > Thanks > Uwe > > > > > -----Original Message----- > From: Hilmar Lapp [mailto:hlapp at gmx.net] > Sent: Wednesday, August 05, 2009 6:54 PM > To: Chris Fields > Cc: Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > I don't think that can be the problem. If anything, providing the > format ought to be better in terms of result than not providing it? > > Uwe - I'd like you to go back to Chris' initial questions that you > haven't answered yet: "What version of bioperl are you using, OS, etc? > What does your data look like?" I'd add to that, can you show us your > full script, or a smaller code snippet that reproduces the problem. > > I suspect that either something in your script is swallowing the line, > or that the line endings in your data file are from a different OS > than the one you're running the script on. (Or that you are running a > very old version of BioPerl, which is entirely possible if you > installed through CPAN.) > > -hilmar > > On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> Uwe, >> >> Please keep replies on the list. >> >> It's very possible that's the issue; IIRC the fasta parser pulls out >> the full sequence in chunks (based on local $/ = "\n>") and splits >> the header off as the first line in that chunk. You could probably >> try leaving the format out and letting SeqIO guess it, or passing the >> file into Bio::Tools::GuessSeqFormat directly, but it's probably >> better to go through the files and add a file extension that >> corresponds to the format. >> >> chris >> >> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: >> >>> Thanks, Chris. The files have no extension, but we indicate what >>> format to use, like in the manual: >>> >>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); >>> >>> I wonder now whether this could exactly cause the problem: as we are >>> telling that input files are in fasta format they are being treated >>> as such (=remove first line) - regardless of whether they really are >>> fasta? >>> >>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe >>> Hilgert, Ph.D. >>> Dolan DNA Learning Center >>> Cold Spring Harbor Laboratory >>> >>> C: (516) 857-1693 >>> V: (516) 367-5185 >>> E: hilgert at cshl.edu >>> F: (516) 367-5182 >>> W: http://www.dnalc.org >>> >>> -----Original Message----- >>> From: Chris Fields [mailto:cjfields at illinois.edu] >>> Sent: Wednesday, August 05, 2009 5:04 PM >>> To: Hilgert, Uwe >>> Cc: bioperl-l at lists.open-bio.org >>> Subject: Re: [Bioperl-l] Bio::SeqIO issue >>> >>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: >>> >>>> Is my impression correct that Bio::SeqIO just assumes that >>>> sequences are being submitted in FASTA format? >>> >>> No. See: >>> >>> http://www.bioperl.org/wiki/HOWTO:SeqIO >>> >>> SeqIO tries to guess at the format using the file extension, and if >>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's >>> possible that the extension is causing the problem, or that >>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced >>> to guessing). In any case, it's always advisable to explicitly >>> indicate the format when possible. >>> >>> Relevant lines: >>> >>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ >>> i; >>> ... >>> return 'raw' if /\.(txt)$/i; >>> >>>> In our experience, implementing >>>> Bio::SeqIO led to the first line of files being cut off, regardless >>>> of whether the files were indeed fasta files or files that only >>>> contained sequence. >>> >>> Files that only contain sequence are 'raw'. Ones in FASTA are >>> 'fasta'. >>> >>>> Which, in the latter, led to sequence submissions that had the >>>> first line of nucleotides removed. Has anyone tried to write a fix >>>> for this? >>> >>> This sounds like a bug, but we have very little to go on beyond your >>> description. What version of bioperl are you using, OS, etc? What >>> does your data look like? File extension? >>> >>> chris >>> >>>> Thanks, >>>> >>>> Uwe >>>> >>>> >>>> >>>> >>>> >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - >>>> >>>> Uwe Hilgert, Ph.D. >>>> >>>> Dolan DNA Learning Center >>>> >>>> Cold Spring Harbor Laboratory >>>> >>>> >>>> >>>> V: (516) 367-5185 >>>> >>>> E: hilgert at cshl.edu >>>> >>>> F: (516) 367-5182 >>>> >>>> W: http://www.dnalc.org >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 8 08:38:46 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 8 Aug 2009 08:38:46 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <4A7C3970.10501@bms.com> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> Message-ID: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Thanks Stefan--this makes a lot more sense to me than supposing a priori that a previous legitimate user of this list is spamming bioperl-l intentionally. I would prefer to initially give the benefit of the doubt to the intelligence of the users, rather than scare people off who are likely to be already mortified that their emails have been commandeered like this. I would definitely support an spam filter that works. MAJ ----- Original Message ----- From: "Stefan Kirov" To: "Hilmar Lapp" Cc: "BioPerl List" Sent: Friday, August 07, 2009 10:25 AM Subject: Re: [Bioperl-l] Scour Friend Invite > Hilmar Lapp wrote: >> Just FYI, I am addressing this offline. Note to everyone: we don't >> tolerate this and it will get you removed from the list immediately >> (and banned for the second offense). This is a large list. You better >> spend the time and be very careful who you send this kind of stuff to >> before you waste everyone else's. >> >> -hilmar >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > It is quite possible this guy has no idea scour is spamming people on > his behalf. It seems to me there should be spam-filter trained to take > care of these guys. > As a reference: > http://forums.digitalpoint.com/showthread.php?t=955786 > http://markmail.org/message/fzlutwd3mkforbsu > -------------------------------------------------------------------------------- > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 10:18:59 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 10:18:59 -0400 Subject: [Bioperl-l] SeqIO documentation Message-ID: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Chris, Since we've been discussing formats I just wanted to mention that I've changed this documentation from SeqIO.pm: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then Fasta format is assumed. To: If no format is specified and a filename is given then the module will attempt to deduce the format from the filename suffix. If there is no suffix that Bioperl understands then it will attempt to guess the format based on file content. If this is unsuccessful then SeqIO will throw a fatal error. The code is clear, if SeqIO can't figure out what the format is then it dies, "fasta" is not the default format. Brian O. From cjfields at illinois.edu Sat Aug 8 12:23:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:23:44 -0500 Subject: [Bioperl-l] SeqIO documentation In-Reply-To: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> References: <7E3EFF1F-DF0C-490F-AF8E-F23F81A0E0D5@verizon.net> Message-ID: Brian, That fits current behavior, so yes that makes sense. chris On Aug 8, 2009, at 9:18 AM, Brian Osborne wrote: > Chris, > > Since we've been discussing formats I just wanted to mention that > I've changed this documentation from SeqIO.pm: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then Fasta > format is assumed. > > To: > > If no format is specified and a filename is given then the module > will attempt to deduce the format from the filename suffix. If there > is no suffix that Bioperl understands then it will attempt to guess > the format based on file content. If this is unsuccessful then SeqIO > will throw a fatal error. > > The code is clear, if SeqIO can't figure out what the format is then > it dies, "fasta" is not the default format. > > > Brian O. > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:24:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:24:48 -0500 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife> Message-ID: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite > > >> Hilmar Lapp wrote: >>> Just FYI, I am addressing this offline. Note to everyone: we don't >>> tolerate this and it will get you removed from the list immediately >>> (and banned for the second offense). This is a large list. You >>> better >>> spend the time and be very careful who you send this kind of stuff >>> to >>> before you waste everyone else's. >>> >>> -hilmar >>> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >>> >> It is quite possible this guy has no idea scour is spamming people on >> his behalf. It seems to me there should be spam-filter trained to >> take >> care of these guys. >> As a reference: >> http://forums.digitalpoint.com/showthread.php?t=955786 >> http://markmail.org/message/fzlutwd3mkforbsu >> > > > -------------------------------------------------------------------------------- > > >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 8 12:26:55 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:55 -0500 Subject: [Bioperl-l] Trouble with Clustalw In-Reply-To: <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> References: <0F868099-4B55-4B01-B409-5FE4CFB1F490@gmail.com> <42F6668E-0C00-493A-934A-EC7501160A87@illinois.edu> <320fb6e00908070219r575dc01djadb346e0afb0194d@mail.gmail.com> Message-ID: <0A43205F-828F-4CC9-ADC3-EBCE92690765@illinois.edu> On Aug 7, 2009, at 4:19 AM, Peter wrote: > On Thu, Aug 6, 2009 at 9:25 PM, Chris Fields > wrote: >> Michael, >> >> Are you using ClustalW 2? I'm not sure but I don't think the >> wrapper has >> been updated for the latest version (I think parsing still works, >> though). >> >> chris > > That shouldn't matter, according to Des Higgins ClustalW 2 is intended > to be completely compatible with ClustalW 1.83, including the command > line options. They will be adding new stuff in ClustalW 3. The only > think to worry about with ClustalW 2 is parsing the output, as the > header line of the alignments has changed very slightly. > > I can tell you from personal experience that the Biopython command > line wrappers for ClustalW work fine on both 1.83 and 2.0.10 for > example, and would expect the same to be true for BioPerl. > > Peter I would think so as well, but I encountered some issues on my OS using ClustalW 2 with the last release: http://bugzilla.open-bio.org/show_bug.cgi?id=2728 I think it's something small, like something hard-coded in (version maybe) that's causing the problem, just didn't have time to check. chris From cjfields at illinois.edu Sat Aug 8 12:26:38 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 8 Aug 2009 11:26:38 -0500 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <0963ED84-359B-465B-9BA2-956A0AB23587@illinois.edu> Have you tried installing SOAP::Lite directly? That seems to be the hanging point. The funny thing is this is somehow assigning everything as a requirement (SOAP::Lite is a 'recommends'). Worth investigating, but I don't have access to a Windows box (either for XP, Vista, or Win7). Hopefully we'll get a PPM up soon; it's in the roadmap for 1.6.1. In the meantime, (as a strictly temporary measure) have you tried setting PERL5LIB to point to a local copy of bioperl-1.6? chris On Aug 3, 2009, at 6:18 PM, Johnathan Dalzell wrote: > Hi, I've been trying to install Bioperl 1.6.0 onto strawberry perl > 5.10 and the activePerl equivalent. I'm wrking through vista, and > ovver multiple times, this is the furthest I can get through > installation.... > > > Install [a]ll Bioperl scripts, [n]one, or choose groups > [i]nteractively? [a] a > - will install all scripts > Do you want to run tests that require connection to servers across > the internet > (likely to cause some failures)? y/n [n] y > - will run internet-requiring tests > Encountered CODE ref, using dummy placeholder at C:/strawberry/perl/ > lib/Data/Dumper.pm lin > e 190, line 9. > Creating new 'Build' script for 'BioPerl' version '1.006000' > ---- Unsatisfied dependencies detected during ---- > ---- CJFIELDS/BioPerl-1.6.0.tar.gz ---- > SOAP::Lite [requires] > GraphViz [requires] > Convert::Binary::C [requires] > Algorithm::Munkres [requires] > XML::Twig [requires] > DB_File [requires] > Set::Scalar [requires] > XML::Parser::PerlSAX [requires] > XML::Writer [requires] > XML::SAX::Writer [requires] > Clone [requires] > XML::DOM::XPath [requires] > PostScript::TextBlock [requires] > Running Build test > Delayed until after prerequisites > Running Build install > Delayed until after prerequisites > Running install for module 'SOAP::Lite' > Running make for M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > Checksum for C:\strawberry\cpan\sources\authors\id\M\MK\MKUTTER\SOAP- > Lite-0.710.08.tar.gz > ok > CPAN.pm: Going to build M/MK/MKUTTER/SOAP-Lite-0.710.08.tar.gz > We are about to install SOAP::Lite and for your convenience will > provide > you with list of modules and prerequisites, so you'll be able to > choose > only modules you need for your configuration. > XMLRPC::Lite, UDDI::Lite, and XML::Parser::Lite are included by > default. > Installed transports can be used for both SOAP::Lite and XMLRPC::Lite. > Press to see the detailed list. > Feature Prerequisites Install? > ----------------------------- ---------------------------- -------- > Core Package [*] Scalar::Util always > [*] Test::More > [*] URI > [*] MIME::Base64 > [*] version > [*] XML::Parser (v2.23) > Client HTTP support [*] LWP::UserAgent always > Client HTTPS support [ ] Crypt::SSLeay [ no ] > Client SMTP/sendmail support [ ] MIME::Lite [ no ] > Client FTP support [*] IO::File [ yes ] > [*] Net::FTP > Standalone HTTP server [*] HTTP::Daemon [ yes ] > Apache/mod_perl server [ ] Apache [ no ] > FastCGI server [ ] FCGI [ no ] > POP3 server [ ] MIME::Parser [ no ] > [*] Net::POP3 > IO server [*] IO::File [ yes ] > MQ transport support [ ] MQSeries [ no ] > JABBER transport support [ ] Net::Jabber [ no ] > MIME messages [ ] MIME::Parser [ no ] > DIME messages [*] IO::Scalar (v2.105) [ no ] > [ ] DIME::Tools (v0.03) > [ ] Data::UUID (v0.11) > SSL Support for TCP Transport [ ] IO::Socket::SSL [ no ] > Compression support for HTTP [*] Compress::Zlib [ yes ] > MIME interoperability w/ Axis [ ] MIME::Parser (v6.106) [ no ] > --- An asterix '[*]' indicates if the module is currently installed. > Do you want to proceed with this configuration? [yes] yes > Checking if your kit is complete... > Looks good > Writing Makefile for SOAP::Lite > cp lib/SOAP/Client.pod blib\lib\SOAP\Client.pod > cp lib/UDDI/Lite.pm blib\lib\UDDI\Lite.pm > cp lib/SOAP/Packager.pm blib\lib\SOAP\Packager.pm > cp lib/XML/Parser/Lite.pm blib\lib\XML\Parser\Lite.pm > cp lib/SOAP/Transport/LOOPBACK.pm blib\lib\SOAP\Transport\LOOPBACK.pm > cp lib/XMLRPC/Transport/TCP.pm blib\lib\XMLRPC\Transport\TCP.pm > cp lib/SOAP/Transport/JABBER.pm blib\lib\SOAP\Transport\JABBER.pm > cp lib/OldDocs/SOAP/Transport/TCP.pm blib\lib\OldDocs\SOAP\Transport > \TCP.pm > cp lib/SOAP/Transport/MAILTO.pm blib\lib\SOAP\Transport\MAILTO.pm > cp lib/OldDocs/SOAP/Transport/POP3.pm blib\lib\OldDocs\SOAP\Transport > \POP3.pm > cp lib/Apache/SOAP.pm blib\lib\Apache\SOAP.pm > cp lib/SOAP/Schema.pod blib\lib\SOAP\Schema.pod > cp lib/SOAP/Test.pm blib\lib\SOAP\Test.pm > cp lib/Apache/XMLRPC/Lite.pm blib\lib\Apache\XMLRPC\Lite.pm > cp lib/XMLRPC/Transport/HTTP.pm blib\lib\XMLRPC\Transport\HTTP.pm > cp lib/SOAP/Transport/MQ.pm blib\lib\SOAP\Transport\MQ.pm > cp lib/SOAP/Transport/POP3.pm blib\lib\SOAP\Transport\POP3.pm > cp lib/SOAP/Deserializer.pod blib\lib\SOAP\Deserializer.pod > cp lib/SOAP/Data.pod blib\lib\SOAP\Data.pod > cp lib/SOAP/Server.pod blib\lib\SOAP\Server.pod > cp lib/SOAP/Transport/IO.pm blib\lib\SOAP\Transport\IO.pm > cp lib/SOAP/Lite/Utils.pm blib\lib\SOAP\Lite\Utils.pm > cp lib/SOAP/Header.pod blib\lib\SOAP\Header.pod > cp lib/SOAP/Constants.pm blib\lib\SOAP\Constants.pm > cp lib/SOAP/Lite/Packager.pm blib\lib\SOAP\Lite\Packager.pm > cp lib/SOAP/SOM.pod blib\lib\SOAP\SOM.pod > cp lib/XMLRPC/Transport/POP3.pm blib\lib\XMLRPC\Transport\POP3.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema1999.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema19 > 99.pm > cp lib/XMLRPC/Lite.pm blib\lib\XMLRPC\Lite.pm > cp lib/OldDocs/SOAP/Lite.pm blib\lib\OldDocs\SOAP\Lite.pm > cp lib/SOAP/Transport.pod blib\lib\SOAP\Transport.pod > cp lib/OldDocs/SOAP/Transport/HTTP.pm blib\lib\OldDocs\SOAP\Transport > \HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchema2001.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchema20 > 01.pm > cp lib/SOAP/Trace.pod blib\lib\SOAP\Trace.pod > cp lib/IO/SessionData.pm blib\lib\IO\SessionData.pm > cp lib/XMLRPC/Test.pm blib\lib\XMLRPC\Test.pm > cp lib/OldDocs/SOAP/Transport/MQ.pm blib\lib\OldDocs\SOAP\Transport > \MQ.pm > cp lib/OldDocs/SOAP/Transport/FTP.pm blib\lib\OldDocs\SOAP\Transport > \FTP.pm > cp lib/OldDocs/SOAP/Transport/JABBER.pm blib\lib\OldDocs\SOAP > \Transport\JABBER.pm > cp lib/SOAP/Transport/TCP.pm blib\lib\SOAP\Transport\TCP.pm > cp lib/SOAP/Utils.pod blib\lib\SOAP\Utils.pod > cp lib/IO/SessionSet.pm blib\lib\IO\SessionSet.pm > cp lib/SOAP/Transport/HTTP.pm blib\lib\SOAP\Transport\HTTP.pm > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_2.pm > cp lib/OldDocs/SOAP/Transport/IO.pm blib\lib\OldDocs\SOAP\Transport > \IO.pm > cp lib/SOAP/Serializer.pod blib\lib\SOAP\Serializer.pod > cp lib/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.pm blib\lib\SOAP\Lite > \Deserializer\XMLSchem > aSOAP1_1.pm > cp lib/OldDocs/SOAP/Transport/LOCAL.pm blib\lib\OldDocs\SOAP > \Transport\LOCAL.pm > cp lib/SOAP/Transport/LOCAL.pm blib\lib\SOAP\Transport\LOCAL.pm > cp lib/SOAP/Fault.pod blib\lib\SOAP\Fault.pod > cp lib/SOAP/Lite.pm blib\lib\SOAP\Lite.pm > cp lib/OldDocs/SOAP/Transport/MAILTO.pm blib\lib\OldDocs\SOAP > \Transport\MAILTO.pm > cp lib/SOAP/Transport/FTP.pm blib\lib\SOAP\Transport\FTP.pm > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > SOAPsh.pl blib\script\S > OAPsh.pl > pl2bat.bat blib\script\SOAPsh.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > stubmaker.pl blib\scrip > t\stubmaker.pl > pl2bat.bat blib\script\stubmaker.pl > C:\strawberry\perl\bin\perl.exe -MExtUtils::Command -e "cp" -- bin/ > XMLRPCsh.pl blib\script > \XMLRPCsh.pl > pl2bat.bat blib\script\XMLRPCsh.pl > MKUTTER/SOAP-Lite-0.710.08.tar.gz > C:\strawberry\c\bin\dmake.EXE -- OK > Running make test > C:\strawberry\perl\bin\perl.exe "-MExtUtils::Command::MM" "-e" > "test_harness(0, 'blib\lib' > , 'blib\arch')" t/01-core.t t/010-serializer.t t/012-cloneable.t t/ > 013-array-deserializati > on.t t/014_UNIVERSAL_use.t t/015_UNIVERSAL_can.t t/02-payload.t t/03- > server.t t/04-attach. > t t/05-customxml.t t/06-modules.t t/07-xmlrpc_payload.t t/08- > schema.t t/096_characters.t t > /097_kwalitee.t t/098_pod.t t/099_pod_coverage.t t/IO/SessionData.t > t/IO/SessionSet.t t/SO > AP/Data.t t/SOAP/Serializer.t t/SOAP/Lite/Packager.t t/SOAP/Lite/ > Deserializer/XMLSchema199 > 9.t t/SOAP/Lite/Deserializer/XMLSchema2001.t t/SOAP/Lite/ > Deserializer/XMLSchemaSOAP1_1.t t > /SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t t/SOAP/Schema/WSDL.t t/ > SOAP/Transport/FTP.t t/S > OAP/Transport/HTTP.t t/SOAP/Transport/IO.t t/SOAP/Transport/LOCAL.t > t/SOAP/Transport/MAILT > O.t t/SOAP/Transport/MQ.t t/SOAP/Transport/POP3.t t/SOAP/Transport/ > HTTP/CGI.t t/XML/Parser > /Lite.t t/XMLRPC/Lite.t > t/01-core.t .................................. ok > t/010-serializer.t ........................... ok > t/012-cloneable.t ............................ ok > t/013-array-deserialization.t ................ ok > t/014_UNIVERSAL_use.t ........................ ok > t/015_UNIVERSAL_can.t ........................ ok > t/02-payload.t ............................... ok > t/03-server.t ................................ ok > t/04-attach.t ................................ skipped: Could not > find MIME::Parser - is M > IME::Tools installed? Aborting. > t/05-customxml.t ............................. ok > t/06-modules.t ............................... ok > t/07-xmlrpc_payload.t ........................ ok > t/08-schema.t ................................ ok > t/096_characters.t ........................... skipped: (no reason > given) > t/097_kwalitee.t ............................. skipped: (no reason > given) > t/098_pod.t .................................. skipped: (no reason > given) > t/099_pod_coverage.t ......................... skipped: (no reason > given) > t/IO/SessionData.t ........................... ok > t/IO/SessionSet.t ............................ ok > t/SOAP/Data.t ................................ ok > t/SOAP/Lite/Deserializer/XMLSchema1999.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchema2001.t ..... ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_1.t .. ok > t/SOAP/Lite/Deserializer/XMLSchemaSOAP1_2.t .. ok > t/SOAP/Lite/Packager.t ....................... ok > t/SOAP/Schema/WSDL.t ......................... ok > t/SOAP/Serializer.t .......................... 1/12 Use of > uninitialized value $values[0] > in join or string at C:\strawberry\cpan\build\SOAP-Lite-0.710.08- > wfOzhM\blib\lib/SOAP/Lite > .pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > Use of uninitialized value $values[0] in join or string at C: > \strawberry\cpan\build\SOAP-L > ite-0.710.08-wfOzhM\blib\lib/SOAP/Lite.pm line 1376. > t/SOAP/Serializer.t .......................... ok > t/SOAP/Transport/FTP.t ....................... 1/7 Use of > uninitialized value in split at > C:\strawberry\cpan\build\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/ > Transport/FTP.pm line 55. > substr outside of string at C:\strawberry\cpan\build\SOAP- > Lite-0.710.08-wfOzhM\blib\lib/SO > AP/Transport/FTP.pm line 56. > Use of uninitialized value $_[1] in join or string at C:/STRAWB~1/ > perl/lib/IO/Socket/INET. > pm line 117. > Use of uninitialized value $server in concatenation (.) or string at > C:\strawberry\cpan\bu > ild\SOAP-Lite-0.710.08-wfOzhM\blib\lib/SOAP/Transport/FTP.pm line 60. > t/SOAP/Transport/FTP.t ....................... ok > t/SOAP/Transport/HTTP.t ...................... ok > t/SOAP/Transport/HTTP/CGI.t .................. > > everytime I get to the CGI.t at the end here the installation won't > move! Any suggestions would be greatly appreciated, I've been > trying to force it through, literally for 5 hours now.... > > cheers, > jonny > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bosborne11 at verizon.net Sat Aug 8 12:42:12 2009 From: bosborne11 at verizon.net (Brian Osborne) Date: Sat, 08 Aug 2009 12:42:12 -0400 Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <979637B9-F2EC-47A0-9283-440AA2558481@verizon.net> Jonathan, It looks like you're not the only one having problems with SOAP::Lite on Windows. For a possible workaround: http://objectmix.com/perl/638075-how-install-soap-lite-windows.html Brian O. On Aug 3, 2009, at 7:18 PM, Johnathan Dalzell wrote: > SOAP/Transport/HTTP/CGI From stefan.kirov at bms.com Sat Aug 8 16:45:32 2009 From: stefan.kirov at bms.com (Kirov, Stefan) Date: Sat, 8 Aug 2009 16:45:32 -0400 Subject: [Bioperl-l] Scour Friend Invite In-Reply-To: <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> References: <4a7c1a0e5b82d@gmail.com><8596CFE6-DFDA-441D-AC23-FA1322E84F7A@gmx.net> <4A7C3970.10501@bms.com> <5E86C62B77684000A9AB1758BBCBA5F8@NewLife>, <0322EF1B-260D-4210-91EC-492D4E16D5AF@illinois.edu> Message-ID: There is indeed, actually my mail with the same header was held for a while. In any case I think these pay-to-search/invite-colleagues/et spam-whole-address-book sites should be banned if they are not formally not spam, since the user is at least partially aware of the effect. I am not sure if this is a good solution, I am just frustrated, because these companies are quite unethical. Maybe not as unethical as others (few come to my mind, but will not name them :-)), but still... On the other hand they have not been a real problem before. As long as this is not a frequent thing I guess the filter is doing a great job. Stefan ________________________________________ From: Chris Fields [cjfields at illinois.edu] Sent: Saturday, August 08, 2009 12:24 PM To: Mark A. Jensen Cc: Kirov, Stefan; Hilmar Lapp; BioPerl List Subject: Re: [Bioperl-l] Scour Friend Invite I believe there are spam filters in place (Jason and Chris D. could probably indicate more on this). chris On Aug 8, 2009, at 7:38 AM, Mark A. Jensen wrote: > Thanks Stefan--this makes a lot more sense to me than supposing > a priori that a previous legitimate user of this list is spamming > bioperl-l > intentionally. I would prefer to initially give the benefit of the > doubt > to the intelligence of the users, rather than scare people off who are > likely to be already mortified that their emails have been > commandeered > like this. I would definitely support an spam filter that works. > MAJ > ----- Original Message ----- From: "Stefan Kirov" > > To: "Hilmar Lapp" > Cc: "BioPerl List" > Sent: Friday, August 07, 2009 10:25 AM > Subject: Re: [Bioperl-l] Scour Friend Invite This message (including any attachments) may contain confidential, proprietary, privileged and/or private information. The information is intended to be for the use of the individual or entity designated above. If you are not the intended recipient of this message, please notify the sender immediately, and delete the message and any attachments. Any disclosure, reproduction, distribution or other use of this message or any attachments by an individual or entity other than the intended recipient is prohibited. From j_martin at lbl.gov Sat Aug 8 22:41:53 2009 From: j_martin at lbl.gov (Joel Martin) Date: Sat, 8 Aug 2009 19:41:53 -0700 Subject: [Bioperl-l] Bio::SeqIO issue In-Reply-To: References: <8294FAED-7DAA-4019-91DA-4536DF84A2F7@illinois.edu> <5E47BFF0-2253-42CE-895B-4E338CC400D8@illinois.edu> <25C707BB-FE3D-49E0-9AF7-2CC0D3A7124A@gmx.net> <00A087B7-784E-4FC6-9F9C-51EBFB6FC082@gmx.net> <640CE06B3C0A11429127B2550390A10B835B7C@mailbox09.cshl.edu> Message-ID: <20090809024152.GA26943@eniac.jgi-psf.org> Hello, It sounds like you want a layer to to figure out what they're giving your program before you open it, you could use Bio::Tools::GuessSeqFormat and spare your user the pain of knowledge. It seems reasonable that coddling happens only when requested. use IO::String; use Bio::SeqIO; use Bio::Tools::GuessSeqFormat; my @files = ( 'NC_000913.fasta', '.gb' ); for my $file ( @files ) { my ( $string, $strio, $out ); $strio = IO::String->new( $string ); $out = Bio::SeqIO->new ( -fh => $strio, -format => 'raw' ); my $guesser = new Bio::Tools::GuessSeqFormat( -file => $file ); my $in = Bio::SeqIO->new( -format => $guesser->guess , -file => $file ); while ( my $seq = $in->next_seq() ) { $out->write_seq( $seq ); print substr($string, 0, 30), "\n"; } } Joel On Thu, Aug 06, 2009 at 03:36:36PM -0400, Hilgert, Uwe wrote: > Hmmm, I fail to see how supplying raw sequence could be a called "bad" > input or a "problem". In our case, for example, not every user is a > bioinformatics expert and Cornel was suggesting to account for that > instead of trying to "train" the user to adhere to requirements that > have not much to do with what s/he tries to accomplish. I don't really > see data being modified, rather that the data format is being adopted to > the needs of the software; which I would argue should be something the > software is being able to take care of. > > Uwe > > > -----Original Message----- > From: Chris Fields [mailto:cjfields at illinois.edu] > Sent: Thursday, August 06, 2009 12:50 PM > To: Ghiban, Cornel > Cc: Hilmar Lapp; Hilgert, Uwe; BioPerl List > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > Cornel, > > I'm failing to see how adding '>' would solve the problem. > > This is a simple validation issue: should we throw an exception on bad > input (no '>'), or just argue GIGO based on user error (the assumption > that the SeqIO parser will read raw sequence correctly when set to > 'fasta' is wrong)? > > I think, in this circumstance, the former applies. It is easy to add, > and the use of an exception in this case is violently user-friendly, > e.g. it will stop cold and immediately point out the problem. > Otherwise data is (silently) being modified, which is always a bad > thing. > > chris > > On Aug 6, 2009, at 11:04 AM, Ghiban, Cornel wrote: > > > Hi, > > > > It doesn't matter what sequence we use. As Chris Fields's showed in > > his test, not having > > ">" as the 1st character on the first line is the problem. > > We always assumed the sequence is in FASTA format and this seems to > > be wrong. > > > > I think, the solution to our problem is to check whether the ">" > > symbol is present or not. > > If not present then it will be added. > > > > Thank you, > > Cornel Ghiban > > > > -----Original Message----- > > From: Hilmar Lapp [mailto:hlapp at gmx.net] > > Sent: Thursday, August 06, 2009 11:18 AM > > To: Hilgert, Uwe > > Cc: Chris Fields; BioPerl List; Ghiban, Cornel > > Subject: Re: [Bioperl-l] Bio::SeqIO issue > > > > Uwe - could you send an actual data file (as an attachment) that > > reproduces the problem, or is that not possible? > > > > -hilmar > > > > On Aug 6, 2009, at 11:01 AM, Hilgert, Uwe wrote: > > > >> I'm not sure what version we have. Cornel may have installed it a > >> while ago from CVS: > >> > >> Module id = Bio::Root::Build > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Build.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::Root::Version > >> Module id = Bio::Root::Version > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Version.pm > >> INST_VERSION 1.006900 > >> cpan> m Bio::SeqIO > >> Module id = Bio::SeqIO > >> CPAN_USERID CJFIELDS (Christopher Fields ) > >> CPAN_VERSION 1.006000 > >> INST_FILE /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO.pm > >> INST_VERSION undef > >> > >> Cornel still has the checked-out "bioperl-live" directory and the > >> last > >> changes are from March this year. > >> > >> As per why he used "Fasta" instead of 'fasta" as the format parameter > >> in Bio::SeqIO, it's because that what it says in the modules manual. > >> He now tried 'fasta' instead and see no changes in behavior. Omitting > >> the format parameter altogether, fasta-formatted sequence continues > >> to > >> be treated correctly, the first line being removed. However, raw > >> sequence is being treated differently in that the first line is not > >> being removed any more. Instead, the program returns the first line > >> only. Which, in the example I am going to forward in my next message, > >> will return 60 amino acids out of raw sequence of 300 aa. Can't win > >> with raw sequence... > >> > >> > >> The files may be created on different platforms, we didn't notice any > >> difference between using files created on Windows or Linux. > >> > >> Thanks > >> Uwe > >> > >> > >> > >> > >> -----Original Message----- > >> From: Hilmar Lapp [mailto:hlapp at gmx.net] > >> Sent: Wednesday, August 05, 2009 6:54 PM > >> To: Chris Fields > >> Cc: Hilgert, Uwe; BioPerl List > >> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >> > >> I don't think that can be the problem. If anything, providing the > >> format ought to be better in terms of result than not providing it? > >> > >> Uwe - I'd like you to go back to Chris' initial questions that you > >> haven't answered yet: "What version of bioperl are you using, OS, > >> etc? > >> What does your data look like?" I'd add to that, can you show us your > >> full script, or a smaller code snippet that reproduces the problem. > >> > >> I suspect that either something in your script is swallowing the > >> line, > >> or that the line endings in your data file are from a different OS > >> than the one you're running the script on. (Or that you are running a > >> very old version of BioPerl, which is entirely possible if you > >> installed through CPAN.) > >> > >> -hilmar > >> > >> On Aug 5, 2009, at 5:37 PM, Chris Fields wrote: > >> > >>> Uwe, > >>> > >>> Please keep replies on the list. > >>> > >>> It's very possible that's the issue; IIRC the fasta parser pulls out > >>> the full sequence in chunks (based on local $/ = "\n>") and splits > >>> the header off as the first line in that chunk. You could probably > >>> try leaving the format out and letting SeqIO guess it, or passing > >>> the > >>> file into Bio::Tools::GuessSeqFormat directly, but it's probably > >>> better to go through the files and add a file extension that > >>> corresponds to the format. > >>> > >>> chris > >>> > >>> On Aug 5, 2009, at 4:23 PM, Hilgert, Uwe wrote: > >>> > >>>> Thanks, Chris. The files have no extension, but we indicate what > >>>> format to use, like in the manual: > >>>> > >>>> $in = Bio::SeqIO->new(-file => "file_path", -format => 'Fasta'); > >>>> > >>>> I wonder now whether this could exactly cause the problem: as we > >>>> are > >>>> telling that input files are in fasta format they are being treated > >>>> as such (=remove first line) - regardless of whether they really > >>>> are > >>>> fasta? > >>>> > >>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Uwe > >>>> Hilgert, Ph.D. > >>>> Dolan DNA Learning Center > >>>> Cold Spring Harbor Laboratory > >>>> > >>>> C: (516) 857-1693 > >>>> V: (516) 367-5185 > >>>> E: hilgert at cshl.edu > >>>> F: (516) 367-5182 > >>>> W: http://www.dnalc.org > >>>> > >>>> -----Original Message----- > >>>> From: Chris Fields [mailto:cjfields at illinois.edu] > >>>> Sent: Wednesday, August 05, 2009 5:04 PM > >>>> To: Hilgert, Uwe > >>>> Cc: bioperl-l at lists.open-bio.org > >>>> Subject: Re: [Bioperl-l] Bio::SeqIO issue > >>>> > >>>> On Aug 5, 2009, at 3:27 PM, Hilgert, Uwe wrote: > >>>> > >>>>> Is my impression correct that Bio::SeqIO just assumes that > >>>>> sequences are being submitted in FASTA format? > >>>> > >>>> No. See: > >>>> > >>>> http://www.bioperl.org/wiki/HOWTO:SeqIO > >>>> > >>>> SeqIO tries to guess at the format using the file extension, and if > >>>> one isn't present makes use of Bio::Tools::GuessSeqFormat. It's > >>>> possible that the extension is causing the problem, or that > >>>> GuessSeqFormat guessing wrong (it's apt to do that, as it's forced > >>>> to guessing). In any case, it's always advisable to explicitly > >>>> indicate the format when possible. > >>>> > >>>> Relevant lines: > >>>> > >>>> return 'fasta' if /\.(fasta|fast|fas|seq|fa|fsa|nt|aa|fna|faa)$/ > >>>> i; > >>>> ... > >>>> return 'raw' if /\.(txt)$/i; > >>>> > >>>>> In our experience, implementing > >>>>> Bio::SeqIO led to the first line of files being cut off, > >>>>> regardless > >>>>> of whether the files were indeed fasta files or files that only > >>>>> contained sequence. > >>>> > >>>> Files that only contain sequence are 'raw'. Ones in FASTA are > >>>> 'fasta'. > >>>> > >>>>> Which, in the latter, led to sequence submissions that had the > >>>>> first line of nucleotides removed. Has anyone tried to write a fix > >>>>> for this? > >>>> > >>>> This sounds like a bug, but we have very little to go on beyond > >>>> your > >>>> description. What version of bioperl are you using, OS, etc? What > >>>> does your data look like? File extension? > >>>> > >>>> chris > >>>> > >>>>> Thanks, > >>>>> > >>>>> Uwe > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > >>>>> > >>>>> Uwe Hilgert, Ph.D. > >>>>> > >>>>> Dolan DNA Learning Center > >>>>> > >>>>> Cold Spring Harbor Laboratory > >>>>> > >>>>> > >>>>> > >>>>> V: (516) 367-5185 > >>>>> > >>>>> E: hilgert at cshl.edu > >>>>> > >>>>> F: (516) 367-5182 > >>>>> > >>>>> W: http://www.dnalc.org > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> _______________________________________________ > >>>>> Bioperl-l mailing list > >>>>> Bioperl-l at lists.open-bio.org > >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >>>> > >>> > >>> _______________________________________________ > >>> Bioperl-l mailing list > >>> Bioperl-l at lists.open-bio.org > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> -- > >> =========================================================== > >> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > >> =========================================================== > >> > >> > > > > -- > > =========================================================== > > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > > =========================================================== > > > > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bix at sendu.me.uk Sun Aug 9 06:38:30 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 11:38:30 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EA726.60303@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > OK, I propose to look into these. Almost certainly I'll be doing "convert > run/db/network to Module::Build". I'll try to resolve the bugs you've > mentioned. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. Chris already started on "convert run/db/network to Module::Build" for some reason, but his attempt doesn't actually result in any modules getting installed (setting pm_files() like that isn't enough). The easiest, cleanest and most standard solution is to create a lib directory and svn move Bio into it. Does anyone have an objection to me doing this for the network, db and run packages? It will only affect developers currently working on code in those packages, and they just need to be aware that an svn update will be rather dramatic after my change. From cjfields at illinois.edu Sun Aug 9 09:05:17 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:05:17 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7EA726.60303@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> Message-ID: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> ... > > Chris already started on "convert run/db/network to Module::Build" > for some reason, but his attempt doesn't actually result in any > modules getting installed (setting pm_files() like that isn't enough). > > The easiest, cleanest and most standard solution is to create a lib > directory and svn move Bio into it. Does anyone have an objection to > me doing this for the network, db and run packages? It will only > affect developers currently working on code in those packages, and > they just need to be aware that an svn update will be rather > dramatic after my change. If it stimulates you into doing this then I'm all for it, but I've waited on getting this fixed long enough I decided to take it on myself to work on it, using the simplest ones. You had mentioned several times you would do this and I hadn't seen any progress. The point: I would really like to get another point release out before we work on splitting things up. Simple as that. From what I have seen (with my few tests) everything (modules, scripts) gets copied into blib just fine and the temp folder for script generation gets cleaned up; I haven't progressed beyond to the installation step, but there isn't anything to me that indicates it wouldn't work. I won't be available until Wed. at the earliest for additional comment (out of town, no internet connection). chris From bix at sendu.me.uk Sun Aug 9 09:15:07 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 14:15:07 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> Message-ID: <4A7ECBDB.9030505@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >> The easiest, cleanest and most standard solution is to create a lib >> directory and svn move Bio into it. Does anyone have an objection to >> me doing this for the network, db and run packages? It will only >> affect developers currently working on code in those packages, and >> they just need to be aware that an svn update will be rather dramatic >> after my change. > > From what I have seen (with my few tests) everything (modules, scripts) > gets copied into blib just fine and the temp folder for script > generation gets cleaned up; I haven't progressed beyond to the > installation step, but there isn't anything to me that indicates it > wouldn't work. ./Build testinstall will show you it doesn't work as-is. If you're in a rush I'll just do the svn moves and we can revert later if anyone complains. From cjfields at illinois.edu Sun Aug 9 09:19:30 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:19:30 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <2790F9A5-43E8-47E5-B5AA-98239B95EF04@illinois.edu> On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. > > If you're in a rush I'll just do the svn moves and we can revert > later if anyone complains. Works for me. The sooner it gets done the better (next week, would be nice, but two is fine so we don't rush it too much). I'll be working on several other bits, including FASTQ, when I get back Wed, then I'll merge over and work on the next point release. chris From cjfields at illinois.edu Sun Aug 9 09:34:07 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 9 Aug 2009 08:34:07 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ECBDB.9030505@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > Chris Fields wrote: >> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>> The easiest, cleanest and most standard solution is to create a >>> lib directory and svn move Bio into it. Does anyone have an >>> objection to me doing this for the network, db and run packages? >>> It will only affect developers currently working on code in those >>> packages, and they just need to be aware that an svn update will >>> be rather dramatic after my change. >> >> From what I have seen (with my few tests) everything (modules, >> scripts) gets copied into blib just fine and the temp folder for >> script generation gets cleaned up; I haven't progressed beyond to >> the installation step, but there isn't anything to me that >> indicates it wouldn't work. > > ./Build testinstall will show you it doesn't work as-is. Sorry, I'll be leaving in the next hour, but for the above, did you mean './Build fakeinstall'? As long as you're moving everything into /lib (which I fully support), we should consider hard_coding scripts into bp_foo.PLS syntax seeing as we're going through additional trouble of converting them over. That is, unless there is a specific purpose to keeping them without the 'bp_'. chris From bix at sendu.me.uk Sun Aug 9 10:00:18 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 15:00:18 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> Message-ID: <4A7ED672.20701@sendu.me.uk> Chris Fields wrote: > On Aug 9, 2009, at 8:15 AM, Sendu Bala wrote: > >> Chris Fields wrote: >>> On Aug 9, 2009, at 5:38 AM, Sendu Bala wrote: >>>> The easiest, cleanest and most standard solution is to create a lib >>>> directory and svn move Bio into it. Does anyone have an objection to >>>> me doing this for the network, db and run packages? It will only >>>> affect developers currently working on code in those packages, and >>>> they just need to be aware that an svn update will be rather >>>> dramatic after my change. >>> >>> From what I have seen (with my few tests) everything (modules, >>> scripts) gets copied into blib just fine and the temp folder for >>> script generation gets cleaned up; I haven't progressed beyond to the >>> installation step, but there isn't anything to me that indicates it >>> wouldn't work. >> >> ./Build testinstall will show you it doesn't work as-is. > > Sorry, I'll be leaving in the next hour, but for the above, did you mean > './Build fakeinstall'? Yes, sorry. > As long as you're moving everything into /lib (which I fully support), > we should consider hard_coding scripts into bp_foo.PLS syntax seeing as > we're going through additional trouble of converting them over. That > is, unless there is a specific purpose to keeping them without the 'bp_'. (The final suffix is supposed to be .pl - we convert from PLS to pl in core, no conversion needed in db) Yes, for only a handful of scripts, it actually makes sense to flatten them all into a new bin directory, which is the default script location for Module::Build. So for example I'd do: svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl etc. From bix at sendu.me.uk Sun Aug 9 12:13:03 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 09 Aug 2009 17:13:03 +0100 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> Message-ID: <4A7EF58F.9000909@sendu.me.uk> bix at sendu.me.uk wrote: >> The three critical issues (as I've pointed out before) are: >> >> 1) Getting CPANPLUS installation working, which may be just META.yml, >> or it may be shell-related. I would like it for CPAN Testers, if for >> nothing else. That's at least 2 bug reports, maybe more. >> 2) Bio::Root::Build converted towards a Module::Build-compliant API, >> or we'll need to convert run/db/network to Module::Build. 1 bug report. >> 3) Avoid potential infinite looping. This may be Gbrowse-related via >> the net install script, but if Build.PL is being called in some way >> that potentially causes recursion we need to be aware of it. This one >> appears rarely, but I did manage to replicate it using an old >> Module::Build (I can't recall if I used the net install script or >> not). 1 bug report. > > It might be a week or so before I get started since I'm currently on > holiday away from a usable computer. These issues should now be resolved. I'll note that for future cases similar to 3), if a user chooses to install an optional dependency using CPAN/CPANPLUS and the installation of that external module causes an infinite loop, it's an issue of that module or CPAN/CPANPLUS, not BioPerl. The solution from our end is to tell the user to choose not to install that dependency or ask on the CPAN mailing list if they really need it. (I've often got stuck in infinite loops just trying to install Bundle::CPAN! CPAN itself will detect infinite loops after a while and kill itself.) From jdalzell03 at qub.ac.uk Sun Aug 9 05:06:26 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Sun, 9 Aug 2009 02:06:26 -0700 (PDT) Subject: [Bioperl-l] bioperl 1.6 installation on vista with perl 5.10 In-Reply-To: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> References: <576B0BC4C2F0664A97DD1532491715421AC81D9B39@EX2K7-VIRT-4.ads.qub.ac.uk> Message-ID: <24885345.post@talk.nabble.com> Thanks for the replies, I emailed Chris and Brian individually, but I guess it would be helpfull if I threw my solution to "the dogs" In the end I found that by downloading subversion (you need to sign up to collabnet for a user account first), and following the installation instructions of the relevant subversion pages on the bioperl site (http://www.bioperl.org/wiki/Using_Subversion), that It downloaded fine first time. No need for CPAN, or a PPM, just copy paste 'svn co svn://code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live' into your command line, and it auto installs in under 30 seconds...definately the way to go for anyone else out there trying to bust-a-move on a Win machine. At time of writing, I have also installed BioPerl-db (same as above, copy and paste 'svn co svn://code.open-bio.org/bioperl/bioperl-db/trunk bioperl-db' into command line), and BioPerl-run (I typed in 'svn co svn://code.open-bio.org/bioperl/bioperl-run/trunk bio' (I THINK), and it worked fine. The relevant installation instructions don't give an explicit command for BP-run installation, but I think that matches the branches and trunk in the subversion repository (if not, sorry, but you can cross ref its position in there easily by following the links). Both have worked without problem on Strawberry Perl 5.10 through WinVista, so far. Jonny -- View this message in context: http://www.nabble.com/bioperl-1.6-installation-on-vista-with-perl-5.10-tp24875623p24885345.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From mwhagen85 at gmail.com Mon Aug 10 14:54:53 2009 From: mwhagen85 at gmail.com (OjoLoco) Date: Mon, 10 Aug 2009 11:54:53 -0700 (PDT) Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits Message-ID: <24905417.post@talk.nabble.com> Hello all, I have found matching sequences between two genomes and I would now like to create a graphic that contains a heat map-like track that will show areas of the genome that were found more often than others. For every nt I have the number of times it was found, so if it was found very often it would be a darker color than say a nt that wasn't found at all. Is there any way to achieve this using built in BioPerl graphics? Thank you for your time. -- View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From cain.cshl at gmail.com Mon Aug 10 15:22:36 2009 From: cain.cshl at gmail.com (Scott Cain) Date: Mon, 10 Aug 2009 15:22:36 -0400 Subject: [Bioperl-l] Using Bioperl Graphics to create a heat map of sequence hits In-Reply-To: <24905417.post@talk.nabble.com> References: <24905417.post@talk.nabble.com> Message-ID: Hi, You should be able to do that with wiggle_density and wiggle_xyplot glyphs. See http://gmod.org/wiki/GBrowse/Uploading_Wiggle_Tracks for instructions on constructing wiggle plots. After you have a wiggle plot, you'll need the wiggle2gff3.pl script (which is part of GBrowse, but it will should run fine on its own), which you can get from GMOD's cvs: http://gmod.cvs.sourceforge.net/viewvc/*checkout*/gmod/Generic-Genome-Browser/bin/wiggle2gff3.pl which will convert the wig file to a binary file. Then you can create Bio::SeqFeatureI objects that will work with Bio::Graphics to draw the density or xyplot. Note as well that Bio::Graphics is no longer part of the main BioPerl distribution, so you'll need to get the most recent version from CPAN. Also, fair warning: I've never actually done this; I've only used wiggle plots in the context of GBrowse, but it should work pretty much as described. Scott On Aug 10, 2009, at 2:54 PM, OjoLoco wrote: > > Hello all, > I have found matching sequences between two genomes and I would > now like > to create a graphic that contains a heat map-like track that will > show areas > of the genome that were found more often than others. For every nt > I have > the number of times it was found, so if it was found very often it > would be > a darker color than say a nt that wasn't found at all. Is there any > way to > achieve this using built in BioPerl graphics? Thank you for your time. > -- > View this message in context: http://www.nabble.com/Using-Bioperl-Graphics-to-create-a-heat-map-of-sequence-hits-tp24905417p24905417.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ----------------------------------------------------------------------- Scott Cain, Ph. D. scott at scottcain dot net GMOD Coordinator (http://gmod.org/) 216-392-3087 Ontario Institute for Cancer Research From jdalzell03 at qub.ac.uk Tue Aug 11 11:07:52 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:07:52 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <24919498.post@talk.nabble.com> Hi, trying to run the example given for Bio::Tools::HMM on the Bioperl site, and when I try to run it, I get this in the command line... "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package BEGIN failed--compilation aborted at C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. Compilation failed in require at HMM.txt line 4. BEGIN failed--compilation aborted at HMM.txt line 4." I have installed the entire bioperl-ext package through subversion, and it looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it won't work. Am I missing something? I'm under the impression that the C-compiler comes with bioperl-ext (which installed with no reported problems)? I concede that I am extrememly new to both Perl in general and Bioperl more specifically, but I have followed the instructions which I can find. I have the bioperl core installed in addition to bioperl-db and bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that most work through Linux systems...I am at times sorely tempted myself. Any suggestions would be welcomed gratefully, cheers, Jonny ps. this is the partial script I was trying to run... #!/usr/bin/perl -w usr strict; use Bio::Tools::HMM; use Bio::SeqIO; use Bio::Matrix::Scoring; #Create a HMM object #ACGT are the bases NC mean non-coding and coding $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); #Initialise some training observation sequences $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); @seqs = ($seq1, $seq2); #Train the HMM with the observation sequences $hmm ->baum_welch_training(\@seqs); #Get parameters $init = $hmm->init_prob; #Returns an array reference $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring I realise that this is incomplete. -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From shameer at ncbs.res.in Tue Aug 11 13:07:20 2009 From: shameer at ncbs.res.in (K. Shameer) Date: Tue, 11 Aug 2009 22:37:20 +0530 (IST) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Hello Jonny, Are you sure that you have a compiled version of HMMER installed in your machine ? -- K. Shameer > Hi, > > trying to run the example given for Bio::Tools::HMM on the Bioperl site, > and > when I try to run it, I get this in the command line... > > "The C-compiled engine for Hidden Markov Model (HMM) has not been > installed. > Please read the install the bioperl-ext package > > BEGIN failed--compilation aborted at > C:/strawberry/perl/site/lib/Bio/Tools/HMM.pm line 140. > Compilation failed in require at HMM.txt line 4. > BEGIN failed--compilation aborted at HMM.txt line 4." > > I have installed the entire bioperl-ext package through subversion, and it > looks like all the relevant folders are in perl/site/lib/Bio/Tools, but it > won't work. Am I missing something? I'm under the impression that the > C-compiler comes with bioperl-ext (which installed with no reported > problems)? I concede that I am extrememly new to both Perl in general and > Bioperl more specifically, but I have followed the instructions which I > can > find. I have the bioperl core installed in addition to bioperl-db and > bioperl-run. I'm using Strawberry Perl on WinVista. I appreciate that > most > work through Linux systems...I am at times sorely tempted myself. > > Any suggestions would be welcomed gratefully, > cheers, > Jonny > > ps. this is the partial script I was trying to run... > > #!/usr/bin/perl -w > > usr strict; > use Bio::Tools::HMM; > use Bio::SeqIO; > use Bio::Matrix::Scoring; > > #Create a HMM object > #ACGT are the bases NC mean non-coding and coding > $hmm = new Bio::Tools::HMM ('-symbols' => "ACGT", '-states' => "NC"); > > #Initialise some training observation sequences > $Seq1 = new Bio::SeqIO(-file => $ARGV[0], -format => 'fasta'); > $seq2 = new Bio::SeqIO(-file => $ARGV[1], -format => 'fasta'); > @seqs = ($seq1, $seq2); > > #Train the HMM with the observation sequences > $hmm ->baum_welch_training(\@seqs); > > #Get parameters > $init = $hmm->init_prob; #Returns an array reference > $matrix1 = $hmm->transition_prob; #Returns Bio::Matrix::Scoring > $matrix2 = $hmm->emission_prob; #Returns Bio::Matrix::Scoring > > I realise that this is incomplete. > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919498.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From jdalzell03 at qub.ac.uk Tue Aug 11 11:14:59 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:14:59 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24919603.post@talk.nabble.com> I should point out perhaps that CPAN is not an option on a Win setup...it has never worked for anything I have tried to install. Although I'm using Strawberry Perl now, I had no success getting bioperl or any of its components through the activestate PPM either (One of the reasons I ended up going to Strawberry). The only option I have for installation is the subversion server. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24919603.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 11:42:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 08:42:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24920117.post@talk.nabble.com> I realise that this looks like there is a problem with Bio::Tools::HMM when looking at the source code, but I've even tried replacing the HMM.pm file I had with the HMM.pm script at http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, and now I'm getting... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: C:/strawberry/perl/lib C:/strawberry/perl/site/ lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." ?? jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24920117.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 14:52:21 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 11:52:21 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> Message-ID: <24923606.post@talk.nabble.com> Hi, I'm as sure as I can be. I look in the HHMER folder and it contains "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something to do with @INC, but I put "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at the top of my script, which definately encompasses the directory it should be in, and I still get... "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib C:/strawberry/perl/site/lib/ Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt line 5. BEGIN failed--compilation aborted at HMM.txt line 5." I'm out of ideas. Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From rmb32 at cornell.edu Tue Aug 11 15:23:56 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:23:56 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24920117.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> Message-ID: <4A81C54C.5020905@cornell.edu> Jonny, For quicker help you might want to try #bioperl on freenode. That said, the problem here is that when you get code from subversion, you are not really 'installing' it, you are just copying it to your machine. Part of the installation process is compiling these things, and for that you need a working C compiler. I don't know anything about using BioPerl on Windows, but as a general recommendation I would say go back to the CPAN and/or ppm directions and getting those working. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu Jonny Dalzell wrote: > I realise that this looks like there is a problem with Bio::Tools::HMM when > looking at the source code, but I've even tried replacing the HMM.pm file I > had with the HMM.pm script at > http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-ext/trunk/Bio/Ext/HMM/HMM.pm, > and now I'm getting... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: > C:/strawberry/perl/lib C:/strawberry/perl/site/ > lib .) at HMM.txt line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > ?? > > jonny From maj at fortinbras.us Tue Aug 11 15:22:42 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 15:22:42 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <7C7654A8A64E49158F6761EE09C9F297@NewLife> Jonny, You need the HMMER application, which is not part of BioPerl. See http://hmmer.janelia.org/ for download options. MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 2:52 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Tue Aug 11 15:48:11 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Tue, 11 Aug 2009 12:48:11 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81C54C.5020905@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> Message-ID: <4A81CAFB.5050903@cornell.edu> Elaborating more, the 'C-compiled engine' error comes because Bio::Ext::HMM is not installed, because bioperl-ext is not installed (correctly), because Bio::Ext::HMM is an XS extension written in C. Which needs to be compiled. With a C compiler. As part of some kind of installation process, not just copying the files to a machine with subversion. Rob Robert Buels wrote: > Jonny, > > For quicker help you might want to try #bioperl on freenode. > > That said, the problem here is that when you get code from subversion, > you are not really 'installing' it, you are just copying it to your > machine. Part of the installation process is compiling these things, > and for that you need a working C compiler. > > I don't know anything about using BioPerl on Windows, but as a general > recommendation I would say go back to the CPAN and/or ppm directions and > getting those working. > > Rob > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From bix at sendu.me.uk Tue Aug 11 16:11:43 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Tue, 11 Aug 2009 21:11:43 +0100 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: <4A81D07F.6000703@sendu.me.uk> Jonny Dalzell wrote: > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > the top of my script, which definately encompasses the directory it should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. lib (or at least one entry in your PERL5LIB) needs to point to the directory that contains the Bio directory. So: use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; Now it will be able to locate Bio::Tools::Hmm. You'll still get your original error because you don't have Hmmer installed. See Mark's reply. From jdalzell03 at qub.ac.uk Tue Aug 11 16:29:29 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:29:29 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81D07F.6000703@sendu.me.uk> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> Message-ID: <24925178.post@talk.nabble.com> Hi, thanks. I did install HHMER from the site Mark suggested, and it is within the directories that perl recognizes when reading the script...still I get "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package" Is it possible that this module simply won't run through windows? jonny Sendu Bala-2 wrote: > > Jonny Dalzell wrote: >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >> something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >> the top of my script, which definately encompasses the directory it >> should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >> HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. > > lib (or at least one entry in your PERL5LIB) needs to point to the > directory that contains the Bio directory. So: > > use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > > Now it will be able to locate Bio::Tools::Hmm. You'll still get your > original error because you don't have Hmmer installed. See Mark's reply. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jdalzell03 at qub.ac.uk Tue Aug 11 16:31:36 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Tue, 11 Aug 2009 13:31:36 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A81CAFB.5050903@cornell.edu> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> Message-ID: <24925211.post@talk.nabble.com> OK, so is there any particular C-compiler which I should use? Thanks, jonny Robert Buels wrote: > > Elaborating more, the 'C-compiled engine' error comes because > Bio::Ext::HMM is not installed, because bioperl-ext is not installed > (correctly), because Bio::Ext::HMM is an XS extension written in C. > Which needs to be compiled. With a C compiler. As part of some kind of > installation process, not just copying the files to a machine with > subversion. > > Rob > > Robert Buels wrote: >> Jonny, >> >> For quicker help you might want to try #bioperl on freenode. >> >> That said, the problem here is that when you get code from subversion, >> you are not really 'installing' it, you are just copying it to your >> machine. Part of the installation process is compiling these things, >> and for that you need a working C compiler. >> >> I don't know anything about using BioPerl on Windows, but as a general >> recommendation I would say go back to the CPAN and/or ppm directions and >> getting those working. >> >> Rob >> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From maj at fortinbras.us Tue Aug 11 17:05:10 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 17:05:10 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925178.post@talk.nabble.com> References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: Jonny, It will run in Win/Vis but there are some caveats. The BioPerl package has some plain C components, as Rob pointed out. These need to be compiled, and the objects/libraries put in the right place. CPAN will cause this to happen when you have a compiler available; ActiveState .ppm will download the binaries directly from the repository (my understanding, anyway). CPAN is always available by doing > perl -MCPAN -e shell but you may not have a C compiler around. This is a little tricky. You can either explore Visual C/C++ options from MS here http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, and install Cygwin (www.cygwin.com), which creates a linux-like environment with GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful as the real thing, I grant. Which bring me to a third possibility, that I haven't tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot system (https://help.ubuntu.com/community/WindowsDualBoot). MAJ ----- Original Message ----- From: "Jonny Dalzell" To: Sent: Tuesday, August 11, 2009 4:29 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Hi, > > thanks. I did install HHMER from the site Mark suggested, and it is within > the directories that perl recognizes when reading the script...still I get > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > Please read the install the bioperl-ext package" > > Is it possible that this module simply won't run through windows? > > jonny > > > > Sendu Bala-2 wrote: >> >> Jonny Dalzell wrote: >>> Hi, >>> >>> I'm as sure as I can be. I look in the HHMER folder and it contains >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was >>> something >>> to do with @INC, but I put >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at >>> the top of my script, which definately encompasses the directory it >>> should >>> be in, and I still get... >>> >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib >>> C:/strawberry/perl/site/lib/ >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at >>> HMM.txt >>> line 5. >>> BEGIN failed--compilation aborted at HMM.txt line 5." >>> >>> I'm out of ideas. >> >> lib (or at least one entry in your PERL5LIB) needs to point to the >> directory that contains the Bio directory. So: >> >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; >> >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your >> original error because you don't have Hmmer installed. See Mark's reply. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925178.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From Russell.Smithies at agresearch.co.nz Tue Aug 11 17:39:30 2009 From: Russell.Smithies at agresearch.co.nz (Smithies, Russell) Date: Wed, 12 Aug 2009 09:39:30 +1200 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> <4A81D07F.6000703@sendu.me.uk> <24925178.post@talk.nabble.com> Message-ID: <18DF7D20DFEC044098A1062202F5FFF32AAB6F93AA@exchsth.agresearch.co.nz> Dev-C++ http://www.bloodshed.net/devcpp.html is a good (i.e. free under GPL) Windows compiler I've used before. Might save having to install Cygwin. --Russell > -----Original Message----- > From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l- > bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen > Sent: Wednesday, 12 August 2009 9:05 a.m. > To: Jonny Dalzell; Bioperl-l at lists.open-bio.org > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > Jonny, > It will run in Win/Vis but there are some caveats. The BioPerl package has > some > plain C components, as Rob pointed out. These need to be compiled, and the > objects/libraries put in the right place. CPAN will cause this to happen when > you have a compiler available; ActiveState .ppm will download the binaries > directly from the repository (my understanding, anyway). CPAN is always > available by doing > > > perl -MCPAN -e shell > > but you may not have a C compiler around. This is a little tricky. You can > either explore Visual C/C++ options from MS here > http://msdn.microsoft.com/en-us/library/ms950410.aspx, or you can do as I do, > and install Cygwin (www.cygwin.com), which creates a linux-like environment > with > GNU compiler tools and many other (wonderful, IMHO) goodies. Not as wonderful > as > the real thing, I grant. Which bring me to a third possibility, that I haven't > tried, which is an Ubuntu box running in a VM under Windows, or as a dual-boot > system (https://help.ubuntu.com/community/WindowsDualBoot). > MAJ > ----- Original Message ----- > From: "Jonny Dalzell" > To: > Sent: Tuesday, August 11, 2009 4:29 PM > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > > > > > > Hi, > > > > thanks. I did install HHMER from the site Mark suggested, and it is within > > the directories that perl recognizes when reading the script...still I get > > > > "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. > > Please read the install the bioperl-ext package" > > > > Is it possible that this module simply won't run through windows? > > > > jonny > > > > > > > > Sendu Bala-2 wrote: > >> > >> Jonny Dalzell wrote: > >>> Hi, > >>> > >>> I'm as sure as I can be. I look in the HHMER folder and it contains > >>> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > >>> something > >>> to do with @INC, but I put > >>> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/Tools/";" at > >>> the top of my script, which definately encompasses the directory it > >>> should > >>> be in, and I still get... > >>> > >>> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/per/lib > >>> C:/strawberry/perl/site/lib/ > >>> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > >>> HMM.txt > >>> line 5. > >>> BEGIN failed--compilation aborted at HMM.txt line 5." > >>> > >>> I'm out of ideas. > >> > >> lib (or at least one entry in your PERL5LIB) needs to point to the > >> directory that contains the Bio directory. So: > >> > >> use lib "strawberry/per/lib C:/strawberry/perl/site/lib/"; > >> > >> Now it will be able to locate Bio::Tools::Hmm. You'll still get your > >> original error because you don't have Hmmer installed. See Mark's reply. > >> > >> _______________________________________________ > >> Bioperl-l mailing list > >> Bioperl-l at lists.open-bio.org > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > >> > >> > > > > -- > > View this message in context: > > http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista-- > tp24919498p24925178.html > > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > > > _______________________________________________ > > Bioperl-l mailing list > > Bioperl-l at lists.open-bio.org > > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l ======================================================================= Attention: The information contained in this message and/or attachments from AgResearch Limited is intended only for the persons or entities to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipients is prohibited by AgResearch Limited. If you have received this message in error, please notify the sender immediately. ======================================================================= From cjfields at illinois.edu Tue Aug 11 19:44:23 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:44:23 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24923606.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in> <24923606.post@talk.nabble.com> Message-ID: Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext that generates HMM's (XS-based bindings I think). I have managed to compile it successfully on Ubuntu and Mac OS X, but WinVista is a whole different bag-o-worms altogether (untested AFAIK). For the record, I do not recommend using it; I'm unsure about it's maintenance status, so it may be released separately. It would be best to use something better supported, such as the HMMER wrapper in bioperl-run and the hmmer parsers in bioperl-core. We may also have wrappers for similar code available in biolib at some future point. chris On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > > Hi, > > I'm as sure as I can be. I look in the HHMER folder and it contains > "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was > something > to do with @INC, but I put > "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ > Tools/";" at > the top of my script, which definately encompasses the directory it > should > be in, and I still get... > > "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ > per/lib > C:/strawberry/perl/site/lib/ > Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at > HMM.txt > line 5. > BEGIN failed--compilation aborted at HMM.txt line 5." > > I'm out of ideas. > > Jonny > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Tue Aug 11 19:48:08 2009 From: cjfields at illinois.edu (Chris Fields) Date: Tue, 11 Aug 2009 18:48:08 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24925211.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24920117.post@talk.nabble.com> <4A81C54C.5020905@cornell.edu> <4A81CAFB.5050903@cornell.edu> <24925211.post@talk.nabble.com> Message-ID: <3A5CA958-3B03-4252-B78F-07BBFF1FA355@illinois.edu> Any C-based code should use the same compiler used from whatever perl version you are running. ActiveState supports both VC/C++ (as Mark indicates) or mingw/gcc. I think Strawberry supports mainly the latter. Though you can use CygWin, I think a native Win module is the best way to go if possible. It will likely be a tricky road, so keep us updated and we'll attempt to help out the best we can. chris On Aug 11, 2009, at 3:31 PM, Jonny Dalzell wrote: > > OK, > > so is there any particular C-compiler which I should use? > > Thanks, > jonny > > > > Robert Buels wrote: >> >> Elaborating more, the 'C-compiled engine' error comes because >> Bio::Ext::HMM is not installed, because bioperl-ext is not installed >> (correctly), because Bio::Ext::HMM is an XS extension written in C. >> Which needs to be compiled. With a C compiler. As part of some >> kind of >> installation process, not just copying the files to a machine with >> subversion. >> >> Rob >> >> Robert Buels wrote: >>> Jonny, >>> >>> For quicker help you might want to try #bioperl on freenode. >>> >>> That said, the problem here is that when you get code from >>> subversion, >>> you are not really 'installing' it, you are just copying it to your >>> machine. Part of the installation process is compiling these >>> things, >>> and for that you need a working C compiler. >>> >>> I don't know anything about using BioPerl on Windows, but as a >>> general >>> recommendation I would say go back to the CPAN and/or ppm >>> directions and >>> getting those working. >>> >>> Rob >>> >>> >> >> >> -- >> Robert Buels >> Bioinformatics Analyst, Sol Genomics Network >> Boyce Thompson Institute for Plant Research >> Tower Rd >> Ithaca, NY 14853 >> Tel: 503-889-8539 >> rmb32 at cornell.edu >> http://www.sgn.cornell.edu >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > > -- > View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24925211.html > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From maj at fortinbras.us Tue Aug 11 20:09:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Tue, 11 Aug 2009 20:09:01 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: References: <24919498.post@talk.nabble.com><47969.192.168.1.1.1250010440.squirrel@mail.ncbs.res.in><24923606.post@talk.nabble.com> Message-ID: <69BDE54FD5C943669BCD41A9A607634A@NewLife> [OOps. Sorry about that. The compiler ideas still apply however.] ----- Original Message ----- From: "Chris Fields" To: "Jonny Dalzell" Cc: Sent: Tuesday, August 11, 2009 7:44 PM Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > Bio::Tools::Hmm doesn't use HMMER, it uses a C-based extension in bioperl-ext > that generates HMM's (XS-based bindings I think). I have managed to compile > it successfully on Ubuntu and Mac OS X, but WinVista is a whole different > bag-o-worms altogether (untested AFAIK). > > For the record, I do not recommend using it; I'm unsure about it's > maintenance status, so it may be released separately. It would be best to > use something better supported, such as the HMMER wrapper in bioperl-run and > the hmmer parsers in bioperl-core. We may also have wrappers for similar > code available in biolib at some future point. > > chris > > On Aug 11, 2009, at 1:52 PM, Jonny Dalzell wrote: > >> >> Hi, >> >> I'm as sure as I can be. I look in the HHMER folder and it contains >> "Domain.pm", "Results.pm", and "Set.pm". I thought perhaps it was something >> to do with @INC, but I put >> "use lib "strawberry/per/lib C:/strawberry/perl/site/lib/Bio/ Tools/";" at >> the top of my script, which definately encompasses the directory it should >> be in, and I still get... >> >> "Can't locate Bio/Tools/HMM.pm in @INC (@INC contains: strawberry/ per/lib >> C:/strawberry/perl/site/lib/ >> Bio/Tools/ C:/strawberry/perl/lib C:/strawberry/perl/site/lib .) at HMM.txt >> line 5. >> BEGIN failed--compilation aborted at HMM.txt line 5." >> >> I'm out of ideas. >> >> Jonny >> -- >> View this message in context: >> http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24923606.html >> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. >> >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Wed Aug 12 12:44:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Wed, 12 Aug 2009 11:44:37 -0500 Subject: [Bioperl-l] Regarding Bio::Root::Build In-Reply-To: <4A7ED672.20701@sendu.me.uk> References: <4239c0bb0907151625o7166edd6j3c2b13fec8adf530@mail.gmail.com> <4A5E7CE7.4040908@cornell.edu> <4A5ED518.7010504@cornell.edu> <4A60ACC6.6020003@sendu.me.uk> <4add5f940bc48d7f9e978fb951a966bf.squirrel@sendu.me.uk> <1F5CF270-63AD-4CEF-8BE1-2E0D5B2BCA8B@illinois.edu> <934d8690a76cc4a65b1a3d128b43f818.squirrel@sendu.me.uk> <0662dd641b656a6aa5648d31ce08db91.squirrel@sendu.me.uk> <4A7EA726.60303@sendu.me.uk> <0348CC9D-A860-432D-B47A-52B735DDF5B3@illinois.edu> <4A7ECBDB.9030505@sendu.me.uk> <4A7ED672.20701@sendu.me.uk> Message-ID: <1F099DCC-073E-470E-873A-608E674375C1@illinois.edu> On Aug 9, 2009, at 9:00 AM, Sendu Bala wrote: > Chris Fields wrote: > ... >> As long as you're moving everything into /lib (which I fully >> support), we should consider hard_coding scripts into bp_foo.PLS >> syntax seeing as we're going through additional trouble of >> converting them over. That is, unless there is a specific purpose >> to keeping them without the 'bp_'. > > (The final suffix is supposed to be .pl - we convert from PLS to pl > in core, no conversion needed in db) Yes, had that reversed in my commit. Thanks. > Yes, for only a handful of scripts, it actually makes sense to > flatten them all into a new bin directory, which is the default > script location for Module::Build. > > So for example I'd do: > svn mv scripts/biosql/bioentry2flat.pl bin/bp_bioentry2flat.pl > etc. Yes, exactly. It seems we're going out of our way to keep things as they were previously when using ExtUtil::MakeMaker/Makefile.PL. I'm not quite sure why we've bent over backwards to work around these issues when it is much easier to stick to simple standards that 99% of CPAN uses: scripts in bin (or whatever dir is passed to script_files), modules in lib. I'm not complaining, just haven't heard an explanation about that one way or the other. chris From rmb32 at cornell.edu Thu Aug 13 14:59:00 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 11:59:00 -0700 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A79A52E.7000104@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> Message-ID: <4A846274.4000600@cornell.edu> OK, commit 15927 adds some more info about -db options for Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, nuccore, nucgss, nucest, and unigene, and including a link to an (XML) page from NCBI that lists inputs that NCBI accepts. Could somebody who knows more about eUtils than me also review this patch and make corrections if necessary? Rob Robert Buels wrote: > I think you're looking for the -db => 'nucgss' option. > > I'll add a better listing of this (undocumented) options to the > Bio::DB::Query::GenBank docs. > > Rob > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From jdalzell03 at qub.ac.uk Thu Aug 13 15:27:14 2009 From: jdalzell03 at qub.ac.uk (Jonny Dalzell) Date: Thu, 13 Aug 2009 12:27:14 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24919498.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> Message-ID: <24957222.post@talk.nabble.com> Fellows, thanks very much for the input. However, today I saw fit to dual-boot with ubuntu. I've installed everything, but I still get the same "The C-compiled engine for Hidden Markov Model (HMM) has not been installed. Please read the install the bioperl-ext package " message! Is it ridiculous of me to expect ubuntu to take care of this for me? How do I go about compiling the HMM? Thanks in advance, Jonny -- View this message in context: http://www.nabble.com/Problems-with-Bioperl-ext-package-on-WinVista--tp24919498p24957222.html Sent from the Perl - Bioperl-L mailing list archive at Nabble.com. From jonathanmflowers at gmail.com Thu Aug 13 15:41:21 2009 From: jonathanmflowers at gmail.com (Jonathan Flowers) Date: Thu, 13 Aug 2009 12:41:21 -0700 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO Message-ID: Hi, I am trying to parse BLAST reports written in XML using Bio::SearchIO. When running the following code on a set of reports (multiple query results in a single file), I only get one ResultI object. I tried running the same code on a file in 'blast' format and obtained the expected results (ie one ResultI object for each query), suggesting that the issue is with blastxml. I found an old thread on this listserv where someone had had a similar problem, but could not find how it was resolved. I am using Bioperl 1.5.2 and the XML reports were generated using blastall with the -m7 option. my $in = new Bio::SearchIO(-format => 'blastxml', -file => 'blastreport.xml' ); while( my $result = $in->next_result ) { print $result->query_name,"\n"; while( my $hit = $result->next_hit ) { while( my $hsp = $hit->next_hsp ) { #do something with hsp } } } Thanks Jonathan From rmb32 at cornell.edu Thu Aug 13 17:37:21 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Thu, 13 Aug 2009 14:37:21 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <24957222.post@talk.nabble.com> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> Message-ID: <4A848791.4010402@cornell.edu> Jonny Dalzell wrote: > Is it ridiculous of me to expect ubuntu to take care of this for me? How do > I go about compiling the HMM? Yes. This is a very specialized thing that you're doing, and Ubuntu does not have the resources to package every single thing. Unfortunately, it looks like bioperl-ext package is not installable under Ubuntu 9.04 anyway, which is what I'm running. For others on this list, if somebody is interested in doing maintaining it, I'd be happy to help out by testing on Debian-based Linux platforms. We need to clarify this package's maintenance status: if there is nobody interested in maintaining it, I would recommend that bioperl-ext be removed from distribution. It's not in anybody's interest to have unmaintained software out there causing confusion. So Jonny, in short, I would say "do not use bioperl-ext". Step back. What are you trying to accomplish? Chris already recommended some alternative methods in his email of 8/11 on this subject. Perhaps we can guide you to some software that is actively maintained and will meet your needs. Rob -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Thu Aug 13 18:06:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:06:29 -0500 Subject: [Bioperl-l] Access GSS sequences using Bio::DB::GenBank In-Reply-To: <4A846274.4000600@cornell.edu> References: <8D08960C647E64438CE5740657CBBDC5F8E98B7F@iahcexch1.iah.bbsrc.ac.uk> <4A79A52E.7000104@cornell.edu> <4A846274.4000600@cornell.edu> Message-ID: <916D0E26-EBB5-4E28-99AD-F689639BB93A@illinois.edu> It looks fine. As for the databases, you can always get the latest databases using a script from bioperl-live, which uses Bio::DB::EUtilities to access them directly (scripts/DB_EUtilities/ einfo.PLS, which should install as bp_einfo.pl). (looking at the below, what is blastdbinfo?) cjfields4:DB_EUtilities cjfields$ perl einfo.PLS pubmed protein nucleotide nuccore nucgss nucest structure genome biosystems blastdbinfo books cancerchromosomes cdd gap domains gene genomeprj gensat geo gds homologene journals mesh ncbisearch nlmcatalog omia omim pepdome pmc popset probe proteinclusters pcassay pccompound pcsubstance snp sra taxonomy toolkit unigene chris On Aug 13, 2009, at 1:59 PM, Robert Buels wrote: > OK, commit 15927 adds some more info about -db options for > Bio::DB::Query::GenBank, explicitly mentioning protein, nucleotide, > nuccore, nucgss, nucest, and unigene, and including a link to an > (XML) page from NCBI that lists inputs that NCBI accepts. > > Could somebody who knows more about eUtils than me also review this > patch and make corrections if necessary? > > Rob > > Robert Buels wrote: >> I think you're looking for the -db => 'nucgss' option. >> I'll add a better listing of this (undocumented) options to the >> Bio::DB::Query::GenBank docs. >> Rob > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:08:37 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:08:37 -0500 Subject: [Bioperl-l] parsing blast XML reports with Bio::SearchIO In-Reply-To: References: Message-ID: <65CC2787-7F0A-43C1-A840-554A2E4FD76A@illinois.edu> You should update to bioperl 1.6; I believe I fixed this issue after the 1.5.2 release. chris On Aug 13, 2009, at 2:41 PM, Jonathan Flowers wrote: > Hi, > > I am trying to parse BLAST reports written in XML using > Bio::SearchIO. When > running the following code on a set of reports (multiple query > results in a > single file), I only get one ResultI object. I tried running the > same code > on a file in 'blast' format and obtained the expected results (ie one > ResultI object for each query), suggesting that the issue is with > blastxml. > I found an old thread on this listserv where someone had had a similar > problem, but could not find how it was resolved. > > I am using Bioperl 1.5.2 and the XML reports were generated using > blastall > with the -m7 option. > > my $in = new Bio::SearchIO(-format => 'blastxml', -file => > 'blastreport.xml' ); > while( my $result = $in->next_result ) { > print $result->query_name,"\n"; > while( my $hit = $result->next_hit ) { > while( my $hsp = $hit->next_hsp ) { > #do something with hsp > } > } > } > > Thanks > > Jonathan > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Thu Aug 13 18:18:57 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 17:18:57 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A848791.4010402@cornell.edu> References: <24919498.post@talk.nabble.com> <24957222.post@talk.nabble.com> <4A848791.4010402@cornell.edu> Message-ID: On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > Jonny Dalzell wrote: >> Is it ridiculous of me to expect ubuntu to take care of this for >> me? How do >> I go about compiling the HMM? > Yes. This is a very specialized thing that you're doing, and Ubuntu > does not have the resources to package every single thing. > > Unfortunately, it looks like bioperl-ext package is not installable > under Ubuntu 9.04 anyway, which is what I'm running. For others on > this list, if somebody is interested in doing maintaining it, I'd be > happy to help out by testing on Debian-based Linux platforms. We > need to clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that bioperl- > ext be removed from distribution. It's not in anybody's interest to > have unmaintained software out there causing confusion. I have cc'd Yee Man Chan for this. If there isn't a response or the message bounces, we do one of two things: 1) consider it deprecated (probably safest). 2) spin it out into a separate module. Just tried to comile it myself and am getting errors (using 64bit perl 5.10), so I think, unless someone wants to take this on, option #1 is best. > So Jonny, in short, I would say "do not use bioperl-ext". In general, that's a safe bet. We're moving most of our C/C++ bindings to BioLib. > Step back. What are you trying to accomplish? Chris already > recommended some alternative methods in his email of 8/11 on this > subject. Perhaps we can guide you to some software that is actively > maintained and will meet your needs. > > Rob Exactly. Lots of other (better supported!) options out there. HMMER, SeqAn, and others. chris From cjfields at illinois.edu Thu Aug 13 20:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Thu, 13 Aug 2009 19:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <650586.94518.qm@web30407.mail.mud.yahoo.com> References: <650586.94518.qm@web30407.mail.mud.yahoo.com> Message-ID: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> (just to point out to everyone, Yee Man's contact information was in the POD) Yee Man, I have the output in the below link: http://gist.github.com/167542 There are similar problems popping up on 32- and 64-bit perl 5.10.0, Mac OS X 10.5. Haven't had time to debug it unfortunately. I think we should seriously consider spinning this code off into it's own distribution for CPAN. It's unfortunately bit-rotting away in bioperl-ext. If you want to continue supporting it I can help set that up. chris On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > Hi > > So is this an HMM only problem? Or does it apply to other bioperl- > ext modules? > > What exactly are the compilation errors for HMM? I believe my > implementation is just a simple one based on Rabiner's paper. > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F > ~murphyk%2FBayes > %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner > +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > I don't think I did anything fancy that makes it machine > dependent or non-ANSI C. > > Yee Man > > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Jonny Dalzell" , "BioPerl List" > >, "Yee Man Chan" >> Date: Thursday, August 13, 2009, 3:18 PM >> >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >> >>> Jonny Dalzell wrote: >>>> Is it ridiculous of me to expect ubuntu to take >> care of this for me? How do >>>> I go about compiling the HMM? >>> Yes. This is a very specialized thing that >> you're doing, and Ubuntu does not have the resources to >> package every single thing. >>> >>> Unfortunately, it looks like bioperl-ext package is >> not installable under Ubuntu 9.04 anyway, which is what I'm >> running. For others on this list, if somebody is >> interested in doing maintaining it, I'd be happy to help out >> by testing on Debian-based Linux platforms. We need to >> clarify this package's maintenance status: if there is >> nobody interested in maintaining it, I would recommend that >> bioperl-ext be removed from distribution. It's not in >> anybody's interest to have unmaintained software out there >> causing confusion. >> >> I have cc'd Yee Man Chan for this. If there isn't a >> response or the message bounces, we do one of two things: >> >> 1) consider it deprecated (probably safest). >> 2) spin it out into a separate module. >> >> Just tried to comile it myself and am getting errors (using >> 64bit perl 5.10), so I think, unless someone wants to take >> this on, option #1 is best. >> >>> So Jonny, in short, I would say "do not use >> bioperl-ext". >> >> In general, that's a safe bet. We're moving most of >> our C/C++ bindings to BioLib. >> >>> Step back. What are you trying to >> accomplish? Chris already recommended some alternative >> methods in his email of 8/11 on this subject. Perhaps >> we can guide you to some software that is actively >> maintained and will meet your needs. >>> >>> Rob >> >> Exactly. Lots of other (better supported!) options >> out there. HMMER, SeqAn, and others. >> >> chris >> > > > From ymc at yahoo.com Thu Aug 13 19:58:28 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 16:58:28 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: Message-ID: <650586.94518.qm@web30407.mail.mud.yahoo.com> Hi So is this an HMM only problem? Or does it apply to other bioperl-ext modules? What exactly are the compilation errors for HMM? I believe my implementation is just a simple one based on Rabiner's paper. http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg I don't think I did anything fancy that makes it machine dependent or non-ANSI C. Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Robert Buels" > Cc: "Jonny Dalzell" , "BioPerl List" , "Yee Man Chan" > Date: Thursday, August 13, 2009, 3:18 PM > > On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > > > Jonny Dalzell wrote: > >> Is it ridiculous of me to expect ubuntu to take > care of this for me?? How do > >> I go about compiling the HMM? > > Yes.? This is a very specialized thing that > you're doing, and Ubuntu does not have the resources to > package every single thing. > > > > Unfortunately, it looks like bioperl-ext package is > not installable under Ubuntu 9.04 anyway, which is what I'm > running.? For others on this list, if somebody is > interested in doing maintaining it, I'd be happy to help out > by testing on Debian-based Linux platforms.? We need to > clarify this package's maintenance status: if there is > nobody interested in maintaining it, I would recommend that > bioperl-ext be removed from distribution.? It's not in > anybody's interest to have unmaintained software out there > causing confusion. > > I have cc'd Yee Man Chan for this.? If there isn't a > response or the message bounces, we do one of two things: > > 1) consider it deprecated (probably safest). > 2) spin it out into a separate module. > > Just tried to comile it myself and am getting errors (using > 64bit perl 5.10), so I think, unless someone wants to take > this on, option #1 is best. > > > So Jonny, in short, I would say "do not use > bioperl-ext". > > In general, that's a safe bet.? We're moving most of > our C/C++ bindings to BioLib. > > > Step back.? What are you trying to > accomplish?? Chris already recommended some alternative > methods in his email of 8/11 on this subject.? Perhaps > we can guide you to some software that is actively > maintained and will meet your needs. > > > > Rob > > Exactly.? Lots of other (better supported!) options > out there.? HMMER, SeqAn, and others. > > chris > From agulyaskov at mail.rockefeller.edu Thu Aug 13 20:40:22 2009 From: agulyaskov at mail.rockefeller.edu (Attila Gulyas-Kovacs) Date: Thu, 13 Aug 2009 20:40:22 -0400 Subject: [Bioperl-l] bus error when indexing large file Message-ID: <4A84B276.2040706@mail.rockefeller.edu> Dear all, I can index the SwissProt database without problem but I get bus error when I try to index the much larger TrEMBL database. Indexing failed with both the swissprot and fasta format (using Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke up TrEMBL into multiple files ('chunks'), about the size of the SwissProt database. Then I could could create separate indeces for each chunk. But I got bus error when I passed all chunks simultaneously to my script (below) to create a single index. Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. What do you suggest? Attila #! /usr/bin/perl use warnings; use strict; use Bio::Index::Swissprot; my $index_file_name = shift; my $inx = Bio::Index::Swissprot->new( -filename => $index_file_name, -write_flag => 1); $inx->make_index(@ARGV); -- Attila Gulyas-Kovacs Postdoctoral Associate Rockefeller University Gadsby Lab (Cardiac/Membrane Physiology) D.W. Bronk Building, Room 307 1230 York Avenue New York, NY, 10065 Tel: (212)327-8617 Fax: (212)327-7589 From ymc at yahoo.com Fri Aug 14 00:15:41 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Thu, 13 Aug 2009 21:15:41 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <234B0B99-CCBA-4DE6-B6A9-74ABD7DBD9AF@illinois.edu> Message-ID: <528790.13637.qm@web30404.mail.mud.yahoo.com> Hi all Based on my understanding of the warning messages, the problem seems to come from the "typemap" file when I cast the return from SvIV from an integer to a pointer. I suppose this might cause problems in 64-bit machines. But when I look at perlguts and perlxs, it does seem to me that the way I did in typemap is the suggested way to do it because the IV type is "guaranteed to be big enough to hold a pointer". Nevertheless, I modified my typemap file to look exactly like what's in perlxs. (See PS) Does anyone know how to deal with this problem? Or can anyone of you give me access to a 64-bit machine to sort this out? Thank you! Yee Man PS This is a typemap file using exactly the same lines suggested by perlxs. It works in my 32-bit machine. Can someone try it on a 64-bit machine? Thanks ================================================ TYPEMAP HMM * T_HMM INPUT T_HMM if (sv_isobject($arg) && (SvTYPE(SvRV($arg)) == SVt_PVMG)) $var = ($type)SvIV((SV*)SvRV( $arg )); else{ warn( \"${Package}::$func_name() -- $var is not a blessed SV referenc e\" ); XSRETURN_UNDEF; } OUTPUT T_HMM sv_setref_pv($arg, "Bio::Ext::HMM::HMM", (void*) $var); ======================================================== --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > From ymc at yahoo.com Fri Aug 14 04:27:11 2009 From: ymc at yahoo.com (Yee Man Chan) Date: Fri, 14 Aug 2009 01:27:11 -0700 (PDT) Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? Message-ID: <168012.97676.qm@web30405.mail.mud.yahoo.com> Ah.. I find that the typemap can become as simple as this ===================== TYPEMAP HMM * T_PTROBJ ===================== Then the generated HMM.c will have a function called INT2PTR to do the pointer conversion. I believe this should solve the warnings. Attached are the updated HMM.xs and typemap. Can someone with a 64-bit machine give it a try? Thank you Yee Man --- On Thu, 8/13/09, Chris Fields wrote: > From: Chris Fields > Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? > To: "Yee Man Chan" > Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" > Date: Thursday, August 13, 2009, 5:31 PM > (just to point out to everyone, Yee > Man's contact information was in the POD) > > Yee Man, > > I have the output in the below link: > > http://gist.github.com/167542 > > There are similar problems popping up on 32- and 64-bit > perl 5.10.0, Mac OS X 10.5.? Haven't had time to debug > it unfortunately. > > I think we should seriously consider spinning this code off > into it's own distribution for CPAN.? It's > unfortunately bit-rotting away in bioperl-ext.? If you > want to continue supporting it I can help set that up. > > chris > > On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: > > > Hi > > > >? ? So is this an HMM only problem? Or does > it apply to other bioperl-ext modules? > > > >? ? What exactly are the compilation errors > for HMM? I believe my implementation is just a simple one > based on Rabiner's paper. > > > > http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg > > > >? ? I don't think I did anything fancy that > makes it machine dependent or non-ANSI C. > > > > Yee Man > > > > --- On Thu, 8/13/09, Chris Fields > wrote: > > > >> From: Chris Fields > >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext > package on WinVista? > >> To: "Robert Buels" > >> Cc: "Jonny Dalzell" , > "BioPerl List" , > "Yee Man Chan" > >> Date: Thursday, August 13, 2009, 3:18 PM > >> > >> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: > >> > >>> Jonny Dalzell wrote: > >>>> Is it ridiculous of me to expect ubuntu to > take > >> care of this for me?? How do > >>>> I go about compiling the HMM? > >>> Yes.? This is a very specialized thing > that > >> you're doing, and Ubuntu does not have the > resources to > >> package every single thing. > >>> > >>> Unfortunately, it looks like bioperl-ext > package is > >> not installable under Ubuntu 9.04 anyway, which is > what I'm > >> running.? For others on this list, if > somebody is > >> interested in doing maintaining it, I'd be happy > to help out > >> by testing on Debian-based Linux platforms.? > We need to > >> clarify this package's maintenance status: if > there is > >> nobody interested in maintaining it, I would > recommend that > >> bioperl-ext be removed from distribution.? > It's not in > >> anybody's interest to have unmaintained software > out there > >> causing confusion. > >> > >> I have cc'd Yee Man Chan for this.? If there > isn't a > >> response or the message bounces, we do one of two > things: > >> > >> 1) consider it deprecated (probably safest). > >> 2) spin it out into a separate module. > >> > >> Just tried to comile it myself and am getting > errors (using > >> 64bit perl 5.10), so I think, unless someone wants > to take > >> this on, option #1 is best. > >> > >>> So Jonny, in short, I would say "do not use > >> bioperl-ext". > >> > >> In general, that's a safe bet.? We're moving > most of > >> our C/C++ bindings to BioLib. > >> > >>> Step back.? What are you trying to > >> accomplish?? Chris already recommended some > alternative > >> methods in his email of 8/11 on this > subject.? Perhaps > >> we can guide you to some software that is > actively > >> maintained and will meet your needs. > >>> > >>> Rob > >> > >> Exactly.? Lots of other (better supported!) > options > >> out there.? HMMER, SeqAn, and others. > >> > >> chris > >> > > > > > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com -------------- next part -------------- A non-text attachment was scrubbed... Name: HMM.xs Type: application/octet-stream Size: 5588 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: typemap Type: application/octet-stream Size: 26 bytes Desc: not available URL: From cjfields at illinois.edu Fri Aug 14 10:20:21 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:20:21 -0500 Subject: [Bioperl-l] bus error when indexing large file In-Reply-To: <4A84B276.2040706@mail.rockefeller.edu> References: <4A84B276.2040706@mail.rockefeller.edu> Message-ID: I can attempt to reproduce this (I have very similar specs). I'm wondering if it has something to do with large file support. Have you tried the perl packaged with Mac OS X? I think it's perl 5.8.8. chris On Aug 13, 2009, at 7:40 PM, Attila Gulyas-Kovacs wrote: > Dear all, > > I can index the SwissProt database without problem but I get bus > error when I try to index the much larger TrEMBL database. Indexing > failed with both the swissprot and fasta format (using > Bio::Index::Swissprot or Bio::Index::Fasta, respectively). I broke > up TrEMBL into multiple files ('chunks'), about the size of the > SwissProt database. Then I could could create separate indeces for > each chunk. But I got bus error when I passed all chunks > simultaneously to my script (below) to create a single index. > Perl v5.10.0; Bioperl 1.6.0; Mac OS X 10.5.8; MacPro 10 GB RAM. > > What do you suggest? > > Attila > > > #! /usr/bin/perl > use warnings; > use strict; > use Bio::Index::Swissprot; > my $index_file_name = shift; > my $inx = Bio::Index::Swissprot->new( > -filename => $index_file_name, > -write_flag => 1); > $inx->make_index(@ARGV); > > -- > Attila Gulyas-Kovacs > Postdoctoral Associate > > Rockefeller University > Gadsby Lab (Cardiac/Membrane Physiology) > D.W. Bronk Building, Room 307 1230 York Avenue > New York, NY, 10065 > Tel: (212)327-8617 > Fax: (212)327-7589 > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From David.Messina at sbc.su.se Fri Aug 14 10:10:33 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 16:10:33 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence Message-ID: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Hi everyone, I'm using Bio::AlignIO to read in a series of multiple alignments. Occasionally, an alignment will have a sequence which consists entirely of gaps (these are actually trimmed sub-alignments; that's why). Each time I read in such an alignment, an error will be raised when the Bio::LocatableSeq object is created for the all-gap sequence (actually, the error comes from the superclass Bio::PrimarySeq). To my way of thinking, an alignment is not invalid if it contains such all-gap sequences, so there shouldn't be an error. This could be done by having Bio::AlignIO::* passing the -nowarnonempty flag when creating the sequence objects. Any thoughts on this? Is there a better way to suppress the warning than changing the behavior of all the AlignIO modules? Dave From cjfields at illinois.edu Fri Aug 14 10:42:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 09:42:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Dave, Is this using bioperl-live? I recall this being a problem but I thought it was addressed in svn (and soon in the next point release). chris On Aug 14, 2009, at 9:10 AM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists > entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when > the > Bio::LocatableSeq object is created for the all-gap sequence > (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be > done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating > the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning > than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From bernd.web at gmail.com Fri Aug 14 10:44:42 2009 From: bernd.web at gmail.com (Bernd Web) Date: Fri, 14 Aug 2009 16:44:42 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> Message-ID: <716af09c0908140744i4447dffg205ec07daeaaa571@mail.gmail.com> Hi Dave, I have observed the same (with bioperl 1.52) for the same reason. It would be nice not to have these errors as also in my view an all-gaps sequence is a sequence. I also found that sometimes parsing such alignments fails when the all-gaps sequence is the last in the alignment (bug 2744, in Bio::LocatableSeq). Regards, Bernd On Fri, Aug 14, 2009 at 4:10 PM, Dave Messina wrote: > Hi everyone, > I'm using Bio::AlignIO to read in a series of multiple alignments. > Occasionally, an alignment will have a sequence which consists entirely of > gaps (these are actually trimmed sub-alignments; that's why). > > Each time I read in such an alignment, an error will be raised when the > Bio::LocatableSeq object is created for the all-gap sequence (actually, the > error comes from the superclass Bio::PrimarySeq). > > To my way of thinking, an alignment is not invalid if it contains such > all-gap sequences, so there shouldn't be an error. This could be done by > having Bio::AlignIO::* passing the -nowarnonempty flag when creating the > sequence objects. > > Any thoughts on this? Is there a better way to suppress the warning than > changing the behavior of all the AlignIO modules? > > > Dave > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > From David.Messina at sbc.su.se Fri Aug 14 11:12:35 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Fri, 14 Aug 2009 17:12:35 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> Message-ID: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> > > Is this using bioperl-live? Sorry, should've said before. Yes, it's bioperl-live (r15927). I recall this being a problem but I thought it was addressed in svn (and > soon in the next point release). Hmm, the only recent somewhat related change I see (in Bio::AlignIO::*, anyway) is: ------------------------------------------------------------------------ r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 lines deprecate no_sequences/no_residues in main trunk (we can switch the version to 1.7 if deemed necessary) ------------------------------------------------------------------------ Perhaps this is what you were thinking of? Dave From cjfields at illinois.edu Fri Aug 14 11:31:49 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:31:49 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <168012.97676.qm@web30405.mail.mud.yahoo.com> References: <168012.97676.qm@web30405.mail.mud.yahoo.com> Message-ID: Yee Man, I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 64-bit) and on dev.open-bio.org (which is perl 5.8.8, appears to be 32-bit). The patch results in cleaning up warnings for 5.10.0 but results in similar warnings for 5.8.8 (linux or OS X). On OS X perl 5.8.8, this sometimes passes (note the first attempt fails, the second succeeds), so it's not entirely a 32-bit issue: http://gist.github.com/167860 OS X and perl 5.10.0, this always fails as the previous gist shows, but demonstrates similar behavior (multiple attempts to test get different responses): http://gist.github.com/167542 On linux, everything passes with or w/o the patched files (patched files have warnings as indicated above): Specs for all three perl executables (they vary a bit): http://gist.github.com/167883 chris On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: > Ah.. I find that the typemap can become as simple as this > ===================== > TYPEMAP > HMM * T_PTROBJ > ===================== > > Then the generated HMM.c will have a function called INT2PTR to do > the pointer conversion. I believe this should solve the warnings. > > Attached are the updated HMM.xs and typemap. Can someone with a 64- > bit machine give it a try? > > Thank you > Yee Man > --- On Thu, 8/13/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" > >, "BioPerl List" >> Date: Thursday, August 13, 2009, 5:31 PM >> (just to point out to everyone, Yee >> Man's contact information was in the POD) >> >> Yee Man, >> >> I have the output in the below link: >> >> http://gist.github.com/167542 >> >> There are similar problems popping up on 32- and 64-bit >> perl 5.10.0, Mac OS X 10.5. Haven't had time to debug >> it unfortunately. >> >> I think we should seriously consider spinning this code off >> into it's own distribution for CPAN. It's >> unfortunately bit-rotting away in bioperl-ext. If you >> want to continue supporting it I can help set that up. >> >> chris >> >> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >> >>> Hi >>> >>> So is this an HMM only problem? Or does >> it apply to other bioperl-ext modules? >>> >>> What exactly are the compilation errors >> for HMM? I believe my implementation is just a simple one >> based on Rabiner's paper. >>> >>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>> ~murphyk%2FBayes >>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>> >>> I don't think I did anything fancy that >> makes it machine dependent or non-ANSI C. >>> >>> Yee Man >>> >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Jonny Dalzell" , >> "BioPerl List" , >> "Yee Man Chan" >>>> Date: Thursday, August 13, 2009, 3:18 PM >>>> >>>> On Aug 13, 2009, at 4:37 PM, Robert Buels wrote: >>>> >>>>> Jonny Dalzell wrote: >>>>>> Is it ridiculous of me to expect ubuntu to >> take >>>> care of this for me? How do >>>>>> I go about compiling the HMM? >>>>> Yes. This is a very specialized thing >> that >>>> you're doing, and Ubuntu does not have the >> resources to >>>> package every single thing. >>>>> >>>>> Unfortunately, it looks like bioperl-ext >> package is >>>> not installable under Ubuntu 9.04 anyway, which is >> what I'm >>>> running. For others on this list, if >> somebody is >>>> interested in doing maintaining it, I'd be happy >> to help out >>>> by testing on Debian-based Linux platforms. >> We need to >>>> clarify this package's maintenance status: if >> there is >>>> nobody interested in maintaining it, I would >> recommend that >>>> bioperl-ext be removed from distribution. >> It's not in >>>> anybody's interest to have unmaintained software >> out there >>>> causing confusion. >>>> >>>> I have cc'd Yee Man Chan for this. If there >> isn't a >>>> response or the message bounces, we do one of two >> things: >>>> >>>> 1) consider it deprecated (probably safest). >>>> 2) spin it out into a separate module. >>>> >>>> Just tried to comile it myself and am getting >> errors (using >>>> 64bit perl 5.10), so I think, unless someone wants >> to take >>>> this on, option #1 is best. >>>> >>>>> So Jonny, in short, I would say "do not use >>>> bioperl-ext". >>>> >>>> In general, that's a safe bet. We're moving >> most of >>>> our C/C++ bindings to BioLib. >>>> >>>>> Step back. What are you trying to >>>> accomplish? Chris already recommended some >> alternative >>>> methods in his email of 8/11 on this >> subject. Perhaps >>>> we can guide you to some software that is >> actively >>>> maintained and will meet your needs. >>>>> >>>>> Rob >>>> >>>> Exactly. Lots of other (better supported!) >> options >>>> out there. HMMER, SeqAn, and others. >>>> >>>> chris >>>> >>> >>> >>> >> >> > > __________________________________________________ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Fri Aug 14 11:53:51 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 10:53:51 -0500 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> Message-ID: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> On Aug 14, 2009, at 10:12 AM, Dave Messina wrote: > Is this using bioperl-live? > > Sorry, should've said before. Yes, it's bioperl-live (r15927). > > > I recall this being a problem but I thought it was addressed in svn > (and soon in the next point release). > > Hmm, the only recent somewhat related change I see (in > Bio::AlignIO::*, anyway) is: > > ------------------------------------------------------------------------ > r15753 | cjfields | 2009-06-10 05:51:38 +0200 (Wed, 10 Jun 2009) | 2 > lines > > deprecate no_sequences/no_residues in main trunk (we can switch the > version to 1.7 if deemed necessary) > ------------------------------------------------------------------------ > > > Perhaps this is what you were thinking of? > > Dave Maybe not, then (for some reason I thought this was fixed within LocatableSeq). I know that it is possible to have an all-gap LocatableSeq; this works, but the default start/end/length aren't correct, which is part of Bernd's bug: use Modern::Perl; use Bio::LocatableSeq; my $seq = Bio::LocatableSeq->new( -seq => '-------------', -alphabet => 'dna', ); say $seq->start; # 1 say $seq->end; # undef (?) say $seq->length; # 13, counts the gaps The problem is, to fix all this relies on a whole slew of refactors for LocatableSeq and SimpleAlign. Some of this touches root components as well, so it'll need to be tried on a branch and will very likely result in some API changes (and thus may not be included in 1.6). I'll start a branch to get the process started. chris From jncline at gmail.com Fri Aug 14 15:41:21 2009 From: jncline at gmail.com (Jonathan Cline) Date: Fri, 14 Aug 2009 14:41:21 -0500 Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl In-Reply-To: <99E27D08408340B9B0611751A17DF266@NewLife> References: <99E27D08408340B9B0611751A17DF266@NewLife> Message-ID: <4A85BDE1.5020002@gmail.com> Mark A. Jensen wrote: > Sorry, I cut off the last script. The entire thing follows: > This is exactly what I was looking for - thanks. A method to modify Makefile.PL, install in Activestate, etc is great. Perhaps your method could also be improved for portability by using `cygpath` although few cygwin installs modify this beyond the default (to get rid of hardcoded "/cygdrive/x/"). I will definitely save your code for later. I've implemented another workaround, which is to use Win32::Pipe and other Win32:: methods. This has problems of it's own (support is not 100%) and error-free implementation not as easy as requiring Activestate Perl, however it should work with both Activestate and cygwin-perl (and Unix). ## Jonathan Cline ## jcline at ieee.org ## Mobile: +1-805-617-0223 ######################## > ----- Original Message ----- From: "Jonathan Cline" > To: > Cc: > Sent: Friday, July 31, 2009 11:24 PM > Subject: [Bioperl-l] Module issue with cygwin-perl vs. Activestate Perl > > >> I recently mentioned working on Bio::Robotics for Tecan. Vendors >> being MS-Win specific, the vendor software allows third-party software >> communication through a named pipe (the literal filename is >> "\\\\.\\pipe\\gemini" where the multiple front slashes are MS specific >> and this pseudo-pipe is opened with sysopen() ). This is broken under >> cygwin-perl due to cygwin's method of handling paths -- the sysopen >> fails. However it works under ActiveState Perl and communication >> through the named pipe (to the robot hardware) is OK. The standard >> workaround is usually to use cygwin bash, and force the PATH to use >> ActiveState perl. (Typical MS Windows incompatibility problem.) The >> issue is: Perl module libraries for CPAN work under cygwin-perl >> (only?). Attempts to run "activestate-perl Makefile.PL" for CPAN >> module use, or "make test", result in a bad list of incompatibility >> problems. Yet ActiveState Perl is required for communicating to the >> vendor application (unless there is some workaround to raw filesystem >> access in cygwin-perl that I haven't found in 2 days of working this). >> The stand-alone scripts I have work fine to access the named pipe >> (using ActiveState Perl) since the standalone scripts have no module >> INC dependencies, no CPAN module test harness, etc etc. >> >> This isn't specifically a Bio:: issue, though if anyone has >> suggestions please email. I could try msys and see if it handles the >> named-pipe-special-file better, if msys has an msys-perl distribution. >> >> -- >> ## Jonathan Cline >> ## jcline at ieee.org >> ## Mobile: +1-805-617-0223 >> ######################## >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> >> > From cjfields at illinois.edu Fri Aug 14 19:29:43 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 18:29:43 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring Message-ID: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> As we have pretty much everything in place for another point release (which I will start merging over this weekend into the 1.6 branch), I have gone ahead and made two branches for refactoring some of the more important pieces of bioperl code. Both refactors may require API changes; if so these will be part of a 1.7 release. 1) GFF - entail refactoring bioperl code to better handle GFF2/3. This is a large section of code, so small incremental changes may be merged to trunk over time (and thus may involve several branches). Included is refactoring of feature typing to be more consistent and lightweight, and will initially involve Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be deprecated in the process). See the following for additional details: http://www.bioperl.org/wiki/GFF_Refactor 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to address significant bugs but will also entail cleaning up SimpleAlign methods (factoring out more utility-like methods into Bio::Align::AlignUtils or similar). This also may involve several branches. See the following for additional details: http://www.bioperl.org/wiki/Align_Refactor Any help/suggestions for the above two would be greatly appreciated! Robert Buels may be heading up the initial FeatureIO work; I will likely start on LocatableSeq/Align (Mark, wanna help?). chris From maj at fortinbras.us Fri Aug 14 19:45:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 19:45:01 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Hey Chris et al, I'm there on LocatableSeq, definitely. I do have one project to finish this weekend before I move to that: I'm planning to move Chase Miller's excellent NeXML read/write implementation into the trunk, complete with tests. If we can get it to pass the test suite, is there room in the point release for it? MAJ ----- Original Message ----- From: "Chris Fields" To: "BioPerl List" Sent: Friday, August 14, 2009 7:29 PM Subject: [Bioperl-l] GFF and LocatableSeq refactoring > As we have pretty much everything in place for another point release > (which I will start merging over this weekend into the 1.6 branch), I > have gone ahead and made two branches for refactoring some of the more > important pieces of bioperl code. Both refactors may require API > changes; if so these will be part of a 1.7 release. > > 1) GFF - entail refactoring bioperl code to better handle GFF2/3. > > This is a large section of code, so small incremental changes may be > merged to trunk over time (and thus may involve several branches). > Included is refactoring of feature typing to be more consistent and > lightweight, and will initially involve Bio::FeatureIO and > Bio::SeqFeature::Annotated (which may be deprecated in the process). > See the following for additional details: > > http://www.bioperl.org/wiki/GFF_Refactor > > 2) Align/LocatableSeq - dealing with inconsistencies in Bio::AlignI > (SimpleAlign) and LocatableSeq. This is primarily to address > significant bugs but will also entail cleaning up SimpleAlign methods > (factoring out more utility-like methods into Bio::Align::AlignUtils > or similar). This also may involve several branches. See the > following for additional details: > > http://www.bioperl.org/wiki/Align_Refactor > > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will > likely start on LocatableSeq/Align (Mark, wanna help?). > > chris > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From rmb32 at cornell.edu Fri Aug 14 19:50:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Fri, 14 Aug 2009 16:50:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <4A85F83A.30800@cornell.edu> Chris Fields wrote: > Any help/suggestions for the above two would be greatly appreciated! > Robert Buels may be heading up the initial FeatureIO work; I will likely > start on LocatableSeq/Align (Mark, wanna help?). Sure, I'll head up the gff_refactor branch work. If you're interested in what changes are being planned for Bio::SeqFeature::*, Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the implementation plan Chris and I developed just now on IRC, which is at http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan Now soliciting suggestions, comments, and assistance. Rob From cjfields at illinois.edu Fri Aug 14 21:03:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 20:03:41 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: Mark, re: NeXML, yes, of course. There'll be an alpha release or two prior to core 1.6.1 (I need to test the Build.PL/Bio::Root::Build changes Sendu added in). chris On Aug 14, 2009, at 6:45 PM, Mark A. Jensen wrote: > Hey Chris et al, I'm there on LocatableSeq, definitely. I do have > one project to finish this weekend before I move to that: I'm > planning to move Chase Miller's > excellent NeXML read/write implementation into the trunk, complete > with tests. If we can get it to pass the test suite, is there room > in the point release for it? > MAJ > ----- Original Message ----- From: "Chris Fields" > > To: "BioPerl List" > Sent: Friday, August 14, 2009 7:29 PM > Subject: [Bioperl-l] GFF and LocatableSeq refactoring > > >> As we have pretty much everything in place for another point >> release (which I will start merging over this weekend into the 1.6 >> branch), I have gone ahead and made two branches for refactoring >> some of the more important pieces of bioperl code. Both refactors >> may require API changes; if so these will be part of a 1.7 release. >> 1) GFF - entail refactoring bioperl code to better handle GFF2/3. >> This is a large section of code, so small incremental changes may >> be merged to trunk over time (and thus may involve several >> branches). Included is refactoring of feature typing to be more >> consistent and lightweight, and will initially involve >> Bio::FeatureIO and Bio::SeqFeature::Annotated (which may be >> deprecated in the process). See the following for additional >> details: >> http://www.bioperl.org/wiki/GFF_Refactor >> 2) Align/LocatableSeq - dealing with inconsistencies in >> Bio::AlignI (SimpleAlign) and LocatableSeq. This is primarily to >> address significant bugs but will also entail cleaning up >> SimpleAlign methods (factoring out more utility-like methods into >> Bio::Align::AlignUtils or similar). This also may involve several >> branches. See the following for additional details: >> http://www.bioperl.org/wiki/Align_Refactor >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). >> chris >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> From maj at fortinbras.us Fri Aug 14 22:32:01 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Fri, 14 Aug 2009 22:32:01 -0400 Subject: [Bioperl-l] on BP documentation Message-ID: <1F899AA92F94415186CB0B25306F1114@NewLife> Hi All -- Off-list, an old colleague of mine had this insightful, if damning, comment: >I guess that from my perspective, after doing this stuff for >about 10 years, I personally would prefer to see a "summer of >documentation" for the bio* languages (or at least bioperl, as that is >the only one I ever look at). From my own experiences, and from those >of many colleagues, the documentation for bioperl has gone from >mediocre to quite poor in the last few years. I largely think the >wikification of the docs are to blame for this. Even SeqIO is hard >to figure out now--it took me an hour the other day to figure out that >"desc" returns the full Fasta header, and I had to get that from the >module code + trial-and-error, instead of the online docs. There is >far too much inside baseball going on in the documentation scheme. >So I worry more about the constant adding of features at the expense >of documenting what is already there. This is just my 2 cents, and it >is disappointing to see a downward trend for bioperl in this regard. I would be really interested in all responses from the list users. I must agree that BP docs are rather a rat's nest and of varying quality, but taken in toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount of useful and sophisticated information available. I think there are approaches we can take to reorganize and standardize the accession of it to make it more useful and inviting. I disagree with my pal about the wikification, but I wager that the power of the wiki could be leveraged to greater advantage (right, Dan?). I think that what we all as developers love is to code, and detest is to document. Since BP is all-volunteer, and volunteers tend to do what they like -- the beauty of open source, btw -- documentation reorg and cleanup probably must devolve to the Core. I am willing to lead such an effort, which will take some time, and more time the fewer volunteers there are. First let's hear some thoughts, and 'let it all hang out', as they said in my mom's era. cheers Mark From cjfields at illinois.edu Fri Aug 14 23:41:10 2009 From: cjfields at illinois.edu (Chris Fields) Date: Fri, 14 Aug 2009 22:41:10 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> On Aug 14, 2009, at 9:32 PM, Mark A. Jensen wrote: > Hi All -- > > Off-list, an old colleague of mine had this insightful, if damning, > comment: > >> I guess that from my perspective, after doing this stuff for >> about 10 years, I personally would prefer to see a "summer of >> documentation" for the bio* languages (or at least bioperl, as that >> is >> the only one I ever look at). From my own experiences, and from >> those >> of many colleagues, the documentation for bioperl has gone from >> mediocre to quite poor in the last few years. I largely think the >> wikification of the docs are to blame for this. Even SeqIO is hard >> to figure out now--it took me an hour the other day to figure out >> that >> "desc" returns the full Fasta header, and I had to get that from the >> module code + trial-and-error, instead of the online docs. There is >> far too much inside baseball going on in the documentation scheme. > >> So I worry more about the constant adding of features at the expense >> of documenting what is already there. This is just my 2 cents, and >> it >> is disappointing to see a downward trend for bioperl in this regard. > > I would be really interested in all responses from the list users. I > must agree > that BP docs are rather a rat's nest and of varying quality, but > taken in > toto (POD, HOWTOs, scraps, bioperl-l, etc.) there is a huge amount > of useful and sophisticated information available. I think there are > approaches we can take to reorganize and standardize the accession > of it to make it more useful and inviting. I disagree with my pal > about the > wikification, but I wager that the power of the wiki could be > leveraged > to greater advantage (right, Dan?). To me good documentation should be a combination of both wiki docs (HOWTOs, scraps, cookbook-y code) and inline POD. We can't forsake one for the other. If I had a preference, I would take more up-to- date POD over wiki (maybe adding a Status: for the methods), but a good HOWTO goes a long way in helping. It's just too hard to cover every use case. It's unfortunate that documentation is very poor for many modules, but at the same time it's also exceptionally hard to write documentation for modules one has had no part in developing. I think this is the main reason the docs are in the state they are in (not to point the finger of blame at anyone, I'm just as much to blame). > I think that what we all as developers love is to code, and detest > is to > document. Since BP is all-volunteer, and volunteers tend to do what > they like -- the beauty of open source, btw -- documentation reorg > and cleanup probably must devolve to the Core. I am willing to lead > such an effort, which will take some time, and more time the fewer > volunteers there are. First let's hear some thoughts, and 'let it > all hang out', > as they said in my mom's era. > > cheers > Mark Two things: 1) Take advantage of the proposed restructuring effort (as well as some of the refactoring are doing) to add decent documentation where possible. This means updating method docs and updating the HOWTO's as needed, or adding new HOWTO's (Jason has indicated this in the past). 2) Pinpoint areas where docs are desperately needed first. Other wiki docs could also use updating. As an example, the above author's question on FASTA and desc() is actually answered in the FAQ, but the question doesn't make it easy to find: http://www.bioperl.org/wiki/FAQ#I_would_like_to_make_my_own_custom_fasta_header_-_how_do_I_do_this.3F chris From David.Messina at sbc.su.se Sat Aug 15 03:49:59 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 09:49:59 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <628aabb70908150049h64f83b8ewb30d916f0534e40d@mail.gmail.com> > > To me good documentation should be a combination of both wiki docs (HOWTOs, > scraps, cookbook-y code) and inline POD. We can't forsake one for the > other. > I think this notion is already kinda there de facto (inside baseball? :)), but perhaps we should make clear the idea that: - POD is the reference manual, with each method's capabilities described comprehensively and in detail. - The wiki is tutorials (bptutorial, Jason's slides), use cases (HOWTOs and Scrapbook), and FAQ And actually all the POD is accessible online from the wiki at doc.bioperl.org, too (although maybe a little hard to find -- it's under Developer--API Docs). > If I had a preference, I would take more up-to-date POD over wiki (maybe > adding a Status: for the methods), but a good HOWTO goes a long way in > helping. It's just too hard to cover every use case. > I'd agree with this, too, partly because I think the HOWTOs are in pretty good shape, covering the most common stuff pretty well, and partly because I think the reference manual has to be complete, both for a user coming to find out how to use it and for authors ensuring that their internal model of how the code works actually hangs together. Mark, one attack point for a documentation improvement effort would be to take a survey of the PODs and see how well they are fulfilling the role of a reference manual. But part of a good reference manual is knowing how to find what you're looking for, and indeed I think that's maybe the main overall problem with trying to document anything as big and complicated as BioPerl. So for me, the organization of our copious docs might benefit from some attention. The goal of providing a way to find information better handled by the wiki, which does searching and crossreferencing much better than POD. To take your friend's FASTA header example, I might expect to be able to search for 'FASTA' or 'FASTA header' on the wiki and find something which guides me to the answer. A search for 'FASTA' gives a list of pointers, including the 'FASTA sequence format' page. That page almost gives the right answer (see the Note section), but perhaps it might be a nice place to say that in BioPerl, a FASTA sequence is a Bio::Seq, and that the header is $seq->desc and the seq is $seq->seq. And there could be an equivalent page for the other common formats, breaking down how the format maps to an object. [...] it's also exceptionally hard to write documentation for modules one > has had no part in developing. I think this is the main reason the docs are > in the state they are in (not to point the finger of blame at anyone, I'm > just as much to blame). Absolutely, and maybe a first step would be to contact the authors of a module with out-of-date docs and ask for them to fix it, in the same way one would go to the author with a bug in their code. Core+volunteers will certainly be needed for organizing the effort and assessing the state of BioPerl documentation as a whole, but give authors the opportunity to take care of their code, too. Two things: > > 1) Take advantage of the proposed restructuring effort (as well as some of > the refactoring are doing) to add decent documentation where possible. This > means updating method docs and updating the HOWTO's as needed, or adding new > HOWTO's (Jason has indicated this in the past). > This is a great idea. > 2) Pinpoint areas where docs are desperately needed first. > > Other wiki docs could also use updating. As an example, the above author's > question on FASTA and desc() is actually answered in the FAQ, Absolutely. Maybe some of the FAQs could actually be added back to the relevant PODs? Dave From David.Messina at sbc.su.se Sat Aug 15 04:00:50 2009 From: David.Messina at sbc.su.se (Dave Messina) Date: Sat, 15 Aug 2009 10:00:50 +0200 Subject: [Bioperl-l] AlignIO error with aligments containing an all-gap sequence In-Reply-To: <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> References: <628aabb70908140710q3022c71cu342be94f4998ab8a@mail.gmail.com> <62B3A229-C971-44DE-9104-8F2D028504D7@illinois.edu> <628aabb70908140812ie1177a4t1d16f95aee90398b@mail.gmail.com> <3D50B594-126D-4CFC-B5A8-EDB119BC75B2@illinois.edu> Message-ID: <628aabb70908150100ka8c21aahe2bf7d636fa94112@mail.gmail.com> > > I know that it is possible to have an all-gap LocatableSeq You can, but to avoid the "can't guess alphabet" error I'm getting you have to set the alphabet manually (which AlignIO does not). I'll start a branch to get the process started. Terrific! In the meantime, then, I'll just use the -nowarnonempty workaround in my local copy of AlignIO. Dave From bernd.web at gmail.com Sat Aug 15 07:17:44 2009 From: bernd.web at gmail.com (Bernd Web) Date: Sat, 15 Aug 2009 13:17:44 +0200 Subject: [Bioperl-l] on BP documentation In-Reply-To: <1F899AA92F94415186CB0B25306F1114@NewLife> References: <1F899AA92F94415186CB0B25306F1114@NewLife> Message-ID: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Hi >>? Even SeqIO is hard >>to figure out now--it took me an hour the other day to figure out that >>"desc" returns the full Fasta header, and I had to get that from the >>module code + trial-and-error, instead of the online docs. I was a bit surprised about $seq->desc retrieving the entire FASTA header line Actually, in Bioperl 1.52 at least $seq->desc returns the description only, so without the ID. Thus, to get the entire FASTA header line $seq->id . " " $seq->desc would be needed. For the modules I use (mainly related to sequences, such as SeqIO, SimpleAlign), I'd be happy to contribute on docs, checking docs, or examples. Regards, Bernd From sanjaysingh765 at gmail.com Sat Aug 15 09:38:18 2009 From: sanjaysingh765 at gmail.com (sanjay singh) Date: Sat, 15 Aug 2009 19:08:18 +0530 Subject: [Bioperl-l] BLINK PARSER Message-ID: Hi, I want to submit query to NCBI'S BLINK and parsed the result for the best hit. is there anyone have script to do so.i would be very grateful if someone would like to share it with me. regards sanjay -- Happy moments , praise God. Difficult moments, seek God. Quiet moments, worship God. Painful moments, trust God. Every moment, thank God Sanjay Kumar Singh Bose Institute 93\1,A.P.C.Road Kolkata-700 009 West Bengal India From jimhu at tamu.edu Sat Aug 15 11:01:15 2009 From: jimhu at tamu.edu (Jim Hu) Date: Sat, 15 Aug 2009 10:01:15 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? Message-ID: Over on the Gbrowse list, Don Gilbert explained to me why genbank2gff3.pl is having problems with prokaryotic genomes. Has anyone written an alternative? Jim Hu ===================================== Jim Hu Associate Professor Dept. of Biochemistry and Biophysics 2128 TAMU Texas A&M Univ. College Station, TX 77843-2128 979-862-4054 From cjfields at illinois.edu Sat Aug 15 11:27:01 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:27:01 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: References: Message-ID: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> We (bioperl devs and users) would be very interested to have something like this included. I ran into a similar problem with genbank2gff3 a year ago with some of our work here on Archaea. I managed to get enough data out to get gbrowse up-and-running, but it required quite a bit of hand-editing. In fact, seeing as we're refactoring GFF and other aspects of Features in bioperl, this may be the best time to add something in. chris On Aug 15, 2009, at 10:01 AM, Jim Hu wrote: > Over on the Gbrowse list, Don Gilbert explained to me why > genbank2gff3.pl is having problems with prokaryotic genomes. Has > anyone written an alternative? > > Jim Hu > ===================================== > Jim Hu > Associate Professor > Dept. of Biochemistry and Biophysics > 2128 TAMU > Texas A&M Univ. > College Station, TX 77843-2128 > 979-862-4054 > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sat Aug 15 11:55:44 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 10:55:44 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <716af09c0908150417uadde09dr18f7dfee02d7d0f4@mail.gmail.com> Message-ID: On Aug 15, 2009, at 6:17 AM, Bernd Web wrote: > Hi > >>> Even SeqIO is hard >>> to figure out now--it took me an hour the other day to figure out >>> that >>> "desc" returns the full Fasta header, and I had to get that from the >>> module code + trial-and-error, instead of the online docs. > I was a bit surprised about $seq->desc retrieving the entire FASTA > header line > Actually, in Bioperl 1.52 at least $seq->desc returns the description > only, so without the ID. Thus, to get the entire FASTA header line > $seq->id . " " $seq->desc would be needed. Odd, not seeing where a change was made that would cause this behavior. Can you post an example? > For the modules I use (mainly related to sequences, such as SeqIO, > SimpleAlign), I'd be happy to contribute on docs, checking docs, or > examples. > > Regards, > Bernd Would be nice to have an Align/SimpleAlign HOWTO, but seeing as we want to refactor large chunks of that code, it might be slightly premature. That is, unless we want to document what behavior we expect to see as a sort of ROADMAP (maybe as part of the refactoring page). That could then be converted over to a HOWTO. Feel free to chip in on this in any way possible. The more documentation the better. chris From rmb32 at cornell.edu Sat Aug 15 12:44:03 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 09:44:03 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <85143.35343.qm@web30404.mail.mud.yahoo.com> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> Message-ID: <4A86E5D3.3030906@cornell.edu> The usual procedure for developing code is to exchange code via commits to a version control system. Yee, do you know how to use Subversion? Does Yee need a commit bit? Rob Yee Man Chan wrote: > Hi Chris > > I find that there is a memory access bug in my code. Attached is the fixed HMM.xs. This file together with the simpler typemap should fix all problems. (I hope..) > > Please let me know if it works for you. > > Sorry for the bug... > Yee Man > > --- On Fri, 8/14/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "Jonny Dalzell" , "BioPerl List" >> Date: Friday, August 14, 2009, 8:31 AM >> Yee Man, >> >> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >> appears to be 32-bit). The patch results in cleaning >> up warnings for 5.10.0 but results in similar warnings for >> 5.8.8 (linux or OS X). >> >> On OS X perl 5.8.8, this sometimes passes (note the first >> attempt fails, the second succeeds), so it's not entirely a >> 32-bit issue: >> >> http://gist.github.com/167860 >> >> OS X and perl 5.10.0, this always fails as the previous >> gist shows, but demonstrates similar behavior (multiple >> attempts to test get different responses): >> >> http://gist.github.com/167542 >> >> On linux, everything passes with or w/o the patched files >> (patched files have warnings as indicated above): >> >> Specs for all three perl executables (they vary a bit): >> >> http://gist.github.com/167883 >> >> chris >> >> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >> >>> Ah.. I find that the typemap can become as simple as >> this >>> ===================== >>> TYPEMAP >>> HMM * T_PTROBJ >>> ===================== >>> >>> Then the generated HMM.c will have a function called >> INT2PTR to do the pointer conversion. I believe this should >> solve the warnings. >>> Attached are the updated HMM.xs and typemap. Can >> someone with a 64-bit machine give it a try? >>> Thank you >>> Yee Man >>> --- On Thu, 8/13/09, Chris Fields >> wrote: >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>> Date: Thursday, August 13, 2009, 5:31 PM >>>> (just to point out to everyone, Yee >>>> Man's contact information was in the POD) >>>> >>>> Yee Man, >>>> >>>> I have the output in the below link: >>>> >>>> http://gist.github.com/167542 >>>> >>>> There are similar problems popping up on 32- and >> 64-bit >>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >> to debug >>>> it unfortunately. >>>> >>>> I think we should seriously consider spinning this >> code off >>>> into it's own distribution for CPAN. It's >>>> unfortunately bit-rotting away in >> bioperl-ext. If you >>>> want to continue supporting it I can help set that >> up. >>>> chris >>>> >>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>> >>>>> Hi >>>>> >>>>> So is this an HMM only >> problem? Or does >>>> it apply to other bioperl-ext modules? >>>>> What exactly are the >> compilation errors >>>> for HMM? I believe my implementation is just a >> simple one >>>> based on Rabiner's paper. >>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F~murphyk%2FBayes%2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner+hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>> >>>>> I don't think I did >> anything fancy that >>>> makes it machine dependent or non-ANSI C. >>>>> Yee Man >>>>> >>>>> --- On Thu, 8/13/09, Chris Fields >>>> wrote: >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Jonny Dalzell" , >>>> "BioPerl List" , >>>> "Yee Man Chan" >>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>> >>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >> wrote: >>>>>>> Jonny Dalzell wrote: >>>>>>>> Is it ridiculous of me to expect >> ubuntu to >>>> take >>>>>> care of this for me? How do >>>>>>>> I go about compiling the HMM? >>>>>>> Yes. This is a very specialized >> thing >>>> that >>>>>> you're doing, and Ubuntu does not have >> the >>>> resources to >>>>>> package every single thing. >>>>>>> Unfortunately, it looks like >> bioperl-ext >>>> package is >>>>>> not installable under Ubuntu 9.04 anyway, >> which is >>>> what I'm >>>>>> running. For others on this list, >> if >>>> somebody is >>>>>> interested in doing maintaining it, I'd be >> happy >>>> to help out >>>>>> by testing on Debian-based Linux >> platforms. >>>> We need to >>>>>> clarify this package's maintenance status: >> if >>>> there is >>>>>> nobody interested in maintaining it, I >> would >>>> recommend that >>>>>> bioperl-ext be removed from distribution. >>>> It's not in >>>>>> anybody's interest to have unmaintained >> software >>>> out there >>>>>> causing confusion. >>>>>> >>>>>> I have cc'd Yee Man Chan for this. >> If there >>>> isn't a >>>>>> response or the message bounces, we do one >> of two >>>> things: >>>>>> 1) consider it deprecated (probably >> safest). >>>>>> 2) spin it out into a separate module. >>>>>> >>>>>> Just tried to comile it myself and am >> getting >>>> errors (using >>>>>> 64bit perl 5.10), so I think, unless >> someone wants >>>> to take >>>>>> this on, option #1 is best. >>>>>> >>>>>>> So Jonny, in short, I would say "do >> not use >>>>>> bioperl-ext". >>>>>> >>>>>> In general, that's a safe bet. We're >> moving >>>> most of >>>>>> our C/C++ bindings to BioLib. >>>>>> >>>>>>> Step back. What are you trying >> to >>>>>> accomplish? Chris already >> recommended some >>>> alternative >>>>>> methods in his email of 8/11 on this >>>> subject. Perhaps >>>>>> we can guide you to some software that is >>>> actively >>>>>> maintained and will meet your needs. >>>>>>> Rob >>>>>> Exactly. Lots of other (better >> supported!) >>>> options >>>>>> out there. HMMER, SeqAn, and >> others. >>>>>> chris >>>>>> >>>>> >>>>> >>>> >>> __________________________________________________ >>> Do You Yahoo!? >>> Tired of spam? Yahoo! Mail has the best spam >> protection around >>> http://mail.yahoo.com >> _______________________________________________ >>> Bioperl-l mailing list >>> Bioperl-l at lists.open-bio.org >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >> > > > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From maj at fortinbras.us Sat Aug 15 13:40:26 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 13:40:26 -0400 Subject: [Bioperl-l] BLINK PARSER In-Reply-To: References: Message-ID: <34DBCBEA5E2D49A892E5077AA780BA4E@NewLife> Hi Sanjay- I'm not sure BioPerl has an interface specifically for BLINK (I will be corrected if I'm wrong, so stay tuned). If you can obtain the "raw" blast output for the protein you're interested in ( doing [BLINK] then [Other Views: BLAST] then [Format:Show: Alignment as Plain text] ) that text can be parsed using the Bio::SearchIO tools, and you can use Bio::Search::Tiling to obtain the 'best' hsps. This may not be too helpful, I'm afraid, but it is where I would start. Mark ----- Original Message ----- From: "sanjay singh" To: Sent: Saturday, August 15, 2009 9:38 AM Subject: [Bioperl-l] BLINK PARSER > Hi, > I want to submit query to NCBI'S BLINK and parsed the result for the best > hit. is there anyone have script to do so.i would be very grateful if > someone would like to share it with me. > regards > sanjay > > -- > Happy moments , praise God. > Difficult moments, seek God. > Quiet moments, worship God. > Painful moments, trust God. > Every moment, thank God > > Sanjay Kumar Singh > Bose Institute > 93\1,A.P.C.Road > Kolkata-700 009 > West Bengal > India > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Sat Aug 15 15:11:48 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 14:11:48 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A86E5D3.3030906@cornell.edu> References: <85143.35343.qm@web30404.mail.mud.yahoo.com> <4A86E5D3.3030906@cornell.edu> Message-ID: <8B7B3664-A0E2-4E66-82D6-982096F4C75E@illinois.edu> I'm not sure, but it makes more sense to commit these changes directly. Yee, need us to set you up with a commit bit? If so, fill out the information on this page: http://www.bioperl.org/wiki/SVN_Account_Request and forward it to support at open-bio.org. I'll sponsor you. chris On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: > The usual procedure for developing code is to exchange code via > commits to a version control system. Yee, do you know how to use > Subversion? Does Yee need a commit bit? > > Rob > > Yee Man Chan wrote: >> Hi Chris >> I find that there is a memory access bug in my code. Attached is >> the fixed HMM.xs. This file together with the simpler typemap >> should fix all problems. (I hope..) >> Please let me know if it works for you. >> Sorry for the bug... >> Yee Man >> --- On Fri, 8/14/09, Chris Fields wrote: >>> From: Chris Fields >>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >>> WinVista? >>> To: "Yee Man Chan" >>> Cc: "Robert Buels" , "Jonny Dalzell" >> >, "BioPerl List" >>> Date: Friday, August 14, 2009, 8:31 AM >>> Yee Man, >>> >>> I tested this out locally (perl 5.8.8 32-bit, perl 5.10.0 >>> 64-bit) and on dev.open-bio.org (which is perl 5.8.8, >>> appears to be 32-bit). The patch results in cleaning >>> up warnings for 5.10.0 but results in similar warnings for >>> 5.8.8 (linux or OS X). >>> >>> On OS X perl 5.8.8, this sometimes passes (note the first >>> attempt fails, the second succeeds), so it's not entirely a >>> 32-bit issue: >>> >>> http://gist.github.com/167860 >>> >>> OS X and perl 5.10.0, this always fails as the previous >>> gist shows, but demonstrates similar behavior (multiple >>> attempts to test get different responses): >>> >>> http://gist.github.com/167542 >>> >>> On linux, everything passes with or w/o the patched files >>> (patched files have warnings as indicated above): >>> >>> Specs for all three perl executables (they vary a bit): >>> >>> http://gist.github.com/167883 >>> >>> chris >>> >>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan wrote: >>> >>>> Ah.. I find that the typemap can become as simple as >>> this >>>> ===================== >>>> TYPEMAP >>>> HMM * T_PTROBJ >>>> ===================== >>>> >>>> Then the generated HMM.c will have a function called >>> INT2PTR to do the pointer conversion. I believe this should >>> solve the warnings. >>>> Attached are the updated HMM.xs and typemap. Can >>> someone with a 64-bit machine give it a try? >>>> Thank you >>>> Yee Man >>>> --- On Thu, 8/13/09, Chris Fields >>> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >>> package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >>> "Jonny Dalzell" , >>> "BioPerl List" >>>>> Date: Thursday, August 13, 2009, 5:31 PM >>>>> (just to point out to everyone, Yee >>>>> Man's contact information was in the POD) >>>>> >>>>> Yee Man, >>>>> >>>>> I have the output in the below link: >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> There are similar problems popping up on 32- and >>> 64-bit >>>>> perl 5.10.0, Mac OS X 10.5. Haven't had time >>> to debug >>>>> it unfortunately. >>>>> >>>>> I think we should seriously consider spinning this >>> code off >>>>> into it's own distribution for CPAN. It's >>>>> unfortunately bit-rotting away in >>> bioperl-ext. If you >>>>> want to continue supporting it I can help set that >>> up. >>>>> chris >>>>> >>>>> On Aug 13, 2009, at 6:58 PM, Yee Man Chan wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> So is this an HMM only >>> problem? Or does >>>>> it apply to other bioperl-ext modules? >>>>>> What exactly are the >>> compilation errors >>>>> for HMM? I believe my implementation is just a >>> simple one >>>>> based on Rabiner's paper. >>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>> ~murphyk%2FBayes >>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>> >>>>>> I don't think I did >>> anything fancy that >>>>> makes it machine dependent or non-ANSI C. >>>>>> Yee Man >>>>>> >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >>> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Robert Buels" >>>>>>> Cc: "Jonny Dalzell" , >>>>> "BioPerl List" , >>>>> "Yee Man Chan" >>>>>>> Date: Thursday, August 13, 2009, 3:18 PM >>>>>>> >>>>>>> On Aug 13, 2009, at 4:37 PM, Robert Buels >>> wrote: >>>>>>>> Jonny Dalzell wrote: >>>>>>>>> Is it ridiculous of me to expect >>> ubuntu to >>>>> take >>>>>>> care of this for me? How do >>>>>>>>> I go about compiling the HMM? >>>>>>>> Yes. This is a very specialized >>> thing >>>>> that >>>>>>> you're doing, and Ubuntu does not have >>> the >>>>> resources to >>>>>>> package every single thing. >>>>>>>> Unfortunately, it looks like >>> bioperl-ext >>>>> package is >>>>>>> not installable under Ubuntu 9.04 anyway, >>> which is >>>>> what I'm >>>>>>> running. For others on this list, >>> if >>>>> somebody is >>>>>>> interested in doing maintaining it, I'd be >>> happy >>>>> to help out >>>>>>> by testing on Debian-based Linux >>> platforms. >>>>> We need to >>>>>>> clarify this package's maintenance status: >>> if >>>>> there is >>>>>>> nobody interested in maintaining it, I >>> would >>>>> recommend that >>>>>>> bioperl-ext be removed from distribution. >>>>> It's not in >>>>>>> anybody's interest to have unmaintained >>> software >>>>> out there >>>>>>> causing confusion. >>>>>>> >>>>>>> I have cc'd Yee Man Chan for this. >>> If there >>>>> isn't a >>>>>>> response or the message bounces, we do one >>> of two >>>>> things: >>>>>>> 1) consider it deprecated (probably >>> safest). >>>>>>> 2) spin it out into a separate module. >>>>>>> >>>>>>> Just tried to comile it myself and am >>> getting >>>>> errors (using >>>>>>> 64bit perl 5.10), so I think, unless >>> someone wants >>>>> to take >>>>>>> this on, option #1 is best. >>>>>>> >>>>>>>> So Jonny, in short, I would say "do >>> not use >>>>>>> bioperl-ext". >>>>>>> >>>>>>> In general, that's a safe bet. We're >>> moving >>>>> most of >>>>>>> our C/C++ bindings to BioLib. >>>>>>> >>>>>>>> Step back. What are you trying >>> to >>>>>>> accomplish? Chris already >>> recommended some >>>>> alternative >>>>>>> methods in his email of 8/11 on this >>>>> subject. Perhaps >>>>>>> we can guide you to some software that is >>>>> actively >>>>>>> maintained and will meet your needs. >>>>>>>> Rob >>>>>>> Exactly. Lots of other (better >>> supported!) >>>>> options >>>>>>> out there. HMMER, SeqAn, and >>> others. >>>>>>> chris >>>>>>> >>>>>> >>>>>> >>>>> >>>> __________________________________________________ >>>> Do You Yahoo!? >>>> Tired of spam? Yahoo! Mail has the best spam >>> protection around >>>> http://mail.yahoo.com >>> _______________________________________________ >>>> Bioperl-l mailing list >>>> Bioperl-l at lists.open-bio.org >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>> >> > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu From hlapp at gmx.net Sat Aug 15 15:41:56 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 15:41:56 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: On Aug 14, 2009, at 11:41 PM, Chris Fields wrote: > I would take more up-to-date POD over wiki (maybe adding a Status: > for the methods), but a good HOWTO goes a long way in helping. It's > just too hard to cover every use case. I'd very much second this. An API documentation should arguably be written by the developer(s) and hence I would expect to find in the PODs. Use-cases, however, and how to solve those in BioPerl can and should be contributed by everyone, and the wiki is just way better at facilitating this. As for the FASTA example, I can understand - I've heard repeatedly from people that one of the things that they are missing is documentation for every SeqIO format we support (such as GenBank, UniProt, FASTA, etc) about where to find a particular piece of the format in the object model. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 15:53:31 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 15:53:31 -0400 Subject: [Bioperl-l] on BP documentation In-Reply-To: References: <1F899AA92F94415186CB0B25306F1114@NewLife> <6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: ----- Original Message ----- From: "Hilmar Lapp" ... > As for the FASTA example, I can understand - I've heard repeatedly > from people that one of the things that they are missing is > documentation for every SeqIO format we support (such as GenBank, > UniProt, FASTA, etc) about where to find a particular piece of the > format in the object model. .... This is the right thread for list lurkers to contribute their betes noires such as this one. I encourage ALL to post these issues and help create our list of action items. MAJ From hlapp at gmx.net Sat Aug 15 16:09:14 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:09:14 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> Message-ID: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > I'm planning to move Chase Miller's excellent NeXML read/write > implementation into the trunk, complete with tests. If we can get it > to pass the test suite, is there room in the point release for it? We've in the past stayed away from adding new features to stable branches with the exception of new methods in existing classes and that didn't do anything complicated. I'm not sure I remember everything but I think the NeXML support does exceed that level, doesn't it? Can it be rolled into its own pre- release that is a drop-in to an existing 1.6.x installation for those who want to go there? -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sat Aug 15 16:12:35 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:12:35 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A85F83A.30800@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: Great! Two suggestions: > ? deprecate the get_Annotations(Str) method in favor of > get_annotation(Str), which adheres better to standard perl method > naming Yes, but also is then inconsistent with existing BioPerl naming, with the method name indicating what type of object you get back (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in Bio::SeqI). > ? finally, split Bio::FeatureIO modules off into their own CPAN > distribution Wouldn't one start with this? -hilmar On Aug 14, 2009, at 7:50 PM, Robert Buels wrote: > Chris Fields wrote: >> Any help/suggestions for the above two would be greatly >> appreciated! Robert Buels may be heading up the initial FeatureIO >> work; I will likely start on LocatableSeq/Align (Mark, wanna help?). > > Sure, I'll head up the gff_refactor branch work. If you're > interested in what changes are being planned for Bio::SeqFeature::*, > Bio::Annotat*, and/or Bio::FeatureIO*, have a look at the > implementation plan Chris and I developed just now on IRC, which is at > > http://www.bioperl.org/wiki/GFF_Refactor#Implementation_Plan > > Now soliciting suggestions, comments, and assistance. > > Rob > > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From rmb32 at cornell.edu Sat Aug 15 16:24:35 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 13:24:35 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <4A871983.4010702@cornell.edu> Hilmar Lapp wrote: > I'm not sure I remember everything but I think the NeXML support does > exceed that level, doesn't it? Can it be rolled into its own pre-release > that is a drop-in to an existing 1.6.x installation for those who want > to go there? So split it out into its own CPAN dist. Rob From maj at fortinbras.us Sat Aug 15 16:36:47 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 16:36:47 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> Message-ID: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Yes, I'd say the Nexml support exceeds the 'complicated' test. There are no modifications to existing modules (except for the addition of annotation attributes to members of the Bio::PopGen model, which are don't-cares to anything out there currently). The manifest of a NeXML drop-in would look like Bio/NexmlIO.pm Bio/Nexml/Factory.pm Bio/SeqIO/nexml.pm Bio/AlignIO/nexml.pm Bio/TreeIO/nexml.pm and, if I get it completed, support for arbitrary characters via Bio::PopGen Bio/PopGen/IO/nexml.pm (all based on hacks of Chase's code, btw; we thought it would round out the package nicely...) Of course, the big dependency that not everyone will need or want is Rutger's Bio::Phylo, so the Nexml support will have to be optional even in 1.7, I think. I am adding run-time checks for Bio::Phylo in the modules so they die relatively gracefully and informatively, rather than just barf. Also, the tests will have appropriate skip blocks. I do want to get the code into bioperl-live, however, unless there's a gotcha there I'm not seeing-- cheers MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:09 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 14, 2009, at 7:45 PM, Mark A. Jensen wrote: > >> I'm planning to move Chase Miller's excellent NeXML read/write >> implementation into the trunk, complete with tests. If we can get it to pass >> the test suite, is there room in the point release for it? > > > We've in the past stayed away from adding new features to stable branches > with the exception of new methods in existing classes and that didn't do > anything complicated. > > I'm not sure I remember everything but I think the NeXML support does exceed > that level, doesn't it? Can it be rolled into its own pre- release that is a > drop-in to an existing 1.6.x installation for those who want to go there? > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From hlapp at gmx.net Sat Aug 15 16:49:22 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sat, 15 Aug 2009 16:49:22 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <307089ED92AD46539EEF45EE2D8F5A81@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> Message-ID: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > I do want to get the code into bioperl-live, however, unless there's > a gotcha there I'm not seeing-- That sounds great to me, though it may make some of Chris' hair stand on end if he wants this to go into a separate module from the start :) Maybe a phylogenetics module can be carved out that this would become part of? Though I recall someone saying recently that Bio::Species and by extension Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to split out. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From maj at fortinbras.us Sat Aug 15 17:07:30 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sat, 15 Aug 2009 17:07:30 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> Message-ID: <659CA35CE3AD464AA516D18B313311BE@NewLife> I'm all for an attempt to split out phylogenetic stuff, it seems natural, and think in terms of a phylo package dependent upon a sequence package, and if necessary vice versa -- although if the Bio::Species - Bio::Tree::Node connection is relatively loose, perhaps we can refactor to make some attributes/methods optional features that carp when the phylo package is not installed. (Roles, anyone?) However, probably 1.6.x doesn't sound like the place to do that! I myself wouldn't have any problem waiting till 1.7 for 'official' Nexml support--but I hope Chase will chime in on that. What does Chris think? MAJ ----- Original Message ----- From: "Hilmar Lapp" To: "Mark A. Jensen" Cc: "Chris Fields" ; "BioPerl List" Sent: Saturday, August 15, 2009 4:49 PM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 15, 2009, at 4:36 PM, Mark A. Jensen wrote: > >> I do want to get the code into bioperl-live, however, unless there's a >> gotcha there I'm not seeing-- > > > That sounds great to me, though it may make some of Chris' hair stand on end > if he wants this to go into a separate module from the start :) Maybe a > phylogenetics module can be carved out that this would become part of? Though > I recall someone saying recently that Bio::Species and by extension > Bio::SeqIO is dependent on Bio::Tree::Node, so maybe that's not realistic to > split out. > > -hilmar > -- > =========================================================== > : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : > =========================================================== > > > > > From rmb32 at cornell.edu Sat Aug 15 17:23:40 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:23:40 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <4A85F83A.30800@cornell.edu> Message-ID: <4A87275C.5040300@cornell.edu> Hilmar Lapp wrote: >> ? deprecate the get_Annotations(Str) method in favor of >> get_annotation(Str), which adheres better to standard perl method naming > > Yes, but also is then inconsistent with existing BioPerl naming, with > the method name indicating what type of object you get back > (Bio::AnnotationI in this case; see also e.g., get_SeqFeatures() in > Bio::SeqI). Blech. OK never mind about the method rename then. > >> ? finally, split Bio::FeatureIO modules off into their own CPAN >> distribution > > Wouldn't one start with this? Yeah....I've kind of been vacillating back and forth about whether it would be best to *start* with this, or to end with this. Probably makes more sense to start with it, since it gives more freedom to add dependencies on more CPAN stuff without worrying too much. Like...oh...I don't know...Moose? Thoughts on this? Rob From rmb32 at cornell.edu Sat Aug 15 17:25:51 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sat, 15 Aug 2009 14:25:51 -0700 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> Message-ID: <4A8727DF.7000204@cornell.edu> Chris Fields wrote: > In fact, seeing as we're refactoring GFF and other aspects of Features > in bioperl, this may be the best time to add something in. Reading that thread, it sounds like most of the issues revolve around when and how to use the unflattener. Perhaps just adding another command line switch or two to the script would be appropriate? Editorializing a bit, it's really disheartening that Genbank stores features in such a lossy way. Rob From cjfields at illinois.edu Sat Aug 15 22:05:41 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:05:41 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <241652.96493.qm@web30404.mail.mud.yahoo.com> References: <241652.96493.qm@web30404.mail.mud.yahoo.com> Message-ID: I'm still seeing the same errors on Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl (v5.8.8) passes fine now (as well as perl 5.8.8 on dev.open-bio.org). I'm wondering if this is a problem with my local perl build. I'm very tempted to push the HMM-related code into a separate distribution (bioperl-hmm) and make a CPAN release out of it so it gets wider testing via CPAN testers; it would just require a minimum bioperl 1.6 installation for Bio::Tools::HMM and any related modules. Yee, would that be okay with you? chris On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: > > I just committed HMM.xs and typemap to SVN. Can you test it to > confirm it works in 64-bit machines? > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Robert Buels" >> Cc: "Yee Man Chan" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 12:11 PM >> I'm not sure, but it makes more sense >> to commit these changes directly. Yee, need us to set >> you up with a commit bit? If so, fill out the >> information on this page: >> >> http://www.bioperl.org/wiki/SVN_Account_Request >> >> and forward it to support at open-bio.org. >> I'll sponsor you. >> >> chris >> >> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >> >>> The usual procedure for developing code is to exchange >> code via commits to a version control system. Yee, do >> you know how to use Subversion? Does Yee need a commit bit? >>> >>> Rob >>> >>> Yee Man Chan wrote: >>>> Hi Chris >>>> I find that there is a memory >> access bug in my code. Attached is the fixed HMM.xs. This >> file together with the simpler typemap should fix all >> problems. (I hope..) >>>> Please let me know if it works >> for you. >>>> Sorry for the bug... >>>> Yee Man >>>> --- On Fri, 8/14/09, Chris Fields >> wrote: >>>>> From: Chris Fields >>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext package on WinVista? >>>>> To: "Yee Man Chan" >>>>> Cc: "Robert Buels" , >> "Jonny Dalzell" , >> "BioPerl List" >>>>> Date: Friday, August 14, 2009, 8:31 AM >>>>> Yee Man, >>>>> >>>>> I tested this out locally (perl 5.8.8 32-bit, >> perl 5.10.0 >>>>> 64-bit) and on dev.open-bio.org (which is perl >> 5.8.8, >>>>> appears to be 32-bit). The patch results >> in cleaning >>>>> up warnings for 5.10.0 but results in similar >> warnings for >>>>> 5.8.8 (linux or OS X). >>>>> >>>>> On OS X perl 5.8.8, this sometimes passes >> (note the first >>>>> attempt fails, the second succeeds), so it's >> not entirely a >>>>> 32-bit issue: >>>>> >>>>> http://gist.github.com/167860 >>>>> >>>>> OS X and perl 5.10.0, this always fails as the >> previous >>>>> gist shows, but demonstrates similar behavior >> (multiple >>>>> attempts to test get different responses): >>>>> >>>>> http://gist.github.com/167542 >>>>> >>>>> On linux, everything passes with or w/o the >> patched files >>>>> (patched files have warnings as indicated >> above): >>>>> >>>>> Specs for all three perl executables (they >> vary a bit): >>>>> >>>>> http://gist.github.com/167883 >>>>> >>>>> chris >>>>> >>>>> On Aug 14, 2009, at 3:27 AM, Yee Man Chan >> wrote: >>>>> >>>>>> Ah.. I find that the typemap can become as >> simple as >>>>> this >>>>>> ===================== >>>>>> TYPEMAP >>>>>> HMM * T_PTROBJ >>>>>> ===================== >>>>>> >>>>>> Then the generated HMM.c will have a >> function called >>>>> INT2PTR to do the pointer conversion. I >> believe this should >>>>> solve the warnings. >>>>>> Attached are the updated HMM.xs and >> typemap. Can >>>>> someone with a 64-bit machine give it a try? >>>>>> Thank you >>>>>> Yee Man >>>>>> --- On Thu, 8/13/09, Chris Fields >>>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>>> package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>>> "Jonny Dalzell" , >>>>> "BioPerl List" >>>>>>> Date: Thursday, August 13, 2009, 5:31 >> PM >>>>>>> (just to point out to everyone, Yee >>>>>>> Man's contact information was in the >> POD) >>>>>>> >>>>>>> Yee Man, >>>>>>> >>>>>>> I have the output in the below link: >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> There are similar problems popping up >> on 32- and >>>>> 64-bit >>>>>>> perl 5.10.0, Mac OS X 10.5. >> Haven't had time >>>>> to debug >>>>>>> it unfortunately. >>>>>>> >>>>>>> I think we should seriously consider >> spinning this >>>>> code off >>>>>>> into it's own distribution for >> CPAN. It's >>>>>>> unfortunately bit-rotting away in >>>>> bioperl-ext. If you >>>>>>> want to continue supporting it I can >> help set that >>>>> up. >>>>>>> chris >>>>>>> >>>>>>> On Aug 13, 2009, at 6:58 PM, Yee Man >> Chan wrote: >>>>>>> >>>>>>>> Hi >>>>>>>> >>>>>>>> So is this >> an HMM only >>>>> problem? Or does >>>>>>> it apply to other bioperl-ext >> modules? >>>>>>>> What >> exactly are the >>>>> compilation errors >>>>>>> for HMM? I believe my implementation >> is just a >>>>> simple one >>>>>>> based on Rabiner's paper. >>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>> ~murphyk%2FBayes >>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>> >>>>>>>> I don't >> think I did >>>>> anything fancy that >>>>>>> makes it machine dependent or non-ANSI >> C. >>>>>>>> Yee Man >>>>>>>> >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Robert Buels" >>>>>>>>> Cc: "Jonny Dalzell" , >>>>>>> "BioPerl List" , >>>>>>> "Yee Man Chan" >>>>>>>>> Date: Thursday, August 13, >> 2009, 3:18 PM >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 4:37 PM, >> Robert Buels >>>>> wrote: >>>>>>>>>> Jonny Dalzell wrote: >>>>>>>>>>> Is it ridiculous of me >> to expect >>>>> ubuntu to >>>>>>> take >>>>>>>>> care of this for me? How >> do >>>>>>>>>>> I go about compiling >> the HMM? >>>>>>>>>> Yes. This is a very >> specialized >>>>> thing >>>>>>> that >>>>>>>>> you're doing, and Ubuntu does >> not have >>>>> the >>>>>>> resources to >>>>>>>>> package every single thing. >>>>>>>>>> Unfortunately, it looks >> like >>>>> bioperl-ext >>>>>>> package is >>>>>>>>> not installable under Ubuntu >> 9.04 anyway, >>>>> which is >>>>>>> what I'm >>>>>>>>> running. For others on >> this list, >>>>> if >>>>>>> somebody is >>>>>>>>> interested in doing >> maintaining it, I'd be >>>>> happy >>>>>>> to help out >>>>>>>>> by testing on Debian-based >> Linux >>>>> platforms. >>>>>>> We need to >>>>>>>>> clarify this package's >> maintenance status: >>>>> if >>>>>>> there is >>>>>>>>> nobody interested in >> maintaining it, I >>>>> would >>>>>>> recommend that >>>>>>>>> bioperl-ext be removed from >> distribution. >>>>>>> It's not in >>>>>>>>> anybody's interest to have >> unmaintained >>>>> software >>>>>>> out there >>>>>>>>> causing confusion. >>>>>>>>> >>>>>>>>> I have cc'd Yee Man Chan for >> this. >>>>> If there >>>>>>> isn't a >>>>>>>>> response or the message >> bounces, we do one >>>>> of two >>>>>>> things: >>>>>>>>> 1) consider it deprecated >> (probably >>>>> safest). >>>>>>>>> 2) spin it out into a separate >> module. >>>>>>>>> >>>>>>>>> Just tried to comile it myself >> and am >>>>> getting >>>>>>> errors (using >>>>>>>>> 64bit perl 5.10), so I think, >> unless >>>>> someone wants >>>>>>> to take >>>>>>>>> this on, option #1 is best. >>>>>>>>> >>>>>>>>>> So Jonny, in short, I >> would say "do >>>>> not use >>>>>>>>> bioperl-ext". >>>>>>>>> >>>>>>>>> In general, that's a safe >> bet. We're >>>>> moving >>>>>>> most of >>>>>>>>> our C/C++ bindings to BioLib. >>>>>>>>> >>>>>>>>>> Step back. What are >> you trying >>>>> to >>>>>>>>> accomplish? Chris >> already >>>>> recommended some >>>>>>> alternative >>>>>>>>> methods in his email of 8/11 >> on this >>>>>>> subject. Perhaps >>>>>>>>> we can guide you to some >> software that is >>>>>>> actively >>>>>>>>> maintained and will meet your >> needs. >>>>>>>>>> Rob >>>>>>>>> Exactly. Lots of other >> (better >>>>> supported!) >>>>>>> options >>>>>>>>> out there. HMMER, SeqAn, >> and >>>>> others. >>>>>>>>> chris >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >> __________________________________________________ >>>>>> Do You Yahoo!? >>>>>> Tired of spam? Yahoo! Mail has the >> best spam >>>>> protection around >>>>>> http://mail.yahoo.com >>>>> >> _______________________________________________ >>>>>> Bioperl-l mailing list >>>>>> Bioperl-l at lists.open-bio.org >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>> >>>> >>> >>> >>> --Robert Buels >>> Bioinformatics Analyst, Sol Genomics Network >>> Boyce Thompson Institute for Plant Research >>> Tower Rd >>> Ithaca, NY 14853 >>> Tel: 503-889-8539 >>> rmb32 at cornell.edu >>> http://www.sgn.cornell.edu >> >> > > > From cjfields at illinois.edu Sat Aug 15 22:49:25 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:49:25 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <659CA35CE3AD464AA516D18B313311BE@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> Message-ID: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> On Aug 15, 2009, at 4:07 PM, Mark A. Jensen wrote: > I'm all for an attempt to split out phylogenetic stuff, it > seems natural, and think in terms of a phylo package > dependent upon a sequence package, and if necessary > vice versa -- although if the Bio::Species - Bio::Tree::Node > connection is relatively loose, perhaps we can refactor to > make some attributes/methods optional features that carp > when the phylo package is not installed. (Roles, anyone?) I'm pretty sure they're linked very tightly (Species is-a Bio::Taxon is-a Bio::Tree::Node). This may be something Sendu needs to chime in on; he refactored much of that code prior to 1.5.2. As a suggestion, maybe we can use a combined strategy: fall back to a very simple Bio::Species container class if a bioperl-phylo isn't installed, but utilize Bio::Taxon when it is. > However, probably 1.6.x doesn't sound like the place to > do that! I myself wouldn't have any problem waiting till > 1.7 for 'official' Nexml support--but I hope Chase will chime > in on that. What does Chris think? > MAJ Robert's suggestion of a separate distribution makes sense; it may be one avenue of slowly migrating out phylo-specific code into it's own distribution. Not sure about calling it bioperl-phylo (which might be confused with Rutger's Bio::Phylo). chris From cjfields at illinois.edu Sat Aug 15 22:47:36 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sat, 15 Aug 2009 21:47:36 -0500 Subject: [Bioperl-l] genbank2gff3 for prokaryotes? In-Reply-To: <4A8727DF.7000204@cornell.edu> References: <24272770-A7BD-41EB-934E-8E1B448CF66C@illinois.edu> <4A8727DF.7000204@cornell.edu> Message-ID: <81C3E545-4F0E-4B1F-9F06-398D1EE7A3CF@illinois.edu> On Aug 15, 2009, at 4:25 PM, Robert Buels wrote: > Chris Fields wrote: > > In fact, seeing as we're refactoring GFF and other aspects of > Features > > in bioperl, this may be the best time to add something in. > > Reading that thread, it sounds like most of the issues revolve > around when and how to use the unflattener. Perhaps just adding > another command line switch or two to the script would be appropriate? > > Editorializing a bit, it's really disheartening that Genbank stores > features in such a lossy way. > > Rob Just remembered: NCBI does supply GFF3 files for bacterial genomes, but I'm not sure how well they correspond to the GFF3 specification. For example: ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Aquifex_aeolicus/NC_000918.gff A quick glance looks okay, but they don't include FASTA sequence. I think much of the problem with NCBI/GenBank has to do with lack of curation on how submissions are made (lots of inconsistencies). I'm not sure how easy they will be to deal with, but the only way we can deal with that is looking at examples of problematic data (IIRC the Sulfolobus solfataricus genome GB file was a mess, so maybe that's worth a look). chris From cjfields at illinois.edu Sun Aug 16 01:38:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 00:38:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <846546.73578.qm@web30404.mail.mud.yahoo.com> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> Message-ID: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Yee, I took the liberty of making a few simple changes to Bio::Tools::HMM in svn to point out the problem and possible solutions. Feel free to revert these as needed. I'm seeing two errors, which appear randomly when running 'make test'. The first is easily fixable, the second, I'm not so sure. I'll let you make the decisions on both. 1) There is an assumption in the module that, when adding floating points, you will always get 1.0. You may run into problems: see 'perldoc -q long decimals'. Lines like this (two places in the module): ... if ($sum != 1.0) { $self->throw("Sum of probabilities for each state must be 1.0; got $sum\n"); } ... won't work as expected (note I added a simple diagnostic, just print out the 'bad' sum). With perl 5.8.8, this appears to work fine, but this is what I get with perl 5.10 (64-bit): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== Initial Probability Array: 0.499978 0.500022 Transition Probability Matrix: 0.499978 0.500022 0.499978 0.500022 Emission Probability Matrix: 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 0.133333 0.143333 0.163333 0.123333 0.143333 0.293333 Log Probability of sequence 1: -521.808 Log Probability of sequence 2: -426.057 Statistical Training ==================== Initial Probability Array: 1 0 Transition Probability Matrix: ------------- EXCEPTION ------------- MSG: Sum of probabilities for each from-state must be 1.0; got 0.999999999999999976 STACK Bio::Tools::HMM::transition_prob /Users/cjfields/bioperl/bioperl- live/Bio/Tools/HMM.pm:499 STACK toplevel test.pl:82 ------------------------------------- make: *** [test_dynamic] Error 255 I'm assuming this needs to simply be rounded up to 1.0. That could be accomplished with something like 'if (sprintf("%.2f", $sum) != 1.0) {...}' 2) The second error is a little stranger. I have been randomly getting this: pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 When I add strict and warnings pragmas to Bio::Tools::HMM (with a little additional cleanup to get things running), I get an additional warning (arrow): pyrimidine1:HMM cjfields$ make test PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" "-Iblib/arch" test.pl Argument "FL" isn't numeric in numeric lt (<) at /Users/cjfields/ bioperl/bioperl-live/Bio/Tools/HMM.pm line 188. <---- Baum-Welch Training =================== S should be monotonic increasing! make: *** [test_dynamic] Error 255 So something is not being converted as expected. chris On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: > When are you going to release 1.6? Maybe let me work on it before it > releases. If it doesn't resolve the problem, then we can think about > other alternatives. > > Also, please show me the latest errors you have for 5.10.0. > > Thanks > Yee Man > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 7:05 PM >> I'm still seeing the same errors on >> Mac OS X for 64-bit perl 5.10.0. Mac OS X, native perl >> (v5.8.8) passes fine now (as well as perl 5.8.8 on >> dev.open-bio.org). >> >> I'm wondering if this is a problem with my local perl >> build. I'm very tempted to push the HMM-related code >> into a separate distribution (bioperl-hmm) and make a CPAN >> release out of it so it gets wider testing via CPAN testers; >> it would just require a minimum bioperl 1.6 installation for >> Bio::Tools::HMM and any related modules. Yee, would >> that be okay with you? >> >> chris >> >> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >> >>> >>> I just committed HMM.xs and typemap to SVN. Can you >> test it to confirm it works in 64-bit machines? >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Robert Buels" >>>> Cc: "Yee Man Chan" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 12:11 PM >>>> I'm not sure, but it makes more sense >>>> to commit these changes directly. Yee, need >> us to set >>>> you up with a commit bit? If so, fill out >> the >>>> information on this page: >>>> >>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>> >>>> and forward it to support at open-bio.org. >>>> I'll sponsor you. >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 11:44 AM, Robert Buels wrote: >>>> >>>>> The usual procedure for developing code is to >> exchange >>>> code via commits to a version control >> system. Yee, do >>>> you know how to use Subversion? Does Yee need a >> commit bit? >>>>> >>>>> Rob >>>>> >>>>> Yee Man Chan wrote: >>>>>> Hi Chris >>>>>> I find that there is a >> memory >>>> access bug in my code. Attached is the fixed >> HMM.xs. This >>>> file together with the simpler typemap should fix >> all >>>> problems. (I hope..) >>>>>> Please let me know if it >> works >>>> for you. >>>>>> Sorry for the bug... >>>>>> Yee Man >>>>>> --- On Fri, 8/14/09, Chris Fields >>>> wrote: >>>>>>> From: Chris Fields >>>>>>> Subject: Re: [Bioperl-l] Problems >> with >>>> Bioperl-ext package on WinVista? >>>>>>> To: "Yee Man Chan" >>>>>>> Cc: "Robert Buels" , >>>> "Jonny Dalzell" , >>>> "BioPerl List" >>>>>>> Date: Friday, August 14, 2009, 8:31 >> AM >>>>>>> Yee Man, >>>>>>> >>>>>>> I tested this out locally (perl 5.8.8 >> 32-bit, >>>> perl 5.10.0 >>>>>>> 64-bit) and on dev.open-bio.org (which >> is perl >>>> 5.8.8, >>>>>>> appears to be 32-bit). The patch >> results >>>> in cleaning >>>>>>> up warnings for 5.10.0 but results in >> similar >>>> warnings for >>>>>>> 5.8.8 (linux or OS X). >>>>>>> >>>>>>> On OS X perl 5.8.8, this sometimes >> passes >>>> (note the first >>>>>>> attempt fails, the second succeeds), >> so it's >>>> not entirely a >>>>>>> 32-bit issue: >>>>>>> >>>>>>> http://gist.github.com/167860 >>>>>>> >>>>>>> OS X and perl 5.10.0, this always >> fails as the >>>> previous >>>>>>> gist shows, but demonstrates similar >> behavior >>>> (multiple >>>>>>> attempts to test get different >> responses): >>>>>>> >>>>>>> http://gist.github.com/167542 >>>>>>> >>>>>>> On linux, everything passes with or >> w/o the >>>> patched files >>>>>>> (patched files have warnings as >> indicated >>>> above): >>>>>>> >>>>>>> Specs for all three perl executables >> (they >>>> vary a bit): >>>>>>> >>>>>>> http://gist.github.com/167883 >>>>>>> >>>>>>> chris >>>>>>> >>>>>>> On Aug 14, 2009, at 3:27 AM, Yee Man >> Chan >>>> wrote: >>>>>>> >>>>>>>> Ah.. I find that the typemap can >> become as >>>> simple as >>>>>>> this >>>>>>>> ===================== >>>>>>>> TYPEMAP >>>>>>>> HMM * T_PTROBJ >>>>>>>> ===================== >>>>>>>> >>>>>>>> Then the generated HMM.c will have >> a >>>> function called >>>>>>> INT2PTR to do the pointer conversion. >> I >>>> believe this should >>>>>>> solve the warnings. >>>>>>>> Attached are the updated HMM.xs >> and >>>> typemap. Can >>>>>>> someone with a 64-bit machine give it >> a try? >>>>>>>> Thank you >>>>>>>> Yee Man >>>>>>>> --- On Thu, 8/13/09, Chris Fields >> >>>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems with >>>> Bioperl-ext >>>>>>> package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>>> "Jonny Dalzell" , >>>>>>> "BioPerl List" >>>>>>>>> Date: Thursday, August 13, >> 2009, 5:31 >>>> PM >>>>>>>>> (just to point out to >> everyone, Yee >>>>>>>>> Man's contact information was >> in the >>>> POD) >>>>>>>>> >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I have the output in the below >> link: >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> There are similar problems >> popping up >>>> on 32- and >>>>>>> 64-bit >>>>>>>>> perl 5.10.0, Mac OS X 10.5. >>>> Haven't had time >>>>>>> to debug >>>>>>>>> it unfortunately. >>>>>>>>> >>>>>>>>> I think we should seriously >> consider >>>> spinning this >>>>>>> code off >>>>>>>>> into it's own distribution >> for >>>> CPAN. It's >>>>>>>>> unfortunately bit-rotting away >> in >>>>>>> bioperl-ext. If you >>>>>>>>> want to continue supporting it >> I can >>>> help set that >>>>>>> up. >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 13, 2009, at 6:58 PM, >> Yee Man >>>> Chan wrote: >>>>>>>>> >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> So is >> this >>>> an HMM only >>>>>>> problem? Or does >>>>>>>>> it apply to other bioperl-ext >>>> modules? >>>>>>>>>> What >>>> exactly are the >>>>>>> compilation errors >>>>>>>>> for HMM? I believe my >> implementation >>>> is just a >>>>>>> simple one >>>>>>>>> based on Rabiner's paper. >>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>> >>>>>>>>>> I >> don't >>>> think I did >>>>>>> anything fancy that >>>>>>>>> makes it machine dependent or >> non-ANSI >>>> C. >>>>>>>>>> Yee Man >>>>>>>>>> >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Robert Buels" >> >>>>>>>>>>> Cc: "Jonny Dalzell" >> , >>>>>>>>> "BioPerl List" , >>>>>>>>> "Yee Man Chan" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 3:18 PM >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 4:37 PM, >>>> Robert Buels >>>>>>> wrote: >>>>>>>>>>>> Jonny Dalzell >> wrote: >>>>>>>>>>>>> Is it >> ridiculous of me >>>> to expect >>>>>>> ubuntu to >>>>>>>>> take >>>>>>>>>>> care of this for >> me? How >>>> do >>>>>>>>>>>>> I go about >> compiling >>>> the HMM? >>>>>>>>>>>> Yes. This is >> a very >>>> specialized >>>>>>> thing >>>>>>>>> that >>>>>>>>>>> you're doing, and >> Ubuntu does >>>> not have >>>>>>> the >>>>>>>>> resources to >>>>>>>>>>> package every single >> thing. >>>>>>>>>>>> Unfortunately, it >> looks >>>> like >>>>>>> bioperl-ext >>>>>>>>> package is >>>>>>>>>>> not installable under >> Ubuntu >>>> 9.04 anyway, >>>>>>> which is >>>>>>>>> what I'm >>>>>>>>>>> running. For >> others on >>>> this list, >>>>>>> if >>>>>>>>> somebody is >>>>>>>>>>> interested in doing >>>> maintaining it, I'd be >>>>>>> happy >>>>>>>>> to help out >>>>>>>>>>> by testing on >> Debian-based >>>> Linux >>>>>>> platforms. >>>>>>>>> We need to >>>>>>>>>>> clarify this >> package's >>>> maintenance status: >>>>>>> if >>>>>>>>> there is >>>>>>>>>>> nobody interested in >>>> maintaining it, I >>>>>>> would >>>>>>>>> recommend that >>>>>>>>>>> bioperl-ext be removed >> from >>>> distribution. >>>>>>>>> It's not in >>>>>>>>>>> anybody's interest to >> have >>>> unmaintained >>>>>>> software >>>>>>>>> out there >>>>>>>>>>> causing confusion. >>>>>>>>>>> >>>>>>>>>>> I have cc'd Yee Man >> Chan for >>>> this. >>>>>>> If there >>>>>>>>> isn't a >>>>>>>>>>> response or the >> message >>>> bounces, we do one >>>>>>> of two >>>>>>>>> things: >>>>>>>>>>> 1) consider it >> deprecated >>>> (probably >>>>>>> safest). >>>>>>>>>>> 2) spin it out into a >> separate >>>> module. >>>>>>>>>>> >>>>>>>>>>> Just tried to comile >> it myself >>>> and am >>>>>>> getting >>>>>>>>> errors (using >>>>>>>>>>> 64bit perl 5.10), so I >> think, >>>> unless >>>>>>> someone wants >>>>>>>>> to take >>>>>>>>>>> this on, option #1 is >> best. >>>>>>>>>>> >>>>>>>>>>>> So Jonny, in >> short, I >>>> would say "do >>>>>>> not use >>>>>>>>>>> bioperl-ext". >>>>>>>>>>> >>>>>>>>>>> In general, that's a >> safe >>>> bet. We're >>>>>>> moving >>>>>>>>> most of >>>>>>>>>>> our C/C++ bindings to >> BioLib. >>>>>>>>>>> >>>>>>>>>>>> Step back. >> What are >>>> you trying >>>>>>> to >>>>>>>>>>> accomplish? >> Chris >>>> already >>>>>>> recommended some >>>>>>>>> alternative >>>>>>>>>>> methods in his email >> of 8/11 >>>> on this >>>>>>>>> subject. Perhaps >>>>>>>>>>> we can guide you to >> some >>>> software that is >>>>>>>>> actively >>>>>>>>>>> maintained and will >> meet your >>>> needs. >>>>>>>>>>>> Rob >>>>>>>>>>> Exactly. Lots of >> other >>>> (better >>>>>>> supported!) >>>>>>>>> options >>>>>>>>>>> out there. >> HMMER, SeqAn, >>>> and >>>>>>> others. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>> >> __________________________________________________ >>>>>>>> Do You Yahoo!? >>>>>>>> Tired of spam? Yahoo! Mail >> has the >>>> best spam >>>>>>> protection around >>>>>>>> http://mail.yahoo.com >>>>>>> >>>> >> _______________________________________________ >>>>>>>> Bioperl-l mailing list >>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>> >>>>>> >>>>> >>>>> >>>>> --Robert Buels >>>>> Bioinformatics Analyst, Sol Genomics Network >>>>> Boyce Thompson Institute for Plant Research >>>>> Tower Rd >>>>> Ithaca, NY 14853 >>>>> Tel: 503-889-8539 >>>>> rmb32 at cornell.edu >>>>> http://www.sgn.cornell.edu >>>> >>>> >>> >>> >>> >> >> > > > From abhishek.vit at gmail.com Sun Aug 16 04:06:49 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 04:06:49 -0400 Subject: [Bioperl-l] About binning data for histograms Message-ID: Hi All After a lot of look up on forums I could google, I am finally posting my question here. I think it may not be appropriate for this mailing list. I apologize for this first up. The question is regarding dynamic binning of data points for histogram plots. So I have many hashes, each having a "numerical" coverage data obtained from Next generation sequencing data analysis. Now each hash may have couple of hundred to thousands entry "contig_name => coverage". What I want to do is to plot a histogram for each hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N has to be binned according to the data size). I am using Chart::Gnuplot for this but I am not able to figure out how to bin the data points to fit nicely on a screen. Is there any smart/quick method to do this. Any pointers will help a great deal. Best Regards, -Abhi From bix at sendu.me.uk Sun Aug 16 05:21:11 2009 From: bix at sendu.me.uk (Sendu Bala) Date: Sun, 16 Aug 2009 10:21:11 +0100 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <4A87CF87.7030803@sendu.me.uk> Abhishek Pratap wrote: > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. http://en.wikipedia.org/wiki/Histogram#Number_of_bins_and_width Like it says, it depends on the data, but it's worth trying them out to see if one of them gives you anything sensible. From sdavis2 at mail.nih.gov Sun Aug 16 07:48:23 2009 From: sdavis2 at mail.nih.gov (Sean Davis) Date: Sun, 16 Aug 2009 07:48:23 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <264855a00908160448i2691fc08t472fc0d83afbb356@mail.gmail.com> On Sun, Aug 16, 2009 at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > Hi, Abhi. You could use R, but you got that already. ; ) However, you might look here for a perl solution. http://search.cpan.org/~whizdog/GDGraph-histogram-1.1/lib/GD/Graph/histogram.pm Sean From cjfields at illinois.edu Sun Aug 16 08:53:29 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 07:53:29 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <217259.7083.qm@web30408.mail.mud.yahoo.com> References: <217259.7083.qm@web30408.mail.mud.yahoo.com> Message-ID: <05D89C95-261C-47B5-A4C6-794D36DD5FB8@illinois.edu> That worked! Thanks Yee Man! chris ps - let me know how you want to deal with a release. On Aug 16, 2009, at 4:36 AM, Yee Man Chan wrote: > Hi Chris > > Thanks for your suggestions. I think it is indeed better to check > sum to 1.0 using sprintf. I fixed this in the newly committed HMM.pm > > I also fixed codes that will lead to warnings with use warnings. > > So now the only problem left is that "monotonic increasing" error. > For that part of the code, I was trying to perform an expectation > maximization step. Theoretically, the expectation should > monotonically increase in every step. But I suppose this is not > necessarily true when double precision floating point numbers are > involved. I don't know why I used a 1e-100 tolerance for this. > Therefore I "fixed" it by using the same tolerance to terminate the > maximization step (ie .000001). I suppose this "fix" will make it > much more unlikely to throw exception with your 5.10.0 perl. > > Can you give that a try again and see if it works now. > > Thank you > Yee Man > > > > --- On Sat, 8/15/09, Chris Fields wrote: > >> From: Chris Fields >> Subject: Re: [Bioperl-l] Problems with Bioperl-ext package on >> WinVista? >> To: "Yee Man Chan" >> Cc: "Robert Buels" , "BioPerl List" > > >> Date: Saturday, August 15, 2009, 10:38 PM >> Yee, >> >> I took the liberty of making a few simple changes to >> Bio::Tools::HMM in svn to point out the problem and possible >> solutions. Feel free to revert these as needed. >> >> I'm seeing two errors, which appear randomly when running >> 'make test'. The first is easily fixable, the second, >> I'm not so sure. I'll let you make the decisions on >> both. >> >> 1) There is an assumption in the module that, when >> adding floating points, you will always get 1.0. You >> may run into problems: see 'perldoc -q long decimals'. >> Lines like this (two places in the module): >> ... >> if ($sum != 1.0) { >> $self->throw("Sum of >> probabilities for each state must be 1.0; got $sum\n"); >> } >> ... >> >> won't work as expected (note I added a simple diagnostic, >> just print out the 'bad' sum). With perl 5.8.8, this >> appears to work fine, but this is what I get with perl 5.10 >> (64-bit): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> Initial Probability Array: >> 0.499978 0.500022 >> Transition Probability Matrix: >> 0.499978 0.500022 >> 0.499978 0.500022 >> Emission Probability Matrix: >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> 0.133333 0.143333 >> 0.163333 0.123333 >> 0.143333 0.293333 >> >> Log Probability of sequence 1: -521.808 >> Log Probability of sequence 2: -426.057 >> >> Statistical Training >> ==================== >> Initial Probability Array: >> 1 0 >> Transition Probability Matrix: >> >> ------------- EXCEPTION ------------- >> MSG: Sum of probabilities for each from-state must be 1.0; >> got 0.999999999999999976 >> >> STACK Bio::Tools::HMM::transition_prob >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm:499 >> STACK toplevel test.pl:82 >> ------------------------------------- >> >> make: *** [test_dynamic] Error 255 >> >> I'm assuming this needs to simply be rounded up to >> 1.0. That could be accomplished with something like >> 'if (sprintf("%.2f", $sum) != 1.0) {...}' >> >> 2) The second error is a little stranger. I have been >> randomly getting this: >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> When I add strict and warnings pragmas to Bio::Tools::HMM >> (with a little additional cleanup to get things running), I >> get an additional warning (arrow): >> >> pyrimidine1:HMM cjfields$ make test >> PERL_DL_NONLAZY=1 /opt/perl/bin/perl "-Iblib/lib" >> "-Iblib/arch" test.pl >> Argument "FL" isn't numeric in numeric lt (<) at >> /Users/cjfields/bioperl/bioperl-live/Bio/Tools/HMM.pm line >> 188. <---- >> Baum-Welch Training >> =================== >> S should be monotonic increasing! >> make: *** [test_dynamic] Error 255 >> >> So something is not being converted as expected. >> >> chris >> >> On Aug 15, 2009, at 11:32 PM, Yee Man Chan wrote: >> >>> When are you going to release 1.6? Maybe let me work >> on it before it releases. If it doesn't resolve the problem, >> then we can think about other alternatives. >>> >>> Also, please show me the latest errors you have for >> 5.10.0. >>> >>> Thanks >>> Yee Man >>> >>> --- On Sat, 8/15/09, Chris Fields >> wrote: >>> >>>> From: Chris Fields >>>> Subject: Re: [Bioperl-l] Problems with Bioperl-ext >> package on WinVista? >>>> To: "Yee Man Chan" >>>> Cc: "Robert Buels" , >> "BioPerl List" >>>> Date: Saturday, August 15, 2009, 7:05 PM >>>> I'm still seeing the same errors on >>>> Mac OS X for 64-bit perl 5.10.0. Mac OS X, >> native perl >>>> (v5.8.8) passes fine now (as well as perl 5.8.8 >> on >>>> dev.open-bio.org). >>>> >>>> I'm wondering if this is a problem with my local >> perl >>>> build. I'm very tempted to push the >> HMM-related code >>>> into a separate distribution (bioperl-hmm) and >> make a CPAN >>>> release out of it so it gets wider testing via >> CPAN testers; >>>> it would just require a minimum bioperl 1.6 >> installation for >>>> Bio::Tools::HMM and any related modules. >> Yee, would >>>> that be okay with you? >>>> >>>> chris >>>> >>>> On Aug 15, 2009, at 8:23 PM, Yee Man Chan wrote: >>>> >>>>> >>>>> I just committed HMM.xs and typemap to SVN. >> Can you >>>> test it to confirm it works in 64-bit machines? >>>>> >>>>> Thanks >>>>> Yee Man >>>>> >>>>> --- On Sat, 8/15/09, Chris Fields >>>> wrote: >>>>> >>>>>> From: Chris Fields >>>>>> Subject: Re: [Bioperl-l] Problems with >> Bioperl-ext >>>> package on WinVista? >>>>>> To: "Robert Buels" >>>>>> Cc: "Yee Man Chan" , >>>> "BioPerl List" >>>>>> Date: Saturday, August 15, 2009, 12:11 PM >>>>>> I'm not sure, but it makes more sense >>>>>> to commit these changes directly. >> Yee, need >>>> us to set >>>>>> you up with a commit bit? If so, >> fill out >>>> the >>>>>> information on this page: >>>>>> >>>>>> http://www.bioperl.org/wiki/SVN_Account_Request >>>>>> >>>>>> and forward it to support at open-bio.org. >>>>>> I'll sponsor you. >>>>>> >>>>>> chris >>>>>> >>>>>> On Aug 15, 2009, at 11:44 AM, Robert Buels >> wrote: >>>>>> >>>>>>> The usual procedure for developing >> code is to >>>> exchange >>>>>> code via commits to a version control >>>> system. Yee, do >>>>>> you know how to use Subversion? Does Yee >> need a >>>> commit bit? >>>>>>> >>>>>>> Rob >>>>>>> >>>>>>> Yee Man Chan wrote: >>>>>>>> Hi Chris >>>>>>>> I find >> that there is a >>>> memory >>>>>> access bug in my code. Attached is the >> fixed >>>> HMM.xs. This >>>>>> file together with the simpler typemap >> should fix >>>> all >>>>>> problems. (I hope..) >>>>>>>> Please let >> me know if it >>>> works >>>>>> for you. >>>>>>>> Sorry for the bug... >>>>>>>> Yee Man >>>>>>>> --- On Fri, 8/14/09, Chris Fields >> >>>>>> wrote: >>>>>>>>> From: Chris Fields >>>>>>>>> Subject: Re: [Bioperl-l] >> Problems >>>> with >>>>>> Bioperl-ext package on WinVista? >>>>>>>>> To: "Yee Man Chan" >>>>>>>>> Cc: "Robert Buels" , >>>>>> "Jonny Dalzell" , >>>>>> "BioPerl List" >>>>>>>>> Date: Friday, August 14, 2009, >> 8:31 >>>> AM >>>>>>>>> Yee Man, >>>>>>>>> >>>>>>>>> I tested this out locally >> (perl 5.8.8 >>>> 32-bit, >>>>>> perl 5.10.0 >>>>>>>>> 64-bit) and on >> dev.open-bio.org (which >>>> is perl >>>>>> 5.8.8, >>>>>>>>> appears to be 32-bit). >> The patch >>>> results >>>>>> in cleaning >>>>>>>>> up warnings for 5.10.0 but >> results in >>>> similar >>>>>> warnings for >>>>>>>>> 5.8.8 (linux or OS X). >>>>>>>>> >>>>>>>>> On OS X perl 5.8.8, this >> sometimes >>>> passes >>>>>> (note the first >>>>>>>>> attempt fails, the second >> succeeds), >>>> so it's >>>>>> not entirely a >>>>>>>>> 32-bit issue: >>>>>>>>> >>>>>>>>> http://gist.github.com/167860 >>>>>>>>> >>>>>>>>> OS X and perl 5.10.0, this >> always >>>> fails as the >>>>>> previous >>>>>>>>> gist shows, but demonstrates >> similar >>>> behavior >>>>>> (multiple >>>>>>>>> attempts to test get >> different >>>> responses): >>>>>>>>> >>>>>>>>> http://gist.github.com/167542 >>>>>>>>> >>>>>>>>> On linux, everything passes >> with or >>>> w/o the >>>>>> patched files >>>>>>>>> (patched files have warnings >> as >>>> indicated >>>>>> above): >>>>>>>>> >>>>>>>>> Specs for all three perl >> executables >>>> (they >>>>>> vary a bit): >>>>>>>>> >>>>>>>>> http://gist.github.com/167883 >>>>>>>>> >>>>>>>>> chris >>>>>>>>> >>>>>>>>> On Aug 14, 2009, at 3:27 AM, >> Yee Man >>>> Chan >>>>>> wrote: >>>>>>>>> >>>>>>>>>> Ah.. I find that the >> typemap can >>>> become as >>>>>> simple as >>>>>>>>> this >>>>>>>>>> ===================== >>>>>>>>>> TYPEMAP >>>>>>>>>> HMM * >> T_PTROBJ >>>>>>>>>> ===================== >>>>>>>>>> >>>>>>>>>> Then the generated HMM.c >> will have >>>> a >>>>>> function called >>>>>>>>> INT2PTR to do the pointer >> conversion. >>>> I >>>>>> believe this should >>>>>>>>> solve the warnings. >>>>>>>>>> Attached are the updated >> HMM.xs >>>> and >>>>>> typemap. Can >>>>>>>>> someone with a 64-bit machine >> give it >>>> a try? >>>>>>>>>> Thank you >>>>>>>>>> Yee Man >>>>>>>>>> --- On Thu, 8/13/09, Chris >> Fields >>>> >>>>>>>>> wrote: >>>>>>>>>>> From: Chris Fields >> >>>>>>>>>>> Subject: Re: >> [Bioperl-l] >>>> Problems with >>>>>> Bioperl-ext >>>>>>>>> package on WinVista? >>>>>>>>>>> To: "Yee Man Chan" >> >>>>>>>>>>> Cc: "Robert Buels" >> , >>>>>>>>> "Jonny Dalzell" , >>>>>>>>> "BioPerl List" >>>>>>>>>>> Date: Thursday, August >> 13, >>>> 2009, 5:31 >>>>>> PM >>>>>>>>>>> (just to point out to >>>> everyone, Yee >>>>>>>>>>> Man's contact >> information was >>>> in the >>>>>> POD) >>>>>>>>>>> >>>>>>>>>>> Yee Man, >>>>>>>>>>> >>>>>>>>>>> I have the output in >> the below >>>> link: >>>>>>>>>>> >>>>>>>>>>> http://gist.github.com/167542 >>>>>>>>>>> >>>>>>>>>>> There are similar >> problems >>>> popping up >>>>>> on 32- and >>>>>>>>> 64-bit >>>>>>>>>>> perl 5.10.0, Mac OS X >> 10.5. >>>>>> Haven't had time >>>>>>>>> to debug >>>>>>>>>>> it unfortunately. >>>>>>>>>>> >>>>>>>>>>> I think we should >> seriously >>>> consider >>>>>> spinning this >>>>>>>>> code off >>>>>>>>>>> into it's own >> distribution >>>> for >>>>>> CPAN. It's >>>>>>>>>>> unfortunately >> bit-rotting away >>>> in >>>>>>>>> bioperl-ext. If you >>>>>>>>>>> want to continue >> supporting it >>>> I can >>>>>> help set that >>>>>>>>> up. >>>>>>>>>>> chris >>>>>>>>>>> >>>>>>>>>>> On Aug 13, 2009, at >> 6:58 PM, >>>> Yee Man >>>>>> Chan wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi >>>>>>>>>>>> >>>>>>>>>>>> >> So is >>>> this >>>>>> an HMM only >>>>>>>>> problem? Or does >>>>>>>>>>> it apply to other >> bioperl-ext >>>>>> modules? >>>>>>>>>>>> >> What >>>>>> exactly are the >>>>>>>>> compilation errors >>>>>>>>>>> for HMM? I believe my >>>> implementation >>>>>> is just a >>>>>>>>> simple one >>>>>>>>>>> based on Rabiner's >> paper. >>>>>>>>>>>> http://www.google.com/url?sa=t&source=web&ct=res&cd=1&url=http%3A%2F%2Fwww.cs.ubc.ca%2F >>>>>>>>>>>> ~murphyk%2FBayes >>>>>>>>>>>> %2Frabiner.pdf&ei=QqiESvClHNCfkQXbh8mWBw&rct=j&q=rabiner >>>>>>>>>>>> +hmm&usg=AFQjCNHeXLhTHmuKUXKKCHYSs58TxVGfZg >>>>>>>>>>>> >>>>>>>>>>>> >> I >>>> don't >>>>>> think I did >>>>>>>>> anything fancy that >>>>>>>>>>> makes it machine >> dependent or >>>> non-ANSI >>>>>> C. >>>>>>>>>>>> Yee Man >>>>>>>>>>>> >>>>>>>>>>>> --- On Thu, >> 8/13/09, Chris >>>> Fields >>>>>> >>>>>>>>>>> wrote: >>>>>>>>>>>>> From: Chris >> Fields >>>> >>>>>>>>>>>>> Subject: Re: >>>> [Bioperl-l] >>>>>> Problems with >>>>>>>>> Bioperl-ext >>>>>>>>>>> package on WinVista? >>>>>>>>>>>>> To: "Robert >> Buels" >>>> >>>>>>>>>>>>> Cc: "Jonny >> Dalzell" >>>> , >>>>>>>>>>> "BioPerl List" , >>>>>>>>>>> "Yee Man Chan" >>>>>>>>>>>>> Date: >> Thursday, August >>>> 13, >>>>>> 2009, 3:18 PM >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 13, >> 2009, at >>>> 4:37 PM, >>>>>> Robert Buels >>>>>>>>> wrote: >>>>>>>>>>>>>> Jonny >> Dalzell >>>> wrote: >>>>>>>>>>>>>>> Is it >>>> ridiculous of me >>>>>> to expect >>>>>>>>> ubuntu to >>>>>>>>>>> take >>>>>>>>>>>>> care of this >> for >>>> me? How >>>>>> do >>>>>>>>>>>>>>> I go >> about >>>> compiling >>>>>> the HMM? >>>>>>>>>>>>>> Yes. >> This is >>>> a very >>>>>> specialized >>>>>>>>> thing >>>>>>>>>>> that >>>>>>>>>>>>> you're doing, >> and >>>> Ubuntu does >>>>>> not have >>>>>>>>> the >>>>>>>>>>> resources to >>>>>>>>>>>>> package every >> single >>>> thing. >>>>>>>>>>>>>> >> Unfortunately, it >>>> looks >>>>>> like >>>>>>>>> bioperl-ext >>>>>>>>>>> package is >>>>>>>>>>>>> not >> installable under >>>> Ubuntu >>>>>> 9.04 anyway, >>>>>>>>> which is >>>>>>>>>>> what I'm >>>>>>>>>>>>> running. >> For >>>> others on >>>>>> this list, >>>>>>>>> if >>>>>>>>>>> somebody is >>>>>>>>>>>>> interested in >> doing >>>>>> maintaining it, I'd be >>>>>>>>> happy >>>>>>>>>>> to help out >>>>>>>>>>>>> by testing on >>>> Debian-based >>>>>> Linux >>>>>>>>> platforms. >>>>>>>>>>> We need to >>>>>>>>>>>>> clarify this >>>> package's >>>>>> maintenance status: >>>>>>>>> if >>>>>>>>>>> there is >>>>>>>>>>>>> nobody >> interested in >>>>>> maintaining it, I >>>>>>>>> would >>>>>>>>>>> recommend that >>>>>>>>>>>>> bioperl-ext be >> removed >>>> from >>>>>> distribution. >>>>>>>>>>> It's not in >>>>>>>>>>>>> anybody's >> interest to >>>> have >>>>>> unmaintained >>>>>>>>> software >>>>>>>>>>> out there >>>>>>>>>>>>> causing >> confusion. >>>>>>>>>>>>> >>>>>>>>>>>>> I have cc'd >> Yee Man >>>> Chan for >>>>>> this. >>>>>>>>> If there >>>>>>>>>>> isn't a >>>>>>>>>>>>> response or >> the >>>> message >>>>>> bounces, we do one >>>>>>>>> of two >>>>>>>>>>> things: >>>>>>>>>>>>> 1) consider >> it >>>> deprecated >>>>>> (probably >>>>>>>>> safest). >>>>>>>>>>>>> 2) spin it out >> into a >>>> separate >>>>>> module. >>>>>>>>>>>>> >>>>>>>>>>>>> Just tried to >> comile >>>> it myself >>>>>> and am >>>>>>>>> getting >>>>>>>>>>> errors (using >>>>>>>>>>>>> 64bit perl >> 5.10), so I >>>> think, >>>>>> unless >>>>>>>>> someone wants >>>>>>>>>>> to take >>>>>>>>>>>>> this on, >> option #1 is >>>> best. >>>>>>>>>>>>> >>>>>>>>>>>>>> So Jonny, >> in >>>> short, I >>>>>> would say "do >>>>>>>>> not use >>>>>>>>>>>>> bioperl-ext". >>>>>>>>>>>>> >>>>>>>>>>>>> In general, >> that's a >>>> safe >>>>>> bet. We're >>>>>>>>> moving >>>>>>>>>>> most of >>>>>>>>>>>>> our C/C++ >> bindings to >>>> BioLib. >>>>>>>>>>>>> >>>>>>>>>>>>>> Step >> back. >>>> What are >>>>>> you trying >>>>>>>>> to >>>>>>>>>>>>> accomplish? >>>> Chris >>>>>> already >>>>>>>>> recommended some >>>>>>>>>>> alternative >>>>>>>>>>>>> methods in his >> email >>>> of 8/11 >>>>>> on this >>>>>>>>>>> subject. >> Perhaps >>>>>>>>>>>>> we can guide >> you to >>>> some >>>>>> software that is >>>>>>>>>>> actively >>>>>>>>>>>>> maintained and >> will >>>> meet your >>>>>> needs. >>>>>>>>>>>>>> Rob >>>>>>>>>>>>> Exactly. >> Lots of >>>> other >>>>>> (better >>>>>>>>> supported!) >>>>>>>>>>> options >>>>>>>>>>>>> out there. >>>> HMMER, SeqAn, >>>>>> and >>>>>>>>> others. >>>>>>>>>>>>> chris >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>> >>>> >> __________________________________________________ >>>>>>>>>> Do You Yahoo!? >>>>>>>>>> Tired of spam? >> Yahoo! Mail >>>> has the >>>>>> best spam >>>>>>>>> protection around >>>>>>>>>> http://mail.yahoo.com >>>>>>>>> >>>>>> >>>> >> _______________________________________________ >>>>>>>>>> Bioperl-l mailing list >>>>>>>>>> Bioperl-l at lists.open-bio.org >>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> --Robert Buels >>>>>>> Bioinformatics Analyst, Sol Genomics >> Network >>>>>>> Boyce Thompson Institute for Plant >> Research >>>>>>> Tower Rd >>>>>>> Ithaca, NY 14853 >>>>>>> Tel: 503-889-8539 >>>>>>> rmb32 at cornell.edu >>>>>>> http://www.sgn.cornell.edu >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >> >> > > > From hlapp at gmx.net Sun Aug 16 11:07:39 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:07:39 -0400 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> Message-ID: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > I'm assuming this needs to simply be rounded up to 1.0. That could > be accomplished with something like 'if (sprintf("%.2f", $sum) != > 1.0) {...}' Couldn't you just test for the absolute difference being smaller than some reasonable epsilon? That might be more efficient (and more explicit) than printing to a string. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From hlapp at gmx.net Sun Aug 16 11:13:54 2009 From: hlapp at gmx.net (Hilmar Lapp) Date: Sun, 16 Aug 2009 11:13:54 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > Not sure about calling it bioperl-phylo (which might be confused > with Rutger's Bio::Phylo). Frankly, it seems to me that either is more powerful in combination with the other, so I don't quite see how the name suggesting some linkage isn't a Good Thing rather than bad. -hilmar -- =========================================================== : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net : =========================================================== From cjfields at illinois.edu Sun Aug 16 11:42:50 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:42:50 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> References: <846546.73578.qm@web30404.mail.mud.yahoo.com> <91A9ADBF-B93F-4C78-838F-67CAA6C2B47D@illinois.edu> <40C4DAAA-F815-4DC6-8384-0B3C714AE439@gmx.net> Message-ID: On Aug 16, 2009, at 10:07 AM, Hilmar Lapp wrote: > > On Aug 16, 2009, at 1:38 AM, Chris Fields wrote: > >> I'm assuming this needs to simply be rounded up to 1.0. That could >> be accomplished with something like 'if (sprintf("%.2f", $sum) != >> 1.0) {...}' > > > Couldn't you just test for the absolute difference being smaller > than some reasonable epsilon? That might be more efficient (and more > explicit) than printing to a string. > > -hilmar Yes, either way is fine. Re: floating point and sprintf, acc. to the perlfaq4, as perl doesn't have a round() function the sprintf() idiom is suggested (and commonly used). chris From cjfields at illinois.edu Sun Aug 16 11:48:52 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 10:48:52 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: > >> Not sure about calling it bioperl-phylo (which might be confused >> with Rutger's Bio::Phylo). > > > Frankly, it seems to me that either is more powerful in combination > with the other, so I don't quite see how the name suggesting some > linkage isn't a Good Thing rather than bad. > > -hilmar I don't have a problem as long as there is some emphasis they are two separate, but related, projects. There is quite a bit of crossover between the two (particularly with the last few bioperl-related GSoC projects), but I would rather not have to worry about users emailing the list wondering why something in bioperl-phylo doesn't work when they installed Bio::Phylo instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended module with bioperl-phylo to alleviate that? chris From maj at fortinbras.us Sun Aug 16 12:59:40 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Sun, 16 Aug 2009 12:59:40 -0400 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> Message-ID: <44D32BE895F446A9917A5550485AB102@NewLife> I see both points- I think Chris's suggestion is good. The nexml support won't work without Bio::Phylo, but not everyone will need that support, so if the install can be chatty about this that would be great- ----- Original Message ----- From: "Chris Fields" To: "Hilmar Lapp" Cc: "BioPerl List" ; "Mark A. Jensen" ; "chase Miller" Sent: Sunday, August 16, 2009 11:48 AM Subject: Re: [Bioperl-l] GFF and LocatableSeq refactoring > > On Aug 16, 2009, at 10:13 AM, Hilmar Lapp wrote: > >> On Aug 15, 2009, at 10:49 PM, Chris Fields wrote: >> >>> Not sure about calling it bioperl-phylo (which might be confused with >>> Rutger's Bio::Phylo). >> >> >> Frankly, it seems to me that either is more powerful in combination with the >> other, so I don't quite see how the name suggesting some linkage isn't a >> Good Thing rather than bad. >> >> -hilmar > > I don't have a problem as long as there is some emphasis they are two > separate, but related, projects. There is quite a bit of crossover between > the two (particularly with the last few bioperl-related GSoC projects), but I > would rather not have to worry about users emailing the list wondering why > something in bioperl-phylo doesn't work when they installed Bio::Phylo > instead (or vice-versa). Maybe Bio::Phylo could be added as a recommended > module with bioperl-phylo to alleviate that? > > chris > > From rmb32 at cornell.edu Sun Aug 16 13:16:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Sun, 16 Aug 2009 10:16:18 -0700 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <44D32BE895F446A9917A5550485AB102@NewLife> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> Message-ID: <4A883EE2.3060101@cornell.edu> Mark A. Jensen wrote: > I see both points- I think Chris's suggestion is good. The nexml support > won't work without Bio::Phylo, but not everyone will need that support, > so if the install can be chatty about this that would be great- Maybe the parts that have differing dependencies should be in different distros then? Rob From jason at bioperl.org Sun Aug 16 13:25:08 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 13:25:08 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: References: Message-ID: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> For binning of a distribution see the perl module Statistics::Descriptive - http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm function: frequency_distritibution I would also look at R histogram function for the plotting. This would be one of the easiest ways - I would just make a perl script that generates the correct R code that can be used to make the plots. On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > Hi All > > After a lot of look up on forums I could google, I am finally posting > my question here. I think it may not be appropriate for this mailing > list. I apologize for this first up. The question is regarding dynamic > binning of data points for histogram plots. > > So I have many hashes, each having a "numerical" coverage data > obtained from Next generation sequencing data analysis. Now each hash > may have couple of hundred to thousands entry "contig_name => > coverage". What I want to do is to plot a histogram for each > hash/dataset. "Coverage v/s Count of contigs with coverage > #N " ( N > has to be binned according to the data size). > > I am using Chart::Gnuplot for this but I am not able to figure out how > to bin the data points to fit nicely on a screen. Is there any > smart/quick method to do this. > > Any pointers will help a great deal. > > Best Regards, > -Abhi > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l -- Jason Stajich jason.stajich at gmail.com jason at bioperl.org From abhishek.vit at gmail.com Sun Aug 16 13:34:54 2009 From: abhishek.vit at gmail.com (Abhishek Pratap) Date: Sun, 16 Aug 2009 13:34:54 -0400 Subject: [Bioperl-l] About binning data for histograms In-Reply-To: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> References: <3594EADE-7127-43FB-AB2F-D66CC179DF4C@bioperl.org> Message-ID: Thanks All. I completely forgot and dint realize that histogram function in R could auto bin based on the data. Cheers, -Abhi On Sun, Aug 16, 2009 at 1:25 PM, Jason Stajich wrote: > For binning of a distribution see the perl module Statistics::Descriptive - > http://search.cpan.org/~colink/Statistics-Descriptive-2.6/Descriptive.pm?function: > frequency_distritibution > > I would also look at R histogram function for the plotting. ?This would be > one of the easiest ways - I would just make a perl script that generates the > correct R code that can be used to make the plots. > > > On Aug 16, 2009, at 4:06 AM, Abhishek Pratap wrote: > >> Hi All >> >> After a lot of look up on forums I could google, I am finally posting >> my question here. I think it may not be appropriate for this mailing >> list. I apologize for this first up. The question is regarding dynamic >> binning of data points for histogram plots. >> >> So I have many hashes, each having a "numerical" coverage data >> obtained from Next generation sequencing data analysis. Now each hash >> may have couple of hundred to thousands entry "contig_name => >> coverage". ?What I want to do is to plot a histogram for each >> hash/dataset. ?"Coverage v/s Count of contigs with coverage > #N " ( N >> has to be binned according to the data size). >> >> I am using Chart::Gnuplot for this but I am not able to figure out how >> to bin the data points to fit nicely on a screen. Is there any >> smart/quick method to do this. >> >> Any pointers will help a great deal. >> >> Best Regards, >> -Abhi >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > -- > Jason Stajich > jason.stajich at gmail.com > jason at bioperl.org > > From robert.bradbury at gmail.com Sun Aug 16 15:16:09 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Sun, 16 Aug 2009 15:16:09 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? Message-ID: Hello, I am trying to use get_sequence() to fetch the sequence NS_000198 for the fungus *Podospora anserina* with the databases "GenBank" and when that didn't work "Gene". This is a simple script which fetches the sequence then writes out the fasta and genbank files from the data structure. The errors I got suggested that the system was running out of memory which I thought was unlikely since I've got something like 3GB of main memory and 9GB of swap space. After running strace on the script (which takes a while) I determined that the brk() calls were generating ENOMEM at ~3GB. This turns out to be due to the limit of the Linux memory model I am using (3GB/1GB) on a Pentium IV (Prescott). Now, I think the total genome size for the fungus is ~70MB but haven't verified this so I "should" be able to fetch it unless Bioperl (or perl itself) is doing extremely poor memory management (perhaps not coalescing memory segments into one large sequence) as the reads take place? [1]. Has anyone encountered this problem (fetching say large mammalian chromosomes)? Does anyone know what the limits are for "fetching" sequence files (on 32/64 bit machines?. The reason I am using get_sequence and BioPerl is that I can't seem to find the *Podospora anserina* sequence in a FTP database anywhere (so I can't use "wget or ftp"). I haven't tested accessing the GenBank file in a browser (I don't know what browsers would do with a HTML file that large but suspect it would not be pretty). Thanks in advance, Robert Bradbury 1. The strace seems to indicate periodic brk() calls to expand the process data segment size between which there are lots of read() calls of size 4096, presumably reading the socket from NCBI. I don't know if there is an easy way to trace perl's memory allocation/manipulation at a higher level. From jason at bioperl.org Sun Aug 16 15:22:35 2009 From: jason at bioperl.org (Jason Stajich) Date: Sun, 16 Aug 2009 15:22:35 -0400 Subject: [Bioperl-l] Limit on sequence file size fetches? In-Reply-To: References: Message-ID: <93672502-26EB-4C30-A37E-F3B593E57279@bioperl.org> Robert - Posting your script will help us replicate and diagnose - I am not sure which GenBank fetch option you are using. I have a feeling it is trying to do recursive calls to stitch together the pseudoscaffold. I presume it works find though if you request the each chromosome scaffold like CU607053,CU633438, ... I guess posting it via a bugzilla bug is the best way unless you have a git account and wanted to post it as a 'gist'. -jason -- Jason Stajich jason at bioperl.org http://fungalgenomes.org/ On Aug 16, 2009, at 3:16 PM, Robert Bradbury wrote: > Hello, > > I am trying to use get_sequence() to fetch the sequence NS_000198 > for the > fungus *Podospora anserina* with the databases "GenBank" and when that > didn't work "Gene". This is a simple script which fetches the > sequence then > writes out the fasta and genbank files from the data structure. > > The errors I got suggested that the system was running out of memory > which I > thought was unlikely since I've got something like 3GB of main > memory and > 9GB of swap space. After running strace on the script (which takes > a while) > I determined that the brk() calls were generating ENOMEM at ~3GB. > This > turns out to be due to the limit of the Linux memory model I am using > (3GB/1GB) on a Pentium IV (Prescott). > > Now, I think the total genome size for the fungus is ~70MB but haven't > verified this so I "should" be able to fetch it unless Bioperl (or > perl > itself) is doing extremely poor memory management (perhaps not > coalescing > memory segments into one large sequence) as the reads take place? [1]. > > Has anyone encountered this problem (fetching say large mammalian > chromosomes)? Does anyone know what the limits are for "fetching" > sequence > files (on 32/64 bit machines?. The reason I am using get_sequence and > BioPerl is that I can't seem to find the *Podospora anserina* > sequence in a > FTP database anywhere (so I can't use "wget or ftp"). I haven't > tested > accessing the GenBank file in a browser (I don't know what browsers > would do > with a HTML file that large but suspect it would not be pretty). > > Thanks in advance, > Robert Bradbury > > 1. The strace seems to indicate periodic brk() calls to expand the > process > data segment size between which there are lots of read() calls of > size 4096, > presumably reading the socket from NCBI. I don't know if there is > an easy > way to trace perl's memory allocation/manipulation at a higher level. > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Sun Aug 16 15:42:56 2009 From: cjfields at illinois.edu (Chris Fields) Date: Sun, 16 Aug 2009 14:42:56 -0500 Subject: [Bioperl-l] GFF and LocatableSeq refactoring In-Reply-To: <4A883EE2.3060101@cornell.edu> References: <2997D937-B062-45CD-86D6-41F570E4899D@illinois.edu> <375A06F4-F711-4DD3-9FB3-E2FE3670A573@gmx.net> <307089ED92AD46539EEF45EE2D8F5A81@NewLife> <63558691-29FA-40AB-9654-8740D80ED60F@gmx.net> <659CA35CE3AD464AA516D18B313311BE@NewLife> <671FAD60-9FCB-4535-9254-94762B4AA305@illinois.edu> <44D32BE895F446A9917A5550485AB102@NewLife> <4A883EE2.3060101@cornell.edu> Message-ID: <69B8C887-1C5E-47B4-9168-8509BB0A5528@illinois.edu> On Aug 16, 2009, at 12:16 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> I see both points- I think Chris's suggestion is good. The nexml >> support >> won't work without Bio::Phylo, but not everyone will need that >> support, >> so if the install can be chatty about this that would be great- > > Maybe the parts that have differing dependencies should be in > different distros then? > > Rob I'm guessing large chunks of that code would have Bio::Root::Root as a base, so I think maintaining related code split into two distributions too problematic. Simple to indicate that Bio::Phylo is required only for NeXML (so listing it as a 'recommends') and keep everything NeXML- related and requiring Bio::Root::Root in one spot. It's possible something inheriting from Bio::Phylo could go there, but that's up to Rutger. chris From maj at fortinbras.us Mon Aug 17 08:43:33 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 08:43:33 -0400 Subject: [Bioperl-l] new NeXML I/O modules Message-ID: Hi All- I'm pleased to announce that my Google Summer of Code student Chase Miller and I have successfully migrated his modules for NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is Rutger Vos' highly flexible, highly annotable standard for evolutionary data exchange, that is catching on in the evolutionary DB world. We hope these modules will help move that process along. I also want to say that Chase has been a terrific student and collaborator. He learned the not only the complexities of BioPerl IO from scratch, but also grokked Rutger's Bio::Phylo internals, and became familiar with and applied modern OO concepts. He also wrote tests (which pass!), complete POD, and a HOWTO (at http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this work. Best of all, he finished! (Well, as much as anything is ever finished around here.) I for one hope he will continue to use his commit bit for good and not evil. cheers, Mark From deequan at gmail.com Mon Aug 17 09:06:44 2009 From: deequan at gmail.com (David Quan) Date: Mon, 17 Aug 2009 09:06:44 -0400 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? Message-ID: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Hello there, I've been browsing around bioperl documentation and have used a blast parser, but am wondering if it is possible to use the start and end information for a hit to trace back to a gene in genbank and extract the sequence for that gene? I have not been able to find elements that would work in such a way. Hints and recommendations for elements that would be capable of behaving in such a way would be greatly appreciated. Thanks very much. David N. Quan -- Love of country is, at heart, trust in a nation's people, faith in their better nature, esteem for their best hopes, understanding for the magnificence and the distinctiveness and the huge, infinitely shaded cultural palette of their simple humanity. --Bradley Burston From akarger at CGR.Harvard.edu Mon Aug 17 09:04:29 2009 From: akarger at CGR.Harvard.edu (Amir Karger) Date: Mon, 17 Aug 2009 09:04:29 -0400 Subject: [Bioperl-l] on BP documentation References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> Message-ID: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> > -----Original Message----- > From: Mark A. Jensen [mailto:maj at fortinbras.us] > > From: "Hilmar Lapp" > ... > > As for the FASTA example, I can understand - I've heard > repeatedly > > from people that one of the things that they are missing is > > documentation for every SeqIO format we support (such as > GenBank, > > UniProt, FASTA, etc) about where to find a particular piece of > the > > format in the object model. > > This is the right thread for list lurkers to contribute their betes > noires > such as this one. I encourage ALL to post these issues and help > create > our list of action items. > MAJ I wish you the best of luck on this ambitious and crucial project. I teach intro Perl classes to biologists and always tell them that Bioperl is amazingly useful, but only if you can figure out how to use it. If what you want to do isn't in the howtos, you can be in big trouble. I was trying to remember specific examples of where I've gotten lost, and unfortunately can't give any. But I can tell you that often I've run into trouble because the particular method I'm looking for is three parent classes away from the module I'm actually looking at. The deobfuscator helps some, but only for people who know about that. Do you think you could automate a tool that would add the following to the bottom of each module? =head2 Inherited methods =over 4 =item desc See Bio::Seq::Basic =back This would make browsing through the docs on bioperl.org more fun too. -Amir Karger From cjfields at illinois.edu Mon Aug 17 10:06:15 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:06:15 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: Congrats Chase! chris On Aug 17, 2009, at 7:43 AM, Mark A. Jensen wrote: > Hi All- > > I'm pleased to announce that my Google Summer of Code student > Chase Miller and I have successfully migrated his modules for > NeXML I/O into bioperl-live. NeXML (http://www.nexml.org) is > Rutger Vos' highly flexible, highly annotable standard for > evolutionary data exchange, that is catching on in the > evolutionary DB world. We hope these modules will help move that > process along. > > I also want to say that Chase has been a terrific student and > collaborator. He learned the not only the complexities of BioPerl > IO from scratch, but also grokked Rutger's Bio::Phylo internals, > and became familiar with and applied modern OO concepts. He also > wrote tests (which pass!), complete POD, and a HOWTO (at > http://www.bioperl.org/wiki/HOWTO:Nexml) to accompany this > work. Best of all, he finished! (Well, as much as anything is > ever finished around here.) I for one hope he will continue to > use his commit bit for good and not evil. > > cheers, > Mark > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:22:26 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:22:26 -0500 Subject: [Bioperl-l] blast hit to feature gene sequence in bioperl? In-Reply-To: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> References: <470b4b060908170606t42266fc6i3366830cb2289b6f@mail.gmail.com> Message-ID: <74D10663-5770-43DA-ABDB-27FA5D532497@illinois.edu> That's possible, yes. Use the hit information and use Bio::DB::GenBank to pull the sequence out, in the below example. Note that strand is different than BioPerl's -1/0/1; efetch strand: 1 = normal (default), 2 = comp. ================================ my $factory = Bio::DB::GenBank->new(-format => 'genbank', -seq_start => $seqstart, -seq_stop => $seqend, -strand => $strand, # 1=plus, 2=minus ); $factory->get_Seq_by_id($id); # should be UID, use get_Seq_by_acc() for accessions ================================ This pulls everything into a Bio::Seq, though, so you'll need to push it out to a SeqIO output stream. You can also use Bio::DB::EUtilities to get the raw sequence via efetch, something like (untested): ================================ my $fetcher = Bio::DB::EUtilities->new( -eutil => 'efetch', -db => 'nucleotide', -rettype => 'gb'); # loop: for each hit/HSP, grab sequence... my $fetcher->set_parameters( -id => $id # UID or accession -seq_start => $seqstart, # hit start -seq_stop => $seqend, # hit end -strand => $strand # 1=plus, 2=minus ); # then get raw content $fetcher->get_Response(-file => ">$id.gb"); ================================ You could probably plug into ENSembl similarly if the db versions match; see: http://www.bioperl.org/wiki/HOWTO:Getting_Genomic_Sequences chris On Aug 17, 2009, at 8:06 AM, David Quan wrote: > Hello there, > > I've been browsing around bioperl documentation and have used > a blast parser, but am wondering if it is possible to use the start > and end information for a hit to trace back to a gene in genbank and > extract the sequence for that gene? I have not been able to find > elements that would work in such a way. Hints and recommendations for > elements that would be capable of behaving in such a way would be > greatly appreciated. Thanks very much. > > David N. Quan > > -- > Love of country is, at heart, trust in a nation's people, faith in > their better nature, esteem for their best hopes, understanding for > the magnificence and the distinctiveness and the huge, infinitely > shaded cultural palette of their simple humanity. --Bradley Burston > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From cjfields at illinois.edu Mon Aug 17 10:47:31 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 09:47:31 -0500 Subject: [Bioperl-l] on BP documentation In-Reply-To: <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> References: <1F899AA92F94415186CB0B25306F1114@NewLife><6373FCCF-4A2B-48EC-91FA-AE5CB8DF4462@illinois.edu> <72AF30DC2881964CB911FD08E57157E7035C0510@lsdiv-msxbe-001.nucleus.harvard.edu> Message-ID: On Aug 17, 2009, at 8:04 AM, Amir Karger wrote: >> -----Original Message----- >> From: Mark A. Jensen [mailto:maj at fortinbras.us] >> >> From: "Hilmar Lapp" >> ... >>> As for the FASTA example, I can understand - I've heard >> repeatedly >>> from people that one of the things that they are missing is >>> documentation for every SeqIO format we support (such as >> GenBank, >>> UniProt, FASTA, etc) about where to find a particular piece of >> the >>> format in the object model. >> >> This is the right thread for list lurkers to contribute their betes >> noires >> such as this one. I encourage ALL to post these issues and help >> create >> our list of action items. >> MAJ > > I wish you the best of luck on this ambitious and crucial project. I > teach intro Perl classes to biologists and always tell them that > Bioperl > is amazingly useful, but only if you can figure out how to use it. If > what you want to do isn't in the howtos, you can be in big trouble. > > I was trying to remember specific examples of where I've gotten lost, > and unfortunately can't give any. But I can tell you that often I've > run > into trouble because the particular method I'm looking for is three > parent classes away from the module I'm actually looking at. The > deobfuscator helps some, but only for people who know about that. Do > you > think you could automate a tool that would add the following to the > bottom of each module? > > =head2 Inherited methods > > =over 4 > > =item desc > > See Bio::Seq::Basic > > =back > > This would make browsing through the docs on bioperl.org more fun too. > > -Amir Karger For many modules this is already in place, but yes this could be improved. One of the problems I suggest we avoid when doing this is placing these interspersed within code. It has been demonstrated that doing so actually slows down the perl interpreter slightly; it has to slog through lots of POD to find the code at the compilation step. This occurs only upon on initial compilation, but it is significant enough that the overall recommendation by most perl brethren (and in Perl Best Practices) has been to place any POD after an __END__ marker. This way the compiler doesn't have to look at it at all, but perldoc can still find it. Also, acc to PBP, although the inline POD would seemingly be easier to take care of, apparently the opposite is true in most cases (though it can come down to styling differences). Interspersed code is much harder to maintain in a consistent state, tends to be choppier, and can be laid out in odd ways due to being scattered throughout the file. I know this can come down to a difference in style, but the arguments do make sense enough to me that in Biome I am pushing to have all docs after the __END__ marker. Lincoln already practices this within bioperl and Bio::Graphics, and I plan on moving much on my documentation similarly within my code in BioPerl. The additional comments in the PBP chapter "Documentation" are well- worth reading if you can get your hands on it. chris From rmb32 at cornell.edu Mon Aug 17 11:21:08 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:21:08 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A897564.2090203@cornell.edu> Hurrah! GSoC strikes again! Rob From rmb32 at cornell.edu Mon Aug 17 11:45:18 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:45:18 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <474354.59886.qm@web30408.mail.mud.yahoo.com> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> Message-ID: <4A897B0E.7060208@cornell.edu> Yee Man Chan wrote: > As to the release, my thinking is that I do understand that your desire to maintain a high level of quality in BioPerl code base. So if the HMM doesn't meet that standard, I am ok with it being spinned off. We're not pushing to spin it off because of code quality, we're pushing to spin it off because we're spinning everything off. The plan is to break BioPerl up into many discrete distributions on CPAN with the dependencies between them well-known and codified. This will make maintenance of BioPerl *much* easier in the long run. So this means that the plan of action should be 1.) get the code so that it's working on all platforms, 2.) create a CPAN distribution for it and put it on CPAN, 3.) remove it from bioperl-ext Also, doing a search for bioperl-ext on CPAN brings to light a couple of issues that probably need to be dealt with. To wit: 1.) there is an ancient version of bioperl-ext that probably needs to be removed, it's under ~birney's account. Thoughts on this? 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on bioperl-ext, which suggests that these really need to be split off, each with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the first case of this: * make a dir in the repos called Bio-Tools-HMM alongside bioperl-live, having trunk/, and branches/ subdirs * move Bio::Tools::HMM out of bioperl-live into that * move Bio::Ext::HMM stuff out of bioperl-ext into that * repeat with Bio::Tools::dpAlign and pSW, which would probably go together into a Bio-Tools-Align distro, I think Sounds like this is moving along nicely. Rob From rmb32 at cornell.edu Mon Aug 17 11:48:10 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 08:48:10 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <4A897BBA.2070204@cornell.edu> Also, I volunteer to make this branch and module machinery and such if you want. I just don't want to step on any ongoing development you guys are going in the bioperl-ext trunk. If you want me to do it, just say the word, either here or in #bioperl. Rob Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So if >> the HMM doesn't meet that standard, I am ok with it being spinned off. > > We're not pushing to spin it off because of code quality, we're pushing > to spin it off because we're spinning everything off. The plan is to > break BioPerl up into many discrete distributions on CPAN with the > dependencies between them well-known and codified. This will make > maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a couple of > issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs to be > removed, it's under ~birney's account. Thoughts on this? > > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend on > bioperl-ext, which suggests that these really need to be split off, each > with the Bio::Ext::Modules they depend on. Bio::Tools::HMM could be the > first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside > bioperl-live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > -- Robert Buels Bioinformatics Analyst, Sol Genomics Network Boyce Thompson Institute for Plant Research Tower Rd Ithaca, NY 14853 Tel: 503-889-8539 rmb32 at cornell.edu http://www.sgn.cornell.edu From cjfields at illinois.edu Mon Aug 17 12:58:24 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 11:58:24 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897B0E.7060208@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> Message-ID: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> On Aug 17, 2009, at 10:45 AM, Robert Buels wrote: > Yee Man Chan wrote: >> As to the release, my thinking is that I do understand that your >> desire to maintain a high level of quality in BioPerl code base. So >> if the HMM doesn't meet that standard, I am ok with it being >> spinned off. > > We're not pushing to spin it off because of code quality, we're > pushing to spin it off because we're spinning everything off. The > plan is to break BioPerl up into many discrete distributions on CPAN > with the dependencies between them well-known and codified. This > will make maintenance of BioPerl *much* easier in the long run. > > So this means that the plan of action should be > 1.) get the code so that it's working on all platforms, > 2.) create a CPAN distribution for it and put it on CPAN, > 3.) remove it from bioperl-ext > > Also, doing a search for bioperl-ext on CPAN brings to light a > couple of issues that probably need to be dealt with. To wit: > > > 1.) there is an ancient version of bioperl-ext that probably needs > to be removed, it's under ~birney's account. Thoughts on this? This subject just recently popped up on perl.module.authors, more in relation to abandonware, but a similar thing. Andreas has indicate there is an abandoned flag that can be set so it's worth looking into, but using it requires another release. I have been in contact with that group on ideas for the split; libwin32 did the same thing, so I'll contact Jan Dubois on the matter for some pointers. > 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend > on bioperl-ext, which suggests that these really need to be split > off, each with the Bio::Ext::Modules they depend on. > Bio::Tools::HMM could be the first case of this: > * make a dir in the repos called Bio-Tools-HMM alongside bioperl- > live, having trunk/, and branches/ subdirs > * move Bio::Tools::HMM out of bioperl-live into that > * move Bio::Ext::HMM stuff out of bioperl-ext into that > * repeat with Bio::Tools::dpAlign and pSW, which would probably > go together into a Bio-Tools-Align distro, I think > > Sounds like this is moving along nicely. > > Rob Yes, that's essentially the idea. The more significant impact of this (both here and in core) is allowing updates to be made as needed, and not be blocked due to issues in unrelated modules. We have been waiting years for fixes to pSW, Staden::read, Align w/o progress, which has hindered overall releases of bioperl-ext. Similar problems exist in bp-core. Re: bioperl-ext, BioLib has rendered some of those implementations obsolete. I would rather do that incrementally (individual implementations) vs. wait for a full-blown bioperl-ext release, so splitting these up makes that possible. chris From robert.bradbury at gmail.com Mon Aug 17 13:14:57 2009 From: robert.bradbury at gmail.com (Robert Bradbury) Date: Mon, 17 Aug 2009 13:14:57 -0400 Subject: [Bioperl-l] Homology/Phylogeny pretty-print for non-bioinformatics researchers Message-ID: One of the questions facing people working in bioinformatics is "How do we present information so that it can be effectively interpreted by non-informatics specialists?" Now, my expertise lies in computer science (esp. O.S. & databases) and as a second vocation the biology of aging (DNA damage & repair, to a lesser extent cancer and pathologies of aging, etc.). Now by my estimate there are perhaps 5 people in the world who are able to effectively discuss computer science X aging (gerontology) [3]. There are perhaps several dozen people where those areas, esp aging, may overlap with DNA damage & repair. But then there is a wider audience of perhaps a few hundred members of AGE, and maybe a thousand or so who are members of the scientific subgroup of GSA. But most of those individuals are "old school" scientists who know relatively little about bioinformatics. So one has barriers to presenting bioinformatics information in ways that they can use usefully. I have found in my limited experience that homology graphs of conserved protein domains, such as those displayed in HomloGene or those in Ensembl (including phylogeny graphs) can be quite useful in reaching interesting conclusions. For example, double strand break repair processes which may involve 8-10 relatively conserved proteins, may have a critical role in the mechanisms of aging. In particular two of those proteins, WRN & DCLRE1C (Artemis) contain complementary exonuclease activities which chew up the DNA in order to prepare the strands for ligation. Of course, programmers may appreciate better than gerontologists the significance of deleting random bytes from instruction sequences in ones code. At the recent AGE meeting in June several discussions arose as to possible differences in "aging" in yeast, *C. elegans* and mammals. [1]. A quick database search showed that *C. elegans* seems to be lacking the exonuclease domain on the WRN homologue and may be missing a DCLRE1C homologue entirely (which if true would lead to conclusions that aging in *C. elegans* may be fundamentally different from aging in vertebrates). Explaining this to researchers can best be done using pictures. I've been through PubMed and have several papers (NAR / BMC Bioinformatics) regarding programs to do homology comparisons and phylogeny trees. However these seem to lean towards producing less condensed bioinformatics-ish information. I do not know however whether the outputs from databases like PubMed HomoloGene or Ensembl have been packaged in tools that might be part of BioPerl. I am interested in programs that can be run on a regular basis to draw "pretty pictures" that can be used for publication and/or internet browsing. In particular I'm interested in running such programs on species of interest to various gerontological communities [2] which involves subsets of databases which seem to be scattered around the world. Thanks. 1. Of course there has been lots of discussion and rationalization over the last 15+ years about how "aging" is largely the same in more complex and simpler organisms -- in part to justify sequencing some organisms and in part to justify funding research at certain laboratories. A closer examination based on some of the complete and emerging genome sequences may suggest this is a very swampy discussion. 2. For example, nematode DNA repair gene comparisons would be interesting to nematode researchers, insect DNA repair gene comparisons to insect researchers, both to invertebrate researchers, etc. 3. The recently published textbooks *Aging of the Genome* by Jan Vijg and the 2nd edition of *DNA Repair and Mutagenesis* by Errol Friedberg *et al*, go a long way towards moving these areas from the stacks of research libraries into areas for more general discussion. Both volumes deal extensively with the ~150 DNA repair genes. From cjfields at illinois.edu Mon Aug 17 13:15:46 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 12:15:46 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A897BBA.2070204@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <4A897BBA.2070204@cornell.edu> Message-ID: I say go for it if Yee Man is okay with the idea. It gets the code out there that much faster. This also doesn't depend on core being split up (only need a 'requires' bioperl 1.6.0). chris On Aug 17, 2009, at 10:48 AM, Robert Buels wrote: > Also, I volunteer to make this branch and module machinery and such > if you want. I just don't want to step on any ongoing development > you guys are going in the bioperl-ext trunk. > > If you want me to do it, just say the word, either here or in > #bioperl. > > Rob > > Robert Buels wrote: >> Yee Man Chan wrote: >>> As to the release, my thinking is that I do understand that >>> your desire to maintain a high level of quality in BioPerl code >>> base. So if the HMM doesn't meet that standard, I am ok with it >>> being spinned off. >> We're not pushing to spin it off because of code quality, we're >> pushing to spin it off because we're spinning everything off. The >> plan is to break BioPerl up into many discrete distributions on >> CPAN with the dependencies between them well-known and codified. >> This will make maintenance of BioPerl *much* easier in the long run. >> So this means that the plan of action should be >> 1.) get the code so that it's working on all platforms, >> 2.) create a CPAN distribution for it and put it on CPAN, >> 3.) remove it from bioperl-ext >> Also, doing a search for bioperl-ext on CPAN brings to light a >> couple of issues that probably need to be dealt with. To wit: >> 1.) there is an ancient version of bioperl-ext that probably needs >> to be removed, it's under ~birney's account. Thoughts on this? >> 2.) Bio::Tools::( dpAlign | HMM | pSW ) all state that they depend >> on bioperl-ext, which suggests that these really need to be split >> off, each with the Bio::Ext::Modules they depend on. >> Bio::Tools::HMM could be the first case of this: >> * make a dir in the repos called Bio-Tools-HMM alongside bioperl- >> live, having trunk/, and branches/ subdirs >> * move Bio::Tools::HMM out of bioperl-live into that >> * move Bio::Ext::HMM stuff out of bioperl-ext into that >> * repeat with Bio::Tools::dpAlign and pSW, which would probably >> go together into a Bio-Tools-Align distro, I think >> Sounds like this is moving along nicely. >> Rob >> _______________________________________________ >> Bioperl-l mailing list >> Bioperl-l at lists.open-bio.org >> http://lists.open-bio.org/mailman/listinfo/bioperl-l > > > -- > Robert Buels > Bioinformatics Analyst, Sol Genomics Network > Boyce Thompson Institute for Plant Research > Tower Rd > Ithaca, NY 14853 > Tel: 503-889-8539 > rmb32 at cornell.edu > http://www.sgn.cornell.edu > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l From chmille4 at gmail.com Mon Aug 17 14:44:09 2009 From: chmille4 at gmail.com (Chase Miller) Date: Mon, 17 Aug 2009 14:44:09 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A897564.2090203@cornell.edu> References: <4A897564.2090203@cornell.edu> Message-ID: <991fb8210908171144t3f7107f0ldaf02dfdc762ae27@mail.gmail.com> Thanks! It was a great experience. I couldn't have done it without Mark who was a fantastic mentor. cheers, Chase On Mon, Aug 17, 2009 at 11:21 AM, Robert Buels wrote: > Hurrah! GSoC strikes again! > > Rob > From rmb32 at cornell.edu Mon Aug 17 16:32:14 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:32:14 -0700 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> Message-ID: <4A89BE4E.7090901@cornell.edu> OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro at Bio-Tools-HMM in the repo. The tests are not passing, I think that some bugs need to be fixed in the logic of things. Yee Man, could you have a look? To download the newly repackaged code: svn co svn+ssh://your_login at dev.open-bio.org/home/svn-repositories/bioperl/Bio-Tools-HMM/trunk Bio-Tools-HMM perl Build.PL; ./Build test Please check that things are compiling OK, check the test logic, upgrade the tests to use Test::More, and get the tests to the point where they are passing. At that point, it should be ready for CPAN, but we need to decide how we want to coordinate that with releases of bioperl-live and bioperl-ext. Rob From rmb32 at cornell.edu Mon Aug 17 16:45:42 2009 From: rmb32 at cornell.edu (Robert Buels) Date: Mon, 17 Aug 2009 13:45:42 -0700 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: References: Message-ID: <4A89C176.3050109@cornell.edu> Mark A. Jensen wrote: > wrote tests (which pass!), complete POD, and a HOWTO (at The tests for this are depending on Bio::Phylo and fail if it's not installed. Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a "recommended" module, or what? Gotta clarify our dependencies. Rob From cjfields at illinois.edu Mon Aug 17 16:54:05 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 15:54:05 -0500 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: On Aug 17, 2009, at 3:45 PM, Robert Buels wrote: > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not > installed. Are we going to add Bio::Phylo as a bioperl dependency, > or band-aid it as a "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob 'recommends', should skip all tests as a 'pass' with message that 'Bio::Phylo is required' or somesuch. chris From maj at fortinbras.us Mon Aug 17 16:55:19 2009 From: maj at fortinbras.us (Mark A. Jensen) Date: Mon, 17 Aug 2009 16:55:19 -0400 Subject: [Bioperl-l] new NeXML I/O modules In-Reply-To: <4A89C176.3050109@cornell.edu> References: <4A89C176.3050109@cornell.edu> Message-ID: <3D65CA5234EB4BDF892F280D575FB01D@NewLife> I meant to add a skip tests on a runtime check for bio::phylo. Gotta do that. It's necessary only for these modules. ----- Original Message ----- From: "Robert Buels" To: "Mark A. Jensen" Cc: "BioPerl List" ; "Rutger Vos" ; "Chase Miller" Sent: Monday, August 17, 2009 4:45 PM Subject: Re: [Bioperl-l] new NeXML I/O modules > Mark A. Jensen wrote: >> wrote tests (which pass!), complete POD, and a HOWTO (at > > The tests for this are depending on Bio::Phylo and fail if it's not installed. > Are we going to add Bio::Phylo as a bioperl dependency, or band-aid it as a > "recommended" module, or what? > > Gotta clarify our dependencies. > > Rob > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/bioperl-l > > From cjfields at illinois.edu Mon Aug 17 17:22:00 2009 From: cjfields at illinois.edu (Chris Fields) Date: Mon, 17 Aug 2009 16:22:00 -0500 Subject: [Bioperl-l] Problems with Bioperl-ext package on WinVista? In-Reply-To: <4A89BE4E.7090901@cornell.edu> References: <474354.59886.qm@web30408.mail.mud.yahoo.com> <4A897B0E.7060208@cornell.edu> <7F616861-0C3A-4C68-BE9C-405A377718B4@illinois.edu> <4A89BE4E.7090901@cornell.edu> Message-ID: <062C4E69-B72D-471B-8588-2FEC9F798983@illinois.edu> Still seeing that odd warning popping up: cjfields4:Bio-Tools-HMM cjfields$ ./Build test --verbose t/001_basics.t .. Argument "FL" isn't numeric in numeric lt (<) at / Users/cjfields/bioperl/Bio-Tools-HMM/blib/lib/Bio/Tools/HMM.pm line 185. Have you tried using Yee Man's original Makefile.PL to see if it works better? There appear to be some differences in the compilation, including a linking warning popping up. chris On Aug 17, 2009, at 3:32 PM, Robert Buels wrote: > OK, I split Bio::Tools::HMM and Bio::Ext::HMM off into a new distro > at Bio-Tools-HMM in the repo. The tests are not passing, I think > that some bugs need to be fixed in the logic of things. > > Yee Man, could you have a look? To download the newly repackaged